WEEKLY AI NEWS: RESEARCH, NEWS, RESOURCES, AND PERSPECTIVES
AI & ML news: Week 6–12 May
AlphaFold3, OpenAI want to create its own search engine, and so on
The most interesting news, repository, articles, and resources of the week
Check and star this repository where the news will be collected and indexed:
You will find the news first in GitHub. Single posts are also collected here:
Research
- Mantis: Interleaved Multi-Image Instruction Tuning. A newly developed dataset and trained visual language model that allow for better instruction over a series of images.
- FeNNol: an Efficient and Flexible Library for Building Force-field-enhanced Neural Network Potentials. A state-of-the-art library called FeNNol makes it easier to create and use hybrid neural network potentials in molecular simulations.
- Spider: A Unified Framework for Context-dependent Concept Understanding. Spider is a revolutionary unified paradigm intended to improve comprehension of context-dependent (CD) concepts that rely largely on visual contexts, like medical lesions and items concealed in the environment.
- Frequency-mixed Single-source Domain Generalization for Medical Image Segmentation. A novel algorithm known as RaffeSDG has been created by researchers to enhance the precision of medical imaging models when evaluating data from various sources.
- SlotGAT: Slot-based Message Passing for Heterogeneous Graph Neural Network. SlotGAT is a new approach that improves heterogeneous graph neural networks by addressing the semantic mixing issue in traditional message passing.
- Frequency Masking for Universal Deepfake Detection.By concentrating on masked picture modeling, particularly in the frequency domain, this novel technique finds deepfakes. The strategy is different from conventional approaches and demonstrates a notable improvement in recognizing artificial images, even from recently developed AI generative techniques.
- Auto-Encoding Morph-Tokens for Multimodal LLM. Researchers have created “Morph-Tokens” to enhance AI’s capacity for image creation and visual comprehension. These tokens take advantage of the sophisticated processing capabilities of the MLLM framework to convert abstract notions required for comprehension into intricate graphics for image creation.
- Introducing AlphaFold 3. In a paper published in Nature, we introduce AlphaFold 3, a revolutionary model that can predict the structure and interactions of all life’s molecules with unprecedented accuracy. For the interactions of proteins with other molecule types we see at least a 50% improvement compared with existing prediction methods, and for some important categories of interaction, we have doubled prediction accuracy.
- ImageInWords: Unlocking Hyper-Detailed Image Descriptions. An extraordinarily detailed coupling of images and text was produced via a novel labeling technique that made use of two passes of VLMs. Strong multimodal models can be trained with the help of the captions, which include significantly more detail than any previous dataset.
- Inf-DiT: Upsampling Any-Resolution Image with Memory-Efficient Diffusion Transformer. To get beyond memory constraints in the creation of ultra-high-resolution images, a novel diffusion model presents a unidirectional block attention mechanism.
- DocRes: A Generalist Model Toward Unifying Document Image Restoration Tasks. A novel model called DocRes handles five tasks in one system: de-warping, deshadowing, appearance enhancement, deblurring, and binarization, making document image restoration easier.
- QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving. QoQ is a unique quantization approach that leverages a 4-bit KV cache, 8-bit activations, and 4-bit weights to accelerate big language model inference.
- Navigating Chemical Space with Latent Flows. ChemFlow is a new framework that uses deep generative models to rapidly navigate chemical space, improving molecular science.
- Consistency Large Language Models: A Family of Efficient Parallel Decoders. One intriguing paradigm of ongoing research is the prediction of many tokens at once. If it works, generation times for many large language models would be significantly reduced. This post’s method aims to accelerate generation by using a parallel decoding mechanism on fine-tuned LLMs, akin to consistency models from picture synthetics. Initial findings correspond with a 3x speculative decoding performance.
- You Only Cache Once: Decoder-Decoder Architectures for Language Models. The decoder-decoder YOCO architecture maintains global attention capabilities while using less GPU RAM. It is made up of a cross-decoder and a self-decoder, which enable effective key-value pair caching and reuse. With notable gains in throughput, latency, and inference memory over standard Transformers, YOCO performs favorably and is appropriate for big language models and extended context lengths.
- Optimal Group Fair Classifiers from Linear Post-Processing. This innovative post-processing approach ensures compliance with many group fairness criteria, including statistical parity, equal opportunity, and equalized odds, by recalibrating output scores after imposing a “fairness cost” to address model bias.
- DiffMatch: Visual-Language Guidance Makes Better Semi-supervised Change Detector. DiffMatch is a new semi-supervised change detection technique that generates pseudo labels for unlabeled data by using visual language models, hence offering extra supervision signals.
- Gemma-10M Technical Overview. Language-Vision The ability of models to comprehend and interact with text and visuals is quickly developing, as demonstrated by GPT-4V. Their important limits in visual deductive thinking are revealed by a recent study. Using challenging visual puzzles similar to those in IQ testing, researchers assessed these models and found that they had trouble with multi-step reasoning and abstract pattern recognition.
- Vision Mamba: A Comprehensive Survey and Taxonomy. a thorough examination of Mamba’s uses in a range of visual tasks and its changing significance. Keep up with the latest discoveries and developments about the Mamba project.
News
- Lamini Raises $25M For Enterprises To Develop Top LLMs In-House. Software teams within enterprises can now create new LLM capabilities that lessen hallucinations on proprietary data, run their LLMs securely from cloud VPCs to on-premise, and scale their infrastructure with model evaluations that put ROI and business outcomes ahead of hype thanks to Lamini, an Enterprise AI platform. Amplify Partners led a $25 million Series A financing round.
- Microsoft-backed OpenAI may launch the search, taking on Google’s ‘biggest product’. Speculations in the tech world suggest that OpenAI is gearing up for a major announcement, possibly a new search engine. According to Jimmy Apples, who reports the claim as an insider, the company is planning an event this month (May), tentatively scheduled for May 9, 2024, at 10 am.
- An AI-controlled fighter jet took the Air Force leader for a historic ride. What that means for war. AI marks one of the biggest advances in military aviation since the introduction of stealth in the early 1990s, and the Air Force has aggressively leaned in. Even though the technology is not fully developed, the service is planning for an AI-enabled fleet of more than 1,000 unmanned warplanes, the first of them operating by 2028.
- Stack Overflow and OpenAI Partner to Strengthen the World’s Most Popular Large Language Models. ack Overflow and OpenAI today announced a new API partnership that will empower developers with the collective strengths of the world’s leading knowledge platform for highly technical content with the world’s most popular LLM models for AI development.
- Elon Musk’s Plan For AI News. Musk emails with details on AI-powered news inside X. An AI bot will summarize news and commentary, sometimes looking through tens of thousands of posts per story.
- Microsoft says it did a lot for responsible AI in the inaugural transparency report. The report covers its responsible AI achievements in 2023 but doesn’t talk about Mario flying a plane to the Twin Towers.
- Cohere’s Command R Model Family is Now Available In Amazon Bedrock. Command R Model Family is now available in Amazon Bedrock.
- Fake Monet and Renoir on eBay among 40 counterfeits identified using AI. Paintings identified as fake using cutting-edge technology are ‘tip of the iceberg’ specialist Dr Carina Popovici says
- ‘A chilling prospect’: should we be scared of AI contestants on reality shows? Netflix’s hit show The Circle recently introduced an AI chatbot contestant, a potentially worrying sign of where we’re heading
- ‘ChatGPT for CRISPR’ creates new gene-editing tools. In the never-ending quest to discover previously unknown CRISPR gene-editing systems, researchers have scoured microbes in everything from hot springs and peat bogs to poo and even yogurt. Now, thanks to advances in generative artificial intelligence (AI), they might be able to design these systems with the push of a button.
- Microsoft Working on ‘Far Larger’ In-House AI Model. Microsoft is reportedly working on a new, in-house artificial intelligence (AI) model that is “far larger” than the other open-source models it has trained.
- Apple unveils M4: Its first chip made for AI from the ground up. Apple on Tuesday unveiled M4, the next generation of its Apple Silicon chip. Built with the 3-nanometer chip architecture, the M4 is the first Apple chip to be built for AI from the ground up. M4 is the chip that powers the new generation iPad Pro and will soon be inside Macs
- OpenAI Model Spec. This is the first draft of the Model Spec, a document that specifies the desired behavior for our models in the OpenAI API and ChatGPT. It includes a set of core objectives, as well as guidance on how to deal with conflicting objectives or instructions.
- AI engineers report burnout and rushed rollouts as ‘rat race’ to stay competitive hits the tech industry. Artificial intelligence engineers at top tech companies told CNBC that the pressure to roll out AI tools at breakneck speed has come to define their jobs. They say that much of their work is assigned to appease investors rather than to solve problems for end users and that they are often chasing OpenAI. Burnout is an increasingly common theme as AI workers say their employers are pursuing projects without regard for the technology’s effect on climate change, surveillance, and other potential real-world harms.
- The teens make friends with AI chatbots. Teens are opening up to AI chatbots as a way to explore friendship. But sometimes, the AI’s advice can go too far.
- GPT-2-Chatbot Confirmed As OpenAI. Recently, the gpt-2-chatbot has been seen in the LMSYS space; after discovering information from OpenAI’s API through a 429 rate limit issue, it was verified that this was a new model from OpenAI.
- OpenAI Is Readying a Search Product to Rival Google, Perplexity. The feature would let ChatGPT users search the web and cite sources in its results.DatologyAI raises $46M Series A.The data curation platform raises additional funds in its September $11 million seed round with the goal of growing its workforce and advancing corporate development.
- Yellow raises $5M from A16z for Gen AI-powered 3D modeling tool. Yellow has raised $5 million in seed funding from A16z Games to fund further development of its Gen AI-powered 3D modeling tool. With its YellowSculpt tool, artists can generate clean, pre-rigged 3D character meshes based on a text prompt in under three minutes.
- Stable Artisan: Media Generation and Editing on Discord. Stable Artisan enables media generation on Discord powered by Stability AI’s cutting-edge image and video models, Stable Diffusion 3, Stable Video Diffusion, and Stable Image Core. In addition to media generation, Stable Artisan offers tools to edit your creations like Search and Replace, Remove Background, Creative Upscale, and Outpainting.
- ElevenLabs previews a music-generating AI model. Voice AI startup ElevenLabs is offering an early look at a new model that turns a prompt into song lyrics. To raise awareness, it’s following a similar playbook Sam Altman used when OpenAI introduced Sora, its video-generating AI, soliciting ideas on social media and turning them into lyrics.
- Sources: Mistral AI raising at a $6B valuation, SoftBank ‘not in’ but DST is. Paris-based Mistral AI, a startup working on open-source large language models — the building block for generative AI services — has been raising money at a $6 billion valuation, three times its valuation in December, to compete more keenly against the likes of OpenAI and Anthropic, TechCrunch has learned from multiple sources.
- Leaked Deck Reveals How OpenAI Is Pitching Publisher Partnerships. The generative artificial intelligence firm OpenAI has been pitching partnership opportunities to news publishers through an initiative called the Preferred Publishers Program, according to a deck obtained by ADWEEK and interviews with four industry executives.
- Alibaba rolls out the latest version of its large language model to meet robust AI demand. Alibaba Cloud on Thursday said its large language model has seen more than 90,000 deployments in companies across industries. Alibaba Cloud said the latest version of its Tongyi Qianwen model, Qwen2.5, possesses “remarkable advancements in reasoning, code comprehension, and textual understanding compared to its predecessor Qwen2.0.”
Resources
- Prometheus-Eval. GPT-4 is a widely used performance benchmark for evaluating generation quality. Built upon Mistral, Prometheus is a model that excels at this particular purpose.
- Bonito. Bonito is an open-source model for conditional task generation: the task of converting unannotated text into task-specific training datasets for instruction tuning. This repo is a lightweight library for Bonito to easily create synthetic datasets built on top of the Hugging Face transformers and vllm libraries.
- Penzai. Penzai is a JAX library that provides clear, useful Pytree structures for training and interpreting models. It comes with a wide range of tools for component analysis, debugging, and model visualization. Penzai is easy to install and use, and it offers comprehensive tutorials for learning how to create and interact with neural networks.
- Realtime Video Stream Analysis with Computer Vision. This in-depth article shows you how to create a system that generates reports on the density of vehicle traffic. It counts cars over time using state-of-the-art computer vision.
- DOCCI — Descriptions of Connected and Contrasting Images. A great new dataset from Google that contains detailed and comprehensive labels.Unsloth.ai: Easily finetune & train LLMs.An animation by Unsloth’s founder demonstrating how the team builds kernels designs API surfaces, and utilizes PyTorch. The framework and library of Unsloth are incredibly robust and user-friendly.
- LeRobot. LeRobot aims to provide models, datasets, and tools for real-world robotics in PyTorch. The goal is to lower the barrier to entry to robotics so that everyone can contribute and benefit from sharing datasets and pre-trained models. LeRobot contains state-of-the-art approaches that have been shown to transfer to the real world with a focus on imitation learning and reinforcement learning.
- Vibe-Eval. A benchmark for evaluating multimodal chat models, including especially challenging examples.
- DeepSeek-V2-Chat. DeepSeek-V2, a strong Mixture-of-Experts (MoE) language model characterized by economical training and efficient inference. It comprises 236B total parameters, of which 21B are activated for each token. Compared with DeepSeek 67B, DeepSeek-V2 achieves stronger performance and meanwhile saves 42.5% of training costs, reduces the KV cache by 93.3%, and boosts the maximum generation throughput to 5.76 times.
- Visual Reasoning Benchmark. Language-Vision The ability of models to comprehend and interact with text and visuals is quickly developing, as demonstrated by GPT-4V. Their important limits in visual deductive thinking are revealed by a recent study. Using challenging visual puzzles similar to those in IQ testing, researchers assessed these models and found that they had trouble with multi-step reasoning and abstract pattern recognition.
- AI Index: State of AI in 13 Charts. In the new report, foundation models dominate, benchmarks fall, prices skyrocket, and on the global stage, the U.S. overshadows.
- Buzz Pretraining Dataset. Preference data is a new addition to the pretraining mix in Buzz. Multiple models that were trained on this data have also been made available by its researchers. They discovered that the models show good results on several tasks related to human preferences.
Perspectives
- From Baby Talk to Baby A.I. Could a better understanding of how infants acquire language help us build smarter A.I. models?
- The AI Hardware Dilemma. Even while recent AI-powered hardware releases, such as the Humane Pin and Rabbit R1, have drawn criticism, the industry is still receiving a lot of venture capital investment, and well-known individuals like Sam Altman are considering making sizable investments. The appeal is in AI’s ability to transform consumer hardware through the innovative use of sensors, silicon, and interfaces. Though hardware startups find it difficult to compete with well-established tech giants, AI still needs to evolve, making it difficult to provide a compelling alternative to flexible smartphones.
- AI Prompt Engineering Is Dead. Automating prompt optimization for AI models points to more effective, model-driven prompt generation techniques in the future, possibly rendering human prompt engineering unnecessary.
- The Next Big Programming Language Is English. GitHub Copilot Workspace is a robust programming tool that allows users to code in plain English via the browser, from planning to implementation. It is currently available in a limited technical preview. In contrast to ChatGPT, the AI easily integrates with codebases, suggesting block-by-block code execution and managing complex tasks with less active user interaction.
- Is AI lying to me? Scientists warn of growing capacity for deception. Researchers find instances of systems double-crossing opponents, bluffing, pretending to be human, and modifying behavior in tests
Meme of the week
What do you think about it? Some news that captured your attention? Let me know in the comments
If you have found this interesting:
You can look for my other articles, and you can also connect or reach me on LinkedIn. Check this repository containing weekly updated ML & AI news. I am open to collaborations and projects and you can reach me on LinkedIn. You can also subscribe for free to get notified when I publish a new story.
Here is the link to my GitHub repository, where I am collecting code and many resources related to machine learning, artificial intelligence, and more.
or you may be interested in one of my recent articles: