AI & ML news: Week 6–12 May

AlphaFold3, OpenAI want to create its own search engine, and so on

Salvatore Raieli
14 min readMay 14, 2024


Photo by Ariana Tafur on Unsplash

The most interesting news, repository, articles, and resources of the week

Check and star this repository where the news will be collected and indexed:

You will find the news first in GitHub. Single posts are also collected here:

Weekly AI and ML news - each week the best of the field

32 stories
AI & ML news


  • Auto-Encoding Morph-Tokens for Multimodal LLM. Researchers have created “Morph-Tokens” to enhance AI’s capacity for image creation and visual comprehension. These tokens take advantage of the sophisticated processing capabilities of the MLLM framework to convert abstract notions required for comprehension into intricate graphics for image creation.
  • Introducing AlphaFold 3. In a paper published in Nature, we introduce AlphaFold 3, a revolutionary model that can predict the structure and interactions of all life’s molecules with unprecedented accuracy. For the interactions of proteins with other molecule types we see at least a 50% improvement compared with existing prediction methods, and for some important categories of interaction, we have doubled prediction accuracy.
  • ImageInWords: Unlocking Hyper-Detailed Image Descriptions. An extraordinarily detailed coupling of images and text was produced via a novel labeling technique that made use of two passes of VLMs. Strong multimodal models can be trained with the help of the captions, which include significantly more detail than any previous dataset.
  • Navigating Chemical Space with Latent Flows. ChemFlow is a new framework that uses deep generative models to rapidly navigate chemical space, improving molecular science.
  • Consistency Large Language Models: A Family of Efficient Parallel Decoders. One intriguing paradigm of ongoing research is the prediction of many tokens at once. If it works, generation times for many large language models would be significantly reduced. This post’s method aims to accelerate generation by using a parallel decoding mechanism on fine-tuned LLMs, akin to consistency models from picture synthetics. Initial findings correspond with a 3x speculative decoding performance.
  • You Only Cache Once: Decoder-Decoder Architectures for Language Models. The decoder-decoder YOCO architecture maintains global attention capabilities while using less GPU RAM. It is made up of a cross-decoder and a self-decoder, which enable effective key-value pair caching and reuse. With notable gains in throughput, latency, and inference memory over standard Transformers, YOCO performs favorably and is appropriate for big language models and extended context lengths.
The technology found that a painting listed for $599,000 on eBay as a ‘Monet’ had a ‘high probability’ of not being authentic. https://www.theguardian.com/artanddesign/article/2024/may/08/fake-monet-and-renoir-on-ebay-among-counterfeits-identified-using-ai#img-1
  • Gemma-10M Technical Overview. Language-Vision The ability of models to comprehend and interact with text and visuals is quickly developing, as demonstrated by GPT-4V. Their important limits in visual deductive thinking are revealed by a recent study. Using challenging visual puzzles similar to those in IQ testing, researchers assessed these models and found that they had trouble with multi-step reasoning and abstract pattern recognition.
  • Vision Mamba: A Comprehensive Survey and Taxonomy. a thorough examination of Mamba’s uses in a range of visual tasks and its changing significance. Keep up with the latest discoveries and developments about the Mamba project.


  • Lamini Raises $25M For Enterprises To Develop Top LLMs In-House. Software teams within enterprises can now create new LLM capabilities that lessen hallucinations on proprietary data, run their LLMs securely from cloud VPCs to on-premise, and scale their infrastructure with model evaluations that put ROI and business outcomes ahead of hype thanks to Lamini, an Enterprise AI platform. Amplify Partners led a $25 million Series A financing round.
  • Microsoft-backed OpenAI may launch the search, taking on Google’s ‘biggest product’. Speculations in the tech world suggest that OpenAI is gearing up for a major announcement, possibly a new search engine. According to Jimmy Apples, who reports the claim as an insider, the company is planning an event this month (May), tentatively scheduled for May 9, 2024, at 10 am.
  • OpenAI Model Spec. This is the first draft of the Model Spec, a document that specifies the desired behavior for our models in the OpenAI API and ChatGPT. It includes a set of core objectives, as well as guidance on how to deal with conflicting objectives or instructions.
  • AI engineers report burnout and rushed rollouts as ‘rat race’ to stay competitive hits the tech industry. Artificial intelligence engineers at top tech companies told CNBC that the pressure to roll out AI tools at breakneck speed has come to define their jobs. They say that much of their work is assigned to appease investors rather than to solve problems for end users and that they are often chasing OpenAI. Burnout is an increasingly common theme as AI workers say their employers are pursuing projects without regard for the technology’s effect on climate change, surveillance, and other potential real-world harms.
  • Stable Artisan: Media Generation and Editing on Discord. Stable Artisan enables media generation on Discord powered by Stability AI’s cutting-edge image and video models, Stable Diffusion 3, Stable Video Diffusion, and Stable Image Core. In addition to media generation, Stable Artisan offers tools to edit your creations like Search and Replace, Remove Background, Creative Upscale, and Outpainting.
  • ElevenLabs previews a music-generating AI model. Voice AI startup ElevenLabs is offering an early look at a new model that turns a prompt into song lyrics. To raise awareness, it’s following a similar playbook Sam Altman used when OpenAI introduced Sora, its video-generating AI, soliciting ideas on social media and turning them into lyrics.
  • Sources: Mistral AI raising at a $6B valuation, SoftBank ‘not in’ but DST is. Paris-based Mistral AI, a startup working on open-source large language models — the building block for generative AI services — has been raising money at a $6 billion valuation, three times its valuation in December, to compete more keenly against the likes of OpenAI and Anthropic, TechCrunch has learned from multiple sources.


  • Prometheus-Eval. GPT-4 is a widely used performance benchmark for evaluating generation quality. Built upon Mistral, Prometheus is a model that excels at this particular purpose.
  • Bonito. Bonito is an open-source model for conditional task generation: the task of converting unannotated text into task-specific training datasets for instruction tuning. This repo is a lightweight library for Bonito to easily create synthetic datasets built on top of the Hugging Face transformers and vllm libraries.
  • Penzai. Penzai is a JAX library that provides clear, useful Pytree structures for training and interpreting models. It comes with a wide range of tools for component analysis, debugging, and model visualization. Penzai is easy to install and use, and it offers comprehensive tutorials for learning how to create and interact with neural networks.
  • LeRobot. LeRobot aims to provide models, datasets, and tools for real-world robotics in PyTorch. The goal is to lower the barrier to entry to robotics so that everyone can contribute and benefit from sharing datasets and pre-trained models. LeRobot contains state-of-the-art approaches that have been shown to transfer to the real world with a focus on imitation learning and reinforcement learning.
  • Vibe-Eval. A benchmark for evaluating multimodal chat models, including especially challenging examples.
  • DeepSeek-V2-Chat. DeepSeek-V2, a strong Mixture-of-Experts (MoE) language model characterized by economical training and efficient inference. It comprises 236B total parameters, of which 21B are activated for each token. Compared with DeepSeek 67B, DeepSeek-V2 achieves stronger performance and meanwhile saves 42.5% of training costs, reduces the KV cache by 93.3%, and boosts the maximum generation throughput to 5.76 times.
  • Visual Reasoning Benchmark. Language-Vision The ability of models to comprehend and interact with text and visuals is quickly developing, as demonstrated by GPT-4V. Their important limits in visual deductive thinking are revealed by a recent study. Using challenging visual puzzles similar to those in IQ testing, researchers assessed these models and found that they had trouble with multi-step reasoning and abstract pattern recognition.
  • AI Index: State of AI in 13 Charts. In the new report, foundation models dominate, benchmarks fall, prices skyrocket, and on the global stage, the U.S. overshadows.
  • Buzz Pretraining Dataset. Preference data is a new addition to the pretraining mix in Buzz. Multiple models that were trained on this data have also been made available by its researchers. They discovered that the models show good results on several tasks related to human preferences.


  • From Baby Talk to Baby A.I. Could a better understanding of how infants acquire language help us build smarter A.I. models?
  • The AI Hardware Dilemma. Even while recent AI-powered hardware releases, such as the Humane Pin and Rabbit R1, have drawn criticism, the industry is still receiving a lot of venture capital investment, and well-known individuals like Sam Altman are considering making sizable investments. The appeal is in AI’s ability to transform consumer hardware through the innovative use of sensors, silicon, and interfaces. Though hardware startups find it difficult to compete with well-established tech giants, AI still needs to evolve, making it difficult to provide a compelling alternative to flexible smartphones.
  • AI Prompt Engineering Is Dead. Automating prompt optimization for AI models points to more effective, model-driven prompt generation techniques in the future, possibly rendering human prompt engineering unnecessary.
  • The Next Big Programming Language Is English. GitHub Copilot Workspace is a robust programming tool that allows users to code in plain English via the browser, from planning to implementation. It is currently available in a limited technical preview. In contrast to ChatGPT, the AI easily integrates with codebases, suggesting block-by-block code execution and managing complex tasks with less active user interaction.
  • Is AI lying to me? Scientists warn of growing capacity for deception. Researchers find instances of systems double-crossing opponents, bluffing, pretending to be human, and modifying behavior in tests

Meme of the week

What do you think about it? Some news that captured your attention? Let me know in the comments

If you have found this interesting:

You can look for my other articles, and you can also connect or reach me on LinkedIn. Check this repository containing weekly updated ML & AI news. I am open to collaborations and projects and you can reach me on LinkedIn. You can also subscribe for free to get notified when I publish a new story.

Here is the link to my GitHub repository, where I am collecting code and many resources related to machine learning, artificial intelligence, and more.

or you may be interested in one of my recent articles:



Salvatore Raieli

Senior data scientist | about science, machine learning, and AI. Top writer in Artificial Intelligence