WEEKLY AI NEWS: RESEARCH, NEWS, RESOURCES, AND PERSPECTIVES

ML news: Week 18–24 March

Stability AI maker leaves the company, Microsoft devours Inflection AI, and much more

Salvatore Raieli
20 min readMar 25, 2024
Photo by Priscilla Du Preez 🇨🇦 on Unsplash

The most interesting news, repository, articles, and resources of the week

Check and star this repository where the news will be collected and indexed:

You will find the news first in GitHub. Single posts are also collected here:

Weekly AI and ML news - each week the best of the field

49 stories

Research

  • ScoreHMR: Score-Guided Diffusion for 3D Human Recovery. We present Score-Guided Human Mesh Recovery (ScoreHMR), an approach for solving inverse problems for 3D human pose and shape reconstruction. ScoreHMR mimics model fitting approaches, but alignment with the image observation is achieved through score guidance in the latent space of a diffusion model. Here, we show the application of our approach on videos, utilizing keypoint detections and score guidance with keypoint reprojection and temporal smoothness terms.
  • Cappy: Outperforming and boosting large multi-task language models with a small scorer. A little model called Cappy has been taught to accept instructions and a candidate’s completion, then calculate how well the completion satisfies the instructions by returning a score. It performs better on this job than significantly bigger models, indicating that it may be applied as a generation and training feedback mechanism.
https://statho.github.io/ScoreHMR/
  • RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Horizon Generation. demonstrates how LLM reasoning and generation in long-horizon generation tasks can be greatly enhanced by iteratively revising a chain of thoughts with information retrieval; the key idea is that each thought step is revised with pertinent retrieved information to the task query, the current and past thought steps; Retrieval Augmented Thoughts (RAT) is a zero-shot prompting approach that offers notable improvements over baselines that include vanilla RAG, zero-shot CoT prompting, and other baselines. RAT can be applied to various models such as GPT-4 and CodeLlama-7B to improve long-horizon generation tasks (e.g., creative writing and embodied task planning).
  • Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking. outlines Quiet-STaR, a generalization of STaR that enables language models (LMs) to acquire reasoning skills that are more scalable and general; Quiet-STaR gives LMs the ability to produce justifications for each token to explain the future text; it suggests a token-wise parallel sampling approach that enhances LM predictions by producing internal thoughts effectively; REINFORCE is used to improve the rationale creation.
The demonstration of the instruction-following pre-training of multi-task LLMs, e.g., FLAN. Pre-training tasks under this paradigm improves the performance for unseen tasks. source
  • Branch-Train-MiX: Mixing Expert LLMs into a Mixture-of-Experts LLM. suggests combining expert LLMs with a mixture of expert LLMs as a more computationally efficient way to train LLMs. This method, called BTX, is shown to be more effective than training a single specialized LLM or a larger generalist LLM. It works by first training (in parallel) multiple copies of a seed LLM with specialized knowledge in different domains (i.e., expert LLMs), then combining them into a single LLM using MoE feed-forward layers. Finally, the entire unified model is fine-tuned.
  • Large language models surpass human experts in predicting neuroscience results. suggests using BrainBench as a benchmark to assess LLMs’ capacity to forecast neuroscience outcomes; discovers that LLMs outperform experts in forecasting the results of experiments; an LLM that has been modified based on neuroscience literature has been demonstrated to do even better.
  • Uni-SMART: Universal Science Multimodal Analysis and Research Transformer. Comprehensive literature analysis faces a problem due to the scientific literature’s constant increase. Because of their ability to summarize, LLMs present a viable option; yet, they are not well-suited to the multimodal aspects that are common in scientific information. Uni-SMART (Universal Science Multimodal Analysis and Research Transformer) was created to fill this vacuum by understanding and analyzing the intricate multimodal data found in scientific publications.
  • Mechanics of Next Token Prediction with Self-Attention. Predicting the next token is a straightforward goal that triggers complex actions. This work discovered that the problem could be divided into two parts: soft composition and hard retrieval. This allowed for good overall performance and in-context learning, and the single self-attention layer was trained using gradient descent.
https://arxiv.org/pdf/2403.05313.pdf
https://arxiv.org/pdf/2403.09629.pdf
https://arxiv.org/pdf/2403.03230.pdf

News

https://arxiv.org/pdf/2403.10301.pdf
https://vision.huji.ac.il/podd/
  • IBM and NASA build language models to make scientific knowledge more accessible. In a new collaboration, IBM and NASA created a suite of efficient language models by training on scientific literature. Based on the transformer architecture, these models can be used in a variety of applications, from classification and entity extraction to question-answering and information retrieval. These models achieve high performance across a variety of domains and can respond promptly. We have open-sourced the models on Hugging Face for the benefit of the scientific and academic community.
  • Introducing RAG 2.0. A technique for adding knowledge to a language model that can become stale is called retrieval augmented generation, or RAG. Unfortunately, outside of demonstrations, the current paradigm of “frozen RAG,” in which just a portion of the pipeline is trained and the model itself is not updated, performs badly. This blog describes the next generation of RAG, where all the components are fine-tuned for the job at hand. In this system, an open model such as Mistral 7B can perform better than the conventional GPT-4 RAG.
  • Fitbit Using Google Gemini for New AI That Could Become Your Fitness Coach. Google is training Gemini on health data, and it’s creating a new AI model for the Fitbit app that can give advice tailored to your needs.
https://sites.google.com/view/chain-of-spot/
  • Stable Diffusion maker leaves Stability AI. Robin Rombach helped build the tech that made Stability AI famous, now he’s leaving the company
  • Introducing Copilot4D: A Foundation Model for Self-Driving. Waabi’s Copilot4D is a ground-breaking foundation model that advances the capabilities of autonomous machines by using LiDAR data to comprehend and forecast the 3D dynamics of the environment across time.
  • NLX Raises $15M in Series A Funding. In March 2024, NLX extended its Series A funding to $15M, adding Comcast Ventures.
  • Triton Puzzles. Triton is an alternative open-source language that allows you to code at a higher level and compile to accelerators like GPU. This set is puzzles is meant to teach you how to use Triton from first principles in an interactive fashion. You will start with trivial examples and build your way up to real algorithms like Flash Attention and Quantized neural networks. These puzzles do not need to run on GPU since they use a Triton interpreter.
https://rlawjdghek.github.io/StableVITON/
https://www.mmlab-ntu.com/project/fresco/

Resources

  • tlm — Local CLI Copilot, powered by CodeLLaMa. tlm is your CLI companion which requires nothing except your workstation. It uses the most efficient and powerful CodeLLaMa in your local environment to provide you with the best possible command line suggestions.
  • Multi-node LLM Training on AMD GPUs. The whole stack of technologies, including schedulers, model training software, and more, that Lamini employs to train models on AMD GPUs is described in this blog article.
https://ai.meta.com/blog/scenescript-3d-scene-reconstruction-reality-labs-research/
  • clarity-upscaler. A state-of-the-art image upscaling tool.
  • musiclang_predict. Music Lang is an API and set of models that generate music.
  • Optimizing Technical Docs for LLMs. Capa.ai provides guidance on how to organize LLM documentation, including how to include troubleshooting FAQs, self-contained code snippets, segmentation into sub-products, and community forum creation.
  • lamini/earnings-calls-qa. This dataset contains transcripts of earning calls for various companies, along with questions and answers related to the companies’ financial performance and other relevant topics.
  • Knowledge Conflicts for LLMs: A Survey. A summary of the prevalent problem of knowledge conflict that arises while working with LLMs; the survey article divides these conflicts into three categories: intra-memory, inter-context, and context-memory conflict. It also offers insights into the sources of these conflicts and possible solutions.
  • Enhancing RAG-based application accuracy by constructing and leveraging knowledge graphs. A practical guide to constructing and retrieving information from knowledge graphs in RAG applications with Neo4j and LangChain
https://arxiv.org/pdf/2403.13315v1.pdf
  • How to Evaluate Your RAG System? Retrieval Augmented Generation (RAG) is a powerful technique that enhances output quality by retrieving relevant context from an external vector database. However, building and evaluating an RAG system can be challenging, especially when it comes to measuring performance. In this post, we’ll explore the most effective metrics for each stage of your RAG pipeline and how to use them to evaluate your whole system.
  • Anthropic Prompt Library. Although Claude 3 has been widely used, these models use a somewhat different prompting technique. Anthropic has compiled a list of user prompts that are effective for a wide range of assignments and subjects.
  • Pretraining 16 language models on different tokenizers. One peculiarity of contemporary language modeling is that the model is not trained until the tokenizer has been trained. The second peculiar truth is that, on vast scales, vocabulary size doesn’t appear to matter all that much.
  • LLM4Decompile. Reverse Engineering: Decompiling Binary Code with Large Language Models
https://www.androidauthority.com/chat-gpt-4-5-turbo-3425326/
  • Under The Hood: How OpenAI’s Sora Model Works. In this blog post, we dive into some of the technical details behind Sora. We also talk about our current thinking around the implications of these video models. Finally, we discuss our thoughts around the compute used for training models like Sora and present projections for how that training compute compares to inference, which has meaningful indications for estimated future GPU demand.
  • Quiet-STaR. A reasoning framework called Quiet-Star enhances language models’ capacity to produce accurate results. An eight-step model per token has been given along with the code.
  • MoE-Adapters4CL. Continual learning can empower vision-language models to continuously acquire new knowledge, without the need for access to the entire historical dataset. Through extensive experiments across various settings, our proposed method consistently outperforms previous state-of-the-art approaches while concurrently reducing parameter training burdens by 60%.
  • LlamaGym. Fine-tune LLM agents with online reinforcement learning
https://www.anthropic.com/news/claude-3-haiku
  • Stylized image binning algorithm. This is a tutorial on utilizing a JavaScript binning method to create an image processing application that looks like pixel art and has customizable interactive web features like sliders. By averaging pixel brightness inside bins, the binning technique transforms photos into stylized, pixelated artwork by utilizing parameters like bin size and spacing. The approach entails efficiently optimizing looping structures and modifying pixel data on HTML canvas components.
  • TorchTune. TorchTune is a native-Pytorch library for easily authoring, fine-tuning and experimenting with LLMs.MVFA-AD.Adapting Visual-Language Models for Generalizable Anomaly Detection in Medical Images

Perspectives

  • What I learned from looking at 900 most popular open source AI tools. By examining the GitHub stars of well-known AI models, we can uncover some fascinating patterns. The majority of open-source AI tools appear to be geared toward apps and infrastructure.
  • LLM inference speed of light. This article explores the “speed of light” theoretical limit for transformer-based language model inference and emphasizes the significance of memory bandwidth over computational power, showing that the ability to read data from memory rather than perform calculations is the primary constraint on inference speed and that this is an important factor to optimize and comprehend the performance of AI.
  • AI is bad/good actually. This article’s author suggests eschewing the nebulous good/bad continuum and instead using terminology like “harmful,” “helpful,” “capable,” and “incapable” to distinguish AI conversations. For them, AI is capable yet possibly dangerous because of unresolved problems like bias exaggeration and copyright infringement. Using these more precise phrases, the author asks readers to explain their own opinions on AI
  • Captain’s log: the irreducible weirdness of prompting AIs. A wealth of free AI and machine learning tools can be found on the new companion website, More Useful Things. These resources highlight the amusing and useful ways in which AI-generated prompts, such as creative scenarios, can surpass human-crafted ones in tasks like solving mathematical puzzles. For more consistent prompting outcomes, the experiment emphasizes the value of adding context, few-shot learning, and chain-of-thought strategies. Though organized prompting is still an evolving art with considerable potential benefits, prompting as a talent may become less important as AI models advance and get better at inferring user intent.
  • AI Prompt Engineering Is Dead, Long live AI prompt engineering. According to recent studies, as AI and machine learning models get better at optimizing their own prompts, human prompt engineers might become outdated. Prompts produced by algorithms can be strange but powerful; they exceed those created by humans and significantly cut down on optimization time. Despite the potential of automatically adjusted prompts, experts predict that the need for occupations related to prompts will change rather than vanish, maybe taking the form of new positions like LLMOps (Large Language Model Operations).
https://sakana.ai/evolutionary-model-merge/
  • The Road to Biology 2.0 Will Pass Through Black-Box Data. This year marks perhaps the zenith of expectations for AI-based breakthroughs in biology, transforming it into an engineering discipline that is programmable, predictable, and replicable. Drawing insights from AI breakthroughs in perception, natural language, and protein structure prediction, we endeavor to pinpoint the characteristics of biological problems that are most conducive to being solved by AI techniques. Subsequently, we delineate three conceptual generations of bio AI approaches in the biotech industry and contend that the most significant future breakthrough will arise from the transition away from traditional “white-box” data, understandable by humans, to novel high-throughput, low-cost AI-specific “black-box” data modalities developed in tandem with appropriate computational methods.
  • ”AI, no ads please”: 4 words to wipe out $1tn. AI poses a huge threat to ad-based platforms by slashing how many ads we see
  • OpenAI’s “Own Goal”. And why it is becoming increasingly difficult to take them at their word
  • What if it isn’t happening, AGI is not coming? No matter what appears to be happening, we always have to consider what if it isn’t. What If LLMs fail to turn into AGIs? Has our quest for intelligence simply unveiled our demonstrable lack thereof? Will trillions of dollars turn unpredictable hallucination machines into reliable universal productivity tools that can do anything?
  • How OpenAI’s text-to-video tool Sora could change science — and society. OpenAI’s debut of its impressive Sora text-to-video tool has raised important questions.
  • Chatbot AI makes racist judgements on the basis of dialect. Some large language models harbor hidden biases that cannot be removed using standard methods.
https://github.com/KhoomeiK/LlamaGym
  • Could AI-designed proteins be weaponized? Scientists lay out safety guidelines. AI tools that can come up with protein structures at the push of a button should be used safely and ethically, say researchers in the field.
  • Three reasons why AI doesn’t model human language. Artificial intelligence (AI) is being used to develop large language models (LLMs) with considerable success. But they should not be seen as being models of how human language works and is acquired.
  • So … you’ve been hacked. Research institutions are under siege from cybercriminals and other digital assailants. How do you make sure you don’t let them in?
  • 8 Google Employees Invented Modern AI. Here’s the Inside Story. They met by chance, got hooked on an idea, and wrote the “Transformers” paper — the most consequential tech breakthrough in recent history.
  • Using LLMs to Generate Fuzz Generators. Claude and other LLMs are capable of producing efficient fuzzes for code parsing, automating a task that has historically required a great deal of human labor. Given that fuzzing is stochastic, LLMs seem to be a good fit for producing fuzzes, even if they are usually not exact enough for static analysis. To find and exploit code vulnerabilities, a hybrid approach that combines targeted fuzzing and LLM-driven static analysis may be promising.
https://docs.anthropic.com/claude/prompt-library
  • First Impressions of Early-Access GPT-4 Fine-Tuning. A few weeks ago we finally got access to the GPT-4 fine-tuning API (in limited early access), and were super excited to check out how well it works. We’d been a user of OpenAI’s fine-tuned models since fine-tuning the original GPT-3 Davinci model first became available.
  • AI and the Future of Work. High Mensa exam scores for Anthropic’s most recent AI, Claude, indicate that self-improving AI is not far off and presents both prospects and existential concerns. As seen at Klarna, where a customer support AI replaced 700 workers, machine learning is already eliminating jobs. This suggests that automation is becoming more and more common. Recent layoffs at Duolingo as a result of AI’s translation capabilities highlight this change and the increasing influence of AI on the nature of work in the future.
  • Two years later, deep learning is still faced with the same fundamental challenges. Gary Marcus revisits his forecasts two years after writing a pessimistic AI paper, and he maintains his original mistrust. Even with breakthroughs like GPT-4, basic problems like true understanding and reliable AI are still unsolved. Marcus draws the conclusion that multidisciplinary cooperation is essential to achieving AGI and that increasing data and processing capacity alone won’t be enough.
  • From 0 to 10 million users in four years.In just four years, the AI-powered writing tool Copy.ai has amassed an amazing 10 million users.

Medium articles

A list of the Medium articles I have read and found the most interesting this week:

Meme of the week

What do you think about it? Some news that captured your attention? Let me know in the comments

If you have found this interesting:

You can look for my other articles, and you can also connect or reach me on LinkedIn. Check this repository containing weekly updated ML & AI news. I am open to collaborations and projects and you can reach me on LinkedIn. You can also subscribe for free to get notified when I publish a new story.

Here is the link to my GitHub repository, where I am collecting code and many resources related to machine learning, artificial intelligence, and more.

or you may be interested in one of my recent articles:

--

--

Salvatore Raieli

Senior data scientist | about science, machine learning, and AI. Top writer in Artificial Intelligence