WEEKLY AI NEWS: RESEARCH, NEWS, RESOURCES, AND PERSPECTIVES

ML news: Week 22–28 January

Salvatore Raieli
14 min readJan 29, 2024

AI phones are coming, Google Chrome gains AI features and much more

Photo by Christian Lue on Unsplash

The most interesting news, repository, articles, and resources of the week

Check and star this repository where the news will be collected and indexed:

You will find the news first in GitHub. Single posts are also collected here:

Weekly AI and ML news - each week the best of the field

49 stories

Research

  • OMG-Seg: Is One Model Good Enough For All Segmentation? OMG-Seg can handle over ten different segmentation tasks in one framework, including image-level and video-level segmentation tasks, interactive segmentation, and open-vocabulary segmentation. To our knowledge, this is the first model to unify these four directions.
source
source
source
  • Contrastive Preference Optimization: Pushing the Boundaries of LLM Performance in Machine Translation. A different preference optimization method is now being used in machine translation. For this job, it is more data-efficient than DPO. Crucially, the goal prevented the model from suggesting correct but inadequate translations, allowing it to perform competitively on WMT.
  • WARM: On the Benefits of Weight Averaged Reward Models. In RLHF, reward models are employed to simulate human desire; nevertheless, the model that is being aligned frequently “hacks the reward” and performs poorly. The resultant aligned model is favored 79% of the time over one aligned with a single reward model. This is achieved by combining numerous reward models that maintain a linear mode connection. Although model merging may be merely regularization, it has shown to be an effective training phase for the general language model pipeline and has performed fairly well in general models.
  • Benchmarking Large Multimodal Models against Common Corruptions. This technical study introduces MMCBench, a new benchmark created to evaluate large multimodal models’ (LMMs) consistency and dependability on a variety of tasks, including text-to-image and speech-to-text. It covers more than 100 well-known models with the goal of helping readers better comprehend how various AI systems function in practical situations.
source
source

News

source
source

Resources

  • nanotron. The objective of this library is to provide easily distributed primitives in order to train a variety of models efficiently using 3D parallelism.
  • DataTrove. DataTrove is a library to process, filter, and deduplicate text data at a very large scale. It provides a set of prebuilt commonly used processing blocks with a framework to easily add custom functionality.
source
  • CaptionIMG. A Simple program is written in Python to manually caption your images (or any other file types) so you can use them for AI training. I use it for Dreambooth training (StableDiffusion).
  • AI Toolkit. AI Toolkit is a header-only C++ library that provides tools for building the brain of your game’s NPCs.
  • Face Mixer Diffusion. This piece demonstrates how to clone faces in photos using diffusion. Although there are other methods for creating deep fakes, diffusion is intriguing since it allows for the necessary inpainting of other image elements.
  • Self-Rewarding Language Model. Implementation of the training framework proposed in the Self-Rewarding Language Model, from MetaAI
  • snorkelai/Snorkel-Mistral-PairRM-DPO. A powerful new Mistral tune that creates a DPO-compatible dataset by cleverly using poor supervision and synthetic data. Numerous iterations of the described procedure can be used for a broad range of corporate use cases.
  • nanoColBERT. ColBERT is a powerful late-interaction model that can perform both retrieval and reranking.
  • RPG-DiffusionMaster. RPG is a powerful training-free paradigm that can utilize proprietary MLLMs (e.g., GPT-4, Gemini-Pro) or open-source local MLLMs (e.g., miniGPT-4) as the prompt reception and region planner with our complementary regional diffusion to achieve SOTA text-to-image generation and editing. Our framework is very flexible and can generalize to arbitrary MLLM architectures and diffusion backbones.
source
  • Matrix Multiplication: Optimizing the code from 6 hours to 1 sec. A brief read about matrix multiplication optimizations particular to certain hardware and a generic procedure to accelerate AI programs.
  • SyncTalk: Mastering Realism in Talking Head Videos. A significant advancement in realistic talking head videos is SyncTalk. It solves earlier problems with lip motions, expressions, and facial identity synchronization.
  • Hallucination Leaderboard. Public LLM leaderboard computed using Vectara’s Hallucination Evaluation Model. This evaluates how often an LLM introduces hallucinations when summarizing a document. We plan to update this regularly as our model and the LLMs get updated over time.
  • Embedding English Wikipedia in under 15 minutes. Modal provides a serverless solution for organizations grappling with scaling workloads. Modal’s technology enables rapid scaling across many GPUs, which we can use to run large-scale workloads, such as generating embeddings for a massive text dataset, at lightning speed.
  • Concrete Steps to Get Started in Transformer Mechanistic Interpretability. Among the founders of Mechanistic Interpretability (MI) is Neel Nanda. This serves as his entry guide into the industry. It has two hundred specific open-ended questions. The research of language models’ quantitative values, or MI, involves actually examining neurons. Even though there hasn’t been much progress in this area of study yet, it is accessible because it doesn’t demand a lot of processing power.
  • The Song Describer Dataset: a Corpus of Audio Captions for Music-and-Language Evaluation. SDD contains ~1.1k captions for 706 permissively licensed music recordings. It is designed for use in the evaluation of models that address music-and-language (M&L) tasks such as music captioning, text-to-music generation, and music-language retrieval.
  • DiffMoog: A Modular Differentiable Commercial-like Synthesizer. This repo contains the implementation of DiffMoog, a differential, subtractive, modular synthesizer, incorporating standard architecture and sound modules commonly found in commercial synthesizers.
  • TensorDict. TensorDict is a dictionary-like class that inherits properties from tensors, such as indexing, shape operations, casting to device, or point-to-point communication in distributed settings. The main purpose of TensorDict is to make code bases more readable and modular by abstracting away tailored operations
source
  • Evaluation Metrics for LLM Applications In Production. How to measure the performance of LLM applications without ground truth data.
  • Asynchronous Local-SGD Training for Language Modeling. This repository contains a Colab notebook that presents a minimal toy example replicating the observed optimization challenge in asynchronous Local-SGD. The task is to perform classification on a mixture of mixtures of Gaussian data.
  • SpeechGPT: Speech Large Language Models. A novel speech synthesis model called SpeechGPT-Gen effectively manages the intricacies of language and voice traits.
  • LLM Steer. A Python module to steer LLM responses towards a certain topic/subject and to enhance capabilities (e.g., making it provide correct responses to tricky logical puzzles more often). A practical tool for using activation engineering by adding steer vectors to different layers of a Large Language Model (LLM). It should be used along with the Transformers library.
  • RoMa: A lightweight library to deal with 3D rotations in PyTorch. RoMa (which stands for Rotation Manipulation) provides differentiable mappings between 3D rotation representations, mappings from Euclidean to rotation space, and various utilities related to rotations. It is implemented in PyTorch and aims to be an easy-to-use and reasonably efficient toolbox for Machine Learning and gradient-based optimization.
  • AgentBoard: An Analytical Evaluation Board of Multi-Turn LLM Agent. AgentBoard is a benchmark designed for multi-turn LLM agents, complemented by an analytical evaluation board for detailed model assessment beyond final success rates. The main Performance of different LLMs across various environments are shown below, please check our Results for more details.
  • makeMoE: Implement a Sparse Mixture of Experts Language Model from Scratch. This blog walks through implementing a sparse mixture of experts language model from scratch. This is inspired by and largely based on Andrej Karpathy’s project ‘makemore’ and borrows a number of reusable components from that implementation.
source

Perspectives

  • Text-to-Video: The Task, Challenges and the Current State. Text-to-video is next in line in the long list of incredible advances in generative models. How do these models work, how do they differ from text-to-image models, and what kind of performance can we expect from them?
  • My AI Timelines Have Sped Up (Again). In light of developments in scaling up models, the author updated their forecasts for the AI timetable. As of right now, they predict that artificial general intelligence will be achieved with a 10% probability by 2028 and a 50% likelihood by 2045. The efficacy of massive language models and the knowledge that numerous intelligent capabilities may arise at scale are credited with these changes.
  • Should The Future Be Human? Elon Musk and Larry Page have a deep disagreement over the possible risks associated with artificial intelligence. Page has called Musk a “speciesist” for favoring humans over digital life forms, which has caused a gap in their friendship. This demonstrates the necessity for careful and deliberate development of AI technology and reflects the larger discussion on the influence of AI, which includes worries about consciousness, individuation, art, science, philosophy, and the potential for mergers between humans and AI.
source
source
  • If AI Were Conscious, How Would We Know?. When discussing AI consciousness, references to Searle’s Chinese Room Thought Experiment and the Turing Test are frequently made. The former examines whether an AI’s conduct can be distinguished from that of a human, while the latter contends that exterior behavior is insufficient to demonstrate consciousness. Given that our knowledge of consciousness in AI is mostly derived from functionalist theories and human experiences, this argument emphasizes how difficult it is to define and identify consciousness in AI.
  • AI today and trends for an AI future. A survey of experts on: How are early adopters using AI today? Where is AI going in 2024?

Meme of the week

What do you think about it? Some news that captured your attention? Let me know in the comments

If you have found this interesting:

You can look for my other articles, and you can also connect or reach me on LinkedIn. Check this repository containing weekly updated ML & AI news. I am open to collaborations and projects and you can reach me on LinkedIn.

Here is the link to my GitHub repository, where I am collecting code and many resources related to machine learning, artificial intelligence, and more.

or you may be interested in one of my recent articles:

--

--

Salvatore Raieli

Senior data scientist | about science, machine learning, and AI. Top writer in Artificial Intelligence