WEEKLY AI NEWS: RESEARCH, NEWS, RESOURCES, AND PERSPECTIVES

ML news: Week 26 February — 3 March

Salvatore Raieli
16 min readMar 4, 2024

Apple car is dead, Mistral Large is arrived and META is planning LLaMA-3

Photo by Javy Luzania on Unsplash

The most interesting news, repository, articles, and resources of the week

Check and star this repository where the news will be collected and indexed:

You will find the news first in GitHub. Single posts are also collected here:

Weekly AI and ML news - each week the best of the field

49 stories

Research

https://arxiv.org/pdf/2402.04845.pdf
  • FlowMDM: Seamless Human Motion Composition with Blended Positional Encodings. A novel model called FlowMDM uses text descriptions to create lengthy, continuous sequences of human movements. This groundbreaking diffusion-based model excels in accuracy and realism on important datasets by using Blended Positional Encodings to create realistic motion without the need for additional denoising stages.
  • VSP-LLM (Visual Speech Processing incorporated with LLMs). We propose a novel framework, namely Visual Speech Processing incorporated with LLMs (VSP-LLM), to maximize the context modeling ability by bringing the overwhelming power of LLMs. Specifically, VSP-LLM is designed to perform multi-tasks of visual speech recognition and translation, where the given instructions control the type of task.
  • Repetition Improves Language Model Embeddings. We present echo embeddings, an embedding strategy designed to address an architectural limitation of autoregressive models: that token embeddings cannot contain information from tokens that appear later in the input. Echo embeddings resolve this issue by repeating the input twice in the input to the embedding model. Our method has strong performance on MTEB and is compatible with many other methods for improving embedding models.
https://opencodeinterpreter.github.io/
https://arxiv.org/pdf/2402.14334v1.pdf
https://arxiv.org/pdf/2402.14905.pdf
  • Graph Diffusion Policy Optimization. The primary objective of this work is to improve multi-modality foundation models, like GPT-4V, in low-level visual perception tasks. The comprehensive study created the Q-Pathway dataset for brightness, color, and clarity analysis by gathering feedback on 18,973 photographs from 58,000 users.
  • HiGPT: Heterogeneous Graph Language Model. A method for learning across many heterogeneous graphs without requiring fine-tuning is called HiGPT. It excels at adapting to different data distributions thanks to its integration with a unique graph tokenizer and a large corpus of graph commands.
  • PromptMM: Multi-Modal Knowledge Distillation for Recommendation with Prompt-Tuning. PromptMM uses Multi-modal Knowledge Distillation to enhance recommendation systems on sites like Amazon and TikTok. In order to avoid overfitting, it eliminates errors in user preferences and streamlines systems by extracting key characteristics from different kinds of content (textual, audio, or visual).
  • Genie: Generative Interactive Environments. We introduce Genie, a foundation world model trained from Internet videos that can generate an endless variety of playable (action-controllable) worlds from synthetic images, photographs, and even sketches.
https://arxiv.org/pdf/2310.20453v1.pdf
https://barquerogerman.github.io/FlowMDM/
  • Do Large Language Models Latently Perform Multi-Hop Reasoning? This study delves into the fascinating world of Large Language Models (LLMs) and their ability to engage in multi-hop reasoning, akin to human thought processes. By crafting intricate prompts like “The mother of the singer of ‘Superstition’ is”, researchers probe how LLMs navigate complex queries. They uncover compelling evidence suggesting that these models can indeed perform multi-hop reasoning, often relying on a bridge entity like Stevie Wonder to connect disparate pieces of information. The findings highlight both the strengths and limitations of LLMs in this regard, offering valuable insights for their future development and application.

News

https://andreaconti.github.io/projects/range_agnostic_multi_view_depth/
  • Transformer Circuits Thread — Updates — February 2024. The research experts at Anthropic have been developing a Circuit-based approach to comprehend deep neural networks. These circuits seek to pinpoint model components that are employed in particular applications. Every month, the research team publishes an update on the trials they conducted and the
  • A new tool targets voter fraud in Georgia — but is it skirting the law? A tech company supported by Trump’s former lawyer is injecting chaos into the state’s vote-counting process
  • Democratic political operative admits he commissioned robocall of AI Biden. Steve Kramer said ‘easy-to-use technology’ enabled him to send automated calls while the New Orleans magician says he was paid $150 to make it
  • Mistral Large. Mistral Large is our new cutting-edge text generation model. It reaches top-tier reasoning capabilities. It can be used for complex multilingual reasoning tasks, including text understanding, transformation, and code generation. Mistral Large achieves strong results on commonly used benchmarks, making it the world’s second-ranked model generally available through an API (next to GPT-4)
https://arxiv.org/pdf/2402.16153.pdf
  • Scale AI to set the Pentagon’s path for testing and evaluating large language models . The company will create a comprehensive T&E framework for generative AI within the Defense Department.
  • DatologyAI is building tech to automatically curate AI training datasets. Morcos’ company, DatologyAI, builds tooling to automatically curate datasets like those used to train OpenAI’s ChatGPT, Google’s Gemini, and other GenAI models. The platform can identify which data is most important depending on a model’s application (e.g. writing emails), Morcos claims, in addition to ways the dataset can be augmented with additional data and how it should be batched, or divided into more manageable chunks, during model training.
  • Bay Bridge: A supercomputer built for startups. With flexible short-term renting options, San Francisco Compute Company is now providing the lowest-cost H100 training clusters in the world to customers who require intensive computing for AI model training but do not want to commit to long-term agreements. Its first cluster, Angel Island, is operational at the moment, and Bay Bridge will follow shortly. The unique business strategy of SF Compute places a premium on cost and accessibility for AI entrepreneurs without requiring long-term commitments.
https://arxiv.org/pdf/2402.15627.pdf
https://arxiv.org/pdf/2402.15838v1.pdf

Resources

https://github.com/mbzuai-oryx/MobiLlama
  • FuseChat. FuseChat is a novel approach to combine the advantages of many huge language models into a single, more potent model without having to pay expensive training fees again.
  • ShieldLM. ShieldLM is a bilingual (Chinese and English) safety detector that mainly aims to help detect safety issues in LLM’ generations. It aligns with general human safety standards, supports fine-grained customizable detection rules, and provides explanations for its decisions.
  • Enable decision-making based on LLM-based simulations. An open-source project called Simulatrex is dedicated to GABM or generative agent-based modeling. Large language models are used to provide more accurate simulations.
  • Training-Free Long-Context Scaling of Large Language Models. Dual chunk attention is a training-free and effective method for extending the context window of large language models (LLMs) to more than 8x times their original pre-training length. We refer to the Llama-based model with dual chunk attention as ChunkLlama.
https://arxiv.org/pdf/2402.17139.pdf
  • DPO to encourage descriptiveness. A minimal code set up with TRL to tune a model to be more descriptive.
  • Shape suffixes for ML coding. The readable nature of shapes in tensors is significantly enhanced by a coding style at Character AI.
  • Getting started with MAX Developer Edition. To drastically reduce complexity and accelerate AI implementations, Modular developed the MAX toolset. It is currently accessible.
  • Bonito. Bonito is an open-source model for conditional task generation: the task of converting unannotated text into task-specific training datasets for instruction tuning. This repo is a lightweight library for Bonito to easily create synthetic datasets built on top of the Hugging Face transformers and vllm libraries.
  • Awesome-LLMs-for-Video-Understanding.A selection of helpful resources for comprehending videos with huge language models can be found in this repository.
https://arxiv.org/pdf/2402.05445v1.pdf
  • Mist text to speech. A new text-to-speech technology called Rime has strong conversational capabilities. This model may incorporate “ums” and realistic pauses, in contrast to earlier ones.
  • Add your own Ollama models. Guidelines for contributing your own models to the Ollama repository for public usage.
  • 2x speed up HF inference with static KV Cache. Increased inference speed can lead to new use cases. This code proposes a method to accelerate Hugging Face inference using Llama models.

Perspectives

  • Sam Altman Wants $7 Trillion.In order to meet the fast-rising costs of developing generative AI models such as GPT, Sam Altman has proposed a $7 trillion budget, indicating an exponential increase in resources required for further iterations. This goal highlights a critical juncture in the development of AI, striking a balance between the quickening pace of scientific improvement and its wider effects on safety and societal preparedness.
  • Ten AI Insights from Databricks, Anyscale, and Microsoft. This article features interviews with founders of AI-forward firms, including their perspectives on the emergence of artificial intelligence (AGI), how to approach LLMs and basic strategies for entrepreneurs integrating AI into their products.
https://arxiv.org/pdf/2311.06783v1.pdf
  • What the EU’s tough AI law means for research and ChatGPT. The EU AI Act is the world’s first major legislation on artificial intelligence and strictly regulates general-purpose models.
  • Online images amplify gender bias. We find that gender bias is consistently more prevalent in images than text for both female- and male-typed categories. We also show that the documented underrepresentation of women online is substantially worse in images than in text, public opinion, and US census data.
  • ChunkLlama. Dual chunk attention is a training-free and effective method for extending the context window of large language models (LLMs) to more than 8x times their original pre-training length. We refer to the Llama-based model with dual chunk attention as ChunkLlama.
https://sites.google.com/view/genie-2024/home
  • distilabel. AI Feedback (AIF) framework for building datasets with and for LLMs.
  • StarCoder2. StarCoder2–15B model is a 15B parameter model trained on 600+ programming languages from The Stack v2, with opt-out requests excluded.
  • The paradox of diffusion distillation. Diffusion models decompose complex issues, such as image production, into numerous smaller issues, such as minimizing a small amount of noise in an image. Single-step diffusion generation has received a lot of attention, however it appears to miss the mark. This article examines the diffusion distillation conundrum and lists the various avenues of inquiry that might be pursued.

Meme of the week

What do you think about it? Some news that captured your attention? Let me know in the comments

--

--

Salvatore Raieli

Senior data scientist | about science, machine learning, and AI. Top writer in Artificial Intelligence