WEEKLY AI NEWS: RESEARCH, NEWS, RESOURCES, AND PERSPECTIVES
ML news: Week 26 February — 3 March
Apple car is dead, Mistral Large is arrived and META is planning LLaMA-3
The most interesting news, repository, articles, and resources of the week
Check and star this repository where the news will be collected and indexed:
You will find the news first in GitHub. Single posts are also collected here:
Research
- Back to Basics: Revisiting REINFORCE Style Optimization for Learning from Human Feedback in LLMs. The RL approach REINFORCE is straightforward, well-known, and simple to comprehend. In simulators, training steadily is a challenge. In general, PPO is far more reliable and performant. REINFORCE is used by Gemini, and PPO is presumably used by GPT-4.
- AlphaFold Meets Flow Matching for Generating Protein Ensembles. The protein’s post-folding state can be predicted using AlphaFold. Adding invertible flow matching allows you to significantly increase modeling capability throughout the whole protein landscape.
- Not All Experts are Equal: Efficient Expert Pruning and Skipping for Mixture-of-Experts Large Language Models. Researchers have created a new technique that focuses on “expert-level sparsification,” which minimizes model size without sacrificing performance, to make LLMs more effective and user-friendly. For Mixture-of-Experts LLMs, which are strong but typically too large to manage simply, this is very helpful.
- Towards Generalizable Hand-Object Interaction Denoising via Denoising Diffusion. A novel method called GeneOH Diffusion enhances models’ comprehension of and ability to manipulate objects with their hands. The goal of this technique is to improve the naturalness of these interactions by fixing mistakes in hand gestures and object relationships.
- Snap Video: Scaled Spatiotemporal Transformers for Text-to-Video Synthesis. Except for Sora, Snap Research has developed a video creation model that is 3 times faster to run than the prior state of the art.
- OpenCodeInterpreter. By training on a synthetic multi-turn dataset and utilizing human feedback, a model built on CodeLlama and DeepSeek Coder was able to achieve 85%+ on the HumanEval programming benchmark.
- INSTRUCTIR: A Benchmark for Instruction Following of Information Retrieval Models. A new benchmark called INSTRUCTIR aims to improve search engines’ ability to infer users’ intentions. INSTRUCTIR assesses how well search engines can obey user instructions and adjust to different and evolving search needs, in contrast to existing approaches that primarily concentrate on the query itself.
- MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases. In terms of accuracy in jobs involving contacting API functions, Meta’s 350m parameter language model has high reasoning performance, even coming close to Llama 7B. Although the model is not yet available, it is worthwhile to investigate the novelty in fixed parameter models.
- ConceptMath: A Bilingual Concept-wise Benchmark for Measuring Mathematical Reasoning of Large Language Models. A new multilingual benchmark called ConceptMath is used to assess LLMs’ arithmetic proficiency in both Chinese and English. It’s special because it deconstructs arithmetic problems into discrete ideas, enabling a more thorough evaluation of an AI’s mathematical prowess and shortcomings.
- Generate What You Prefer: Reshaping Sequential Recommendation via Guided Diffusion. DreamRec proposed a revolutionary ‘learning-to-generate’ technique for sequential recommendation, whereby it generates a ‘oracle’ item representing the optimal next option for the user, as opposed to the conventional way of identifying user preferences from a mixture of positive and negative things.
- FlowMDM: Seamless Human Motion Composition with Blended Positional Encodings. A novel model called FlowMDM uses text descriptions to create lengthy, continuous sequences of human movements. This groundbreaking diffusion-based model excels in accuracy and realism on important datasets by using Blended Positional Encodings to create realistic motion without the need for additional denoising stages.
- VSP-LLM (Visual Speech Processing incorporated with LLMs). We propose a novel framework, namely Visual Speech Processing incorporated with LLMs (VSP-LLM), to maximize the context modeling ability by bringing the overwhelming power of LLMs. Specifically, VSP-LLM is designed to perform multi-tasks of visual speech recognition and translation, where the given instructions control the type of task.
- Repetition Improves Language Model Embeddings. We present echo embeddings, an embedding strategy designed to address an architectural limitation of autoregressive models: that token embeddings cannot contain information from tokens that appear later in the input. Echo embeddings resolve this issue by repeating the input twice in the input to the embedding model. Our method has strong performance on MTEB and is compatible with many other methods for improving embedding models.
- Range-Agnostic Multi-View Depth Estimation With Keyframe Selection. Multi-View 3D reconstruction techniques process a set of source views and a reference view to yield an estimated depth map for the latter.
- ChatMusician: Understanding and Generating Music Intrinsically with LLM. Adding a modality-specific encoder to a language model is usually necessary for comprehending music. This is unstable and costly. This study demonstrated that tokenizing music into ABC notation significantly boosted music knowledge without affecting basic language proficiency.
- MegaScale: Scaling Large Language Model Training to More Than 10,000 GPUs. Bytedance has produced a system called MegaScale that can be used to train massively parallel large language models. It succeeded in training a 175B LLM on 12,288 GPUs with 55.2% Model FLOP utilization (MFU), which is extremely impressive. Bytedance plans to open source some aspects of the codebase.
- ListT5: Listwise Reranking with Fusion-in-Decoder Improves Zero-shot Retrieval. ListT5 presents a novel reranking technique that not only increases information retrieval precision but also provides a workable solution to the issues that earlier listwise rerankers encountered.
- MobiLlama: Towards Accurate and Lightweight Fully Transparent GPT. Our primary contribution is the introduction of an accurate and fully transparent open-source 0.5 billion (0.5B) parameter SLM, named MobiLlama, catering to the specific needs of resource-constrained computing with an emphasis on enhanced performance with reduced resource demands.
- Accurate LoRA-Finetuning Quantization of LLMs via Information Retention. A novel method called IR-QLoRA improves quantized big language model accuracy, which makes them more appropriate for usage on low-resource devices.
- Video as the New Language for Real-World Decision Making. Incredible research presents video as a possible improvement over current methods for AI to communicate with humans. It demonstrates the usage of video models as environment simulators, planners, agents, and computation engines.
- The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits. A parameter in the majority of language models is represented by 16 bits or more. This produces strong models that may be costly to operate. This study suggests a technique where each parameter is in {-1, 0, 1} and requires 1.58 bits. Performance is precisely matched by this approach up to 3B parameters. Models and codes are not yet available.
- Q-Instruct: Improving Low-level Visual Abilities for Multi-modality Foundation Models. Enhancing multi-modality foundation models such as GPT-4V in low-level visual perception tasks is the main goal of this research. The extensive study collected comments on 18,973 photos from 58,000 people and produced the Q-Pathway dataset for brightness, color, and clarity analysis.
- Graph Diffusion Policy Optimization. The primary objective of this work is to improve multi-modality foundation models, like GPT-4V, in low-level visual perception tasks. The comprehensive study created the Q-Pathway dataset for brightness, color, and clarity analysis by gathering feedback on 18,973 photographs from 58,000 users.
- HiGPT: Heterogeneous Graph Language Model. A method for learning across many heterogeneous graphs without requiring fine-tuning is called HiGPT. It excels at adapting to different data distributions thanks to its integration with a unique graph tokenizer and a large corpus of graph commands.
- PromptMM: Multi-Modal Knowledge Distillation for Recommendation with Prompt-Tuning. PromptMM uses Multi-modal Knowledge Distillation to enhance recommendation systems on sites like Amazon and TikTok. In order to avoid overfitting, it eliminates errors in user preferences and streamlines systems by extracting key characteristics from different kinds of content (textual, audio, or visual).
- Genie: Generative Interactive Environments. We introduce Genie, a foundation world model trained from Internet videos that can generate an endless variety of playable (action-controllable) worlds from synthetic images, photographs, and even sketches.
- UniVS: Unified and Universal Video Segmentation with Prompts as Queries. With a unique prompt-based methodology, UniVS is a unified architecture for video segmentation that addresses the difficulties of diverse segmentation jobs. UniVS removes the requirement for heuristic inter-frame matching by utilizing prompt characteristics as queries and providing a target-wise prompt cross-attention layer. This allows UniVS to adapt to various video segmentation settings with ease.
- Coarse-to-Fine Latent Diffusion for Pose-Guided Person Image Synthesis. With a deep semantic knowledge of pictures, the Coarse-to-Fine Latent Diffusion (CFLD) method avoids overfitting and offers a novel Pose-Guided Person Image Synthesis technique that overcomes the drawbacks of previous models.
- Evaluating Quantized Large Language Models. Large language models like OPT and LLaMA2 can be rendered more compute- and memory-efficient through the use of post-training quantization.
- Representing 3D sparse map points and lines for camera relocalization. With minimal memory and processing power, this study presents a novel method for 3D mapping and localization that processes both point and line information using a lightweight neural network, greatly improving pose accuracy.
- Driving into the Future: Multiview Visual Forecasting and Planning with World Model for Autonomous Driving. Drive-WM can produce high-quality multiview films to forecast future events, allowing self-driving cars to make more intelligent and safe driving choices.
- Do Large Language Models Latently Perform Multi-Hop Reasoning? This study delves into the fascinating world of Large Language Models (LLMs) and their ability to engage in multi-hop reasoning, akin to human thought processes. By crafting intricate prompts like “The mother of the singer of ‘Superstition’ is”, researchers probe how LLMs navigate complex queries. They uncover compelling evidence suggesting that these models can indeed perform multi-hop reasoning, often relying on a bridge entity like Stevie Wonder to connect disparate pieces of information. The findings highlight both the strengths and limitations of LLMs in this regard, offering valuable insights for their future development and application.
News
- Microsoft reportedly makes AI server gear to cut Nvidia dependence. Microsoft is creating its own AI server hardware to intensify actions to decrease its dependency on Nvidia, according to a source familiar with the matter speaking to The Information.
- ‘Embarrassing and wrong’: Google admits it lost control of image-generating AI. Google has apologized (or come very close to apologizing) for another embarrassing AI blunder this week, an image-generating model that injected diversity into pictures with a farcical disregard for historical context. While the underlying issue is perfectly understandable, Google blames the model for “becoming” oversensitive.
- Is OpenAI the next challenger trying to take on Google Search? The Information says OpenAI is working on web search (partially powered by Bing) that would more directly compete with Google. It’s unclear if it would be standalone, or a part of ChatGPT.
- Transformer Circuits Thread — Updates — February 2024. The research experts at Anthropic have been developing a Circuit-based approach to comprehend deep neural networks. These circuits seek to pinpoint model components that are employed in particular applications. Every month, the research team publishes an update on the trials they conducted and the
- A new tool targets voter fraud in Georgia — but is it skirting the law? A tech company supported by Trump’s former lawyer is injecting chaos into the state’s vote-counting process
- Democratic political operative admits he commissioned robocall of AI Biden. Steve Kramer said ‘easy-to-use technology’ enabled him to send automated calls while the New Orleans magician says he was paid $150 to make it
- Mistral Large. Mistral Large is our new cutting-edge text generation model. It reaches top-tier reasoning capabilities. It can be used for complex multilingual reasoning tasks, including text understanding, transformation, and code generation. Mistral Large achieves strong results on commonly used benchmarks, making it the world’s second-ranked model generally available through an API (next to GPT-4)
- Scale AI to set the Pentagon’s path for testing and evaluating large language models . The company will create a comprehensive T&E framework for generative AI within the Defense Department.
- DatologyAI is building tech to automatically curate AI training datasets. Morcos’ company, DatologyAI, builds tooling to automatically curate datasets like those used to train OpenAI’s ChatGPT, Google’s Gemini, and other GenAI models. The platform can identify which data is most important depending on a model’s application (e.g. writing emails), Morcos claims, in addition to ways the dataset can be augmented with additional data and how it should be batched, or divided into more manageable chunks, during model training.
- Bay Bridge: A supercomputer built for startups. With flexible short-term renting options, San Francisco Compute Company is now providing the lowest-cost H100 training clusters in the world to customers who require intensive computing for AI model training but do not want to commit to long-term agreements. Its first cluster, Angel Island, is operational at the moment, and Bay Bridge will follow shortly. The unique business strategy of SF Compute places a premium on cost and accessibility for AI entrepreneurs without requiring long-term commitments.
- mlabonne/AlphaMonarch-7B. AlphaMonarch-7B is a new DPO merge that retains all the reasoning abilities of the very best merges and significantly improves its conversational abilities. Kind of the best of both worlds in a 7B model.
- LazyAxolotl. This notebook allows you to fine-tune your LLMs using Axolotl and Runpod
- Apple’s electric car project is dead. After a decade of work, the company is reportedly giving up on its ambitious effort to create an autonomous electric car.
- Expressive Whole-Body Control for Humanoid Robots. UCSD researchers trained robust, socially inclined, expressive policies for humanoid robots. Their unchoreographed dancing on grass videos is quite amazing.
- Meta plans launch of new AI language model Llama 3 in July, The Information reports. Meta Platforms (META.O), opens a new tab and is planning to release the newest version of its artificial-intelligence large language model Llama 3 in July which would give better responses to contentious questions posed by users, The Information reported on Wednesday.
- Tim Cook Says Apple Will ‘Break New Ground’ in Generative AI. Cook said that the company will “break new ground” in generative AI in 2024. “We believe it will unlock transformative opportunities for our users,” said Cook.
- Elon Musk sues OpenAI accusing it of putting profit before humanity. The lawsuit says chief executive Sam Altman’s deal with Microsoft has broken the organization’s mission
- Figure raises $675M at $2.6B valuation. In order to continue developing humanoid robots, Figure, a robotics startup, has secured $675 million from a number of significant investors, including OpenAI.
Resources
- Pearl — A Production-ready Reinforcement Learning AI Agent Library. Pearl is a new production-ready Reinforcement Learning AI agent library open-sourced by the Applied Reinforcement Learning team at Meta. Pearl enables to development of Reinforcement Learning AI agents.
- Large Language Models for Data Annotation: A Survey. This is a curated list of papers about LLM for Annotation
- Automotive Object Detection with Spiking Neural Networks (SNNs). One novel and effective model for autonomous cars is Spiking Neural Networks. High performance is attained using up to 85% less energy.
- Berkeley function calling leaderboard. When a language model can access resources through synthesized functions to carry out commands, this is known as function calling. To pass to such functions, the parameters must be properly synthesized. The purpose of this leaderboard is to evaluate the model’s performance on function-calling tasks.
- FuseChat. FuseChat is a novel approach to combine the advantages of many huge language models into a single, more potent model without having to pay expensive training fees again.
- ShieldLM. ShieldLM is a bilingual (Chinese and English) safety detector that mainly aims to help detect safety issues in LLM’ generations. It aligns with general human safety standards, supports fine-grained customizable detection rules, and provides explanations for its decisions.
- Enable decision-making based on LLM-based simulations. An open-source project called Simulatrex is dedicated to GABM or generative agent-based modeling. Large language models are used to provide more accurate simulations.
- Training-Free Long-Context Scaling of Large Language Models. Dual chunk attention is a training-free and effective method for extending the context window of large language models (LLMs) to more than 8x times their original pre-training length. We refer to the Llama-based model with dual chunk attention as ChunkLlama.
- DPO to encourage descriptiveness. A minimal code set up with TRL to tune a model to be more descriptive.
- Shape suffixes for ML coding. The readable nature of shapes in tensors is significantly enhanced by a coding style at Character AI.
- Getting started with MAX Developer Edition. To drastically reduce complexity and accelerate AI implementations, Modular developed the MAX toolset. It is currently accessible.
- Bonito. Bonito is an open-source model for conditional task generation: the task of converting unannotated text into task-specific training datasets for instruction tuning. This repo is a lightweight library for Bonito to easily create synthetic datasets built on top of the Hugging Face transformers and vllm libraries.
- Awesome-LLMs-for-Video-Understanding.A selection of helpful resources for comprehending videos with huge language models can be found in this repository.
- Mist text to speech. A new text-to-speech technology called Rime has strong conversational capabilities. This model may incorporate “ums” and realistic pauses, in contrast to earlier ones.
- Add your own Ollama models. Guidelines for contributing your own models to the Ollama repository for public usage.
- 2x speed up HF inference with static KV Cache. Increased inference speed can lead to new use cases. This code proposes a method to accelerate Hugging Face inference using Llama models.
Perspectives
- Sam Altman Wants $7 Trillion.In order to meet the fast-rising costs of developing generative AI models such as GPT, Sam Altman has proposed a $7 trillion budget, indicating an exponential increase in resources required for further iterations. This goal highlights a critical juncture in the development of AI, striking a balance between the quickening pace of scientific improvement and its wider effects on safety and societal preparedness.
- Ten AI Insights from Databricks, Anyscale, and Microsoft. This article features interviews with founders of AI-forward firms, including their perspectives on the emergence of artificial intelligence (AGI), how to approach LLMs and basic strategies for entrepreneurs integrating AI into their products.
- What the EU’s tough AI law means for research and ChatGPT. The EU AI Act is the world’s first major legislation on artificial intelligence and strictly regulates general-purpose models.
- Online images amplify gender bias. We find that gender bias is consistently more prevalent in images than text for both female- and male-typed categories. We also show that the documented underrepresentation of women online is substantially worse in images than in text, public opinion, and US census data.
- ChunkLlama. Dual chunk attention is a training-free and effective method for extending the context window of large language models (LLMs) to more than 8x times their original pre-training length. We refer to the Llama-based model with dual chunk attention as ChunkLlama.
- distilabel. AI Feedback (AIF) framework for building datasets with and for LLMs.
- StarCoder2. StarCoder2–15B model is a 15B parameter model trained on 600+ programming languages from The Stack v2, with opt-out requests excluded.
- The paradox of diffusion distillation. Diffusion models decompose complex issues, such as image production, into numerous smaller issues, such as minimizing a small amount of noise in an image. Single-step diffusion generation has received a lot of attention, however it appears to miss the mark. This article examines the diffusion distillation conundrum and lists the various avenues of inquiry that might be pursued.
Meme of the week
What do you think about it? Some news that captured your attention? Let me know in the comments
If you have found this interesting:
You can look for my other articles, and you can also connect or reach me on LinkedIn. Check this repository containing weekly updated ML & AI news. I am open to collaborations and projects and you can reach me on LinkedIn. You can also subscribe for free to get notified when I publish a new story.
Here is the link to my GitHub repository, where I am collecting code and many resources related to machine learning, artificial intelligence, and more.
or you may be interested in one of my recent articles: