WEEKLY AI NEWS: RESEARCH, NEWS, RESOURCES, AND PERSPECTIVES

ML news: Week 12–18 February

OpenAI releases SORA, Sam Altman seeks trillion for an AI chip, Neuralink first in human

Salvatore Raieli
15 min readFeb 19, 2024
Photo by Annie Spratt on Unsplash

The most interesting news, repository, articles, and resources of the week

Check and star this repository where the news will be collected and indexed:

You will find the news first in GitHub. Single posts are also collected here:

Weekly AI and ML news - each week the best of the field

38 stories

Research

  • Skill Set Optimization: Reinforcing Language Model Behavior via Transferable Skills. It has so far proven difficult to transfer expertise amongst RL agents. An environment-neutral skill set is optimized for this work. Its generalization performance is encouraging.
  • Self-Play Fine-Tuning (SPIN). We propose a new fine-tuning method called Self-Play fine-tuning (SPIN), which starts from a supervised fine-tuned model. At the heart of SPIN lies a self-play mechanism, where the LLM refines its capability by playing against instances of itself. More specifically, the LLM generates its training data from its previous iterations, refining its policy by discerning these self-generated responses from those obtained from human-annotated data.
https://allenai.github.io/sso/
  • Real-World Fluid Directed Rigid Body Control via Deep Reinforcement Learning. ”Box o Flows” addresses the difficulty of replicating complicated fluid dynamics for reinforcement learning (RL) applications by introducing a unique experimental system for testing RL algorithms in dynamic real-world environments. It demonstrates how model-free reinforcement learning algorithms may produce complex behaviors from simple rewards, improve data efficiency through offline reinforcement learning, and open the door to more widespread RL use in complex systems.
  • WebLINX. A collection of 100,000 web-based conversations in conversational format is called Weblinx. It was made available to advance research on web-based navigation guided by language models.
  • ImplicitDeepfake: Plausible Face-Swapping through Implicit Deepfake Generation using NeRF and Gaussian Splatting. To produce incredibly lifelike 3D avatars, this work presents ImplicitDeepfake1, a novel method that blends deepfake technology with Gaussian Splatting (GS) and Neural Radiance Fields (NeRFs).
https://github.com/uclaml/SPIN
https://arxiv.org/pdf/2402.06102.pdf
https://mcgill-nlp.github.io/weblinx/
  • Text-Driven Image Editing via Learnable Regions. Given an input image and a language description for editing, our method can generate realistic and relevant images without the need for user-specified regions for editing. It performs local image editing while preserving the image context. Our method can also handle multiple-object and long-paragraph scenarios.
  • Video annotator. The annotation process directly incorporates subject experts thanks to the Video Annotator framework. This novel method increases the accuracy and efficiency of the model by combining human expertise with zero-shot and active learning techniques.
  • Automated Unit Test Improvement using Large Language Models at Meta. Meta created tests for its code base using massive language models. It discovered significant gains in overall code quality and test coverage.
  • Meta’s V-JEPA model. According to Yann LeCun, VP and Chief AI Scientist at Meta, more data-efficient self-supervised models are required for general intelligence. This approach, which uses models trained on video to comprehend parts of the world, is a first step in that direction. The models can be accessed by the general public.
  • Extreme Video Compression with Pre-trained Diffusion Models. Diffusion models have been used by researchers to create a novel video compression technique that produces high-quality video frames at low data rates.
https://arxiv.org/pdf/2402.06390v1.pdf

News

  • Laion releases assistant BUD-E. An open assistant that runs on a gaming laptop and utilizes highly optimized language models and natural voice has been made available by the Laion research group. The project’s goal is to offer a capable, low-resource personal assistant that is simple to deploy.
  • OpenAI Hits $2 Billion Revenue Milestone. Microsoft-backed OpenAI hit the $2 billion revenue milestone in December. The company’s annualized revenue topped $1.6 billion in December based on strong growth from its ChatGPT product, up from $1.3 billion as of mid-October, the Information had reported previously.
https://laion.ai/blog/bud-e/
  • DeepMind framework offers a breakthrough in LLMs’ reasoning. A breakthrough approach in enhancing the reasoning abilities of large language models (LLMs) has been unveiled by researchers from Google DeepMind and the University of Southern California. Their new ‘SELF-DISCOVER’ prompting framework — published this week on arXiV and Hugging Face — represents a significant leap beyond existing techniques, potentially revolutionizing the performance of leading models such as OpenAI’s GPT-4 and Google’s PaLM 2.
  • Meta will start detecting and labeling AI-generated images from other companies. The feature will arrive on Facebook, Instagram, and Threads in the coming months
  • Stability and Wurstchen release new text-to-image model. a new text-to-image model building upon the Würstchen architecture. Stable Cascade is exceptionally easy to train and finetune on consumer hardware thanks to its three-stage approach. In addition to providing checkpoints and inference scripts, we are releasing scripts for finetuning, ControlNet, and LoRA training to enable users further to experiment with this new architecture that can be found on the Stability GitHub page.
  • Memory and new controls for ChatGPT. OpenAI is testing a new feature that allows ChatGPT to remember facts across conversations. This can be switched off if desired. It will allow for a higher measure of personalization when interacting with the chat system.
https://twitter.com/elonmusk/status/1752098683024220632
https://venturebeat.com/ai/apple-releases-mgie-a-revolutionary-ai-model-for-instruction-based-image-editing/
https://www.techspot.com/news/101779-meta-start-detecting-labeling-ai-generated-images-other.html
  • Nvidia is now worth as much as the whole Chinese stock market. Nvidia is now worth the same as the whole Chinese stock market as defined by Hong Kong-listed H-shares, Bank of America chief investment strategist Michael Hartnett pointed out in a new note. The company’s market cap has hit $1.7 trillion, the same as all Chinese companies listed on the Hong Kong Stock Exchange. Nvidia’s stock soared 239% in 2023 and is up 41% in 2024, through Thursday.
  • OpenAI Sora. A new video-generating model with amazing quality was revealed by OpenAI. Red teamers are allowed to test it right now.
  • Lambda Raises $320M To Build A GPU Cloud For AI. Lambda’s mission is to build the #1 AI compute platform in the world. To accomplish this, we’ll need lots of NVIDIA GPUs, ultra-fast networking, lots of data center space, and lots of great new software to delight you and your AI engineering team.
  • USPTO says AI models can’t hold patents. The United States Patent and Trademark Office (USPTO) published guidance on inventorship for AI-assisted inventions, clarifying that while AI systems can play a role in the creative process, only natural persons (human beings) who make significant contributions to the conception of an invention can be named as inventors. It also rules out using AI models to churn out patent ideas without significant human input.

Resources

  • RLX: Reinforcement Learning with MLX. RLX is a collection of Reinforcement Learning algorithms implemented based on the implementations from CleanRL in MLX, Apple’s new Machine Learning framework.
  • llmware. llmware is a unified framework for developing LLM-based application patterns including Retrieval Augmented Generation (RAG). This project provides an integrated set of tools that anyone can use — from a beginner to the most sophisticated AI developer — to rapidly build industrial-grade, knowledge-based enterprise LLM applications with a specific focus on making it easy to integrate open-source small specialized models and connecting enterprise knowledge safely and securely.
https://arxiv.org/pdf/2402.07625v1.pdf
  • Point Transformer V3. For processing 3D point clouds, the Point Transformer V3 (PTv3) model is an effective and straightforward paradigm. By putting more of an emphasis on efficiency and scaling up than on fine-grained design details, it can attain quicker processing speeds and improved memory economy.
  • phidata. Phidata is a toolkit for building AI Assistants using function calls. Function calling enables LLMs to achieve tasks by calling functions and intelligently choosing their next step based on the response, just like how humans solve problems.
  • ml-mgie. Apple released code that uses multimodal language models to improve human-provided natural language edits to images.
  • Lag-Llama: Towards Foundation Models for Probabilistic Time Series Forecasting. Lag-Llama is the first open-source foundation model for time series forecasting!
  • Learning to Fly in Seconds. This repository contains the code for the paper Learning to Fly in Seconds. It allows to train end-to-end control policies using deep reinforcement learning. The training is done in simulation and is finished within seconds on a consumer-grade laptop. The trained policies generalize and can be deployed on real quadrotors
  • Packing Inputs Without Cross-Contamination Attention. By concatenating instances, packing in training models can enhance training effectiveness. When examples are handled carelessly, contamination might happen since the focus isn’t sure where to end. Although the community has discovered that EOS is frequent enough, issues can nevertheless arise. This repository offers a Hugging Face implementation for popular models to correctly compress input data.
https://arxiv.org/pdf/2402.07633v1.pdf
  • ZLUDA. ZLUDA lets you run unmodified CUDA applications with near-native performance on AMD GPUs.
  • GenTranslate. A novel method called GenTranslate leverages massive language models to enhance translation quality. The best translations produced by foundational models are the main focus. Tests have shown that the approach performs better than the state-of-the-art translation models.
  • Design2Code. Design2Code is an open-source project that converts various web design formats, including sketches, wireframes, Figma, XD, etc., into clean and responsive HTML/CSS/JS code. Just upload your design image, and Design2Code will automatically generate the code for you. It’s that simple!
  • SGLang. SGLang is a structured generation language designed for large language models (LLMs). It makes your interaction with LLMs faster and more controllable by co-designing the frontend language and the runtime system.
  • DALI. This study presents cutting-edge techniques to guarantee that autonomous intelligent agents — which are essential in applications that depend on life — remain morally and ethically sound even as they develop.
https://largeworldmodel.github.io/
  • Reor Project. Reor is an AI-powered desktop note-taking app: it automatically links related ideas, answers questions on your notes, and provides semantic search. Everything is stored locally and you can edit your notes with an Obsidian-like markdown editor.
  • Dinosaur: differentiable dynamics for global atmospheric modeling. The Google group has made code available to support atmospheric modeling. DeepMind’s latest weather modeling tools are built around this code.
  • Neural Flow. This is a Python script for plotting the intermediate layer outputs of Mistral 7B. When you run the script, it produces a 512x256 image representing the output at every layer of the model. The concept is straightforward: collect the output tensors from each layer, normalize them between zero and one, and plot these values as a heatmap. The resulting image reveals a surprising amount of structure. I have found this enormously helpful for visually inspecting outputs when fine-tuning models.
  • Tabula Rasa: not enough data? Generate them! How you can apply generative AI to tabular data
  • A practical guide to neighborhood image processing. Love thy neighbors: How the neighbors are influencing a pixel

Perspectives

  • AI agents as a new distribution channel. By making judgments about what to buy on behalf of customers, AI agents are starting to emerge as a new route of distribution that might level the playing field between startups and established players. Businesses will need to adjust their goods to cater to AI tastes instead of human ones as this trend develops, which will alter the conventional dynamics of product appraisal, purchase, and discovery. The development of AI portends a time when agent-driven commerce may completely change the way goods are advertised and bought.
  • Thinking about High-Quality Human Data. The topic of this piece is how people generate data. It also covers labeling, annotating, and gathering preference data, among other topics.
  • AI Aesthetics. Artificial Intelligence will radically transform the way we create, appreciate, and produce art. This article delves deeper into this topic and identifies the businesses spearheading the shift.
  • NYC: Brain2Music. Research talk from Google about reading music from a person’s brain.
https://vchitect.github.io/SEINE-project/
  • Massed Muddler Intelligence. A move away from conventional monolithic AI scaling and toward a paradigm based on distributed, agent-based systems that learn and adapt in real-time is represented by the idea of massed muddler intelligence, or MMI. MMI promotes AI development that stresses scalable, interactive agents with a degree of autonomy and mutual governance, moving away from the current focus on accumulating larger datasets and computational resources. This approach is based on the principles of embodiment, boundary intelligence, temporality, and personhood.
  • AI Could Actually Help Rebuild The Middle Class. AI doesn’t have to be a job destroyer. It offers us the opportunity to extend expertise to a larger set of workers.
  • Letter from the YouTube CEO: 4 Big bets for 2024. YouTube is investing in diverse revenue streams for creators. The platform witnessed a 50% increase in the use of channel memberships. It is creating creator support networks through programs like the Creator Collective. Efforts are undertaken to help politicians appreciate and respect the economic and entertainment worth of artists.
https://yuanze-lin.me/LearnableRegions_page/

Meme of the week

What do you think about it? Some news that captured your attention? Let me know in the comments

If you have found this interesting:

You can look for my other articles, and you can also connect or reach me on LinkedIn. Check this repository containing weekly updated ML & AI news. I am open to collaborations and projects and you can reach me on LinkedIn.

Here is the link to my GitHub repository, where I am collecting code and many resources related to machine learning, artificial intelligence, and more.

or you may be interested in one of my recent articles:

--

--

Salvatore Raieli

Senior data scientist | about science, machine learning, and AI. Top writer in Artificial Intelligence