WEEKLY AI NEWS: RESEARCH, NEWS, RESOURCES, AND PERSPECTIVES
ML news: Week 12–18 February
OpenAI releases SORA, Sam Altman seeks trillion for an AI chip, Neuralink first in human
The most interesting news, repository, articles, and resources of the week
Check and star this repository where the news will be collected and indexed:
You will find the news first in GitHub. Single posts are also collected here:
Research
- Skill Set Optimization: Reinforcing Language Model Behavior via Transferable Skills. It has so far proven difficult to transfer expertise amongst RL agents. An environment-neutral skill set is optimized for this work. Its generalization performance is encouraging.
- Self-Play Fine-Tuning (SPIN). We propose a new fine-tuning method called Self-Play fine-tuning (SPIN), which starts from a supervised fine-tuned model. At the heart of SPIN lies a self-play mechanism, where the LLM refines its capability by playing against instances of itself. More specifically, the LLM generates its training data from its previous iterations, refining its policy by discerning these self-generated responses from those obtained from human-annotated data.
- Real-World Fluid Directed Rigid Body Control via Deep Reinforcement Learning. ”Box o Flows” addresses the difficulty of replicating complicated fluid dynamics for reinforcement learning (RL) applications by introducing a unique experimental system for testing RL algorithms in dynamic real-world environments. It demonstrates how model-free reinforcement learning algorithms may produce complex behaviors from simple rewards, improve data efficiency through offline reinforcement learning, and open the door to more widespread RL use in complex systems.
- WebLINX. A collection of 100,000 web-based conversations in conversational format is called Weblinx. It was made available to advance research on web-based navigation guided by language models.
- ImplicitDeepfake: Plausible Face-Swapping through Implicit Deepfake Generation using NeRF and Gaussian Splatting. To produce incredibly lifelike 3D avatars, this work presents ImplicitDeepfake1, a novel method that blends deepfake technology with Gaussian Splatting (GS) and Neural Radiance Fields (NeRFs).
- AutoMathText: Autonomous Data Selection with Language Models for Mathematical Texts. Researchers have created a novel method to improve language models’ mathematical proficiency by letting base models choose excellent mathematical information on their own.
- Complete Instances Mining for Weakly Supervised Instance Segmentation. A novel method for image segmentation has been presented by researchers that uses just simple image labels to identify particular portions of a picture, such as a dog. They overcame the difficulty of a network identifying many occurrences of the same object by presenting an innovative technique that improves efficiency and lowers mistake rates.
- Whispers in the Machine: Confidentiality in LLM-integrated Systems. The increased pairing of huge language models with external technologies has given rise to new vulnerabilities associated with data breaches. This research offers a methodical way to assess various AI systems’ privacy protection efficacy.
- This AI learned the language by seeing the world through a baby’s eyes. An artificial intelligence (AI) model has learned to recognize words such as ‘crib’ and ‘ball’, by studying headcam recordings of a tiny fraction of a single baby’s life. original article.
- World Model on Million-Length Video and Language with RingAttention. This model can correctly respond to queries with a million token video duration using ring attention and an optimized 7B parameter model. It performs exceptionally accurately on retrieval benchmarks and beats commercial VLMs.
- LUMIERE — A Space-Time Diffusion Model for Video Generation. A new text-to-video model from Google can assist in accepting input in the form of images and styles. It diffuses everything simultaneously via a brand-new “space-time UNet.”
- SEINE: Short-to-Long Video Diffusion Model for Generative Transition and Prediction. With the help of textual descriptions, SEINE is a novel video diffusion model that can expand short AI-generated video clips into larger, narrative-level segments with smooth and creative scene transitions.
- Text-Driven Image Editing via Learnable Regions. Given an input image and a language description for editing, our method can generate realistic and relevant images without the need for user-specified regions for editing. It performs local image editing while preserving the image context. Our method can also handle multiple-object and long-paragraph scenarios.
- Video annotator. The annotation process directly incorporates subject experts thanks to the Video Annotator framework. This novel method increases the accuracy and efficiency of the model by combining human expertise with zero-shot and active learning techniques.
- Automated Unit Test Improvement using Large Language Models at Meta. Meta created tests for its code base using massive language models. It discovered significant gains in overall code quality and test coverage.
- Meta’s V-JEPA model. According to Yann LeCun, VP and Chief AI Scientist at Meta, more data-efficient self-supervised models are required for general intelligence. This approach, which uses models trained on video to comprehend parts of the world, is a first step in that direction. The models can be accessed by the general public.
- Extreme Video Compression with Pre-trained Diffusion Models. Diffusion models have been used by researchers to create a novel video compression technique that produces high-quality video frames at low data rates.
News
- Laion releases assistant BUD-E. An open assistant that runs on a gaming laptop and utilizes highly optimized language models and natural voice has been made available by the Laion research group. The project’s goal is to offer a capable, low-resource personal assistant that is simple to deploy.
- OpenAI Hits $2 Billion Revenue Milestone. Microsoft-backed OpenAI hit the $2 billion revenue milestone in December. The company’s annualized revenue topped $1.6 billion in December based on strong growth from its ChatGPT product, up from $1.3 billion as of mid-October, the Information had reported previously.
- AI PCs will make up nearly 60% of total PC shipments by 2027. Demand for AI PCs to start ramping up this year
- The first human received an implant from Neuralink yesterday and is recovering well. Initial results show promising neuron spike detection.
- Reka Flash: An Efficient and Capable Multimodal Language Model. Reka Flash is a state-of-the-art 21B model trained entirely from scratch and pushed to its absolute limits. It serves as the “turbo-class” offering in our lineup of models. Reka Flash rivals the performance of many significantly larger models, making it an excellent choice for fast workloads that require high quality. On a myriad of language and vision benchmarks, it is competitive with Gemini Pro and GPT-3.5.
- Apple releases ‘MGIE’, a revolutionary AI model for instruction-based image editing. Apple has released a new open-source AI model, called “MGIE,” that can edit images based on natural language instructions. MGIE, which stands for MLLM-Guided Image Editing, leverages multimodal large language models (MLLMs) to interpret user commands and perform pixel-level manipulations. The model can handle various editing aspects, such as Photoshop-style modification, global photo optimization, and local editing.
- DeepMind framework offers a breakthrough in LLMs’ reasoning. A breakthrough approach in enhancing the reasoning abilities of large language models (LLMs) has been unveiled by researchers from Google DeepMind and the University of Southern California. Their new ‘SELF-DISCOVER’ prompting framework — published this week on arXiV and Hugging Face — represents a significant leap beyond existing techniques, potentially revolutionizing the performance of leading models such as OpenAI’s GPT-4 and Google’s PaLM 2.
- Meta will start detecting and labeling AI-generated images from other companies. The feature will arrive on Facebook, Instagram, and Threads in the coming months
- Stability and Wurstchen release new text-to-image model. a new text-to-image model building upon the Würstchen architecture. Stable Cascade is exceptionally easy to train and finetune on consumer hardware thanks to its three-stage approach. In addition to providing checkpoints and inference scripts, we are releasing scripts for finetuning, ControlNet, and LoRA training to enable users further to experiment with this new architecture that can be found on the Stability GitHub page.
- Memory and new controls for ChatGPT. OpenAI is testing a new feature that allows ChatGPT to remember facts across conversations. This can be switched off if desired. It will allow for a higher measure of personalization when interacting with the chat system.
- Report: Sam Altman seeking trillions for AI chip fabrication from UAE, others. On Thursday, The Wall Street Journal reported that OpenAI CEO Sam Altman is in talks with investors to raise as much as $5 trillion to $7 trillion for AI chip manufacturing, according to people familiar with the matter. The funding seeks to address the scarcity of graphics processing units (GPUs) crucial for training and running large language models like those that power ChatGPT, Microsoft Copilot, and Google Gemini.
- Meta to deploy in-house custom chips this year to power AI drive. Facebook owner Meta Platforms plans to deploy into its data centers this year a new version of a custom chip aimed at supporting its artificial intelligence (AI) push, according to an internal company document seen by Reuters on Thursday.
- Google Launches €25 Million AI Opportunity Initiative for Skills Training Across Europe. By investing in AI literacy, infrastructure, and partnerships across sectors, the company hopes to empower broad segments of the workforce with valuable future-proof skills.
- The brain area that lights up in prickly people. Those who are quick to take offense show similar levels of activity in a region of the brain that’s crucial for decision-making.
- Disrupting malicious uses of AI by state-affiliated threat actors. OpenAI discovered and terminated accounts affiliated with nation-states using GPT models for malicious cases.
- Andrej Karpathy is leaving OpenAI again — but he says there was no drama. Andrej Karpathy, a widely respected research scientist, announced today that he has left OpenAI. This is the second time Karpathy has left the top AI firm and his departure is not because of any event, issue, or drama, he said.
- NVIDIA’s new AI chatbot runs locally on your PC. NVIDIA just released a free demo version of a chatbot that runs locally on your PC. This is pretty neat, as it gives the chatbot access to your files and documents. You can feed Chat with RTX a selection of personal data and have it create summaries based on that information. You can also ask it questions, just like any chatbot, and dive into your data for answers.
- MAGNeT: Masked Audio Generation using a Single Non-Autoregressive Transformer. Facebook unveiled an advanced open-source audio model that is 7 times quicker than competing models without compromising on quality. It can produce sound effects and music. The manuscript is now accessible.
- MIMIR.Python package for measuring memorization in LLMs.
- Nvidia is now worth as much as the whole Chinese stock market. Nvidia is now worth the same as the whole Chinese stock market as defined by Hong Kong-listed H-shares, Bank of America chief investment strategist Michael Hartnett pointed out in a new note. The company’s market cap has hit $1.7 trillion, the same as all Chinese companies listed on the Hong Kong Stock Exchange. Nvidia’s stock soared 239% in 2023 and is up 41% in 2024, through Thursday.
- OpenAI Sora. A new video-generating model with amazing quality was revealed by OpenAI. Red teamers are allowed to test it right now.
- Lambda Raises $320M To Build A GPU Cloud For AI. Lambda’s mission is to build the #1 AI compute platform in the world. To accomplish this, we’ll need lots of NVIDIA GPUs, ultra-fast networking, lots of data center space, and lots of great new software to delight you and your AI engineering team.
- USPTO says AI models can’t hold patents. The United States Patent and Trademark Office (USPTO) published guidance on inventorship for AI-assisted inventions, clarifying that while AI systems can play a role in the creative process, only natural persons (human beings) who make significant contributions to the conception of an invention can be named as inventors. It also rules out using AI models to churn out patent ideas without significant human input.
Resources
- RLX: Reinforcement Learning with MLX. RLX is a collection of Reinforcement Learning algorithms implemented based on the implementations from CleanRL in MLX, Apple’s new Machine Learning framework.
- llmware. llmware is a unified framework for developing LLM-based application patterns including Retrieval Augmented Generation (RAG). This project provides an integrated set of tools that anyone can use — from a beginner to the most sophisticated AI developer — to rapidly build industrial-grade, knowledge-based enterprise LLM applications with a specific focus on making it easy to integrate open-source small specialized models and connecting enterprise knowledge safely and securely.
- Point Transformer V3. For processing 3D point clouds, the Point Transformer V3 (PTv3) model is an effective and straightforward paradigm. By putting more of an emphasis on efficiency and scaling up than on fine-grained design details, it can attain quicker processing speeds and improved memory economy.
- phidata. Phidata is a toolkit for building AI Assistants using function calls. Function calling enables LLMs to achieve tasks by calling functions and intelligently choosing their next step based on the response, just like how humans solve problems.
- ml-mgie. Apple released code that uses multimodal language models to improve human-provided natural language edits to images.
- Lag-Llama: Towards Foundation Models for Probabilistic Time Series Forecasting. Lag-Llama is the first open-source foundation model for time series forecasting!
- Learning to Fly in Seconds. This repository contains the code for the paper Learning to Fly in Seconds. It allows to train end-to-end control policies using deep reinforcement learning. The training is done in simulation and is finished within seconds on a consumer-grade laptop. The trained policies generalize and can be deployed on real quadrotors
- Packing Inputs Without Cross-Contamination Attention. By concatenating instances, packing in training models can enhance training effectiveness. When examples are handled carelessly, contamination might happen since the focus isn’t sure where to end. Although the community has discovered that EOS is frequent enough, issues can nevertheless arise. This repository offers a Hugging Face implementation for popular models to correctly compress input data.
- ZLUDA. ZLUDA lets you run unmodified CUDA applications with near-native performance on AMD GPUs.
- GenTranslate. A novel method called GenTranslate leverages massive language models to enhance translation quality. The best translations produced by foundational models are the main focus. Tests have shown that the approach performs better than the state-of-the-art translation models.
- Design2Code. Design2Code is an open-source project that converts various web design formats, including sketches, wireframes, Figma, XD, etc., into clean and responsive HTML/CSS/JS code. Just upload your design image, and Design2Code will automatically generate the code for you. It’s that simple!
- SGLang. SGLang is a structured generation language designed for large language models (LLMs). It makes your interaction with LLMs faster and more controllable by co-designing the frontend language and the runtime system.
- DALI. This study presents cutting-edge techniques to guarantee that autonomous intelligent agents — which are essential in applications that depend on life — remain morally and ethically sound even as they develop.
- Reor Project. Reor is an AI-powered desktop note-taking app: it automatically links related ideas, answers questions on your notes, and provides semantic search. Everything is stored locally and you can edit your notes with an Obsidian-like markdown editor.
- Dinosaur: differentiable dynamics for global atmospheric modeling. The Google group has made code available to support atmospheric modeling. DeepMind’s latest weather modeling tools are built around this code.
- Neural Flow. This is a Python script for plotting the intermediate layer outputs of Mistral 7B. When you run the script, it produces a 512x256 image representing the output at every layer of the model. The concept is straightforward: collect the output tensors from each layer, normalize them between zero and one, and plot these values as a heatmap. The resulting image reveals a surprising amount of structure. I have found this enormously helpful for visually inspecting outputs when fine-tuning models.
- Tabula Rasa: not enough data? Generate them! How you can apply generative AI to tabular data
- A practical guide to neighborhood image processing. Love thy neighbors: How the neighbors are influencing a pixel
Perspectives
- AI agents as a new distribution channel. By making judgments about what to buy on behalf of customers, AI agents are starting to emerge as a new route of distribution that might level the playing field between startups and established players. Businesses will need to adjust their goods to cater to AI tastes instead of human ones as this trend develops, which will alter the conventional dynamics of product appraisal, purchase, and discovery. The development of AI portends a time when agent-driven commerce may completely change the way goods are advertised and bought.
- Thinking about High-Quality Human Data. The topic of this piece is how people generate data. It also covers labeling, annotating, and gathering preference data, among other topics.
- AI Aesthetics. Artificial Intelligence will radically transform the way we create, appreciate, and produce art. This article delves deeper into this topic and identifies the businesses spearheading the shift.
- NYC: Brain2Music. Research talk from Google about reading music from a person’s brain.
- Massed Muddler Intelligence. A move away from conventional monolithic AI scaling and toward a paradigm based on distributed, agent-based systems that learn and adapt in real-time is represented by the idea of massed muddler intelligence, or MMI. MMI promotes AI development that stresses scalable, interactive agents with a degree of autonomy and mutual governance, moving away from the current focus on accumulating larger datasets and computational resources. This approach is based on the principles of embodiment, boundary intelligence, temporality, and personhood.
- AI Could Actually Help Rebuild The Middle Class. AI doesn’t have to be a job destroyer. It offers us the opportunity to extend expertise to a larger set of workers.
- Letter from the YouTube CEO: 4 Big bets for 2024. YouTube is investing in diverse revenue streams for creators. The platform witnessed a 50% increase in the use of channel memberships. It is creating creator support networks through programs like the Creator Collective. Efforts are undertaken to help politicians appreciate and respect the economic and entertainment worth of artists.
- Meta’s AI Chief Yann LeCun on AGI, Open-Source, and AI Risk. Ahead of the award ceremony in Dubai, LeCun sat down with TIME to discuss the barriers to achieving “artificial general intelligence” (AGI), the merits of Meta’s open-source approach, and what he sees as the “preposterous” claim that AI could pose an existential risk to the human race.
- Deepfakes, trolls and cybertroopers: how social media could sway elections in 2024. Faced with data restrictions and harassment, researchers are mapping out fresh approaches to studying social media’s political reach.
- Why “Chat over Your Data” Is Harder Than You Think. Contrary to popular belief, developing chat-based, domain-specific LLM applications and copilots is challenging. Achieving strong performance, managing intricate queries and data, and providing robust data retrieval for LLM-based chat apps are a few of the difficulties.
Meme of the week
What do you think about it? Some news that captured your attention? Let me know in the comments
If you have found this interesting:
You can look for my other articles, and you can also connect or reach me on LinkedIn. Check this repository containing weekly updated ML & AI news. I am open to collaborations and projects and you can reach me on LinkedIn.
Here is the link to my GitHub repository, where I am collecting code and many resources related to machine learning, artificial intelligence, and more.
or you may be interested in one of my recent articles: