WEEKLY AI NEWS: RESEARCH, NEWS, RESOURCES, AND PERSPECTIVES
ML news: Week 11–17 March
Devin arrives, Google is set to revolutionize search against spam, and much more
The most interesting news, repository, articles, and resources of the week
Check and star this repository where the news will be collected and indexed:
You will find the news first in GitHub. Single posts are also collected here:
Research
- Yi: Open Foundation Models by 01.AI. One of the most potent open language models for a long time has been the Yi model. The group has published a document that offers significant new information about how they gather data and train employees.
- From One to Many: Expanding the Scope of Toxicity Mitigation in Language Models. This research uses translation to enhance safety measures in situations when direct data is not available, so taking on the task of minimizing dangerous material in AI across many languages.
- Plum: Prompt Learning using Metaheuristic. In this research, a broad class of more than 100 discrete optimization techniques known as metaheuristics is presented as a potent tool for enhancing rapid learning in big language models.
- ViewFusion: Towards Multi-View Consistency via Interpolated Denoising. A new technique called ViewFusion aims to enhance the way diffusion models produce images from fresh angles while maintaining the consistency of the images from one view to another.
- Functional Benchmarks for Robust Evaluation of Reasoning Performance, and the Reasoning Gap. reveals that there is a reasoning gap between the current models and the proposed functional benchmarks for evaluating the reasoning abilities of LLMs, ranging from 58.35% to 80.31%. However, the authors also note that these gaps can be closed with more advanced prompting techniques.
- Can Large Language Models Reason and Plan? The subject of thinking and planning for LLMs is covered in a recent position paper. The following is an overview of the author’s findings: In summary, I don’t have any strong evidence from anything I’ve read, checked, or done to suggest that LLMs engage in typical reasoning or planning. Instead, they use web-scale training to perform a type of universal approximate retrieval, which is sometimes confused for reasoning abilities, as I have explained.”
- KnowAgent: Knowledge-Augmented Planning for LLM-Based Agents. we introduce KnowAgent, a novel approach designed to enhance the planning capabilities of LLMs by incorporating explicit action knowledge. Specifically, KnowAgent employs an action knowledge base and a knowledgeable self-learning strategy to constrain the action path during planning, enabling more reasonable trajectory synthesis, and thereby enhancing the planning performance of language agents.
- Stealing Stable Diffusion Prior for Robust Monocular Depth Estimation. The new Stealing Stable Diffusion (SSD) method improves monocular depth estimate performance in challenging settings such as low light or wet ones.
- VideoElevator : Elevating Video Generation Quality with Versatile Text-to-Image Diffusion Models. Using the advantages of text-to-image models, VideoElevator presents a unique method that improves text-to-video diffusion models. Videos with better frame quality and text alignment are produced by dividing the improvement process into two parts: fine-tuning temporal motion and improving spatial quality. This is known as the plug-and-play approach.
- Face2Diffusion for Fast and Editable Face Personalization. Gaussian Splatting is combined with 3D mesh geometry in SplattingAvatar to create vibrant virtual humans, introducing a novel method for producing lifelike virtual humans.
- Stealing Part of a Production Language Model. By leveraging their public APIs, you may obtain parts of closed language models — like the embeddings layer — for free. A simple budget of less than $2,000 may do this.
- Caduceus: Bi-Directional Equivariant Long-Range DNA Sequence Modeling. DNA sequence prediction model developed on the Transformer rival Mamba platform. For a little model, it is incredibly powerful and efficient.
- V3D: Video Diffusion Models are Effective 3D Generators. In order to improve 3D object production, this research presents a revolutionary method that creates detailed, high-quality objects from a single photograph.
- A generalist AI agent for 3D virtual environments. We present new research on a Scalable Instructable Multiworld Agent (SIMA) that can follow natural-language instructions to carry out tasks in a variety of video game settings
- SSM Meets Video Diffusion Models: Efficient Video Generation with Structured State Spaces. By concentrating on linear memory consumption, this study overcomes the memory limitations of conventional attention-based diffusion models and presents a novel method for producing videos using state-space models (SSMs). As tested with the UCF101 and MineRL Navigate datasets, SSMs allow the generation of lengthier video sequences with competitive quality.
- SemCity: Semantic Scene Generation with Triplane Diffusion. SemCity transforms 3D scene production by emphasizing real-world outdoor environments — a problem that is sometimes disregarded because of how difficult and sparse outdoor data may be.
- Branch-Train-MiX: Mixing Expert LLMs into a Mixture-of-Experts LLM. This study demonstrates how to train several models and combine them into a single Mixture-of-Experts model.
- LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code. It is difficult to evaluate language models that have been taught to code. The majority of people utilize OpenAI’s HumanEval. Some open models, nevertheless, appear to overfit this standard. Coding performance may be measured while reducing contamination issues with LiveCodeBench.
- Evil Geniuses: Delving into the Safety of LLM-based Agents.’ Evil Geniuses’ is a virtual squad that researchers utilized in a recent study to examine the safety of LLMs. They discovered that these AI agents are less resistant to malevolent attacks, give more nuanced answers, and make it more difficult to identify improper responses.
- ViT-CoMer: Vision Transformer with Convolutional Multi-scale Feature Interaction for Dense Predictions. In this work, a novel backbone architecture called ViT-CoMer is presented, which improves on Vision Transformers (ViT) for dense prediction tasks without requiring pre-training.
- MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training. Apple just released a multimodal model and discussed how they trained in detail.
News
- From Wait Times to Real-Time: Assort Health Secures $3.5 Million to Scale First Generative AI for Healthcare Call Centers. Solution Erases Long Phone Holds for Patients, Supports Overwhelmed Medical Front Desk Workers and Improves Patient Access to Physicians
- OpenAI announces new members to board of directors. Dr. Sue Desmond-Hellmann, Nicole Seligman, Fidji Simo join; Sam Altman rejoins the board
- So long and thanks for all the pixels: Nvidia reportedly retiring the GTX brand for good. Nvidia has stopped producing GPUs based on its Turing architecture. The last of them included the likes of the GTX 1660, 1650, and 1630 series of GPUs. Once remaining stocks sell, they’ll be gone and with them, the “GTX” brand itself, leaving all Nvidia gaming graphics cards as “RTX” models.
- Google’s upcoming Tensor G4 Chip set to rival Snapdragon 8 Gen 4 and Apple A18 Pro. Let’s say you’re a smartphone manufacturer aiming to develop a new model. You have two options: partner with an established chipmaker like Qualcomm or MediaTek or follow the path of Apple by designing your own custom chipset. Google has taken a similar approach, developing its in-house Tensor processors. Recent information suggests the Pixel 9 will feature the Tensor G4 chipset, promising improved heat and power management for an enhanced user experience.
- Microsoft may debut its first ‘AI PCs’ later this month. A report suggests an OLED Surface Pro 10 and Surface Laptop 6 are imminent.
- Looks like we may now know which OpenAI execs flagged concerns about Sam Altman before his ouster. Two OpenAI execs raised concerns about Sam Altman before his ouster, The New York Times reported. The outlet reported that the company’s chief technology officer, Mira Murati, played a key role. Altman returned as CEO in days, leaving many unanswered questions about what happened.
- Cloudflare announces Firewall for AI. Today, Cloudflare is announcing the development of a Firewall for AI, a protection layer that can be deployed in front of Large Language Models (LLMs) to identify abuses before they reach the models.
- Google announces they are tackling spammy, low-quality content on Search. We’re making algorithmic enhancements to our core ranking systems to ensure we surface the most helpful information on the web and reduce unoriginal content in search results. We’re updating our spam policies to keep the lowest-quality content out of Search, like expired websites repurposed as spam repositories by new owners and obituary spam.
- This week, xAI will open-source Grok. Official tweet of Elon Musk
- Covariant is building ChatGPT for robots. The UC Berkeley spinout says its new AI platform can help robots think more like people. Covariant this week announced the launch of RFM-1 (Robotics Foundation Model 1).
- AI solves huge problem holding back fusion power. Princeton researchers have trained an AI to predict and prevent a common problem arising during nuclear fusion reactions — and they think it might be able to solve other problems, too.
- Midjourney bans all Stability AI employees over alleged data scraping. Midjourney blamed a near 24-hour service outage on ‘botnet-like activity’ from two accounts linked to the Stable Diffusion creator.
- Microsoft compares The New York Times’ claims against OpenAI to Hollywood’s early fight against VCR. Microsoft is helping OpenAI fight back against claims of copyright infringement by The New York Times. The news outlet’s lawsuit, filed in December, seeks to hold Microsoft and OpenAI accountable for billions of dollars in damages. In a court filing on Monday, Microsoft accuses the publisher of “unsubstantiated” claims that the use of OpenAI’s technology is harming its business.
- Introducing Devin, the first AI software engineer. Devin, a new system from Cognition, receives a 14% on the difficult SWE-Bench benchmark, which evaluates AI’s capacity for writing code. GPT-4 received a 1.7% score. This model demonstrates excellent contextual learning skills.
- Building Meta’s GenAI Infrastructure. The Llama 3 training infrastructure is described in this Meta blog article. It covers networking, storage, Pytorch, NCCL, and many enhancements. This will prepare the way for Meta’s H100s to go online throughout the course of the remaining months of this year.
- Physical Intelligence Raises $70M to Build AI-Powered Robots for Any Application. Pi differentiates itself by aiming to create software that can be applied across a wide range of robotics hardware.
- Researchers create AI worms that can spread from one system to another. Worms could potentially steal data and deploy malware. Now, in a demonstration of the risks of connected, autonomous AI ecosystems, a group of researchers has created one of what they claim is the first generative AI worms — which can spread from one system to another, potentially stealing data or deploying malware in the process.
- Perplexity brings Yelp data to its chatbot. Perplexity’s responses can source multiple Yelp reviews for that cafe you were considering, along with location data and other information.
- Gemini now lets you tune and modify responses with a prompt. Google is launching “a more precise way for you to tune Gemini’s responses” on the web app. When selecting (by highlighting) a part of Gemini’s response to your prompt, a pencil/sparkle icon appears to “Modify selected text.” This opens a box with Regenerate, Shorter, Longer, and Remove options, as well as an open text field.
- Microsoft’s neural voice tool for people with speech disabilities arrives later this year. At the Microsoft Ability Summit today, the company is continuing to raise awareness about inclusive design.
- Together AI $106M round of funding. we’ve raised $106M in a new round of financing led by Salesforce Ventures with participation from Coatue, and existing investors.
- Autonomous Vehicle Startup Applied Intuition Hits $6B Valuation After $250M Series E. Autonomous vehicle software developer Applied Intuition locked up a $250 million Series E valuing the company at a $6 billion — a 67% uptick in value from its previous round. The deal comes even as venture funding for autonomous vehicle-related startups has been in decline in recent years.
- OpenAI CTO Says It’s Releasing Sora This Year. But now, OpenAI chief technology officer Mira Murati told the Wall Street Journal that the company will publicly release Sora “later this year.”
- Google now wants to limit the AI-powered search spam it helped create. The ranking update targets sites “created for search engines instead of people.”
- OpenAI Partners With Le Monde And Prisa Media. We have partnered with international news organizations Le Monde and Prisa Media to bring French and Spanish news content to ChatGPT.
- World’s first major act to regulate AI passed by European lawmakers. The European Union’s parliament on Wednesday approved the world’s first major set of regulatory ground rules to govern the mediatized artificial intelligence at the forefront of tech investment.
- Figure 01 can now have full conversations with people. Figure’s robots can now hold in-depth discussions with humans thanks to the integration of OpenAI’s technology. While Figure’s neural networks provide quick, low-level dexterous robot operations, OpenAI’s models offer high-level visual and linguistic intelligence. This X article includes a video of a human conversing with a Figure robot, teaching it how to complete tasks, explaining the rationale behind the tasks, and providing a self-evaluation of the activities’ effectiveness.
- Claude 3 Is The Most Human AI Yet. Claude 3, Anthropic’s latest AI model, is distinguished by its “warmth,” which makes it a reliable collaborator on creative writing assignments. More human-feeling and lifelike, Claude 3 is said to straddle the line between delightful deep contemplation and good thought. Though this subtlety has not been fully captured by technological benchmarks, Claude 3 is set to transform our relationship with AI in creative processes.
Resources
- DeepSpeed-FP6: The Power of FP6-Centric Serving for Large Language Models. A recent upgrade to Microsoft’s robust DeepSpeed training package lets models use up to six bits per parameter. This can expedite inference by a factor of more than two.
- You can now train a 70b language model at home. a fully open-source system that, for the first time, can efficiently train a 70b large language model on a regular desktop computer with two or more standard gaming GPUs (RTX 3090 or 4090). This system, which combines FSDP and QLoRA, is the result of a collaboration between Answer.AI, Tim Dettmers (U Washington), and Hugging Face’s Titus von Koeller and Sourab Mangrulkar.
- Retrieval-Augmented Generation for AI-Generated Content: A Survey. gives a summary of RAG’s application in several generating contexts, such as code, images, and audio, and includes a taxonomy of RAG upgrades along with citations to important works.
- Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models. Based on public technical reports and reverse engineering, this paper presents a comprehensive review of the model’s background, related technologies, applications, remaining challenges, and future directions of text-to-video AI models.
- SaulLM-7B: A pioneering Large Language Model for Law. With 7 billion parameters, SaulLM-7B is the first LLM designed explicitly for legal text comprehension and generation. Leveraging the Mistral 7B architecture as its foundation, SaulLM-7B is trained on an English legal corpus of over 30 billion tokens.
- A Practical Guide to RAG Pipeline Evaluation (Part 1: Retrieval). Retrieval is a critical and complex subsystem of the RAG pipelines. After all, the LLM output is only as good as the information you provide it unless your App relies solely on the training data of the LLM. The core is measuring retrieval is assessing whether each of the retrieved results is relevant for a given query.
- C4AI Command-R. C4AI Command-R is a research release of a 35 billion parameter highly performant generative model. Command-R is a large language model with open weights optimized for a variety of use cases including reasoning, summarization, and question-answering. Command-R has the capability for multilingual generation evaluated in 10 languages and highly performant RAG capabilities.
- Artificial Intelligence Controller Interface (AICI). The Artificial Intelligence Controller Interface (AICI) lets you build Controllers that constrain and direct the output of a Large Language Model (LLM) in real time. Controllers are flexible programs capable of implementing constrained decoding, dynamic editing of prompts and generated text, and coordinating execution across multiple, parallel generations.
- US Public Domain Books (English). This dataset contains more than 650,000 English books (~ 61 billion words) presumed to be in the public domain in the US which were digitized by the Internet Archive and cataloged as part of the Open Library project.
- transformer-debugger. Transformer Debugger (TDB) is a tool developed by OpenAI’s Superalignment team with the goal of supporting investigations into specific behaviors of small language models. The tool combines automated interpretability techniques with sparse autoencoders.
- VideoMamba.VideoMamba is a technology that effectively manages global dependencies and local redundancy to tackle the challenges of video interpretation.
- FastV. FastV is a plug-and-play inference acceleration method for large vision language models relying on visual tokens. It could reach a 45% theoretical FLOP reduction without harming the performance through pruning redundant visual tokens in deep layers.
- Maximizing training throughput using PyTorch FSDP. Together, teams from IBM and Meta have achieved 57% MFU by rapidly training potent models in parallel on huge A100 and H100 clusters.MoAI.MoAI is a new large language and vision model that integrates auxiliary visual data from specific computer vision tasks to improve upon existing models.
- superopenai: logging and caching superpowers for the openai SDK. superopenai is a minimal convenience library for logging and caching LLM requests and responses for visibility and rapid iteration during development.TripoSR.TripoSR, a state-of-the-art open-source model for fast feedforward 3D reconstruction from a single image, was collaboratively developed by Tripo AI and Stability AI.
- Exploring Alternative UX Patterns for GenAI Interfaces. In the rapidly evolving landscape of GenAI interfaces, it is crucial to venture beyond the established norms. The current dominance of Quick Actions and Multi-Turn engagement patterns in these interfaces, while effective in many cases, should not limit our imagination or hinder the potential for innovation.
- rerankers. Rerankers are an important part of any retrieval architecture, but they’re also often more obscure than other parts of the pipeline. rerankers seeks to address this problem by providing a simple API for all popular rerankers, no matter the architecture.
- skyvern.Skyvern automates browser-based workflows using LLMs and computer vision. It provides a simple API endpoint to fully automate manual workflows, replacing brittle or unreliable automation solutions.
- Licensing AI Means Licensing the Whole Economy. Because artificial intelligence is a process that is essential to many different economic uses, it is not possible to regulate it like a physical thing.
- Enhancing RAG-based application accuracy by constructing and leveraging knowledge graphs.A practical guide to constructing and retrieving information from knowledge graphs in RAG applications with Neo4j and LangChain
- You can now train a 70b language model at home. We’re releasing an open-source system, based on FSDP and QLoRA, that can train a 70b model on two 24GB GPUs.
- pricing sheet with all popular token-based pricing providers and the top-performing models. Princing and comparison between different LLMs
Perspectives
- Winning Strategies for Applied AI Companies. Key Success Factors after reviewing over 70 companies that have raised at least $7M
- AI startups require new strategies: This time it’s actually different. The typical dynamics between startups and incumbents do not apply in AI as they did in previous technology revolutions like mobile and the Internet. Ignore this at your peril.
- The GPT-4 barrier has finally been broken. Four weeks ago, GPT-4 remained the undisputed champion: consistently at the top of every key benchmark, but more importantly the clear winner in terms of “vibes”. Today that barrier has finally been smashed. We have four new models, all released to the public in the last four weeks, that are benchmarking near or even above GPT-4.
- Embrace AI to break down barriers in publishing for people who aren’t fluent in English. E. M. Wolkovich describes having a paper rejected because of an unfounded accusation that ChatGPT was used to write it. We think that both the rejection and the bias against the use of artificial intelligence (AI) in scientific writing are misguided.
- Why scientists trust AI too much — and what to do about it. Some researchers see superhuman qualities in artificial intelligence. All scientists need to be alert to the risks this creates.
- The Future of Poetry. Questions about whether poems were authored by humans or artificial intelligence (AI) were given to 38 AI experts and 39 English experts. First prize went to The Human, followed by Bard, ChatGPT-4, and Claude in that order, for both writing quality and the ability to deceive respondents into thinking that the poetry was written by a human. The fact that English specialists were far better at identifying which poems were composed by AI suggests that they should be involved more in the development of upcoming AI systems.
- Barack Obama on AI, free speech, and the future of the internet. The former president joined me on Decoder to discuss AI regulation, the First Amendment, and of course, what apps he has on his home screen.
- AI startups require new strategies: This time it’s actually different. The typical dynamics between startups and incumbents do not apply in AI as they did in previous technology revolutions like mobile and the Internet. Ignore this at your peril.
- Top AIs still fail IQ tests — When asked to read image-based questions. According to recent testing, sophisticated AI models such as ChatGPT-4 and Google’s “Gemini Advanced” do poorly on visual IQ tests, receiving lower-than-average scores. Although ChatGPT-4 exhibits mediocre pattern recognition abilities, it misidentifies objects visually and makes logical mistakes, indicating a considerable difference in comparison to human intellect. These results suggest that the development of universally intelligent AI systems may still be some way off.
- The Top 100 Gen AI Consumer Apps. Over 40% of the top web products are new, having entered the top 50 in the last six months, according to Andreessen Horowitz’s most recent consumer analysis on the top 100 Gen AI consumer apps.
- This Nvidia Cofounder Could Have Been Worth $70 Billion. Instead, He Lives Off The Grid. If Curtis Priem, Nvidia’s first CTO, had held onto all his stock, he’d be the 16th richest person in America. Instead, he sold out years ago and gave most of his fortune to his alma mater Rensselaer Polytechnic Institute.
- How to thrive in a crowded enterprise AI market. At a Lightspeed event, Arvind Jain, CEO of Glean, spoke on the difficulties and solutions facing corporate AI startups. He emphasized the need to provide genuine business value, being tenacious in hiring, and placing a higher priority on product quality than speed and cost. Jain also emphasized how privacy and security issues have slowed down the deployment of generative AI tools in businesses. Glean wants to become a widely used workplace AI platform that completely transforms how people work by becoming firmly integrated into organizational operations.
- As AI tools get smarter, they’re growing more covertly racist, experts find. ChatGPT and Gemini discriminate against those who speak African American Vernacular English, report shows
Medium articles
A list of the Medium articles I have read and found the most interesting this week:
- Mirko Peters, Unlocking the Power of Machine Learning in C# with Top Libraries
- Stephanie Shen, Consciousness: Concepts, Theories, and Neural Networks
- Everton Gomede, PhD, Robust Watershed Transform: Enhancing Image Segmentation in the Digital Era
- Ignacio de Gregorio, LWM, The Dawn of AI World Models?
- Daniel Parris, What’s the Greatest Year in Oscar History? A Statistical Analysis
- Tejaswi kashyap, Deciphering Mixtral-8x7B: Navigating the Sparse Expert Model Ensemble by Mistral AI
- Benjamin Marie, A Simple Introduction to RoPE for Transformers
Meme of the week
What do you think about it? Some news that captured your attention? Let me know in the comments
If you have found this interesting:
You can look for my other articles, and you can also connect or reach me on LinkedIn. Check this repository containing weekly updated ML & AI news. I am open to collaborations and projects and you can reach me on LinkedIn. You can also subscribe for free to get notified when I publish a new story.
Here is the link to my GitHub repository, where I am collecting code and many resources related to machine learning, artificial intelligence, and more.
or you may be interested in one of my recent articles: