WEEKLY AI NEWS: RESEARCH, NEWS, RESOURCES, AND PERSPECTIVES

ML news: Week 4–10 March

Musk sues OpenAI (which fights back), Claude 3 and LeChat released, and much more

Salvatore Raieli
17 min readMar 11, 2024
Photo by the blowup on Unsplash

The most interesting news, repository, articles, and resources of the week

Check and star this repository where the news will be collected and indexed:

You will find the news first in GitHub. Single posts are also collected here:

Weekly AI and ML news - each week the best of the field

49 stories

Research

https://arxiv.org/pdf/2310.05869.pdf
https://arxiv.org/pdf/2307.13854.pdf
  • Evo: Long-context modeling from molecular to genome scale. Introducing Evo, a long-context biological foundation model based on the StripedHyena architecture that generalizes across the fundamental languages of biology: DNA, RNA, and proteins. Evo is capable of both prediction tasks and generative design, from molecular to whole genome scale (over 650k tokens in length). Evo is trained at a nucleotide (byte) resolution, on a large corpus of prokaryotic genomic sequences covering 2.7 million whole genomes.
  • Resonance RoPE: Improving Context Length Generalization of Large Language Models. To assist LLMs in comprehending and producing text in longer sequences than they were first trained on, researchers have created a new method dubbed Resonance RoPE. By using less processing power, our approach outperforms the current Rotary Position Embedding (RoPE) technique and improves model performance on lengthy texts.
  • The All-Seeing Project V2: Towards General Relation Comprehension of the Open World. The All-Seeing Project V2 introduces the ASMv2 model, which blends text generation, object localization, and understanding the connections between objects in images.
  • GPQA: A Graduate-Level Google-Proof Q&A Benchmark. A formidable task is offered by a new dataset named GPQA, which has 448 difficult multiple-choice questions covering physics, chemistry, and biology. Even domain specialists have difficulty — they only score about 65% accuracy — while non-experts only get 34%. Only 39% of advanced AI systems, such as GPT-4, are accurate. The goal of this dataset is to provide techniques for monitoring AI results in challenging scientific problems.
  • SURE: SUrvey REcipes for building reliable and robust deep networks. SURE is a revolutionary strategy that integrates multiple approaches to increase the accuracy of deep neural network uncertainty predictions, particularly for image classification applications.
https://shi-labs.github.io/Smooth-Diffusion/
  • Stable Diffusion 3: Research Paper. Stable Diffusion 3 outperforms state-of-the-art text-to-image generation systems such as DALL·E 3, Midjourney v6, and Ideogram v1 in typography and prompt adherence, based on human preference evaluations. Our new Multimodal Diffusion Transformer (MMDiT) architecture uses separate sets of weights for image and language representations, which improves text understanding and spelling capabilities compared to previous versions of SD3.
  • Researchy Questions: A Dataset of Multi-Perspective, Decompositional Questions for LLM Web Agents. These days, language models are quite good at responding to queries. As a result, the majority of benchmarks in use today are saturated. ‘Researchy’ questions are a new breed of open-ended questions that call for several steps to complete. The source of this specific dataset is search engine queries. It includes instances where GPT-4 had trouble responding to questions.
  • UniCtrl: Improving the Spatiotemporal Consistency of Text-to-Video Diffusion Models via Training-Free Unified Attention Control. A novel method for improving motion quality and semantic coherence in films produced by text-to-video models is presented by UniCtrl. Employing motion injection and cross-frame self-attention approaches enhances video coherence and realism without requiring further training.
https://arxiv.org/pdf/2402.19231v1.pdf
https://arxiv.org/pdf/2402.15809.pdf
  • 3D Diffusion Policy. DP3 presents a novel method for imitation learning that effectively teaches robots difficult abilities by fusing diffusion strategies with 3D visual data.
  • Co-LLM: Learning to Decode Collaboratively with Multiple Language Models. Using an innovative approach, multiple huge language models can collaborate by alternately producing text token by token. With the use of this tactic, models are better able to apply their distinct advantages and areas of competence to a variety of activities, including following instructions, answering questions related to a given domain, and solving reasoning-based problems.

News

https://arxiv.org/pdf/2402.19273.pdf
  • Amazon to spend $1 billion on startups that combine AI with robots. Amazon’s $1 billion industrial innovation fund is to step up investments in companies that combine artificial intelligence and robotics, as the e-commerce giant seeks to drive efficiencies across its logistics network.
  • Claude 3 released. Three new Claude 3 family models have been trained by Anthropic, the best of which achieves benchmark scores that GPT4 has publicly disclosed. It excels at visual tasks and is a multimodal model as well. Claude’s coding skills have significantly improved with this version, which is significant.
  • ChatGPT can read its answers out loud. OpenAI’s new Read Aloud feature for ChatGPT could come in handy when users are on the go by reading its responses in one of five voice options out loud to users. It is now available on both the web version of ChatGPT and the iOS and Android ChatGPT apps.
  • Adobe reveals a GenAI tool for music. Adobe unveiled Project Music GenAI Control, a platform that can generate audio from text descriptions (e.g. “happy dance,” “sad jazz”) or a reference melody and let users customize the results within the same workflow.
https://www.together.ai/blog/evo
https://arxiv.org/pdf/2402.19474v1.pdf
  • Brave’s Leo AI assistant is now available to Android users. Brave is launching its AI-powered assistant, Leo, to all Android users. The assistant allows users to ask questions, translate pages, summarize pages, create content, and more. The Android launch comes a few months after Brave first launched Leo on desktop. Brave says Leo will be available on iOS devices in the coming weeks.
  • Inflection-2.5.A new model has been introduced by Inflection to power Pi, its personal assistant. The model achieves remarkable reasoning scores on benchmarks and performs within 94% of the GPT-4. In comparison to GPT-4, Inflection claims that training only required 40% of the computing. This post offers an intriguing discovery: a typical conversation with Pi lasts 33 minutes.
https://www.theguardian.com/us-news/2024/mar/04/trump-ai-generated-images-black-voters

Resources

  • Using Claude 3 Opus for video summarization. Andrej Karpathy challenged me to write a blog article based on one of his latest videos in a lengthy context. This job was completed by Claude 3 with assistance from some pre-processing data. The end product is an excellent and captivating blog post.
  • Dual-domain strip attention for image restoration. A new technique that greatly enhances image restoration tasks is the dual-domain strip attention mechanism.
  • Open-Sora-Plan. This project aims to reproducing Sora (Open AI T2V model), but we only have limited resources. We deeply wish the all open-source community can contribute to this project.
  • ML system design: 300 case studies to learn from. We put together a database of 300 case studies from 80+ companies that share practical ML use cases and learnings from designing ML systems.
  • orca-math-word-problems-200k . This dataset contains ~200K grade school math word problems. All the answers in this dataset are generated using Azure GPT4-Turbo. Please refer to Orca-Math: Unlocking the Potential of SLMs in Grade School Math for details about the dataset construction.
  • mlx-swift-examples. Apple created the MLX framework, which is used to train AI models on Macs. This repository demonstrates how to use Swift for model training on mobile devices. An MNIST classifier model can be trained with just one on an iPhone.
  • Text Clustering. A free and open-source text clustering tool that makes it simple and rapid to embed, cluster, and semantically label clusters. On 100k samples, the full pipeline runs in 10 minutes.EasyLM.Large language models (LLMs) made easy, EasyLM is a one-stop solution for pre-training, finetuning, evaluating, and serving LLMs in JAX/Flax. EasyLM can scale up LLM training to hundreds of TPU/GPU accelerators by leveraging JAX’s pjit functionality.
  • You can now train a 70b language model at home. Today, we’re releasing Answer.AI’s first project: a fully open-source system that, for the first time, can efficiently train a 70b large language model on a regular desktop computer with two or more standard gaming GPUs (RTX 3090 or 4090). This system, which combines FSDP and QLoRA, is the result of a collaboration between Answer.AI, Tim Dettmers (U Washington), and Hugging Face’s Titus von Koeller and Sourab Mangrulkar
https://www.anthropic.com/news/claude-3-family

Perspectives

  • On the Societal Impact of Open Foundation Models. a position paper that centers on open foundation models and discusses their advantages, disadvantages, and effects; it also suggests a framework for risk analysis and clarifies why, in certain situations, the marginal risk of these models is low. Finally, it provides a more sober evaluation of the open foundation models’ effects on society.
  • Towards Long Context RAG. The amazing one-million-word context window that Google’s Gemini 1.5 Pro has brought to the AI community has sparked a debate regarding the future viability of retrieval-augmented generation (RAG).
  • Aggregator’s AI Risk. The impact of the Internet, especially through Aggregators like Google and Meta, is comparable to that of the printing press on the spread of knowledge and the establishment of nation-states. However, the rise of generative AI puts the Aggregator model to the test by offering unique solutions that represent ingrained worldviews. This could undermine the Aggregator economics’s universal appeal and point to the need for a move toward personalized AI in order to preserve its dominance.
  • Is Synthetic Data the Key to AGI? The caliber of training data has a major impact on how effective large language models are. By 2027, projections indicate that there will be a shortage of high-quality data. A possible answer to this problem is synthetic data generation, which could change internet business models and emphasize the significance of fair data access and antitrust laws.
  • AI Research Internship Search as a CS PhD Student.Tips and thoughts from my relatively successful summer research internship hunt during third-year Computer Science PhD study.
  • How AI Could Disrupt Hollywood. New platforms and tools may allow a person to create a feature-length film from their living room. But can they really compete with the studios?
https://openai.com/blog/openai-elon-musk
  • Training great LLMs entirely from ground zero in the wilderness as a startup. Reka’s creator and well-known GPU critic Yi Tay detailed their experience building very powerful language models outside of Google in a blog post. The primary obstacles stem from hardware instability and cluster issues. They also had difficulties with software maturity.
  • Claude 3 Is The Most Human AI Yet. Anthropic’s Claude 3, a large language model similar to GPT-4, is notable not so much for its cost-effectiveness or benchmark test results as for its distinctly human-like, creative, and naturalistic interaction quality. This represents a major breakthrough in AI’s capacity to collaborate imaginatively with writers.
  • Licensing AI Means Licensing the Whole Economy.AI is a vast process employing statistical approaches, and it would be impractical to control its use across all organizations. Therefore, regulating AI like a tangible commodity is incorrect. Given AI’s imminent economic ubiquity, targeted regulation for particular misuses — akin to current strategies for programming or email abuses — is more successful.
  • Is ChatGPT making scientists hyper-productive? The highs and lows of using AI. Large language models are transforming scientific writing and publishing. However, the productivity boost that these tools bring could have a downside.
https://techcrunch.com/2024/02/29/braves-leo-ai-assistant-is-now-available-to-android-users/
  • Artificial intelligence and illusions of understanding in scientific research. Why are AI tools so attractive and what are the risks of implementing them across the research pipeline? Here we develop a taxonomy of scientists’ visions for AI, observing that their appeal comes from promises to improve productivity and objectivity by overcoming human shortcomings.
  • AI will likely increase energy use and accelerate climate misinformation — report. Claims that artificial intelligence will help solve the climate crisis are misguided, warns a coalition of environmental groups
  • We Need Self-Driving Cars. Anyone rooting against self-driving cars is cheering for tens of thousands of deaths, year after year. We shouldn’t be burning self-driving cars in the streets. We should be celebrating…Subprime Intelligence.Significant problems in OpenAI’s Sora demonstrate the limitations of generative AI’s comprehension. The technology presents both practical obstacles and revolutionary possibilities, as seen by its high computing needs and potential impact on the creative industry.
https://inflection.ai/inflection-2-5
  • Sora, Groq, and Virtual Reality. A few years ago, Facebook’s drive into the metaverse looked misguided, and the idea of the metaverse appeared like fiction from Ernest Cline’s novel. Things feel different now. Groq’s deterministic circuits streamline machine-learning algorithms for quicker processing, while Sora creates intricate video situations. The combination of these developments brings us one step closer to real-time video simulation and full-fledged virtual reality.
  • AI Is Like Water. For GenAI companies to have a competitive advantage, technology alone is no longer sufficient. This means that since the basic product is virtually the same, GenAI and bottled water are comparable. The primary differentiators need to originate from elements like distribution, user experience, perceived customer value, branding, and marketing.

Meme of the week

What do you think about it? Some news that captured your attention? Let me know in the comments

--

--

Salvatore Raieli

Senior data scientist | about science, machine learning, and AI. Top writer in Artificial Intelligence