WEEKLY AI NEWS: RESEARCH, NEWS, RESOURCES, AND PERSPECTIVES

AI & ML news: Week 31 March — 6 April

OpenAI raises $40B from SoftBank, Isomorphic Labs (DeepMind) raises $600M to push AI-driven drug discovery, Gemini 2.5 Pro by Google, Amazon Nova Act, Satya Nadella says DeepSeek is Microsoft’s new benchmark after $80B AI investment, Earth AI finds minerals in ignored regions using its machine-learning platform, Trump extends TikTok sale deadline to avoid U.S. ban

--

Photo by Markus Winkler on Unsplash

The most interesting news, repository, articles, and resources of the week

Check and star this repository where the news will be collected and indexed:

You will find the news first in GitHub. All the Weekly News stories are also collected here:

Weekly AI and ML news - each week the best of the field

75 stories

Artificial intelligence is transforming our world, shaping how we live and work. Understanding how it works and its implications has never been more crucial. If you’re looking for simple, clear explanations of complex AI topics, you’re in the right place. Hit Follow or subscribe for free to stay updated with my latest stories and insights.

Research

  • Tracing the thoughts of a large language model. Anthropic researchers introduce new interpretability tools for examining LLMs, using Claude 3.5 Haiku as a testbed. Their studies reveal insights into model internals, such as circuits, plans, and conceptual thinking. Key findings include Claude’s multilingual “language of thought,” where concepts like “small” are processed similarly across languages, enabling transfer learning. Claude also plans ahead, even in poetry, and computes sums with parallel circuits, explaining answers using human-style logic. The tools help detect unfaithful reasoning, where Claude fabricates steps to fit answers. Researchers can also intervene in multi-step reasoning, showing that Claude’s reasoning is dynamic. The tools also reveal that Claude’s hallucinations are caused by misfires in circuits and that jailbreaks can bypass safety features temporarily.
  • Harmful Fine-Tuning Attacks. Researchers have identified weaknesses in current defenses against harmful fine-tuning attacks and introduced Panacea, an adaptive perturbation method that maintains model safety without compromising fine-tuning performance.AgentRxiv.Researchers from Johns Hopkins and ETH Zurich introduce AgentRxiv, a framework that allows LLM agents to autonomously generate and share research papers, similar to how human scientists collaborate. The system functions like an open-source preprint server for agents, enabling labs to upload, search, and refine papers iteratively. Using this framework, a single agent improved GPT-4o mini’s accuracy by 11.4% on the MATH-500 benchmark through better prompt strategies. The framework also improved other benchmarks, showing consistent performance gains across multiple LLMs. Collaboration between agent labs led to faster progress, with higher accuracy achieved by sharing results via AgentRxiv. Agents refine their own ideas without plagiarism, but the system requires further improvements in reliability and novelty guarantees.
  • Neural Alignment via Speech Embeddings. Google Research and collaborators reveal significant similarities between LLM embeddings and human brain activity during conversation. Their findings show that embeddings from OpenAI’s Whisper model align with brain signals in regions responsible for speech, language, and motor planning. The study suggests a “soft hierarchy” in brain areas, with overlapping processing of speech and language. Brain regions also predict upcoming words, mirroring autoregressive LLM behavior. Additionally, the geometry of word relationships in brain activity reflects that of LLM embeddings, indicating convergent structures in language representation. Despite architectural differences — brains process speech serially, while Transformers process in parallel — these studies highlight potential for using LLMs to reverse-engineer the brain’s language mechanisms and inspire more brain-like AI models.
  • Unlearning Sensitive Content from LLMs. This paper introduces a model merging technique that enables selective forgetting of sensitive content in large language models while retaining their general knowledge.
  • Chain-of-Tools: Utilizing Massive Unseen Tools in the CoT Reasoning of Frozen Language Models. The paper introduces Chain-of-Tools (CoTools), a method allowing LLMs to incorporate external tools, including unseen ones, while maintaining chain-of-thought (CoT) reasoning. CoTools keeps the LLM’s parameters frozen and fine-tunes additional modules (Tool Judge and Tool Retriever) to interact with a wide array of tools. It represents tools as semantic vectors, allowing even unfamiliar tools to be used without retraining the model. CoTools integrates tool calls within the reasoning process, selecting the best tool from many based on query context, improving accuracy on complex tasks. Experiments on various benchmarks show significant improvements in tool-selection accuracy and overall performance, with CoTools successfully handling large and unseen toolsets.
  • Structured Memory Augmentation for Smarter LLM Agents. MemInsight is a framework that autonomously enhances and organizes memory for LLM agents, improving context retention and retrieval. It uses a backbone LLM to mine and structure memory attributes from past conversations, organizing them into entity and conversation-centric augmentations. MemInsight outperforms traditional retrieval methods, achieving up to 34% higher recall on the LoCoMo QA dataset compared to Dense Passage Retrieval (RAG). It also improves movie recommendations by matching genres and reducing memory size by 90%, while increasing persuasive outputs by 12%. MemInsight can summarize long conversations using memory alone, achieving coherence similar to raw-dialogue baselines. The system shows minimal hallucinations and stable performance, particularly when using carefully selected models for memory augmentation.
  • Anthropic Economic Index: Insights from Claude 3.7 Sonnet. Anthropic has launched the Economic Index, leveraging Claude Sonnet 3.7 to evaluate AI’s impact across various job sectors. The model analyzes productivity changes, automation risks, and labor market trends, offering data-driven insights to help policymakers and businesses adapt to AI-driven economic transformations.
  • Investigating Affective Use and Emotional Well-being on ChatGPT. Researchers from OpenAI and MIT Media Lab examine the impact of emotionally engaging interactions with ChatGPT, particularly in Voice Mode, on user well-being. They combine a large-scale analysis of 4M+ conversations and 4,000+ surveys with a randomized controlled trial (RCT) involving 981 participants. The studies reveal that high usage, especially with voice interactions, is linked to emotional dependence, a preference for chatbot interactions, and discomfort with changes in voice/personality. Voice mode had mixed effects, improving well-being for some, but long-term usage led to increased loneliness and problematic use. A small group of users drove most emotionally charged conversations, forming pseudo-relationships with the model. Automated classifiers developed to analyze conversations mirrored user self-reports and highlighted the need for socioaffective alignment, urging developers to design models that support users without exploiting emotional needs.
  • PLAY2PROMPT: Zero-shot Tool Instruction Optimization for LLM Agents via Tool Play. Researchers from MIT CSAIL and IBM introduce Play2Prompt, a framework that enables LLM agents to learn how to use external tools in a zero-shot manner, without needing labeled examples or high-quality documentation. Play2Prompt discovers tool usage patterns through trial-and-error API calls, generating query-answer pairs based on successful tool invocations. The system iteratively refines tool demonstrations and documentation via self-reflective beam search and rejection sampling. Play2Prompt shows strong zero-shot performance, improving accuracy by 5–7% on benchmark tasks and even boosting GPT-4o by up to 3.3%. It remains robust even with poor documentation and outperforming methods like EasyTool that rely on labeled examples, particularly in challenging tool settings.
  • Global modules robustly emerge from local interactions and smooth gradients. The principle of peak selection is described, by which local interactions and smooth gradients drive self-organization of discrete global modules.
  • Evolutionary optimization of model merging recipes. Akiba et al. developed an evolutionary approach to automatically merge artificial intelligence models, creating powerful hybrid models without extensive training. The method produces models with enhanced mathematical and visual capabilities that outperform larger models.
  • Enhanced Cell Segmentation. CellVTA improves vision transformer-based models for cell instance segmentation by injecting high-resolution spatial features through a CNN-based adapter, achieving state-of-the-art performance on multiple datasets.
  • Synthetic Data Generation Using Large Language Models: Advances in Text and Code. LLMs are increasingly employed to generate synthetic training data for language and code tasks, enhancing performance in low-resource settings through methods like prompt-based generation and self-refinement. The paper outlines advantages such as reduced cost and broader coverage, while also addressing challenges like factual inaccuracies and bias. It proposes mitigations and highlights future research directions in prompt automation and data quality evaluation.
  • Current and Future Use of LLMs for Knowledge Work. A two-part survey of 216 and 107 participants shows that knowledge workers currently use LLMs for tasks such as code generation and text enhancement, but anticipate more integrated use within workflows and data systems. The results provide insights for shaping future design and adoption of generative AI in professional environments.
  • Backdoor Attacks in CLIP. CLIP models are extremely susceptible to backdoor poisoning attacks, with almost perfect success rates using very little poisoned data. A practical way to detect this is by applying local outlier detection to reveal accidental backdoors in current datasets.
  • Large Small Net. A new class of efficient vision models draws inspiration from the human visual system’s ability to process broad scenes while focusing on details, known as “See Large, Focus Small”. LSNet delivers leading performance with strong efficiency across multiple vision tasks and features a novel convolution kernel design.

News

Resources

  • Qwen2.5-Omni. Qwen2.5-Omni is an end-to-end multimodal model capable of perceiving and understanding text, audio, images, and video, while generating both text and speech in real-time. It features the Thinker-Talker architecture, where Thinker handles perception and text generation, and Talker generates speech, trained together for synchronized output. The model’s streaming-first design uses block-wise encoders and TMRoPE for real-time interaction. Trained on over 1.2 trillion tokens, Qwen2.5-Omni is fine-tuned for natural speech and performs well across multiple modalities. It achieves state-of-the-art results on OmniBench, outperforms previous models in ASR and TTS, and significantly closes the gap in voice-text instruction following.
  • Test-Time Visual In-Context Tuning. A new method enables test-time adaptation of VICL models using only the test sample, enhancing generalization across different visual tasks under domain shifts.
  • High-Fidelity Simultaneous Speech-To-Speech Translation. Kyutai has unveiled its latest audio system, a real-time audio-to-audio translation tool powered by a robust multi-stream transformer. It features expressive voice capabilities, delivering impressive performance in speech translation.
  • Mobile-VideoGPT. A compact multimodal video model with under 1B parameters, incorporating dual visual encoders and token pruning to enable real-time inference on edge devices.
  • Multimodal Adaptation Methods. A curated list of methods for multimodal adaptation, including traditional domain adaptation, test-time adaptation, and recent innovative approaches.
  • ReAG — Reasoning Augmented Generation. Traditional Retrieval-Augmented Generation (RAG) systems use a two-step approach: semantic search retrieves documents based on surface-level similarity, followed by a language model generating responses from those documents. While effective, this often overlooks deeper context and introduces irrelevant information. ReAG — Reasoning Augmented Generation — proposes a stronger alternative by feeding raw documents directly into the language model, enabling it to process and integrate the full context. This unified method results in more accurate, nuanced, and context-aware outputs.
  • Awesome Vision-to-Music Generation. A curated and regularly updated list of methods, datasets, and demos focused on converting visual inputs into music (V2M), showcasing both academic and industrial advancements in the field.
  • Video Generation Faithfulness Benchmark. A benchmark designed to evaluate how accurately video generation aligns with the given prompt. It also introduces methods to improve the quality of generated videos in relation to the user’s input prompt.
  • Optimal Stepsize in Diffusion Models. Optimal Stepsize for Diffusion Sampling (OSS) improves diffusion model sampling by learning efficient stepsize schedules using dynamic programming, achieving a 10× speedup with minimal loss in generation quality.
  • SAMWISE video segmentation. This work gives SAM 2 open vocabulary segmentation and more precise semantic tracking over long videos.
  • Orpheus.Orpheus is a text-to-speech system. It is easy to install and runs without a GPU, similar to Llama cpp.
  • Video-R1. Video-R1 presents a rule-based reinforcement learning method for video reasoning, utilizing a temporal variant of GRPO and introducing new datasets. It is efficiently trainable on 4 H20 or 5 A100 GPUs.
  • Fast Text-to-3D. Progressive Rendering Distillation enables training 3D generators from text prompts without ground-truth meshes, producing high-quality 3D meshes in just 1.2 seconds and outperforming previous approaches.
  • TIDE for Underwater Scene Understanding. A text-to-image and dense annotation generation method for underwater scenes that produces high-quality synthetic datasets with consistent pixel-level labels.
  • OpenAI launches OpenAI Academy, a free AI learning platform for everyone. OpenAI has introduced OpenAI Academy, a free platform offering AI courses that range from beginner content to advanced subjects like AI safety and governance. Designed for diverse audiences, the platform seeks to expand access to AI education and promote thoughtful engagement with its societal implications. Early feedback praises its accessibility and thorough approach to making AI more understandable worldwide.
  • Video Motion Segmentation. Building on recent trends in tracking systems, this work incorporates dense pixel tracking to enhance long-term segmentation using Dino and SAM2.Open Hands Coding Model.A powerful 32B model fine-tuned with reinforcement learning on top of Qwen, outperforming many much larger models on agentic coding tasks.
  • Open-Reasoner-Zero: An Open Source Approach to Scaling Up Reinforcement Learning on the Base Model. There’s a major open question in RL reasoning around whether a sufficiently strong base model is essential for emergent reasoning. This work explored various cases of scaling RL on a base model and found that strong base models significantly aid reasoning convergence.
  • Easi3R: Estimating Disentangled Motion from DUSt3R Without Training. Easi3R is a 3D vision system designed to more accurately estimate 3D scenes with high motion. It significantly outperforms previous methods in full scene reconstruction by masking moving objects and learning them separately from the background.
  • Benchmark for RL-based Video Understanding. SEED-Bench-R1 is a benchmark designed to assess post-training methods such as RL and SFT for multimodal LLMs on complex video-based tasks. It highlights RL’s advantages in perception and data efficiency while also revealing its difficulties in maintaining logical coherence.
  • Flow Prediction for Autonomous Driving. UniOcc is a unified framework for forecasting and flow prediction in driving scenarios, designed for multi-dataset training and cross-domain evaluation across both real and synthetic environments.
  • Paper Bench. OpenAI has introduced a new benchmark for academic paper creation that involves fully replicating selected papers. This includes comprehending their experiments and results, as well as generating original ideas, to evaluate deeper understanding and creativity in AI models.
  • GeometryCrafter: Consistent Geometry Estimation for Open-world Videos with Diffusion Priors. A powerful model that leverages video diffusion as a prior for consistent geometry estimation over time. It operates at approximately 1.5 FPS for full point cloud estimation and also performs accurate camera pose estimation.
  • DSO: Aligning 3D Generators with Simulation Feedback for Physical Soundness. Most 3D synthesized data is created with a focus on aesthetic quality, which often results in models that can’t stand or support themselves in physics-based environments. This work introduces slight fine-tuning to improve the physical plausibility of these models.
  • Medical Reasoning Dataset. A large-scale medical reasoning dataset designed to enable faithful and explainable problem-solving in LLMs, aiming to advance research in medical AI.
  • DeepMind’s Study for Kernel Fuzzing. Snowplow is a kernel fuzzer that uses a learned white-box mutator to enhance test mutation efficiency, leading to significantly improved code coverage and increased bug discovery in Linux kernel fuzzing.
  • The hottest AI models, what they do, and how to use them. This article reviews the leading AI models released since 2024, showcasing their applications and strengths. Key highlights include OpenAI’s GPT-4.5 Orion for its robust world knowledge, Google’s Gemini 2.5 Pro for its coding capabilities, and Cohere’s Aya Vision, which stands out in image-related tasks. The overview helps simplify understanding the fast-changing AI landscape.
  • DeepSite open source canvas. DeepSite is a DeepSeek powered open source canvas for “vibe coding” that updates apps in real time while the system writes the code.
  • Articulated Kinematics Distillation from Video Diffusion Models. This work presents Articulated Kinematics Distillation (AKD), a framework that combines skeleton-based animation with generative diffusion models to generate high-fidelity, physically plausible character motions with lower complexity. It ensures structural consistency and surpasses existing methods in 3D coherence and expressive motion quality by employing Score Distillation Sampling for precise joint-level control.
  • Enhanced LoRA-based Fine Tuning. MetaLoRA introduces dynamic parameter generation based on meta-learning principles, improving the flexibility and task-awareness of LoRA-based fine-tuning approaches.
  • Pplx Cuda Kernels. Perplexity has open-sourced some of its MoE kernels that surpass DeepSeek in large-scale performance and offer more flexibility with fewer constraints on the MoE architecture.
  • HateBench for Evaluating Hate Speech. HateBench offers a framework to assess hate speech detection models on content generated by LLMs, including a hand-labeled dataset and tools for analyzing subtle and adversarial hate campaigns.
  • Zonos TTS. An impressive Apache 2.0 model for speech synthesis and voice cloning, featuring multilingual support and expressive real-time generation.
  • Hugging Face’s AI Agents Course. Hugging Face released an AI agents course today. This free course will take you on a journey, from beginner to expert, in understanding, using, and building AI agents.
  • The LLM Course from Hugging Face. Hugging Face has updated its well-known NLP course into a more comprehensive LLM curriculum, adding new chapters on fine-tuning, reasoning models, and current AI agent workflows.

Perspectives

Meme of the week

What do you think about it? Some news that captured your attention? Let me know in the comments

If you have found this interesting:

You can look for my other articles, and you can also connect or reach me on LinkedIn. Check this repository containing weekly updated ML & AI news. I am open to collaborations and projects and you can reach me on LinkedIn. You can also subscribe for free to get notified when I publish a new story.

Here is the link to my GitHub repository, where I am collecting code and many resources related to machine learning, artificial intelligence, and more.

or you may be interested in one of my recent articles:

--

--

Salvatore Raieli
Salvatore Raieli

Written by Salvatore Raieli

Senior data scientist | about science, machine learning, and AI. Top writer in Artificial Intelligence

No responses yet