Sitemap

WEEKLY AI NEWS: RESEARCH, NEWS, RESOURCES, AND PERSPECTIVES

AI & ML news: Week 28 April — 4 May

15 min readMay 11, 2025
Photo by Markus Winkler on Unsplash

The most interesting news, repository, articles, and resources of the week

Check and star this repository where the news will be collected and indexed:

You will find the news first in GitHub. All the Weekly News stories are also collected here:

Weekly AI and ML news - each week the best of the field

75 stories

Artificial intelligence is transforming our world, shaping how we live and work. Understanding how it works and its implications has never been more crucial. If you’re looking for simple, clear explanations of complex AI topics, you’re in the right place. Hit Follow or subscribe for free to stay updated with my latest stories and insights.

Research

  • UniversalRAG: Retrieval-Augmented Generation over Multiple Corpora with Diverse Modalities and Granularities. UniversalRAG is a retrieval-augmented generation framework that handles multiple modalities (text, image, video) and granularities (e.g., paragraph vs. document) to overcome the limits of traditional RAG systems. It uses a modality- and granularity-aware router to select the most relevant content format for each query, improving retrieval accuracy. UniversalRAG outperforms existing baselines across eight benchmarks, demonstrating the value of dynamic routing for robust, multimodal question answering.
  • Model Evaluation in the Dark: Robust Classifier Metrics with Missing Labels. This research introduced a multiple imputation method for evaluating classifiers with missing labels, offering accurate and robust predictive distributions even under MNAR conditions.
  • ReLearn: Unlearning via Learning for Large Language Models. ReLearn offers a data augmentation and fine-tuning pipeline for effective unlearning in large language models.
  • Meta AI App. Meta has introduced a new standalone AI app, expanding its efforts to integrate AI features into consumer experiences more directly.
  • Meta previews an API for its Llama AI models. At its inaugural LlamaCon AI developer conference on Tuesday, Meta announced an API for its Llama series of AI models: the Llama API.
  • Advances and Challenges in Foundation Agents: From Brain-Inspired Intelligence to Evolutionary, Collaborative, and Safe Systems. This survey presents a modular, brain-inspired framework for intelligent agents, drawing from cognitive science and neuroscience to define core components like reasoning, memory, and action. It contrasts LLM agents with human cognition, explores how agents can plan and adapt over time, and highlights the need for better tools and training for real-world action. It also emphasizes the long-term goal of self-evolving agents capable of improving themselves with minimal human intervention.
  • MAGI: Multi-Agent Guided Interview for Psychiatric Assessment. MAGI is a multi-agent system that automates structured psychiatric interviews based on the MINI protocol, using four agents for flow control, questioning, judgment, and diagnosis. Its explainable reasoning method, PsyCoT, breaks down diagnoses into transparent steps tied to DSM-5 criteria. In tests on over 1,000 interviews, MAGI outperformed other LLM-based methods in accuracy, completeness, and clinical reliability, achieving strong agreement across multiple psychiatric conditions.
  • Modulating Reasoning in LLMs. Researchers demonstrate that reasoning abilities in LLMs can be modified through straightforward changes in the residual stream by applying representation-based control vectors.

News

  • DeepSeek-R2. DeepSeek has launched DeepSeek-R2, a multilingual and resource-efficient model positioned to challenge global AI leaders.
  • Musk’s xAI Holdings is reportedly raising the second-largest private funding round ever. Elon Musk’s xAI Holdings is in talks to raise $20 billion in fresh funding, potentially valuing the AI and social media combo at over $120 billion, according to a new Bloomberg report that says talks are in the “early stages.” If successful, the deal would be the second-largest startup funding round ever, behind only OpenAI’s $40 billion raise last month.
  • Resilient AI Infrastructure. Harvey employs a centralized Python library to manage AI model interactions, enabling reliable deployments through active load balancing and real-time monitoring. It features distributed rate limiting to manage burst traffic, supports smooth model upgrades, and improves security, resource efficiency, and rapid deployment while ensuring consistent performance and failure detection.
  • Character.AI unveils AvatarFX, an AI video model to create lifelike chatbots. Character.AI, a leading platform for chatting and roleplaying with AI-generated characters, unveiled its forthcoming video generation model, AvatarFX, on Tuesday. Available in closed beta, the model animates the platform’s characters in a variety of styles and voices, from human-like characters to 2D animal cartoons.
  • High Schoolers’ AI-Enabled Device Deters Drunk Driving. High school students from North Carolina developed SoberRide, an AI-powered device that uses cameras, sensors, and machine learning to detect signs of alcohol impairment in drivers. They’ve secured a U.S. patent, partnered with groups like Mothers Against Drunk Driving, and gained attention from major automakers at CES. The team is pushing for legislation requiring DUI detection in vehicles and is targeting fleet operators and parents as initial users.
  • DeepMind UK staff plan to unionise and challenge deals with Israel links, FT reports. Roughly 300 DeepMind employees in London are moving to unionize due to concerns over ethical commitments and partnerships with military organizations.
  • OpenAI debuts Image Generation API for developers. OpenAI has launched an API for integrating DALL·E image generation into apps and workflows.
  • This AI Model Can Scream Hysterically in Terror. Nari Labs’ Dia-1.6B is a tiny open-source AI that claims to surpass ElevenLabs and Sesame in emotional speech synthesis.
  • Anthropic Economic Advisory Council. Anthropic announced the creation of a council of top economists to advise on the economic impacts of AI and guide research for its Anthropic Economic Index.
  • Hugging Face releases a 3D-printed robotic arm starting at $100. Hugging Face, the startup best known for the AI developer platform of the same name, is selling a programmable, 3D-printable robotic arm that can pick up and place objects and perform a few other basic chores.
  • OpenAI upgrades ChatGPT search with shopping features. OpenAI has upgraded ChatGPT’s search to improve online shopping, providing product suggestions, images, reviews, and purchase links. Unlike ad-driven models, it uses structured metadata for personalized, independent results. Future enhancements will include deeper personalization via ChatGPT’s memory for Pro and Plus users, though some European regions are excluded.
  • Detecting and Countering Malicious Uses of Claude: March 2025. Claude models were exploited for influence campaigns, credential stuffing, recruitment scams, and malware creation. In response, Anthropic is enhancing its safeguards and has banned the accounts responsible to curb further abuse.
  • Generative Video Models for Driving. Valeo AI introduces VaViM, an autoregressive video model that predicts spatio-temporal token sequences, and VaVAM, which converts learned video representations into driving trajectories using imitation learning.
  • DeepMind showcased AlphaFold 3’s expanded molecular prediction abilities. DeepMind’s AlphaFold 3 introduces new features for predicting DNA, RNA, and molecular structures, along with enhanced accuracy in modeling complex molecular interactions. It is available for free non-commercial use via EMBL-EBI.
  • Introducing Mobility AI: Advancing urban transportation. Google Research developed new machine learning models to study congestion, parking, and travel demand trends. These tools also assessed greenhouse gas reductions and transportation safety effects, advancing urban mobility planning through geospatial and real-time data.
  • Adobe and Figma tools are getting ChatGPT’s upgraded image generation model. OpenAI made its upgraded image generator model accessible to other companies via the “gpt-image-1” API.
  • The 2025 Annual Work Trend Index: The Frontier Firm is born. Microsoft’s 2025 Work Trend Index report highlights the emergence of AI-powered “Frontier Firms,” characterized by on-demand intelligence, collaboration between humans and AI agents, and the rise of the “agent boss.” The company also introduced new Microsoft 365 Copilot updates to deepen AI integration across workplace tools.
  • OpenAI researcher behind GPT-4.5 denied U.S. green card. Kai Chen, a Canadian AI researcher working at OpenAI who’s lived in the U.S. for 12 years, was denied a green card, according to Noam Brown, a leading research scientist at the company. In a post on X, Brown said that Chen learned of the decision Friday and must soon leave the country.
  • Developers increasingly value generative AI expertise. Developers are encouraged to focus on generative AI skills to boost their careers and stay competitive, as companies increasingly seek talent to drive AI-powered innovation across various initiatives.
  • NotebookLM Audio Overviews Expanded to 50+ Languages. Google enhanced NotebookLM by enabling its popular Audio Overviews feature in more than 50 languages, allowing wider global access to AI-generated podcast-style summaries.
  • OpenAI wants its ‘open’ AI model to call models in the cloud for help. For the first time in roughly five years, OpenAI is gearing up to release an AI system that’s truly “open,” meaning it’ll be available for download at no cost and not gated behind an API. TechCrunch reported on Wednesday that OpenAI is aiming for an early summer launch, and targeting performance superior to open models from Meta and DeepSeek.
  • Australian radio station secretly used an AI host for six months. An Australian radio station is facing backlash after using an AI-generated host for the last six months without disclosing it. Australian Radio Network’s CADA station, which broadcasts in Sydney and on the iHeartRadio app, created a host called Thy using artificial intelligence software developed by voice cloning firm ElevenLabs.
  • GPT-4o Rollback. OpenAI has reverted a recent GPT-4o update that caused the model to become overly agreeable and is currently improving its personalization and feedback systems.
  • Anthropic’s Take on Export Controls for AI Chips. Anthropic backs strict export controls to protect the U.S.’s lead in computing power, emphasizing the national security risks of spreading advanced chips.
  • Meta’s Protection Tools for AI. Meta has introduced new tools to enhance the security of open-source AI systems, including infrastructure that preserves user privacy and mechanisms for detecting threats during model deployment.
  • Understanding Data at Scale. Meta oversees extensive and intricate data systems by integrating privacy measures from the outset of development and establishing a consistent privacy framework to simplify compliance.
  • Perplexity’s CEO on fighting Google and the coming AI browser war. Perplexity’s CEO has shared plans to compete with Google in AI-driven search and browsers, anticipating intensified rivalry as AI-native browsers transform the industry.
  • AI is getting “creepy good” at geo-guessing. New AI models accurately determined locations using subtle cues in images. Researchers cautioned that this might pose privacy risks and security issues.
  • Gemini Adds AI Image Editing. Google has expanded AI editing to the Gemini app, allowing users to modify personal images with multi-step tools and text-image interactions.
  • YouTube is testing its own version of AI Overviews. YouTube has started testing AI Overviews to summarize video content for users.
  • Gmail gets a slider on Android tablets, AI on the side. Google is rolling out Gmail updates for mobile users across Android and iOS, with some design updates and new access to AI features.
  • Claude Integrations. Claude now integrates with third-party apps, Google Workspace, and web search, allowing paid users to conduct in-depth research and access web search worldwide.
  • Perplexity’s CEO on fighting Google and the coming AI browser war. Perplexity’s CEO Aravind Srinivas outlines plans to launch Comet, Perplexity AI’s own browser designed as a platform for AI agents. Despite hurdles with Google, the company secured a pre-installation agreement with Motorola’s new Razr phones. Srinivas views browsers as key to AI, enabling deep integration and interaction with third-party services.
  • AI Mode Updates in Google Search. AI tools like ChatGPT and Perplexity accurately determine locations from photos and bird songs based on visual and auditory clues, even without metadata.
  • Perplexity is now live on WhatsApp. Perplexity has launched a WhatsApp integration for its AI assistant.

Resources

  • Minimal MCP-Powered Agent Implementation. A walkthrough on how to build a compact MCP-powered agent in just 50 lines of Typescript.
  • How to Debug ML Deployments Faster. This guide demonstrates an efficient local testing workflow aimed at speeding up model deployment debugging.
  • Phi-4-Mini-Reasoning. Microsoft’s Phi-4-Mini-Reasoning is a 3.8B parameter model that delivers state-of-the-art math reasoning, outperforming larger models on MATH-500. Trained via a four-stage pipeline — mid-training, fine-tuning, DPO, and RL with verifiable rewards — it combines efficiency with accuracy. Using 10M filtered CoT samples and tailored RL strategies, it shows that small models can achieve high reasoning performance when guided by carefully structured training.
  • Mem0: Building Production-Ready AI Agents with Scalable Long-Term Memory. This paper introduces Mem0 and Mem0g, memory-centric architectures that help LLM agents maintain coherence across long conversations by overcoming fixed-context limits. Mem0 uses dense language-based memory with efficient updates, while Mem0g adds a graph-based structure for better relational and temporal reasoning. Both outperform existing memory and RAG systems on LOCOMO benchmarks, with Mem0g leading in accuracy and Mem0 excelling in latency — making them ideal for real-time, long-term agent use.
  • DeepSeek-Prover-V2. DeepSeek-Prover-V2 is a 671B LLM advancing formal theorem proving in Lean 4 using a cold-start pipeline that blends informal reasoning with subgoal decomposition. It builds training data via recursive proof generation and trains with curriculum learning and RL for structural consistency. Trained in both minimal and CoT modes, it achieves new state-of-the-art results on benchmarks like MiniF2F and ProofNet, showing strong generalization and bridging informal and formal reasoning performance.
  • LLM Arena Pareto Frontier. The chart compares LLMs based on performance relative to cost, highlighting top value models such as Amazon Nova Micro, Amazon Nova Lite, Gemini 2.0 Flash Lite, Gemini 2.0 Flash 001, Gemini 2.5 Flash, and Gemini 2.5 Pro for their strong performance at competitive pricing.
  • Lightweight Neural App Control. An interesting approach from Huawei that allows VLMs to control apps on Android devices with minimal additional system setup.
  • Cognitive Diagnosis. DisenGCD is a cognitive diagnosis model that improves the learning of student, exercise, and concept representations by using a disentangled graph learning framework.
  • Pippo: High-Resolution Multi-View Humans from a Single Image. Virtual human creation training system (no pretrained models) that can take an image as input and output a high-quality 3D representation of a person.
  • Speeding up Graph Learning Models with torch.compile. This article demonstrates how to boost PyTorch graph learning model speeds by up to 35% using PyG and torch.compile without losing model accuracy.
  • Google Cloud WAN for the AI Era. Google shares how its global WAN evolved to support resilient cloud services, which maintained connectivity during the West African cable outages.
  • Relational Graph Transformers: A New Frontier in AI for Relational Data. Relational Graph Transformers can help address enterprise data challenges and power applications like customer analytics, recommendations, fraud detection, and forecasting.
  • CogView 4 Image Generation Model. CogView 4 is a next generation generative image model that is permissively licensed. It outperforms Flux models on a number of key axes.
  • AI Hedge Fund. Numerous efforts have been made to automate trading using modern reasoning models, and this approach shows some improvement over others. It employs persona-based prompting to combine predictions across different fundamental assets, though it still requires significant refinement.
  • Alias free super resolution. This new work can upscale to arbitrary resolutions without typical reconstruction aliasing problems.
  • Kimi-Audio. Kimi-Audio is a new open-source audio foundation model designed for universal audio tasks, combining discrete semantic tokens and Whisper-derived acoustic features. Pretrained on over 13M hours of audio and fine-tuned on 300K+ curated hours, it supports real-time, high-quality generation via a streaming detokenizer and look-ahead decoding. Trained across speech, sound, music, and text modalities, it outperforms existing audio LLMs like Qwen2.5-Omni and Baichuan-Audio in ASR, audio understanding, and audio-text chat benchmarks.
  • MiMo-7B. Xiaomi’s MiMo-7B is a 7B-parameter model built for advanced reasoning in math and code, narrowing the gap with larger 32B-class models through targeted pretraining and posttraining. Trained on 25T tokens with math/code emphasis and enhanced by a Multi-Token Prediction objective, MiMo-7B outperforms other 7B–9B models and even surpasses larger ones on tasks like BBH and LiveCodeBench. Reinforcement learning and efficient rollout infrastructure further boost performance and inference speed.
  • Taming the Titans: A Survey of Efficient LLM Inference Serving. This survey reviews recent methods for optimizing LLM inference by tackling memory and compute limits, spanning instance-level techniques like model placement and scheduling, cluster-level strategies such as GPU deployment and load balancing, and scenario-specific approaches. It concludes with future directions for improving efficiency and scalability.
  • LLMs for Engineering: Teaching Models to Design High Powered Rockets. This study shows that applying reinforcement learning enables a 7B parameter model to surpass state-of-the-art foundation models and human experts in high-powered rocketry design, demonstrating the potential of RL to drive superior performance in complex engineering tasks.
  • Teaching LLMs solid modeling for next-gen design tools. Researchers trained LLMs to generate accurate solid models from text prompts. The method improved geometric accuracy over previous approaches.
  • LLM Benchmarking for Global Health. Google introduces a new benchmark that uses synthetic personas to assess LLMs’ ability to diagnose tropical and infectious diseases, providing a controlled, targeted way to evaluate medical reasoning in diverse scenarios.
  • A Tool for LiDAR Annotation. SALT is a semi-automatic labeling tool for LiDAR point clouds that delivers robust zero-shot adaptability to various sensors and environments, while preserving 4D consistency.
  • Mega Math Dataset. Over 300B tokens of highly curated math questions and answers for training.
  • Microsoft’s Phi-4-reasoning. Microsoft has introduced Phi-4-reasoning variants, pushing small language models further in efficiency and reasoning capabilities.Ai2’s
  • OLMo 2. The Allen Institute has released OLMo-2–1B, a small, transparent model backed by full training data and logs, furthering open research in language models.
  • Observability for RAG Agents. This article provides a walkthrough of building realistic simulation agents using RAG and LLMOps.
  • Google’s Medical AI Reading Images. A summary of how Google’s AMIE now examines medical images during conversational diagnoses, enhancing its capability to recommend accurate treatments similar to a real doctor.
  • Federated LoRA Fine-Tuning. Fed-SB has introduced a scalable approach for federated fine-tuning of LLMs using LoRA-SB that drastically reduces communication costs.
  • OmniParser v2.0. The next version of the great screenshot parsing tool from Microsoft. It scores well on the Screenshot Pro benchmark.

Perspectives

  • The EU Is Asking for Feedback on Frontier AI Regulation. The European AI Office is requesting expert input on how to interpret core obligations for general-purpose AI under the EU AI Act. Leading AI labs have committed to following these upcoming Codes of Practice, which will establish a compliance standard. Key focus areas include systemic risk thresholds, compute estimates for training, and the duties of downstream fine-tuners. Feedback is open until May 22.
  • Figma 2025 AI Report: Perspectives. Figma surveyed designers and found strong optimism for AI-driven creative workflows in 2025.
  • Language equivariance as a way of figuring out what an AI “means”. A researcher uncovered syntax-semantics issues in LLMs and introduced language equivariance as a solution, proposing that models should maintain consistent moral judgments across translations. This approach points to a deeper, more meaningful understanding in language-equivariant LLMs beyond surface-level syntax.
  • AI Companions. AI companions are set to transform the digital experience by moving beyond basic chatbots to become interactive, personalized interfaces. Current generic interactions limit meaningful engagement, but future success hinges on intuitive designs that make AI feel like a personal companion. As the technology evolves, the emphasis should move from model performance to user experience and personalization, shaping AI into a companion that inspires curiosity and self-exploration.
  • MCPs, Gatekeepers, and the Future of AI. MCPs face criticism for slow, opaque licensing processes. This article called for reforms to support independent music creators.
  • Why would AI companies use human-level AI to do alignment research? AI companies may overlook alignment bootstrapping once they reach human-level AI, similar to how they now prioritize enhancing capabilities with human researchers over safety. This could create a risky gap where AI advances outstrip alignment progress, heightening existential threats. To prevent this, companies should already be prioritizing safety to demonstrate their dedication to responsible development.
  • Why Developers Should Care About Generative AI (Even If They Aren’t AI Experts). Generative AI tools such as GitHub Copilot and Claude are poised to reshape software development by boosting productivity and automating repetitive tasks. Though these tools offer efficiency improvements, human developers remain essential for creativity, quality control, and managing complex needs. Embracing AI tools can help developers enhance their abilities and stay current with evolving technology.

Meme of the week

What do you think about it? Some news that captured your attention? Let me know in the comments

If you have found this interesting:

You can look for my other articles, and you can also connect or reach me on LinkedIn. Check this repository containing weekly updated ML & AI news. I am open to collaborations and projects and you can reach me on LinkedIn. You can also subscribe for free to get notified when I publish a new story.

Here is the link to my GitHub repository, where I am collecting code and many resources related to machine learning, artificial intelligence, and more.

--

--

Salvatore Raieli
Salvatore Raieli

Written by Salvatore Raieli

Senior data scientist | about science, machine learning, and AI. Top writer in Artificial Intelligence

No responses yet