Sitemap

WEEKLY AI NEWS: RESEARCH, NEWS, RESOURCES, AND PERSPECTIVES

AI & ML news: Week 12–18 May

18 min readMay 26, 2025
Photo by Markus Winkler on Unsplash

The most interesting news, repository, articles, and resources of the week

Check and star this repository where the news will be collected and indexed:

You will find the news first in GitHub. All the Weekly News stories are also collected here:

Weekly AI and ML news - each week the best of the field

75 stories

Artificial intelligence is transforming our world, shaping how we live and work. Understanding how it works and its implications has never been more crucial. If you’re looking for simple, clear explanations of complex AI topics, you’re in the right place. Hit Follow or subscribe for free to stay updated with my latest stories and insights.

Research

  • The Leaderboard Illusion. The Leaderboard Illusion reveals major flaws in the Chatbot Arena ranking system, showing that practices like selective score reporting, extreme data imbalances, silent model removals, and overfitting to Arena-specific dynamics distort LLM comparisons. Through analysis of 2M battles, the paper finds that private testing privileges and data access for proprietary models inflate scores and undermine fairness, making the leaderboard an unreliable measure of real-world model quality.
  • LLMs Get Lost in Multi-Turn Conversation.LLMs perform significantly worse in multi-turn conversations, with an average 39% drop in task performance due to unreliability and early, incorrect assumptions.
  • Sakana AI Unveils “Continuous Thought Machine” With Brain-Inspired Neural Timing. Japanese AI company Sakana has created a new type of model where individual neurons retain memory of past actions and coordinate based on timing patterns. Though it lags behind traditional models in performance, it offers greater transparency into its reasoning process. Like recent models such as o3, its responses improve when given more time to process.
  • AlphaEvolve: A Gemini-powered coding agent for designing advanced algorithms. Google DeepMind’s AlphaEvolve employs Gemini models to iteratively create and refine full algorithmic solutions rather than isolated functions. It generates code, evaluates it automatically, and evolves better versions by building on successful attempts. This method has led to major improvements across Google’s infrastructure, including data center performance, chip design, and AI training efficiency. Some researchers will get early access, but broad availability remains uncertain.
  • The effect of ChatGPT on students’ learning performance, learning perception, and higher-order thinking: insights from a meta-analysis. Amid ongoing discussions about AI in education, a meta-analysis of 51 studies reveals that ChatGPT significantly boosts student learning performance and moderately enhances perceptions of learning and higher-order thinking. Its impact was strongest in problem-based learning settings with regular use over 4–8 weeks.
  • BLIP3-o: A Family of Fully Open Unified Multimodal Models-Architecture, Training and Dataset. BLIP3-o is a new diffusion transformer architecture trained using a sequential pretraining approach. It sets state-of-the-art performance on various multimodal benchmarks. The release includes the model’s code, pretrained weights, and a 60k instruction-tuning dataset.

News

Resources

  • Llama-Nemotron: Efficient Reasoning Models. NVIDIA’s Llama-Nemotron series — LN-Nano (8B), LN-Super (49B), and LN-Ultra (253B) — introduces powerful, open reasoning models that rival or surpass DeepSeek-R1 while offering better efficiency. Key innovations include a dynamic reasoning toggle for inference-time control and a multi-stage training pipeline combining architecture search, distillation, and RL. LN-Ultra leads on reasoning benchmarks and chat alignment, with open weights, code, and data released to support open research.
  • Optimizing GEMM with Thread Block Clusters. Thread block clusters and 2-SM UMMA instructions on Blackwell GPUs enable higher arithmetic intensity and more efficient memory transfers in GEMM workloads using CUTLASS.
  • Meta AssetGen 2.0. Meta’s AssetGen 2.0 introduces updated diffusion-based models for creating detailed 3D meshes and textures from text and image prompts, offering improved consistency, accuracy, and view-aware texture resolution compared to the earlier version.
  • Flow-GRPO for RL-Tuned Diffusion Models. Flow-GRPO integrates reinforcement learning into flow matching by converting ODEs into SDEs and using denoising reduction to enhance sample efficiency and alignment.
  • DeerFlow. Bytedance’s DeerFlow is an open-source research assistant using a multi-agent system that integrates search engines, web crawlers, and Python tools to produce Deep Research-style reports and podcasts.
  • Single-Image to 3D Avatars. SVAD merges video diffusion with 3D Gaussian Splatting to create high-quality animatable avatars from a single image, enabling real-time rendering.
  • Amazon’s Warehouse Stowing Robot Shows Promise and Limitations. Amazon’s custom stowing robot performs on par with humans in warehouse tasks, showcasing the cutting edge of robotics. Its specialized hardware and AI vision enable large-scale handling of varied items, but a 14% failure rate illustrates why complete warehouse automation is still out of reach despite major progress.
  • China built hundreds of AI data centers to catch the AI boom. Now many stand unused. China’s rapid expansion of AI infrastructure has resulted in significant overcapacity, with 80% of computing resources in over 500 new data centers remaining idle. The release of DeepSeek’s R1 model shifted market demand from training-oriented to inference-optimized hardware, leaving many centers outdated. Despite this correction, China continues to invest heavily in infrastructure to rival U.S. efforts such as the $500 billion Stargate project.
  • OpenAI’s HealthBench. OpenAI’s HealthBench is a benchmark created in collaboration with 262 physicians to assess AI models on realistic medical dialogues.
  • A Generalist Robot Policy Framework. UniVLA enables policy learning from unlabeled video across diverse robot embodiments by inferring task-centric latent actions.
  • Bamba-9B-v2. IBM, Princeton, CMU, and UIUC have introduced Bamba v2, a Mamba2-based model that surpasses Llama 3.1 8B after training on 3 trillion tokens. Utilizing the Mamba2 architecture, Bamba v2 achieves 2 to 2.5 times faster inference and strong results on L1 and L2 benchmarks. The team aims to further optimize the model and encourages community involvement to improve its performance.
  • Helium 1: a modular and multilingual LLM. Helium 1, a 2 billion parameter LLM, excels in European languages and is optimized for on-device use.
  • Visual Autoregression Without Quantization. EAR presents a continuous visual autoregressive generation approach that eliminates the need for quantization by using strictly proper scoring rules, such as the energy score. This allows for direct generation in continuous data spaces without relying on probabilistic models.
  • Unified Training and Sampling for Generative Models. UCGM provides a shared framework for training and sampling across multi-step and few-step continuous generative models.
  • Hugging Face Fast Transcription Endpoint. Hugging Face has launched a new Whisper endpoint offering up to 8x faster transcription. It allows one-click deployment of optimized, cost-efficient models for speech-related tasks via its Inference Endpoints.
  • Stability AI Text-to-Audio Model. Stability AI has open-sourced Stable Audio Open Small, a 341M parameter text-to-audio model optimized for Arm CPUs. It can produce 11-second audio clips on smartphones in under 8 seconds.
  • Building Agents for Daily News Recaps with MCP, Q, and tmux. A Principal Applied Scientist at Amazon developed a smart news aggregation system using Amazon Q CLI and Model Control Protocol (MCP). It processes multiple news feeds at once through coordinated AI agents, generating outputs like category distributions and cross-source trend analysis to reveal patterns across various publications.
  • Void: Open-Source AI Code Editor. Void, a VS Code fork, enables direct connections to AI models without sending data through third-party servers. It includes features like autocomplete, Agent Mode for full file and terminal interaction, Gather Mode for read-only operations, and checkpoints to track AI-suggested changes.
  • Meta’s New Artifacts. Meta’s FAIR team has released datasets and models supporting molecular property prediction, diffusion modeling, and language learning neuroscience.
  • A Visual Tool Use for AI Agents. OpenThinkIMG enables vision-language models to actively utilize visual tools through dynamic inference and distributed deployment. It features a new reinforcement learning approach called V-ToolRL and an efficient training pipeline designed to enhance multi-tool reasoning with images.
  • Making complex text understandable: Minimally-lossy text simplification with Gemini. Developers leveraged Gemini models to automate prompt evaluation and refinement for text simplification, enhancing readability without losing meaning. The system uses LLMs to assess both clarity and fidelity, aligning more closely with human evaluations than traditional approaches. By iterating prompts automatically, it reduces manual work and enables highly effective simplification through a feedback loop powered by LLMs.

Perspectives

  • Australia has been hesitant — but could robots soon be delivering your pizza? While there have been concerns over the safety and legal status of the technology, working models from local startups are showing its benefits
  • For Silicon Valley, AI isn’t just about replacing some jobs. It’s about replacing all of them. AI will do the thinking, robots will do the doing. What place do humans have in this arrangement — and do tech CEOs care?
  • What’s the carbon footprint of using ChatGPT? ChatGPT queries use much less energy than previously thought, with new estimates putting typical usage at just 0.3 Wh — ten times lower than earlier figures. Although AI’s total energy use is worth monitoring, individual text-based interactions have a minimal environmental impact, especially when compared to activities like transportation or heating.
  • ‘AI models are capable of novel research’: OpenAI’s chief scientist on what to expect. Jakub Pachocki, who leads the firm’s development of advanced models, is excited to release an open version to researchers.
  • Vision Language Models (Better, Faster, Stronger). Hugging Face has outlined how Vision Language Models have advanced with smaller, more capable architectures, enabling reasoning, video understanding, and multimodal agents.
  • Journalists Reveal Nuanced Approaches to AI Integration. A survey of media professionals from outlets like Reuters, The Washington Post, VentureBeat, and 404 Media reveals that newsrooms are selectively integrating AI — using it for tasks like transcription, data analysis, and translation, but largely avoiding AI-generated content. While Reuters notes that AI now produces roughly 25% of its code, many journalists remain cautious, emphasizing audience trust and journalistic integrity over efficiency.
  • ChatGPT is used for scientific research in countries where it’s prohibited. Researchers used a classifier to spot unique AI word choices — such as “delve” — in academic papers and found higher ChatGPT usage in countries where it’s banned by OpenAI. By August 2023, 22% of Chinese preprints contained AI-generated content, compared to 11% in countries with legal access, indicating restrictions are easily bypassed. While ChatGPT use was linked to more views and downloads, it had no effect on citations or journal acceptance.
  • Conversational Interfaces: the Good, the Ugly & the Billion-Dollar Opportunity. Chat interfaces offer an easy entry point to LLMs for new users, but they’re ultimately a design limitation that makes users adjust to the model instead of the other way around. Future assistants will feature more adaptive interfaces and proactively convey what they can do.
  • Is it OK for AI to write science papers? Nature survey shows researchers are split. Poll of 5,000 researchers finds contrasting views on when it’s acceptable to involve AI and what needs to be disclosed.
  • AI’s Second-Order Effects. Founders should consider AI’s second-order impacts, such as shifts in workforce roles and regulatory demands, to drive sustainable growth. While first-order applications are becoming commoditized and competitive, real opportunities exist in addressing broader societal and economic changes spurred by AI. Building AI-native media and infrastructure can help tap into the transformative ways people respond to these disruptions.
  • MCP is a powerful new AI coding technology: Understand the risks. Model Context Protocol (MCP), developed by Anthropic AI to link LLMs with tools and data, currently lacks built-in security features, raising serious concerns. Experts have highlighted risks such as prompt injections and tool tampering. Without stronger safeguards, developers and organizations should use MCP cautiously, emphasize robust security practices, and keep up with its evolving standards.
  • OpenAI Engineers Reveal How ChatGPT Images Handled 100M New Users in One Week. OpenAI engineers shared how they handled the March launch of ChatGPT Images, which drew 100 million new users and 700 million images in its first week, with peak demand hitting 1 million new signups per hour during a viral surge in India. When their synchronous image generation system buckled under the pressure, the team rapidly rebuilt it into an asynchronous architecture during the launch.
  • Agents, Tools, and Simulators. AI can be understood through three conceptual frameworks: as a tool, an agent, or a simulator — each offering unique perspectives on its potential and risks. Tools amplify human intent and need supervision; agents act autonomously to achieve goals; simulators replicate processes without inherent objectives. In the case of LLMs, simulator theory posits that they combine simulation with agent-like behavior, particularly when fine-tuned, reflecting a dual nature shaped by both their context of use and architectural design.
  • How AI Agents Will Change the Web for Users and Developers. AI agents are set to reshape the web by autonomously interacting and sharing content, fundamentally changing user experiences and web development. This could lead to an “autonomous internet” where AI-driven interactions become the norm, influencing how content is structured, how payments work, and how businesses operate. Developers will need to adapt by building APIs tailored for AI agents and prioritizing scalable, personalized user experiences.
  • a16z identifies nine key developer patterns in the AI era. Andreessen Horowitz identified nine key developer patterns emerging in the AI era. These patterns fundamentally reshaped how developers built software and what tools they used.

Meme of the week

What do you think about it? Some news that captured your attention? Let me know in the comments

If you have found this interesting:

You can look for my other articles, and you can also connect or reach me on LinkedIn. Check this repository containing weekly updated ML & AI news. I am open to collaborations and projects and you can reach me on LinkedIn. You can also subscribe for free to get notified when I publish a new story.

Here is the link to my GitHub repository, where I am collecting code and many resources related to machine learning, artificial intelligence, and more.

--

--

Salvatore Raieli
Salvatore Raieli

Written by Salvatore Raieli

Senior data scientist | about science, machine learning, and AI. Top writer in Artificial Intelligence

No responses yet