WEEKLY AI NEWS: RESEARCH, NEWS, RESOURCES, AND PERSPECTIVES
AI & ML news: Week 12–18 May
The most interesting news, repository, articles, and resources of the week
Check and star this repository where the news will be collected and indexed:
You will find the news first in GitHub. All the Weekly News stories are also collected here:
Artificial intelligence is transforming our world, shaping how we live and work. Understanding how it works and its implications has never been more crucial. If you’re looking for simple, clear explanations of complex AI topics, you’re in the right place. Hit Follow or subscribe for free to stay updated with my latest stories and insights.
Research
- The Leaderboard Illusion. The Leaderboard Illusion reveals major flaws in the Chatbot Arena ranking system, showing that practices like selective score reporting, extreme data imbalances, silent model removals, and overfitting to Arena-specific dynamics distort LLM comparisons. Through analysis of 2M battles, the paper finds that private testing privileges and data access for proprietary models inflate scores and undermine fairness, making the leaderboard an unreliable measure of real-world model quality.
- LLMs Get Lost in Multi-Turn Conversation.LLMs perform significantly worse in multi-turn conversations, with an average 39% drop in task performance due to unreliability and early, incorrect assumptions.
- Sakana AI Unveils “Continuous Thought Machine” With Brain-Inspired Neural Timing. Japanese AI company Sakana has created a new type of model where individual neurons retain memory of past actions and coordinate based on timing patterns. Though it lags behind traditional models in performance, it offers greater transparency into its reasoning process. Like recent models such as o3, its responses improve when given more time to process.
- AlphaEvolve: A Gemini-powered coding agent for designing advanced algorithms. Google DeepMind’s AlphaEvolve employs Gemini models to iteratively create and refine full algorithmic solutions rather than isolated functions. It generates code, evaluates it automatically, and evolves better versions by building on successful attempts. This method has led to major improvements across Google’s infrastructure, including data center performance, chip design, and AI training efficiency. Some researchers will get early access, but broad availability remains uncertain.
- The effect of ChatGPT on students’ learning performance, learning perception, and higher-order thinking: insights from a meta-analysis. Amid ongoing discussions about AI in education, a meta-analysis of 51 studies reveals that ChatGPT significantly boosts student learning performance and moderately enhances perceptions of learning and higher-order thinking. Its impact was strongest in problem-based learning settings with regular use over 4–8 weeks.
- BLIP3-o: A Family of Fully Open Unified Multimodal Models-Architecture, Training and Dataset. BLIP3-o is a new diffusion transformer architecture trained using a sequential pretraining approach. It sets state-of-the-art performance on various multimodal benchmarks. The release includes the model’s code, pretrained weights, and a 60k instruction-tuning dataset.
News
- Meta taps former Google DeepMind director to lead its AI research lab. Meta has named Robert Fergus, former research director at DeepMind, as the new head of its FAIR lab following a stretch of leadership shifts and staff exits.
- Microsoft and OpenAI may be renegotiating their partnership. OpenAI and Microsoft are revisiting their multibillion-dollar partnership in a pivotal deal that may shape OpenAI’s future. Microsoft, having invested more than $13 billion, is proposing to trade part of its equity for extended access to OpenAI’s technology beyond their current agreement, which ends in 2030.
- Deep Research now supports GitHub repo analysis in ChatGPT. ChatGPT’s Deep Research agent now supports scanning GitHub repositories, analyzing source code and pull requests to generate detailed, cited reports. Users can directly query repositories via the Deep Research → GitHub integration.
- Open Source Project Curl Battles Wave of AI-Generated False Vulnerabilities. curl project founder Daniel Stenberg has grown frustrated with the surge of AI-generated false vulnerability reports submitted via platforms like HackerOne. Although HackerOne argues that AI can improve report quality if used properly, Stenberg is calling for better infrastructure and tools to address what he sees as a denial-of-service attack on maintainers’ time and focus.
- Gemini 2.5 Video Understanding. Gemini 2.5 Pro has achieved state-of-the-art results on video benchmarks such as YouCook2 and QVHighlights, surpassing GPT-4.1 and matching fine-tuned specialist models under the same evaluation conditions.
- Canadian Pharmacist Exposed as Key Figure Behind World’s Largest Explicit Deepfake Site. Investigative journalists have revealed Toronto-area pharmacist David Do as the key figure behind MrDeepFakes.com, which shut down permanently after the exposure. Since 2018, the site had amassed 650,000 users and over 2 billion views, hosting tens of thousands of non-consensual AI-generated explicit videos of celebrities, influencers, and private individuals. Although such deepfakes are still legal in Canada, Prime Minister Mark Carney has committed to criminalizing them, following examples set by the UK and Australia.
- OpenAI agrees to buy Windsurf for about $3 billion, Bloomberg News reports. OpenAI has agreed to buy artificial intelligence-assisted coding tool Windsurf for about $3 billion, Bloomberg News reported on Monday, citing people familiar with the matter.
- SoundCloud’s Quiet Terms Update. SoundCloud has reportedly updated its terms of service to allow AI training on content uploaded by users, raising concerns about transparency and user consent.
- FutureHouse releases AI tools it claims can accelerate science. FutureHouse, an Eric Schmidt-backed nonprofit that aims to build an “AI scientist” within the next decade, has launched its first major product: a platform and API with AI-powered tools designed to support scientific work.
- House of Lords pushes back against government’s AI plans. Peers back amendment to data bill requiring AI companies to reveal which copyrighted material they have used
- OpenAI’s Stargate project reportedly struggling to get off the ground, thanks to tariffs. OpenAI’s ambitious Stargate data center project is facing delays thanks to tariff-related economic uncertainty, reports Bloomberg. Growing market volatility and cheaper AI services have made banks, private equity investors, and asset managers wary of investing in Stargate, an OpenAI-led project that aims to raise up to $500 billion for AI infrastructure in the U.S. and overseas.
- Google’s Fund for AI Startups. Google’s AI Futures Fund will invest in startups using DeepMind’s AI tools and offer access to models, cloud credits, expert support, and potential direct funding.
- Google to Join AI Coding Assistant Race. Google is reportedly preparing to launch an AI software development agent at its I/O conference on May 20, designed to support the entire development lifecycle. This positions Google in direct competition with Anthropic’s Claude Code, OpenAI’s Windsurf, and a growing number of startups in the competitive AI coding arena. Integration with Gemini and AR glasses may also be on the horizon.
- Figma Website Builder. Figma has introduced Figma Sites, allowing users to design, build, and publish responsive websites directly from Figma, streamlining the design-to-production workflow.
- Trump Administration Scraps Biden-Era AI Chip Export Controls. The Trump administration has canceled Biden’s “AI diffusion rule,” which would have restricted American technology exports.
- Manus Expands Free Access. Manus has removed its waitlist, making its virtual desktop AI agent more accessible by allowing all users one free daily task and granting a one-time bonus of 1,000 credits, significantly reducing the entry barrier for this previously hyped automation tool.
- DeepMind unveils ‘spectacular’ general-purpose science AI. System improves chip designs and tackles unsolved maths problems, but has not been rolled out to researchers outside the company.
- TikTok breached EU advertising transparency laws, commission says. Company could face fine of 6% of annual turnover if European Commission’s preliminary verdict is upheld
- Trump says he has a ‘little problem’ with Tim Cook over Apple’s India production. President rebukes tech firm after reports it will switch assembly of iPhones for US market from China to India
- US tech firms secure AI deals as Trump tours Gulf states. Nvidia to sell hundreds of thousands of AI chips in Saudi Arabia and Cisco also signs deal with UAE company G42
- ChatGPT may be polite, but it’s not cooperating with you. Big tech companies have exploited human language for AI gain. Now they want us to see their products as trustworthy collaborators
- TikTok AI Alive. TikTok has introduced AI Alive, a feature that brings static images in Stories to life by using smart editing tools to transform photos into short-form videos with dynamic effects.
- Audible is expanding its AI-narrated audiobook library. Audible, Amazon’s audiobook service, announced on Tuesday that it’s partnering with select publishers to convert print and e-books into AI-narrated audiobooks. This initiative aims to quickly expand its catalog as it competes with Apple, Spotify, and others in the rapidly growing audiobook market.
- Tencent hires WizardLM team, a Microsoft AI group with an odd history. WizardLM, a Beijing-based Microsoft AI research group, appears to have joined Tencent, the Chinese company that owns WeChat and blockbuster games like PUBG Mobile.
- Duolingo’s Push to Be an AI-First Company. Duolingo has declared a major transition to an AI-first company, embedding AI tools throughout its products and operations. Its new principles include initiating every task with AI, setting aside time for learning, encouraging thoughtful experimentation, and maintaining technical excellence. Leadership stressed that AI is meant to boost efficiency, not increase workload.
- LlamaCon Hackathon Winners. Meta’s inaugural LlamaCon Hackathon featured 238 participants developing projects with the Llama 4 toolset. From 44 submissions, winners were chosen for their innovation and technical execution and have now been announced.
- Microsoft is getting ready to host Elon Musk’s Grok AI model. Microsoft plans to host Elon Musk’s Grok AI on Azure AI Foundry despite potential tensions with OpenAI.
- Ai2’s new small AI model outperforms similarly-sized models from Google, Meta. Nonprofit AI research institute Ai2 on Thursday released Olmo 2 1B, a 1-billion-parameter model that Ai2 claims beats similarly-sized models from Google, Meta and Alibaba on several benchmarks
- Over 250 CEOs sign open letter supporting K-12 AI and computer science education. More than 250 CEOs signed an open letter published in The New York Times on Monday calling for AI and computer science to be “core components” of K-12 curricula.
- Musk’s AI Grok bot rants about ‘white genocide’ in South Africa in unrelated chats. X chatbot tells users it was ‘instructed by my creators’ to accept ‘white genocide as real and racially motivated’
- Ministers block Lords bid to make AI firms declare use of copyrighted content. Government uses arcane procedure to strip amendment passed by House of Lords from its data bill
- Labour’s open door to big tech leaves critics crying foul. Promises of tech-driven growth give big US firms access to Downing Street that leaves rivals in the cold
- AI scientist ‘team’ joins the search for extraterrestrial life. The collaborative system generated more than 100 hypotheses relating to the origins of life in the Universe.
- Largest US crypto exchange says cost of recent cyber-attack could reach $400m. Hackers paid overseas Coinbase employees for account data; company is offering $20m reward for information
- Trump agrees deal for UAE to build largest AI campus outside US. Agreement that would give Gulf country better access to advanced AI chips raises concerns over Chinese influence
- AI conjures up potential new antibody drugs in a matter of months. Company finds candidates that bind to tricky proteins that deliver chemical messages in and out of cells
- Klarna’s AI Retrenchment and Broader AI App Risk. Klarna’s retreat from aggressive AI investment highlights broader industry challenges with AI’s unpredictable behavior in real-world use. As a result, many AI applications may have to pivot toward narrow, deterministic use cases, revealing inflated valuations that obscure their true nature as standard SaaS products. Continuous AI system validation is emerging as a crucial next discipline.
- Google Deploys Gemini Nano on Device to Power New Scam Defenses. Google has integrated Gemini Nano, its on-device language model, into Chrome’s Enhanced Protection mode to detect previously unknown scams in real time across Search, Android, and Chrome.
- AWS Announce $5+ Billion AI Partnership in New Saudi Arabian AI Company. AWS and HUMAIN, a new AI firm founded by Crown Prince Mohammed bin Salman, are investing over $5 billion to establish an “AI Zone” in Saudi Arabia equipped with cutting-edge AWS AI infrastructure. HUMAIN has also secured AI partnerships with Cisco, AMD, Oracle, and NVIDIA.
- CEO Satya Nadella says up to 30% of Microsoft’s code is now written by AI. The growing role of AI in software engineering has sparked concerns about the future of programming jobs, but human expertise remains vital. At LlamaCon, the CEOs of Microsoft and Meta emphasized that AI now generates a large share of their companies’ code. While AI is effective at handling repetitive tasks, human oversight is still essential for managing complex projects and improving AI-produced code.
- Perplexity Selects PayPal to Power Agentic Commerce. Following OpenAI’s Shopify integration, Perplexity has partnered with PayPal to enable seamless purchasing directly within its AI search results.
- Grok really wanted people to know that claims of white genocide in South Africa are highly contentious. Grok kept bringing it up in response to seemingly unrelated posts.
- GPT-4.1. OpenAI has released GPT-4.1 in ChatGPT — it is accessible via the “more models” dropdown for Plus, Pro, and Team users.
- Windsurf releases suite of in-house coding models. After its acquisition by OpenAI, Windsurf has introduced a new family of models: the flagship SWE-1, comparable to Claude Sonnet 3.5; the unlimited-use SWE-1-lite; and the compact SWE-1-mini. Trained on incomplete code states and various work surfaces, these specialized models reflect a strategic move toward surpassing general-purpose frontier models over time.
- Google AI-Powered Accessibility Features. Google has introduced Gemini-based updates to Android and Chrome that enhance screen reading, speech recognition, and image understanding.
- Nous Research’s Psyche Network Taps Idle GPUs for AI Training. Psyche is a distributed training system on Solana that enables individuals with compatible hardware to contribute their GPUs for AI model training. Its first project, “Consilience,” aims to build a 40B parameter model trained on 20T tokens, marking the largest community-driven AI training initiative to date.
- FBI warns of ongoing scam that uses deepfake audio to impersonate government officials. The FBI has issued a warning about advanced scammers using AI-generated voice deepfakes to impersonate senior U.S. officials in schemes targeting government contacts. This alert comes after several high-profile incidents, including a LastPass breach involving a CEO deepfake and a political robocall last year that mimicked President Biden.
- Y Combinator Hosts First-Ever AI Startup School. The startup incubator is hosting an invite-only event on June 16–17th in San Francisco for 2,500 CS students and recent graduates.
- TikTok Expands Mental Health Support. TikTok has launched in-app meditation exercises and expanded its Mental Health Education Fund to promote accessible, reliable mental health information for users worldwide.
- HeyGen launches Avatar IV, its most advanced AI avatar model yet. HeyGen’s Avatar IV is a neural audio-to-expression engine that interprets vocal tone, rhythm, and emotion to drive photoreal facial motion from a single image.
Resources
- Llama-Nemotron: Efficient Reasoning Models. NVIDIA’s Llama-Nemotron series — LN-Nano (8B), LN-Super (49B), and LN-Ultra (253B) — introduces powerful, open reasoning models that rival or surpass DeepSeek-R1 while offering better efficiency. Key innovations include a dynamic reasoning toggle for inference-time control and a multi-stage training pipeline combining architecture search, distillation, and RL. LN-Ultra leads on reasoning benchmarks and chat alignment, with open weights, code, and data released to support open research.
- Optimizing GEMM with Thread Block Clusters. Thread block clusters and 2-SM UMMA instructions on Blackwell GPUs enable higher arithmetic intensity and more efficient memory transfers in GEMM workloads using CUTLASS.
- Meta AssetGen 2.0. Meta’s AssetGen 2.0 introduces updated diffusion-based models for creating detailed 3D meshes and textures from text and image prompts, offering improved consistency, accuracy, and view-aware texture resolution compared to the earlier version.
- Flow-GRPO for RL-Tuned Diffusion Models. Flow-GRPO integrates reinforcement learning into flow matching by converting ODEs into SDEs and using denoising reduction to enhance sample efficiency and alignment.
- DeerFlow. Bytedance’s DeerFlow is an open-source research assistant using a multi-agent system that integrates search engines, web crawlers, and Python tools to produce Deep Research-style reports and podcasts.
- Single-Image to 3D Avatars. SVAD merges video diffusion with 3D Gaussian Splatting to create high-quality animatable avatars from a single image, enabling real-time rendering.
- Amazon’s Warehouse Stowing Robot Shows Promise and Limitations. Amazon’s custom stowing robot performs on par with humans in warehouse tasks, showcasing the cutting edge of robotics. Its specialized hardware and AI vision enable large-scale handling of varied items, but a 14% failure rate illustrates why complete warehouse automation is still out of reach despite major progress.
- China built hundreds of AI data centers to catch the AI boom. Now many stand unused. China’s rapid expansion of AI infrastructure has resulted in significant overcapacity, with 80% of computing resources in over 500 new data centers remaining idle. The release of DeepSeek’s R1 model shifted market demand from training-oriented to inference-optimized hardware, leaving many centers outdated. Despite this correction, China continues to invest heavily in infrastructure to rival U.S. efforts such as the $500 billion Stargate project.
- OpenAI’s HealthBench. OpenAI’s HealthBench is a benchmark created in collaboration with 262 physicians to assess AI models on realistic medical dialogues.
- A Generalist Robot Policy Framework. UniVLA enables policy learning from unlabeled video across diverse robot embodiments by inferring task-centric latent actions.
- Bamba-9B-v2. IBM, Princeton, CMU, and UIUC have introduced Bamba v2, a Mamba2-based model that surpasses Llama 3.1 8B after training on 3 trillion tokens. Utilizing the Mamba2 architecture, Bamba v2 achieves 2 to 2.5 times faster inference and strong results on L1 and L2 benchmarks. The team aims to further optimize the model and encourages community involvement to improve its performance.
- Helium 1: a modular and multilingual LLM. Helium 1, a 2 billion parameter LLM, excels in European languages and is optimized for on-device use.
- Visual Autoregression Without Quantization. EAR presents a continuous visual autoregressive generation approach that eliminates the need for quantization by using strictly proper scoring rules, such as the energy score. This allows for direct generation in continuous data spaces without relying on probabilistic models.
- Unified Training and Sampling for Generative Models. UCGM provides a shared framework for training and sampling across multi-step and few-step continuous generative models.
- Hugging Face Fast Transcription Endpoint. Hugging Face has launched a new Whisper endpoint offering up to 8x faster transcription. It allows one-click deployment of optimized, cost-efficient models for speech-related tasks via its Inference Endpoints.
- Stability AI Text-to-Audio Model. Stability AI has open-sourced Stable Audio Open Small, a 341M parameter text-to-audio model optimized for Arm CPUs. It can produce 11-second audio clips on smartphones in under 8 seconds.
- Building Agents for Daily News Recaps with MCP, Q, and tmux. A Principal Applied Scientist at Amazon developed a smart news aggregation system using Amazon Q CLI and Model Control Protocol (MCP). It processes multiple news feeds at once through coordinated AI agents, generating outputs like category distributions and cross-source trend analysis to reveal patterns across various publications.
- Void: Open-Source AI Code Editor. Void, a VS Code fork, enables direct connections to AI models without sending data through third-party servers. It includes features like autocomplete, Agent Mode for full file and terminal interaction, Gather Mode for read-only operations, and checkpoints to track AI-suggested changes.
- Meta’s New Artifacts. Meta’s FAIR team has released datasets and models supporting molecular property prediction, diffusion modeling, and language learning neuroscience.
- A Visual Tool Use for AI Agents. OpenThinkIMG enables vision-language models to actively utilize visual tools through dynamic inference and distributed deployment. It features a new reinforcement learning approach called V-ToolRL and an efficient training pipeline designed to enhance multi-tool reasoning with images.
- Making complex text understandable: Minimally-lossy text simplification with Gemini. Developers leveraged Gemini models to automate prompt evaluation and refinement for text simplification, enhancing readability without losing meaning. The system uses LLMs to assess both clarity and fidelity, aligning more closely with human evaluations than traditional approaches. By iterating prompts automatically, it reduces manual work and enables highly effective simplification through a feedback loop powered by LLMs.
Perspectives
- Australia has been hesitant — but could robots soon be delivering your pizza? While there have been concerns over the safety and legal status of the technology, working models from local startups are showing its benefits
- For Silicon Valley, AI isn’t just about replacing some jobs. It’s about replacing all of them. AI will do the thinking, robots will do the doing. What place do humans have in this arrangement — and do tech CEOs care?
- What’s the carbon footprint of using ChatGPT? ChatGPT queries use much less energy than previously thought, with new estimates putting typical usage at just 0.3 Wh — ten times lower than earlier figures. Although AI’s total energy use is worth monitoring, individual text-based interactions have a minimal environmental impact, especially when compared to activities like transportation or heating.
- ‘AI models are capable of novel research’: OpenAI’s chief scientist on what to expect. Jakub Pachocki, who leads the firm’s development of advanced models, is excited to release an open version to researchers.
- Vision Language Models (Better, Faster, Stronger). Hugging Face has outlined how Vision Language Models have advanced with smaller, more capable architectures, enabling reasoning, video understanding, and multimodal agents.
- Journalists Reveal Nuanced Approaches to AI Integration. A survey of media professionals from outlets like Reuters, The Washington Post, VentureBeat, and 404 Media reveals that newsrooms are selectively integrating AI — using it for tasks like transcription, data analysis, and translation, but largely avoiding AI-generated content. While Reuters notes that AI now produces roughly 25% of its code, many journalists remain cautious, emphasizing audience trust and journalistic integrity over efficiency.
- ChatGPT is used for scientific research in countries where it’s prohibited. Researchers used a classifier to spot unique AI word choices — such as “delve” — in academic papers and found higher ChatGPT usage in countries where it’s banned by OpenAI. By August 2023, 22% of Chinese preprints contained AI-generated content, compared to 11% in countries with legal access, indicating restrictions are easily bypassed. While ChatGPT use was linked to more views and downloads, it had no effect on citations or journal acceptance.
- Conversational Interfaces: the Good, the Ugly & the Billion-Dollar Opportunity. Chat interfaces offer an easy entry point to LLMs for new users, but they’re ultimately a design limitation that makes users adjust to the model instead of the other way around. Future assistants will feature more adaptive interfaces and proactively convey what they can do.
- Is it OK for AI to write science papers? Nature survey shows researchers are split. Poll of 5,000 researchers finds contrasting views on when it’s acceptable to involve AI and what needs to be disclosed.
- AI’s Second-Order Effects. Founders should consider AI’s second-order impacts, such as shifts in workforce roles and regulatory demands, to drive sustainable growth. While first-order applications are becoming commoditized and competitive, real opportunities exist in addressing broader societal and economic changes spurred by AI. Building AI-native media and infrastructure can help tap into the transformative ways people respond to these disruptions.
- MCP is a powerful new AI coding technology: Understand the risks. Model Context Protocol (MCP), developed by Anthropic AI to link LLMs with tools and data, currently lacks built-in security features, raising serious concerns. Experts have highlighted risks such as prompt injections and tool tampering. Without stronger safeguards, developers and organizations should use MCP cautiously, emphasize robust security practices, and keep up with its evolving standards.
- OpenAI Engineers Reveal How ChatGPT Images Handled 100M New Users in One Week. OpenAI engineers shared how they handled the March launch of ChatGPT Images, which drew 100 million new users and 700 million images in its first week, with peak demand hitting 1 million new signups per hour during a viral surge in India. When their synchronous image generation system buckled under the pressure, the team rapidly rebuilt it into an asynchronous architecture during the launch.
- Agents, Tools, and Simulators. AI can be understood through three conceptual frameworks: as a tool, an agent, or a simulator — each offering unique perspectives on its potential and risks. Tools amplify human intent and need supervision; agents act autonomously to achieve goals; simulators replicate processes without inherent objectives. In the case of LLMs, simulator theory posits that they combine simulation with agent-like behavior, particularly when fine-tuned, reflecting a dual nature shaped by both their context of use and architectural design.
- How AI Agents Will Change the Web for Users and Developers. AI agents are set to reshape the web by autonomously interacting and sharing content, fundamentally changing user experiences and web development. This could lead to an “autonomous internet” where AI-driven interactions become the norm, influencing how content is structured, how payments work, and how businesses operate. Developers will need to adapt by building APIs tailored for AI agents and prioritizing scalable, personalized user experiences.
- a16z identifies nine key developer patterns in the AI era. Andreessen Horowitz identified nine key developer patterns emerging in the AI era. These patterns fundamentally reshaped how developers built software and what tools they used.
Meme of the week
What do you think about it? Some news that captured your attention? Let me know in the comments
If you have found this interesting:
You can look for my other articles, and you can also connect or reach me on LinkedIn. Check this repository containing weekly updated ML & AI news. I am open to collaborations and projects and you can reach me on LinkedIn. You can also subscribe for free to get notified when I publish a new story.
Here is the link to my GitHub repository, where I am collecting code and many resources related to machine learning, artificial intelligence, and more.