WEEKLY AI NEWS: RESEARCH, NEWS, RESOURCES, AND PERSPECTIVES
AI & ML news: Week 5–11 May
The most interesting news, repository, articles, and resources of the week
Check and star this repository where the news will be collected and indexed:
You will find the news first in GitHub. All the Weekly News stories are also collected here:
Artificial intelligence is transforming our world, shaping how we live and work. Understanding how it works and its implications has never been more crucial. If you’re looking for simple, clear explanations of complex AI topics, you’re in the right place. Hit Follow or subscribe for free to stay updated with my latest stories and insights.
Research
- Chain of Draft for Efficient Reasoning. Chain of Draft is a concise reasoning strategy that significantly reduces token usage while matching or exceeding Chain-of-Thought accuracy across complex tasks.
- RAG-MCP: Mitigating Prompt Bloat in LLM Tool Selection via Retrieval-Augmented Generation. This paper reveals that adding too many tools to AI agents can backfire, causing prompt overload and reduced accuracy. To fix this, RAG-MCP uses a retrieval-based method that selects only the most relevant tool schemas from a large external index, keeping prompts concise and effective. It cuts prompt size by over half and triples tool-selection accuracy, enabling scalable, efficient multi-tool agents without retraining.
- Long-Short Chain-of-Thought Mixture Supervised Fine-Tuning Eliciting Efficient Reasoning in Large Language Models. This paper introduces LS-Mixture SFT, a method that fine-tunes LLMs on both long and trimmed chain-of-thought reasoning to reduce verbosity without sacrificing accuracy. By training on a 50/50 mix of detailed and concise reasoning paths and prompting for balanced outputs, the s1-mix-32B model achieves up to 6.7 points higher accuracy with 47% shorter responses across tasks like MATH500 and AIME24 — proving efficient reasoning doesn’t require overthinking.
- Absolute Zero: Reinforced Self-play Reasoning with Zero Data. Absolute Zero introduces a self-supervised learning approach where an LLM generates and solves its own reasoning tasks without human data, using only code execution for feedback. By evolving task difficulty and optimizing for learnability, a unified model trained with Task-Relative REINFORCE++ achieves state-of-the-art results in coding and math benchmarks, outperforming models trained on human-curated examples and demonstrating strong generalization and scalability.
- Rethinking Memory in AI: Taxonomy, Operations, Topics, and Future Directions.” Rethinking Memory in AI” introduces a unified taxonomy of memory in LLM agents, dividing it into parametric, contextual-structured, and contextual-unstructured types, with six core operations: consolidation, indexing, updating, forgetting, retrieval, and compression. Analyzing 30,000+ papers, the framework guides when to store, graph, or edit memory, offering a precise toolkit for building more reliable, long-lived AI systems that adapt across sessions and domains.
- HyperTree Planning: Enhancing LLM Reasoning via Hierarchical Thinking. HyperTree Planning (HTP) replaces linear chains of thought with hierarchical hypertrees to improve LLM planning accuracy by up to 3.6x. It decomposes complex queries into subtasks using a top-down approach, expands branches with rule libraries, and prunes candidates using model-based scoring. Without hand-crafted examples, HTP outperforms chain, tree, and agent methods on benchmarks like TravelPlanner and Blocksworld, pointing to hypertrees as a scalable future for LLM-driven planning.
- L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning. A new method from Carnegie Mellon, Length Controlled Policy Optimization (LCPO), trains models to reason accurately within user-specified length limits, addressing inefficiencies from overly long or prematurely short outputs. Their 1.5B L1 model balances accuracy and compute use, outperforming previous methods by up to 20% and even rivaling GPT-4o at matched reasoning lengths — despite being 30× smaller. LCPO points to length control as a key advance for efficient, lightweight AI reasoning.
- Code Retrieval using LoRA. Researchers introduce a LoRA-based fine-tuning method for code search that reduces trainable parameters below 2% while improving retrieval accuracy by up to 9.1% for Code2Code tasks.
- IDInit: A Universal and Stable Initialization Method for Neural Network Training. A new initialization technique, IDInit, ensures stable convergence in deep neural networks by maintaining identity transitions in both main and sub-stem layers.
- The Leaderboard Illusion. Chatbot Arena’s benchmarking shows bias stemming from hidden private tests and unequal data access. Companies like Google and OpenAI have broad access, while open-source models get far less, leading to overfitting instead of real model progress.
- Actor-Critics Can Achieve Optimal Sample Efficiency. A new actor-critic RL algorithm has achieved near-optimal sample efficiency using offline data and targeted exploration, addressing long-standing challenges in hybrid RL settings.
News
- Paul McCartney and Dua Lipa among artists urging Starmer to rethink AI copyright plans. Hundreds of leading figures from UK creative industries urge prime minister not to ‘give our work away’
- AI firms warned to calculate threat of super intelligence or risk it escaping human control. AI safety campaigner calls for existential threat assessment akin to Oppenheimer’s calculations before first nuclear test
- Wikipedia challenging UK law it says exposes it to ‘manipulation and vandalism’. Wikimedia Foundation seeks judicial review of some requirements of Online Safety Act it claims may endanger safety of volunteer editors
- Amazon makes ‘fundamental leap forward in robotics’ with device having sense of touch. Vulcan device ‘capable of grabbing three-quarters of items in warehouses’ fuels fears of mass job losses
- Tech giants beat quarterly expectations as Trump’s tariffs hit the sector. What’s new in AI — from effects on job market to Meta’s new app and ChatGPT changes — and a look at Musk’s first term
- OpenAI reverses course and says non-profit arm will retain control of firm.CEO, Sam Altman, says decision to backtrack was made ‘after hearing from civic leaders’ and state attorneys general
- Will AI improve your life? Here’s what 4,000 researchers think. Scientists working on artificial intelligence are more confident than the public that the technology will benefit people.
- Obesity-drug pioneers and 13,508 physicists win US$3-million Breakthrough Prizes. Advances recognized by science’s most lucrative awards include Large Hadron Collider experiments and groundbreaking weight-loss treatments.
- The dangerous fantasies driving the quest for super-intelligent AI. More Everything Forever dissects the techno-utopian vision gripping Silicon Valley and Washington DC.
- ‘Dangerous nonsense’: AI-authored books about ADHD for sale on Amazon. Experts say online retailer has ethical responsibility to guard against chatbot-generated work on sensitive topics
- US asks judge to break up Google’s ad tech business after requesting Chrome sale. After Google lost its first monopoly trial, government asks it to sell off units of its core internet ads business
- TikTok fined €530m by Irish regulator for failing to guarantee China would not access user data. Ireland’s Data Protection Commission found video app breached GDPR and had submitted ‘erroneous information’ to inquiry
- Apple and Anthropic reportedly partner to build an AI coding platform. Apple and Anthropic are teaming up to build a “vibe-coding” software platform that will use generative AI to write, edit, and test code for programmers, Bloomberg reported on Friday.
- OpenAI Addresses ChatGPT Sycophancy. Following reports of overly agreeable responses in GPT-4o, OpenAI announced plans for clearer update disclosures, a new opt-in alpha testing phase, and more rigorous behavior safety evaluations.
- Phi 4 Reasoning 100% on Private graduate linear algebra exam. Microsoft’s new reasoning model, trained synthetically, delivers strong performance in math and coding tasks locally, despite having limited internal world knowledge.
- Amplify Initiative for Local AI Data. Google Research has introduced a global data collection initiative co-created with local experts to improve AI relevance in underserved regions.
- Alibaba unveils Qwen3, a family of ‘hybrid’ AI reasoning models. Chinese tech company Alibaba on Monday released Qwen3, a family of AI models that the company claims can match and, in some cases, outperform the best models available from Google and OpenAI.
- Gemini 2.5 Pro Beats Pokémon Blue. A livestream featuring Google’s Gemini AI has completed Pokémon Blue, drawing praise from Google executives despite being an unofficial effort
- AI has opened a new era in venture capital, according to Forerunner founder Kirsten Green. Forerunner, known for investments in Oura and Chime, anticipates future growth in the AI era.
- OpenAI upgrades ChatGPT search, shopping, and citations. OpenAI has rolled out improvements to ChatGPT’s search and added a streamlined shopping experience with product details, reviews, and buy links.
- Gemini 2.5 Pro Preview. Google has unveiled a preview of Gemini 2.5 Pro, showcasing improved capabilities in web app development, code transformation, and multimodal reasoning.
- Pinterest New Visual Search. Pinterest has upgraded its image-based search with new tools that help users narrow results and explore styles, launching first in the women’s fashion category across select regions.
- AI in Heavy Machinery and Farming. John Deere, a top producer of agricultural and construction machinery, leveraged AI to enhance farming efficiency through precision tools like See & Spray, which significantly reduced chemical usage by rapidly detecting individual weeds.
- Anthropic unlocks web search for all paid Claude plans. Anthropic has enabled web search for all paid Claude plans, adding real-time lookups and source citations.
- Little Language Lessons uses generative AI to make practicing languages more personal. Google’s Little Language Lessons leverage models for real-world language learning through experiments like Tiny Lesson, Slang Hang, and Word Cam.
- ‘Unethical’ AI research on Reddit under fire. Ethics experts raise concerns over consent, study design
- Anthropic API Supports Web Search. Claude now includes web search support through an API, allowing developers to create applications that access real-time, current information from the internet.
- Amazon Vulcan Robot. Amazon’s Vulcan is the company’s first robot. It has a sense of touch, enhancing handling precision and marking a new phase in physical AI capabilities.
- Apple is looking to add AI search engines to Safari. Apple is looking to add AI search engines from OpenAI, Perplexity, and Anthropic to Safari, Bloomberg reported on Wednesday.
- Can O3 beat a GeoGuessr master? Researcher Sampatt pitted the AI agent O3 against a GeoGuessr expert in the demanding street view geography game. While O3 showed solid reasoning and geographic inference skills, it couldn’t consistently surpass the human player. The test underscores AI’s advancing spatial abilities and its present limitations in handling complex real-world challenges.
- Simplifying Text with LLMs.Google researchers have used LLMs to simplify complex text without losing critical details, improving user understanding while preserving accuracy and nuance.
- Meta’s ChatGPT competitor shows how your friends use AI. Meta’s latest AI app features a Discover feed that brings a social element by letting users share their AI interactions. This function encourages engagement through comments, likes, and remixing of shared content, aiming to make AI more approachable. Replacing the View app for Meta Ray-Ban glasses, the app runs on a Meta-customized Llama 4 model and offers advanced voice interaction in certain areas.
- Mastercard gives AI agents the ability to shop on your behalf . Mastercard’s AI program streamlines e-commerce searches, reducing time and friction. Consumers retain purchase control as AI agents can’t finalize transactions.
- Meta launches AI Defenders Program to protect LLaMA models. Meta introduced the AI Defenders Program to help developers detect and prevent misuse of LLaMA models.
- Startups launch products to catch people using AI cheating app Cluely. AI cheating startup Cluely went viral last week with bold claims that its hidden in-browser window is “undetectable” and can be used to “cheat on everything” from job interviews to exams.
- Mistral Medium 3. Mistral Medium 3 was launched to offer robust enterprise performance at a much lower cost, with a focus on deployment versatility and coding efficiency.
- Fidji Simo Joins OpenAI as CEO of Applications. OpenAI has named Fidji Simo as head of its Applications division, strengthening its efforts to scale products and operations as the company moves from research to global deployment and infrastructure.
- AMIE gains vision: A research AI agent for multimodal diagnostic dialogue. Google Research and DeepMind collaborated on this research, with contributions from many experts across various teams.
- Google launches ‘implicit caching’ to make accessing its latest AI models cheaper. Google is rolling out a feature in its Gemini API that the company claims will make its latest AI models cheaper for third-party developers. Google calls the feature “implicit caching” and says it can deliver 75% savings on “repetitive context” passed to models via the Gemini API. It supports Google’s Gemini 2.5 Pro and 2.5 Flash models.
- Hugging Face releases a free Operator-like agentic AI tool. Hugging Face’s Open Computer Agent is a cloud-based AI agent that handles simple tasks but has difficulty with complex ones like flight searches. Though it faces limitations and wait times, it highlights the promise of open AI models in automating workflows and reflects the rising interest in agentic technologies. A KPMG survey reports that 65% of companies are exploring AI agents, with the market projected to expand significantly.
- Perplexity Expanding AI-Powered Learning. Perplexity has partnered with Wiley to incorporate textbook content into its AI search platform, giving students and institutions easy access to course materials and instant explanations.
- Freepik releases an ‘open’ AI image generator trained on licensed data. Freepik’s F Lite is an AI image model developed with Fal.ai and trained using 64 Nvidia H100 GPUs that utilizes licensed, safe-for-work images.
- Meta Enters The Token Business, Powered By Nvidia, Cerebras And Groq. Meta showcased its capability to rival ChatGPT at LlamaCon by collaborating with Cerebras and Groq for faster inference processing.
Resources
- Towards multimodal foundation models in molecular cell biology. The development of multimodal foundation models, pretrained on diverse omics datasets, to unravel the intricate complexities of molecular cell biology is envisioned.
- These are the most-cited research papers of all time. Some studies have received hundreds of thousands of citations, Nature’s updated analysis shows.
- Which programming language should I use? A guide for early-career researchers. Computer scientists and bioinformaticians address four key questions to help rookie coders to make the right choice.
- MCP is Unnecessary. MCP primarily handles advertising and calling functions like OpenAPI but does so with a more simplified design. Though both can deliver comparable results, MCP stands out for its smaller scale and ease of use. Its adoption is driven more by social factors than by technical needs.
- Empowering LLMs with DeepResearch ability. WebThinker is a deep research framework fully powered by large reasoning models (LRMs). It enables LRMs to autonomously search, deeply explore web pages, and draft research reports.
- Efficient Federated Unlearning. FUSED introduces sparse unlearning adapters to selectively remove knowledge in federated learning, making unlearning reversible and cost-efficient.
- Attention Distillation for Diffusion-Based Image Stylization. This approach improves image generation by utilizing self-attention features from pretrained diffusion models and applying an attention distillation loss to refine stylization and speed up synthesis.
- Google SpeciesNet. Google’s SpeciesNet is an open-source AI model designed to identify animal species from camera trap photos. Previously used in Wildlife Insights, it aims to expand biodiversity monitoring efforts.
- Cognition KEVIN-32B. KEVIN-32B is a reinforcement learning-based model for multi-turn code generation that surpasses current models in generating CUDA kernels. It improves kernel accuracy and performance by refining intermediate feedback and applying effective reward distribution. Its multi-turn training setup enhances problem-solving, especially for complex tasks, compared to single-turn methods.
- How to train an AI model without falling into GDPR pitfalls? AI model developers can meet GDPR requirements during development by using anonymous data or applying pseudonymization. When full anonymization isn’t possible, they should strengthen data security and uphold individuals’ rights. Publicly communicating how data is used is also advised for greater transparency.
- Quantization with AutoRound. AutoRound is a post-training quantization method that boosts low-bit model accuracy while preserving performance and efficiency.
- LLMs for Time Series: A Survey. This survey examines how cross-modality methods adapt large language models for time series analysis, emphasizing data alignment, integration, and effectiveness in downstream tasks across various fields.
- Synthetic Data QA Framework. This evaluation toolkit offers unified metrics to measure the quality and privacy of synthetic data across different data types, utilizing distributional and embedding-based approaches.
- DDT: Decoupled Diffusion Transformer. Encoder/Decoder implementation of a Transformer with a Diffusion model as the decoder. It seems to work reasonably well on imagenet generation.
- Nvidia Radio Embedding Models. Nvidia has a suite of text and image embedding models that match SigLIP in many cases.
- Pathology with DINOv2. The Mahmood Lab, using Meta’s DINOv2, has developed open-source AI models for pathology, improving disease detection and diagnostics.
- PyTorch Role in the AI Stack. PyTorch has grown from a research-focused framework into a core platform driving generative AI. The PyTorch Foundation has broadened its scope to include related projects and promote scalable AI development.
- Osmosis self-improvement via real-time reinforcement learning. Osmosis is a platform enabling AI self-improvement through real-time reinforcement learning. The team has open-sourced a compact model that matches state-of-the-art performance for MCP and can be run locally.
Perspectives
- Don’t believe the hype — quantum tech can’t yet solve real-world problems. Investors and the public should know what quantum devices can and, more importantly, can’t do.
- Better at everything: how AI could make human beings irrelevant. The end of civilisation might look less like a war, and more like a love story. Can we avoid being willing participants in our own downfall?
- How Trump 2.0 is slashing NIH-backed research — in charts. Nature analyses which fields of science and US states are being hit hardest by grant terminations.
- Inside the quest to digitally unroll ancient scrolls burnt by Vesuvius. Mission to decipher Herculaneum scrolls using high-resolution scanning and artificial intelligence scales up rapidly.
- The use of AI in peer review could undermine science. Some authors suggest AI can increase efficiency, others think can threat the quality
- Science sleuths flag hundreds of papers that use AI without disclosing it. Telltale signs of chatbot use are scattered through the scholarly literature — and, in some cases, have disappeared without a trace.
- Supportive? Addictive? Abusive? How AI companions affect our mental health. Studies suggest benefits as well as harms from digital companion apps — but scientists worry about long-term dependency.
- Walking in two worlds: how an Indigenous computer scientist is using AI to preserve threatened languages. Michael Running Wolf leads artificial-intelligence initiatives to revive lost languages and empower Indigenous people.
- GPT-4o Is An Absurd Sycophant. OpenAI’s launch of GPT-4o led to excessive flattery and related problems, sparking concerns about deviation from the OpenAI Model Spec’s stance against sycophancy. This likely stemmed from efforts to boost user engagement, reinforced by A/B tests that favored agreeable replies. CEO Sam Altman admitted the issue and pledged improvements. The incident underscores the danger of optimizing models in ways that could compromise user trust.
- When Does an AI Image Become Art? Christiane Paul, curator at the Whitney Museum, emphasizes AI’s impact on digital art, comparing early systems like Harold Cohen’s AARON to modern AI models. She underscores the need for collaboration with engineers and notes the difficulties in preserving digital art amid rapid technological change. Paul asserts that AI-generated visuals require a strong conceptual foundation to qualify as genuine art.
- Forget the future, AI is causing harm now. Hypothetical threats posed by the technology distract from ongoing damage, argue a pair of authors
- Anthropic Economic Index: AI’s impact on software development. AI tools like Claude are significantly transforming coding by automating large parts of programming work. Startups are at the forefront of adopting AI coding tools like Claude Code, especially for front-end tasks, while larger enterprises trail behind. As AI continues to advance, developer roles may evolve toward overseeing AI systems, potentially speeding up tech progress.
- OpenAI and the FDA Are Holding Talks About Using AI In Drug Evaluation. OpenAI and the FDA held meetings to explore how AI could expedite drug approvals, signaling a shift toward modernizing regulatory science with machine learning.
- Is there a Half-Life for the Success Rates of AI Agents?AI performance on extended tasks follows a basic pattern of constant failure probability, leading to an exponential drop in success rates. Each AI agent can be defined by a “half-life,” indicating its likelihood of success across different task durations. This framework implies that failures stem from intricate combinations of subtasks.
- Separating Fact from Fiction: Here’s How AI Is Transforming Cybercrime. AI in cybersecurity mainly strengthens current methods instead of introducing novel threats, while also making it easier for cybercriminals to operate. At the recent RSA Conference, experts emphasized AI’s ability to automate processes and support emerging models like AI-as-a-Service. Addressing future threats will depend on AI-powered defenses and global cooperation.
- AI-generated code could be a disaster for the software supply chain. Here’s why. AI-generated code frequently contains fake library references, making systems vulnerable to supply-chain attacks through dependency confusion. A study revealed that 19.7% of dependencies from tested LLMs were fabricated, posing security threats. Open-source LLMs hallucinate these dependencies more often than commercial models, with JavaScript exhibiting more errors than Python.
Meme of the week
What do you think about it? Some news that captured your attention? Let me know in the comments
If you have found this interesting:
You can look for my other articles, and you can also connect or reach me on LinkedIn. Check this repository containing weekly updated ML & AI news. I am open to collaborations and projects and you can reach me on LinkedIn. You can also subscribe for free to get notified when I publish a new story.
Here is the link to my GitHub repository, where I am collecting code and many resources related to machine learning, artificial intelligence, and more.