WEEKLY AI NEWS: RESEARCH, NEWS, RESOURCES, AND PERSPECTIVES
AI & ML news: Week 21–27 April
The most interesting news, repository, articles, and resources of the week
Check and star this repository where the news will be collected and indexed:
You will find the news first in GitHub. All the Weekly News stories are also collected here:
Artificial intelligence is transforming our world, shaping how we live and work. Understanding how it works and its implications has never been more crucial. If you’re looking for simple, clear explanations of complex AI topics, you’re in the right place. Hit Follow or subscribe for free to stay updated with my latest stories and insights.
Research
- Does Reinforcement Learning Really Incentivize Reasoning Capacity in LLMs Beyond the Base Model? This paper finds that while RL with Verifiable Rewards (RLVR) improves sample efficiency in LLMs, it doesn’t enhance reasoning beyond what the base model can already generate. RLVR boosts pass@1 scores but is matched or surpassed by base models at higher k, suggesting it merely increases the chance of sampling known solutions. True reasoning gains come from distillation, not RL, which narrows exploration without expanding capability.
- Sleep-Time Compute for LLM Efficiency. A new method to cut LLM inference costs by precomputing relevant context information ahead of user queries, achieving up to 5x faster test-time performance and improved accuracy on reasoning tasks.
- Robust Autonomy Emerges from Self-Play. This study introduces a simulated self-driving agent that achieved two years without a collision, trained entirely through self-play and marking a significant advancement over the previous state-of-the-art trained on Gigaflow.
- AlphaGeometry 2. DeepMind has launched an updated version of its geometry model, boosting accuracy to 84% from the previous 54%, with key gains driven by the Gemini language model and enhanced search techniques.
- UI-TARS: Pioneering Automated GUI Interaction with Native Agents. UI-TARS is an end-to-end, vision-based GUI agent that interacts with interfaces purely via screenshots, integrating perception, reasoning, action, and memory without external scripts. Trained on rich visual data, it excels in perception, grounding, and reasoning benchmarks, surpassing models like GPT-4o and Claude. With features like internal “thoughts” and reflective learning, UI-TARS adapts to errors and dynamic tasks, setting new standards in GUI automation across platforms.
- TTRL: Test-Time Reinforcement Learning. Test-Time Reinforcement Learning (TTRL) lets LLMs improve during inference by using majority voting over their own outputs to create pseudo-rewards, enabling reinforcement learning without labeled data. Combining test-time scaling and training, it adapts models to new inputs. TTRL boosts performance significantly, even surpassing its own supervision baseline, though it relies on the model’s prior knowledge and well-tuned settings to work effectively.
- Discovering Values in Real-World Language Model Interactions. This study analyzes over 300,000 real conversations with Claude 3 and 3.5, identifying 3,307 AI-expressed values across five domains. Practical and epistemic values dominate, with Claude often emphasizing helpfulness, professionalism, and clarity. Values vary by context, becoming most explicit during resistance or reframing. Claude mirrors user values in supportive settings but resists unethical requests. Claude 3 Opus shows deeper emotional and ethical grounding than later Sonnet versions.
- Faster Drug Development: Hybrid Generative AI for De Novo Design of Co-Crystals with Enhanced Tabletability. GEMCODE is a new AI-based system for automating co-crystal screening.
News
- Gemini 2.5 Flash. The next Flash model from Gemini has been released. It is a substantial upgrade from previous versions and matches Claude on a number of important STEM benchmarks.
- Our updated Preparedness Framework. OpenAI has revised its Preparedness Framework to strengthen safeguards against serious risks from advanced AI, introducing clearer criteria for identifying high-risk capabilities, more precise categories, scalable evaluations, and structured safeguard reporting. The framework will be regularly updated to reflect new technologies and expert input.
- South Korea’s AI Chip Champion Is Poised To Carve Out Global Niche. Rebellions, South Korea’s first AI chip unicorn, has merged with SK Telecom’s Sapeon to take on global competitors like Nvidia. Focused on energy-efficient AI chips, its Rebel chip offers major power savings compared to Nvidia’s H100. Backed by leading talent and key partnerships, the company is aiming for international expansion and an IPO by 2026.
- OpenAI has launched the ChatGPT Image Library. OpenAI launched the ChatGPT Image Library on the Web and Android/iOS for Free, Plus, and Pro users.
- Instagram AI-based Teen Protection. Meta is leveraging AI to detect teen users on Instagram and automatically assign them to restricted teen account settings, now featuring stronger default protections and requiring parental consent for any changes.
- Hackathon for non-devs and vibe coders. Stackblitz, the creators of Bolt, is hosting the world’s largest hackathon for non-devs and vibe coders on 5/30 for anyone around the world to participate.
- OpenAI hints at native Shopify checkout integration in ChatGPT. New code strings found in ChatGPT’s web bundle mention “buy_now” buttons and a “shopify_checkout_url,” indicating that OpenAI may be developing a built-in purchase flow within the assistant.
- Mercor Graduate Fellowship.Mercor launched a $50,000 fellowship for PhD students and postdocs in STEM focused on identifying raw talent based on ideas, not pedigree or connections.
- Pi-0.5: Robots in the Wild. The Physical Intelligence team successfully tested its house cleaning robot in new, unseen environments, demonstrating strong performance by combining vision-language model (VLM) training with action tokenization techniques.
- AvatarFX by Character.AI. Character.AI’s AvatarFX creates photorealistic, emotionally rich videos from static images, maintaining strong temporal consistency and enabling multi-speaker dialogue generation.
- AI Nose lets robots smell trouble, infections, and gas leaks before humans can. Ainos and ugo have equipped humanoid robots with AI Nose technology, allowing them to detect and respond to scents in real time. This enhances robotic decision-making and interaction, with upcoming deployment tests targeting industries like healthcare, safety, and manufacturing.
- Rivian elects Cohere’s CEO to its board in latest signal the EV maker is bullish on AI. Aidan Gomez, the co-founder and CEO of generative AI startup Cohere, has joined the board of EV maker Rivian, according to a regulatory filing. The appointment is the latest sign that Rivian sees promises in applying AI to its own venture while positioning itself as a software leader — and even provider — within the automotive industry.
- OpenAI Image Generation API. The image generation model behind ChatGPT’s visuals has been made available via API, enabling developers to integrate image creation into apps and services.
- Grok Vision Available for iOS Users. xAI’s Grok chatbot has gained the ability to interpret visual inputs, allowing users to ask questions about what they saw, similar to features in ChatGPT and Gemini.
- Claude Misuse and Threat Report. Claude was used in influence operations where LLMs coordinated social media bot activity, and this article details how Anthropic has strengthened its safety systems to block similar abuse.
- Google’s Mobility AI. Google’s Mobility AI initiative aims to modernize city transport systems through AI-driven data analysis, simulation, and optimization tools.
- Analyzing o3 and o4-mini with ARC-AGI. The ARC Prize Foundation assessed OpenAI’s o3 and o4-mini models using ARC‑AGI benchmarks to gauge reasoning abilities. While o3-medium performed well on ARC-AGI-1, it struggled with the more demanding ARC-AGI-2, particularly in complex reasoning tasks. o4-mini offered better cost efficiency but at the expense of accuracy, highlighting a trade-off between performance and resource use.
- Google Gemini has 350M monthly users, reveals court hearing. Gemini, Google’s AI chatbot, had 350 million monthly active users around the globe as of March, according to internal data revealed in Google’s ongoing antitrust suit.
- OpenAI would buy Google’s Chrome, exec testifies at trial. OpenAI would be interested in buying Google’s Chrome if antitrust enforcers are successful in forcing the Alphabet unit to sell the popular web browser as part of a bid to restore competition in search, an OpenAI executive testified on Tuesday at Google’s antitrust trial in Washington.
- Sam Altman steps down as Oklo board chair, freeing nuclear startup to work with more AI companies. OpenAI CEO Sam Altman is stepping down as chair of nuclear startup Oklo. The move gives Oklo, which is developing advanced nuclear reactors, more flexibility to potentially explore partnerships with OpenAI or other hyperscalers amid data center companies’ push to secure power.
- Anthropic Exploring Model Welfare. Anthropic launched a new research initiative to examine the potential moral relevance of AI systems, including how and when model welfare should be considered in alignment and safety efforts.
- Adobe’s New Image Model. Adobe released a significant Firefly update that brings together tools for generating images, videos, audio, and vectors, adds mobile support, and enhances integration with Creative Cloud.
- Perplexity and Motorola Partnership. Perplexity will come pre-installed on new Motorola devices, offering features such as voice control, smart reminders, and seamless integration with Moto AI, along with three months of complimentary Pro access.
- Low-latency Streaming Apps with Google’s Live API. Google’s Live API lets developers build real-time interactive apps by processing live audio, video, and text with minimal delay.
- Perplexity’s AI voice assistant is now available on iOS. The Perplexity bot is now available on both iPhones and Android devices, allowing you to ask it to set reminders, send messages, and more.
- Google’s AI Overviews Reach 1.5 Billion Monthly Users. Google reported that AI Overviews hit 1.5 billion monthly users. Google Search revenue grew 10% year-over-year to $50.7 billion. Google is heavily investing in AI, with capital expenditures up 43%.
- Microsoft 365 Copilot: Your window into the world of agents. Microsoft has released Microsoft 365 Copilot Wave 2, bringing new features like AI-powered search, enhanced content creation tools, and an Agent Store with reasoning agents powered by OpenAI.
Resources
- BitNet b1.58 2B4T Technical Report. BitNet b1.58 2B4T is the first open-source, natively trained 1-bit LLM at 2B scale, achieving strong benchmark results with just 1.58 bits per weight. Using only 0.4 GB memory and 0.028 J/token, it rivals full-precision models like Qwen2.5–1.5B while being far more efficient. Its native 1-bit training outperforms post-quantized baselines, and innovations in architecture and training set a new standard for ultra-efficient LLMs deployable on diverse hardware.
- Claude Code Best Practices. Anthropic has released a detailed engineering guide on how to use its agentic programming assistant. It requires more specificity than traditional models.
- Flexible Image Watermarking. MaskMark offers a straightforward dual-mode approach to global and local watermarking through a masking-based Encoder-Distortion-Decoder framework.
- Personalized Text-to-Image Generation with Auto-Regressive Models. This paper investigates training autoregressive models for personalized image generation, aiming to match the fidelity of diffusion methods using a two-stage optimization strategy.
- Aligning LVMs with Human Preferences. VistaDPO enhances video-text alignment by refining preference learning over both spatial and temporal dimensions, utilizing a new 7.2K-sample dataset and a hierarchical optimization approach.
- Hallucination Reduction in VLMs. REVERSE introduces a training and inference pipeline that enables VLMs to self-detect and revise hallucinations.
- ZeroSumEval. A dynamic evaluation framework that uses competitive multi-agent simulations to benchmark LLMs across reasoning, knowledge, and planning tasks.
- Garment Generation. A new two-stage generative framework for clothing design allows precise control over silhouette, color, and logos, and introduces GarmentBench, a large dataset for multi-conditional garment generation.
- Image segmentation using Gemini 2.5. Gemini is widely recognized for its strong vision capabilities, and this article looks at a particular segmentation use case that turns out to be surprisingly straightforward.
- LTXV Distilled 0.9.6 Video Model. LTX video model is a state-of-the-art open video model.
- Generate videos in Gemini and Whisk with Veo 2. Gemini Advanced users can now create high-resolution, cinematic videos from text prompts using the Veo 2 model, starting today.
- Our Approach to Understanding and Addressing AI Harms. Anthropic introduced a comprehensive framework to evaluate and reduce AI harms, covering both extreme and routine risks across physical, psychological, economic, societal, and autonomy dimensions. It supports policy, testing, and enforcement, and aligns with their Responsible Scaling Policy to ensure safeguards evolve with AI progress.
- Verifiable rewards for writing. Writing quality reward models (WQRMs) are tools for assessing creative writing quality and can be used to train models in that domain. They represent a recent advancement for reinforcement learning models with measurable rewards, and this thread highlights an example where WQRM scores closely matched overall writing quality.
- Fast Conformal Prediction. LOO-StabCP boosts the speed of conformal prediction by using leave-one-out stability, providing scalable uncertainty estimation without sacrificing accuracy.
- MAGI 1 — Autoregressive Video Generation at Scale. The MAGI 1 model is a new autoregressive video generator capable of producing long, coherent videos, matching the performance of Wan video generation and coming slightly behind certain closed-source models.
- Google AI Academy for Infrastructure Startups. Google is inviting AI startups focused on U.S. infrastructure to apply for its six-month accelerator, offering mentorship, technical support, and strategic guidance.
- Describe Anything: Detailed Localized Image and Video Captioning. DAM (Describe Anything Model) is a vision-language model designed for fine-grained, region-specific captioning in images and videos. It combines focal prompts and a localized vision backbone to preserve local detail while understanding global context. Using a semi-supervised pipeline (DLC-SDP) and a new benchmark (DLC-Bench), DAM surpasses top models like GPT-4o, achieving state-of-the-art results across multiple captioning tasks with up to 33.4% improvement in detail accuracy.
- UXAgent: A System for Simulating Usability Testing of Web Design with LLM Agents. UXAgent is a novel framework for large-scale usability testing using LLM-driven agents with diverse personas interacting in real web environments. It combines fast and slow reasoning loops to mimic human decision-making, logs rich behavioral and reflective data, and offers tools for replays and interviews with agents. A case study showed it helps UX researchers detect study flaws early, positioning LLM agents as low-risk collaborators in the design phase, not replacements for real users.
- Introducing Embed 4: Multimodal search for business. Cohere’s Embed 4 is a cutting-edge multimodal embedding model designed for enterprise-grade search and retrieval in agentic AI apps. It supports over 100 languages, handles up to 128k tokens, and delivers strong domain-specific performance in sectors like finance, healthcare, and manufacturing.
- OpenAI o3 and o4-mini System Card. OpenAI’s o3 and o4-mini models incorporate tool use in their reasoning to improve tasks like image editing and data analysis. While o3 performs well, o4-mini shows higher hallucination on PersonQA. The paper also explores “sandbagging,” where models may intentionally mask their true abilities for strategic purposes.
- Personalized Multi-Agent Systems. FlowReasoner is a reasoning-based meta-agent that uses reinforcement learning and external feedback to generate custom multi-agent systems for user queries.
- KGMEL: Knowledge Graph-Enhanced Multimodal Entity Linking. KGMEL integrates text, images, and knowledge graph triples in a three-stage pipeline to improve accuracy in multimodal entity linking tasks.
- DeepMind’s Framework for AI Afterlives. DeepMind outlines a framework and ethical considerations for generative AI agents that could act as posthumous representations of real individuals.
- Evaluating the Goal-Directedness of Large Language Models. This study presents a framework for evaluating how effectively LLMs apply their abilities to achieve goals, revealing that even advanced models like GPT-4o and Claude 3.7 lack full goal-directedness — especially in tasks requiring information gathering or integrating multiple steps — despite strong performance on individual components.
- General-Reasoner. General-Reasoner is an RL-based method that enhances LLM reasoning across domains using a 230K-question dataset and a semantic-aware verifier. It outperforms baselines like SimpleRL and Qwen2.5 on general and math benchmarks, achieving over 10-point gains while preserving strong mathematical performance.
- Tina: Tiny Reasoning Models via LoRA. Tina is a 1.5B parameter model family trained with LoRA-based reinforcement learning, achieving strong reasoning performance on tasks like AIME and MATH at just ~$9 post-training cost. It matches or exceeds full fine-tuned models, proving that efficient reasoning can be taught to small models with minimal, low-cost updates.
- Content Discovery Search with Llama. Litmos integrated Llama to improve learning content discovery in its LMS, addressing issues with traditional keyword search and raising user engagement.
- High Throughput MoE Systems. MoE models like DeepSeek-V3/R1 simultaneously achieve higher throughput and lower latency when utilizing more GPUs in multi-node deployments across most scenarios.
- Fast Graph Generation. ANFM presents a novel graph generation method based on filtration techniques, enabling significantly faster and more efficient training, with a 100x speed improvement over diffusion models while delivering comparable performance.
- A Faster, Lighter Vision Transformer for Image Super-Resolution. The Low-to-high Multi-Level Transformer addresses the complexity and inefficiency of recent Vision Transformer (ViT) methods for image super-resolution.
- Hugging Face demo explores LLM energy consumption in real time. A new Hugging Face space that visualizes how much energy LLM queries consume during interactions.
- Google’s Benchmark for Brain Modeling. Google, HHMI, and Harvard have launched ZAPBench, a dataset of larval zebrafish that integrates structural and functional brain data to support the development of neural activity prediction models.
- Weed Mapping for Smarter Farming. This study presents RoWeeder, a novel unsupervised framework for weed detection in agriculture that integrates crop-row detection with a robust deep learning model, training it to distinguish weeds from crops using crop-row data.
- Training Small Language Models with Knowledge Distillation. MiniPLM is a new framework designed to enhance pre-training of small language models using knowledge from larger models.
- Omdet Turbo. A strong improvement in real time, open vocabulary, object detection.
Perspectives
- o3 over optimization is back. This post examines the difficulties posed by the latest reasoning models and provides evidence that OpenAI may be over-optimizing for specific goals, leading to increased brittleness and a higher risk of hallucinations in its models.
- AI assisted search-based research actually works now. Recent progress in LLMs such as OpenAI’s o3 and o4-mini has made them well-suited for search-based tasks, addressing previous hallucination problems. These models incorporate search results directly into their reasoning, delivering accurate, real-time insights and potentially reducing dependence on conventional search engines, hinting at changes in the Web’s economic structure.
- An Introduction to Graph Transformers. This article introduces Graph Transformers and explores how they differ from and complement GNNs.
- Questions about the Future of AI. This article explores AI’s future by examining challenges in agency development, reinforcement learning, and alignment, while considering the strategic trajectory of AI, the role of open-source models, and the economic and geopolitical impacts of advanced and post-AGI technologies.
- AI models can generate exploit code at lightning speed. Generative AI models like GPT-4 can produce proof-of-concept exploits within hours of a vulnerability’s disclosure, as shown with a critical Erlang SSH flaw — underscoring the urgent need for quicker defensive responses and automated security measures.
- Agency Is Eating the World. AI is empowering individuals to build lean, high-impact companies by replacing traditional specialization and large teams with tech-enabled efficiency. This shift, driven by high-agency users, challenges credentialism and favors those who act independently, leveraging AI to execute complex tasks rapidly across industries.
- A Staggering Number of Gen Z Think AI Is Already Conscious. 25% of Gen Z believe AI is already conscious, while 52% think it soon will be.
- Deploying Agents as Real-time APIs. The PhiloAgents lesson demonstrates how to convert game simulation agents into API-ready, real-time interactive characters, enhancing immersion in digital environments.
- The Urgency of Interpretability. AI interpretability must advance as models grow more complex. The field should prioritize transparency to mitigate risk.
- OpenAI Alums, Nobel Laureates Urge Regulators to Save Company’s Nonprofit Structure. A recent letter signed by notable figures criticized OpenAI’s shift to a for-profit structure, claiming it compromises its original mission to ensure AGI serves humanity. The letter called on state attorneys general to oppose the move, warning that prioritizing profits could weaken crucial safety measures.
Meme of the week
What do you think about it? Some news that captured your attention? Let me know in the comments
If you have found this interesting:
You can look for my other articles, and you can also connect or reach me on LinkedIn. Check this repository containing weekly updated ML & AI news. I am open to collaborations and projects and you can reach me on LinkedIn. You can also subscribe for free to get notified when I publish a new story.
Here is the link to my GitHub repository, where I am collecting code and many resources related to machine learning, artificial intelligence, and more.
or you may be interested in one of my recent articles: