WEEKLY AI NEWS: RESEARCH, NEWS, RESOURCES, AND PERSPECTIVES
AI & ML news: Week 14 — 20 April
The most interesting news, repository, articles, and resources of the week
Check and star this repository where the news will be collected and indexed:
You will find the news first in GitHub. All the Weekly News stories are also collected here:
Artificial intelligence is transforming our world, shaping how we live and work. Understanding how it works and its implications has never been more crucial. If you’re looking for simple, clear explanations of complex AI topics, you’re in the right place. Hit Follow or subscribe for free to stay updated with my latest stories and insights.
Research
- Document Reranking. LLM4Ranking is a recently introduced modular framework that works with both open and closed LLMs for document reranking, offering evaluation tools and reproducible benchmarks on well-known datasets.
- d1: Scaling Reasoning in Diffusion Large Language Models via Reinforcement Learning. The d1 framework enhances masked diffusion language models through a two-stage process: supervised fine-tuning on a small dataset followed by task-specific reinforcement learning using the novel diffu-GRPO method. This approach enables efficient gradient updates via random prompt masking, achieving strong performance gains on reasoning tasks like GSM8K and MATH500, outperforming similarly sized models while benefiting from longer outputs and faster convergence.
- Leveraging Reasoning Model Answers to Enhance Non-Reasoning Model Capability. Researchers show that smaller models can gain strong reasoning abilities by being fine-tuned on final answers (and optionally summarized reasoning) from large LLMs. Using a curated 1.3M-instance dataset, they test different distillation strategies, finding that training on final answers alone boosts math/coding accuracy, while combining with summarized thoughts aids alignment tasks. Results highlight trade-offs in including reasoning traces and suggest future blending techniques for improved performance.
- Visual Reasoning with Less Data. Using MCTS to quantify sample difficulty, ThinkLite-VL improves reasoning in VLMs with just 11k training samples and no distillation
- Reasoning Models Can Be Effective Without Thinking. The paper introduces NoThinking, a prompting method that skips explicit reasoning steps yet matches or outperforms traditional chain-of-thought approaches in tasks like math, coding, and theorem proving. By jumping directly to answers with a dummy “Thinking” block, it achieves better accuracy–latency tradeoffs, excels in low-token settings, and benefits from parallel decoding, making it both faster and more efficient across multiple benchmarks.
- SocioVerse: A World Model for Social Simulation Powered by LLM Agents and A Pool of 10 Million Real-World Users. SocioVerse, developed by Fudan University and collaborators, is a large-scale social simulation platform using LLM agents aligned with real-world data across environment, user demographics, interaction scenarios, and behavior. It achieves high accuracy in modeling elections, sentiment, and economic patterns, demonstrating the value of realistic user modeling. SocioVerse offers a scalable, flexible framework for testing sociopolitical hypotheses and bridging AI with social science.
- M1: Towards Scalable Test-Time Compute with Mamba Reasoning Models. M1 is a Mamba-based reasoning model trained with extended test-time computation, delivering solid performance — particularly on long-context tasks and throughput — though it doesn’t quite reach state-of-the-art levels.
- Large Reasoning Models as a Judge. JudgeLRM is a family of LLMs trained with reinforcement learning for judgment tasks. Unlike SFT, it excels in reasoning-heavy evaluations, outperforming models like GPT-4 and DeepSeek-R1.
- Conversational AI for Cells. C2S-Scale is a new family of LLMs that interprets single-cell data and translates biological signals into natural language for applications in personalized medicine and drug discovery.
- DocAgent: A Multi-Agent System for Automated Code Documentation Generation. Meta AI’s DocAgent is a tool-integrated framework that generates high-quality docstrings for complex codebases using a team of specialized agents and a topological traversal strategy. By parsing code dependencies and incrementally building context, it avoids token overflow and improves documentation quality. Evaluated on Python projects, DocAgent significantly outperforms baselines in completeness, helpfulness, and truthfulness, with its dependency-aware Navigator proving essential to its success.
News
- Trump warns exemptions on smartphones, electronics will be short-lived, promises future tariffs. The US president has said no one is ‘getting off the hook’, as he promises to launch a national security investigation into the semiconductor sector
- Legal Defense Fund exits Meta civil rights advisory group over DEI changes. Meta ending DEI programs, getting rid of factcheckers and changing content moderation policies led to LDF’s decision‘
- Amazon slayer’: the Dublin minnow taking on the giants in drone deliveries. The Guardian speaks to Manna Aero founder and orders coffee via startup’s app to be delivered to a suburban home
- OpenAI co-founder Ilya Sutskever’s Safe Superintelligence reportedly valued at $32B. Safe Superintelligence (SSI), the AI startup led by OpenAI’s co-founder and former chief scientist Ilya Sutskever, has raised an additional $2 billion in funding at a $32 billion valuation, according to the Financial Times.
- Airtable’s New AI Assistant. Airtable has introduced Airtable Assistant in beta, along with major updates that redefine app building within the platform. These enhancements simplify app development, automation, interface creation, and enable users to query enterprise data using natural language.
- Verified ID May Be Required for Future OpenAI API Access. OpenAI plans to gate access to certain upcoming models behind a new Verified Organization process that will require government-issued ID. Verification will be limited to one org per ID every 90 days.
- Google launches AI short film initiative. Google has collaborated with filmmakers to create short films centered on AI, seeking to examine the emotional and ethical aspects of living alongside artificial intelligence.
- YouTube supports the NO FAKES Act: Protecting creators and viewers in the age of AI. YouTube supports new legislation to combat AI-generated impersonations, reinforcing its commitment to protecting creators and viewers from deepfake harms.
- Canva unveils Visual Suite 2.0 with AI-powered productivity tools. At Canva Create 2025, Canva launched Visual Suite 2.0, featuring AI-powered tools such as Magic Studio, Canva Sheets, and Magic Charts to simplify design processes. The suite also offers Canva Code for building websites and an upgraded AI-enhanced Photo Editor, all designed to bring together design, data, and development in one platform.
- Samsung and Google Cloud Expand Partnership, Bring Gemini to Ballie, a Home AI Companion Robot by Samsung. Ballie, launching this summer, will provide personalized interactions and proactive home assistance using advanced multimodal reasoning.
- OpenAI GPT-4.1. OpenAI has released three new models through its API: GPT-4.1, GPT-4.1 mini, and GPT-4.1 nano. These models surpass GPT-4o and GPT-4o mini in all areas, showing significant improvements in coding and instruction following. They support up to 1 million tokens of context and demonstrate enhanced long-context understanding, with an updated knowledge cutoff of June 2024.
- AI company Hugging Face buys humanoid robot company Pollen Robotics. AI company Hugging Face is taking a big leap into robotics with the acquisition of humanoid robotics startup Pollen Robotics.
- DolphinGemma. DeepMind has unveiled DolphinGemma, a large language model created by Google to assist researchers in analyzing dolphin communication and potentially understanding their messages.
- 6 highlights from Google Cloud Next 25. Vertex AI has rolled out improvements to its video, image, speech, and music generation models, streamlining creative processes for businesses. Google AI is also supporting the development of specialized AI agents to boost productivity and security, with the new Agent2Agent Protocol enabling secure communication between agents across platforms.
- NVIDIA to Manufacture American-Made AI Supercomputers in US for First Time. NVIDIA is localizing AI hardware production by building factories in Texas and Arizona, aiming to produce Blackwell chips and AI supercomputers entirely within the U.S.
- Gemini Adds Question Generation to Google Classroom. Educators can now leverage Gemini in Google Classroom to create questions or quizzes from chosen text, boosting lesson engagement and simplifying content development.
- OpenAI is building a social network. OpenAI is working on its own X-like social network, according to multiple sources familiar with the matter. While the project is still in early stages, we’re told there’s an internal prototype focused on ChatGPT’s image generation that has a social feed.
- Nvidia H20 chip exports hit with license requirement by US government. Semiconductor giant Nvidia is facing unexpected new U.S. export controls on its H20 chips. In a filing Tuesday, Nvidia said it was informed by the U.S. government that it will need a license to export its H20 AI chips to China. This license will be required indefinitely, according to the filing — the U.S. government cited “risk that the [H20] may be used in … a supercomputer in China.”
- TxGemma Open Model for Therapeutic Development. Google has introduced a new variant of its Gemma models, specifically fine-tuned for therapeutic discovery in medical science. This model surpasses most specialized tools and all open general-purpose models in relevant benchmarks, highlighting its strength in biomedical research applications.
- OpenAI’s Updates Preparedness Framework. OpenAI has updated its safety strategy by introducing clearer risk categories, enhanced evaluation methods, and new protocols for handling advanced AI capabilities, aiming to improve oversight and mitigate potential harms more effectively.
- Canada Has Answer to Energy Needs in AI Race, Ex-Google CEO Says. Canada, home to 250 data centers, plans to grow its digital infrastructure by harnessing its rich hydro and nuclear energy resources. Conservative leader Pierre Poilievre supports ramping up resource production to enhance economic gains within the country.
- Notion Launches AI-Powered Email. Notion Mail is a Gmail-integrated client that helps users manage, search, and reply to email using AI.
- ChatGPT became the most downloaded app globally in March. ChatGPT became the most downloaded non-gaming app in March, surpassing Instagram and TikTok with 46 million downloads.
- Grok Canvas-like Tool for Document Creation. Grok, the chatbot from xAI, now includes Grok Studio, a canvas-like tool to build documents and basic apps. It’s now live for all users.
- Introducing OpenAI o3 and o4-mini. OpenAI has launched the new o3 and o4-mini models, enhancing ChatGPT’s tool usage and enabling quicker, more advanced reasoning with built-in web search, file analysis, and image generation.
- Assort Health Secures $26 Million. Assort Health, an AI-driven platform for handling patient calls, has secured new funding, raising its total to $26 million to further its goal of enhancing healthcare access. Since late 2024, its technology has driven 8x revenue growth by cutting call wait times and improving appointment accuracy, with strong patient satisfaction. Supported by top investors, Assort Health integrates with EHR systems, achieving 99% scheduling accuracy and over 90% issue resolution.
- Google Uses AI to Cut Scam Ads by 90%. Google’s 2024 Ads Safety Report highlights how LLM upgrades blocked billions of bad ads, suspended 700K+ scam accounts, and reduced impersonation scams significantly.
- Stable Diffusion Now Runs Faster on AMD GPUs. Stability AI and AMD optimized several Stable Diffusion models for Radeon GPUs and Ryzen AI, improving speed and performance for AMD users.
- OpenAI in talks to pay about $3 billion to acquire AI coding startup Windsurf. OpenAI is in talks to buy Windsurf, an artificial intelligence tool for coding help, for $3 billion, according to a person familiar with the matter. Windsurf, formerly known as Codeium, competes with Cursor, another popular AI coding tool, as well as existing AI coding features from companies such as Microsoft, Anthropic and OpenAI.
- Mistral Classifier Factory. Mistral, a French AI startup, has launched a new product that allows users to very quickly build and deploy custom classifiers for a whole variety of tasks (e.g., spam, moderation, and more).
- Goodfire raises $50m series A to steer and understand models. Goodfire is a mechanistic interpretability company with strong expertise in SAEs, among other things. It is working closely with closed and open model providers to steer, control, and understand model motivations and behavior.
- Visual Reasoning with OpenAI o3 and o4-mini. OpenAI’s latest visual models can reason with images through tool-augmented transformations, enabling a new level of multimodal understanding and step-by-step visual problem-solving.
- Cohere on Hugging Face Inference Providers. Cohere became the first model creator to directly host and serve its enterprise-focused AI models on Hugging Face.OpenAI Flex Processing.OpenAI has introduced Flex processing, a cost-saving API option that trades slower response times and intermittent availability for lower prices, ideal for non-production tasks.
- Anthropic enhances Claude with Research and Google Workspace integration. Anthropic has launched new Claude features: Research for autonomous multi-step search with citations and Google Workspace integration for context-aware assistance.
Resources
- Anthropic Education Report: How University Students Use Claude. Anthropic has released a great educational report on how different groups of university students are using AI. Most groups in STEM use it for homework help while groups in humanities use it less and generally more for ideation and brainstorming.
- 3D Object Part Segmentation. HoloPart is a semantic 3D Object segmentation model that can identify and separate a single 3D object into meaningful sub-pieces.
- Cluster-Driven Expert Pruning for Mixture-of-Experts Large Language Models. C-Prune is a two-stage pruning method that compresses Mixture-of-Experts models by clustering similar experts and pruning redundant clusters.
- Jax Recommendation Engine. A great recommendation engine with metrics, implementations of embedding models, and training infrastructure.
- Reasoning VLM from Kimi. An early open model for visual question answering, this compact model excels at grounded image-based questions, image captioning, and even some image-related math.
- Fully open fast inference models . Apriel models from ServiceNow research are designed for fast inference and showcase good performance.
- GUI-R1. GUI-R1, developed by researchers in Singapore and China, is a reinforcement learning framework that enhances GUI agents by using a unified action space and reinforcement fine-tuning, needing only 3,000 curated examples. It achieves superior performance and generalization across platforms like Windows, Mac, Android, and Web, outperforming models trained on millions of examples while remaining efficient and adaptable with minimal data.
- AgentA/B: Automated and Scalable Web A/BTesting with Interactive LLM Agents. AgentA/B is an automated A/B testing system that uses LLM-based agents to simulate realistic user behavior on live websites, enabling fast, low-risk UX evaluation. With modular components and DOM parsing for structured interactions, it replicates human-like shopping patterns and supports inclusive prototyping. Tests on Amazon showed agents responded meaningfully to interface changes, suggesting strong alignment with real user behavior and value as a pre-deployment testing layer.
- Seaweed-7B: Cost-Effective Training of Video Generation Foundation Model. ByteDance has published a paper demonstrating how to train a competitive 7B-parameter video generation model using a relatively modest compute budget of 655,000 H100 hours, achieving strong results on several challenging temporal tasks.
- PixelFlow: Pixel-Space Generative Models with Flow. Due to computational limits, most generative models for continuous signals work in latent space. This study presents a cascade approach that enables direct generation in pixel space, removing the requirement for a pretrained VAE.
- InteractVLM: 3D Interaction Reasoning from 2D Foundational Models. New VLM that can reason about contacts between humans in 3D and objects. It does so by leveraging a strong base model and lifting its reasoning into 3D with clever multi-view rendering.
- 3B parameter tokenizer. Scaling image tokenizers is difficult due to their tendency to collapse. This study presents GigaTok, a large-scale tokenizer that achieves excellent reconstruction quality, with stability and performance improved through decoder scaling and regularization.
- Improved MoE with C3PO. C3PO proposes a test-time optimization method that boosts accuracy in Mixture-of-Experts LLMs by adjusting expert weights using similar reference examples.
- BrowseComp Benchmark for Hard-to-Find Knowledge. OpenAI’s BrowseComp is a benchmark consisting of 1,266 tasks aimed at testing AI agents’ ability to browse the web and retrieve complex, hard-to-find information.
- DeepSeek to Open Source its Inference Engine. DeepSeek’s inference engine is built on VLLM, although it is now heavily modified.
- MoonDream 2.0 Release. MoonDream is a small, 2B parameter VLM that outperforms many open and closed models. It has recently gotten a strong upgrade on chartQA and a number of other useful benchmarks.
- Data Decide. AllenAI has released a tool that can be used to help decide which data to include in pre-training. This framework is quite useful for understanding what goes into a filtering run for pre-training.
- Conversion Rate Prediction in Ad Systems. Pinterest researchers have proposed a multitask framework that uses Deep Hierarchical Ensemble Networks to improve CVR predictions in ad systems. It shows state-of-the-art results through feature combination and auxiliary learning.
- Open Source OpenAI Production Kernels. OpenAI has open sourced some of its fp4 and MoE kernels to the Triton language GitHub.
- Nemotron H Models. Nvidia’s ADLR team has released the weights for its Nemotron hybrid Mamba models, which offer strong long-context handling and solid performance on general benchmarks, making them well-suited for tasks requiring extended reasoning or memory.
- Auto Deploy. A new way to transform PyTorch and Hugging Face models into a faster, deployable, format for fast inference.
- Latents for Generative Modeling. A top contender for blog post of the year for those into generative modeling, offering a clear breakdown of the history, core ideas, and major advancements in learned latents.
- NVIDIA’s Temporally Consistent Video Diffusion. NVIDIA’s EquivDM framework improves video diffusion by applying consistent noise, leading to better motion tracking and more 3D-consistent results with fewer sampling steps.
- Intellect 2 Distributed Training. Prime Intellect has developed a 32B fully distributed network trained with reinforcement learning for reasoning, and has open-sourced much of its code and valuable libraries.
- DeepMath dataset. 103K examples of highly filtered and decontaminated math problems for reasoning model training.
- Prima CPP. Prima CPP is an extension of llama.cpp that tries to enable mmaping of memory for large models to enable them to run on low RAM environments.
- Tile Language. Tile Language is a compact domain-specific language aimed at simplifying the creation of high-performance GPU/CPU kernels like GEMM, Dequant GEMM, FlashAttention, and LinearAttention. It uses a Python-like syntax built on TVM’s compiler stack, enabling developer productivity while preserving low-level optimizations for top-tier performance.
- Hugging Face Updated HELMET Benchmark. Hugging Face has expanded its HELMET benchmark to include more models and insights, helping researchers evaluate long-context LLMs like Phi-4 and Jamba 1.6.
- Junfeng5/Liquid_V1_7B. Liquid is a multimodal LLM that integrates visual comprehension and generation by tokenizing images into discrete codes.
- Efficient Line Art Colorization with Broader References. A new efficient long-context, fine-grained ID preservation framework for line art colorization delivers high accuracy, speed, and flexibility for comic coloring, converting black-and-white sketches into vivid illustrations by leveraging rich contextual references.
- Scene Captioning. 3D CoCa is a unified framework that combines vision-language contrastive learning and captioning for 3D scenes.
- DeepSpeed’s DeepCompile. The DeepSpeed team has integrated compilation into their distributed training workflow, significantly accelerating several performance bottlenecks using a modified version of torch compile.
- Speech Instruction Fine-Tuning Dataset. SIFT-50M (Speech Instruction Fine-Tuning) is a dataset of 50 million examples created for instruction fine-tuning and pre-training speech-text LLMs. Sourced from 14,000 hours of public speech data, it uses LLMs and expert models, spans five languages, and supports both speech understanding and controllable speech generation. It enriches existing datasets with instruction-based QA pairs and includes around 5 million examples for generation tasks.
- End-to-End Latent Diffusion Training with REPA-E. REPA-E enables stable, joint training of VAEs and latent diffusion models using a representation-alignment loss, achieving state-of-the-art results on ImageNet.
- Meta Releases Many New Artifacts. Meta has released an image Encoder, a VLM, a 3D object localization model based on JEPA, and weights for a BLT model that operates directly on bytes without tokenization.
- Create AI-generated soundtrack in Shorts with Dream Track. YouTube’s Dream Track is now accessible in the U.S. through YouTube Shorts and the YouTube Create app, offering AI-generated instrumental soundtracks for creators. These tracks can be globally remixed to produce unique Shorts, promoting collaboration, and are fully integrated with YouTube’s creation tools while following community guidelines.
- SWE-PolyBench: A multi-language benchmark for repository level evaluation of coding agents. SWE-PolyBench is a new benchmark for evaluating coding agents on real-world tasks in Java, JavaScript, TypeScript, and Python. It uses execution-based and syntax tree metrics, revealing that current agents struggle with complex problems and perform inconsistently across languages.
- A Survey of Frontiers in LLM Reasoning: Inference Scaling, Learning to Reason, and Agentic Systems. This survey organizes LLM reasoning methods by timing (inference-time vs. training) and architecture (standalone vs. agentic/multi-agent), spotlighting trends like learning-to-reason and agentic workflows. It reviews techniques including prompt design, output refinement, and training approaches like PPO and verifier-based learning.
- A Survey of Large Language Model-Powered Spatial Intelligence Across Scales: Advances in Embodied Agents, Smart Cities, and Earth Science. This paper surveys spatial intelligence across fields, linking human cognition to how LLMs manage spatial memory, reasoning, and representations. It proposes a unified framework bridging AI, robotics, urban planning, and earth science, emphasizing LLMs’ growing spatial abilities and interdisciplinary relevance.
Perspectives
- Business Leaders’ Thoughts on AI Possibilities. Executives from nine companies share how they’re leveraging Google Cloud’s AI tools to drive innovation across sectors, with over 600 real-world use cases highlighted.
- Jack Ma Advocates for AI to Serve Humanity, Not Dominate.Jack Ma emphasizes AI should enhance rather than dominate human life, calling for global cooperation on ethical standards to ensure technology supports societal welfare — echoing public concerns about responsible AI development.
Meme of the week
What do you think about it? Some news that captured your attention? Let me know in the comments
If you have found this interesting:
You can look for my other articles, and you can also connect or reach me on LinkedIn. Check this repository containing weekly updated ML & AI news. I am open to collaborations and projects and you can reach me on LinkedIn. You can also subscribe for free to get notified when I publish a new story.
Here is the link to my GitHub repository, where I am collecting code and many resources related to machine learning, artificial intelligence, and more.