WEEKLY AI NEWS: RESEARCH, NEWS, RESOURCES, AND PERSPECTIVES
AI & ML news: Week 31 March — 6 April
OpenAI raises $40B from SoftBank, Isomorphic Labs (DeepMind) raises $600M to push AI-driven drug discovery, Gemini 2.5 Pro by Google, Amazon Nova Act, Satya Nadella says DeepSeek is Microsoft’s new benchmark after $80B AI investment, Earth AI finds minerals in ignored regions using its machine-learning platform, Trump extends TikTok sale deadline to avoid U.S. ban
The most interesting news, repository, articles, and resources of the week
Check and star this repository where the news will be collected and indexed:
You will find the news first in GitHub. All the Weekly News stories are also collected here:
Artificial intelligence is transforming our world, shaping how we live and work. Understanding how it works and its implications has never been more crucial. If you’re looking for simple, clear explanations of complex AI topics, you’re in the right place. Hit Follow or subscribe for free to stay updated with my latest stories and insights.
Research
- Tracing the thoughts of a large language model. Anthropic researchers introduce new interpretability tools for examining LLMs, using Claude 3.5 Haiku as a testbed. Their studies reveal insights into model internals, such as circuits, plans, and conceptual thinking. Key findings include Claude’s multilingual “language of thought,” where concepts like “small” are processed similarly across languages, enabling transfer learning. Claude also plans ahead, even in poetry, and computes sums with parallel circuits, explaining answers using human-style logic. The tools help detect unfaithful reasoning, where Claude fabricates steps to fit answers. Researchers can also intervene in multi-step reasoning, showing that Claude’s reasoning is dynamic. The tools also reveal that Claude’s hallucinations are caused by misfires in circuits and that jailbreaks can bypass safety features temporarily.
- Harmful Fine-Tuning Attacks. Researchers have identified weaknesses in current defenses against harmful fine-tuning attacks and introduced Panacea, an adaptive perturbation method that maintains model safety without compromising fine-tuning performance.AgentRxiv.Researchers from Johns Hopkins and ETH Zurich introduce AgentRxiv, a framework that allows LLM agents to autonomously generate and share research papers, similar to how human scientists collaborate. The system functions like an open-source preprint server for agents, enabling labs to upload, search, and refine papers iteratively. Using this framework, a single agent improved GPT-4o mini’s accuracy by 11.4% on the MATH-500 benchmark through better prompt strategies. The framework also improved other benchmarks, showing consistent performance gains across multiple LLMs. Collaboration between agent labs led to faster progress, with higher accuracy achieved by sharing results via AgentRxiv. Agents refine their own ideas without plagiarism, but the system requires further improvements in reliability and novelty guarantees.
- Neural Alignment via Speech Embeddings. Google Research and collaborators reveal significant similarities between LLM embeddings and human brain activity during conversation. Their findings show that embeddings from OpenAI’s Whisper model align with brain signals in regions responsible for speech, language, and motor planning. The study suggests a “soft hierarchy” in brain areas, with overlapping processing of speech and language. Brain regions also predict upcoming words, mirroring autoregressive LLM behavior. Additionally, the geometry of word relationships in brain activity reflects that of LLM embeddings, indicating convergent structures in language representation. Despite architectural differences — brains process speech serially, while Transformers process in parallel — these studies highlight potential for using LLMs to reverse-engineer the brain’s language mechanisms and inspire more brain-like AI models.
- Unlearning Sensitive Content from LLMs. This paper introduces a model merging technique that enables selective forgetting of sensitive content in large language models while retaining their general knowledge.
- Chain-of-Tools: Utilizing Massive Unseen Tools in the CoT Reasoning of Frozen Language Models. The paper introduces Chain-of-Tools (CoTools), a method allowing LLMs to incorporate external tools, including unseen ones, while maintaining chain-of-thought (CoT) reasoning. CoTools keeps the LLM’s parameters frozen and fine-tunes additional modules (Tool Judge and Tool Retriever) to interact with a wide array of tools. It represents tools as semantic vectors, allowing even unfamiliar tools to be used without retraining the model. CoTools integrates tool calls within the reasoning process, selecting the best tool from many based on query context, improving accuracy on complex tasks. Experiments on various benchmarks show significant improvements in tool-selection accuracy and overall performance, with CoTools successfully handling large and unseen toolsets.
- Structured Memory Augmentation for Smarter LLM Agents. MemInsight is a framework that autonomously enhances and organizes memory for LLM agents, improving context retention and retrieval. It uses a backbone LLM to mine and structure memory attributes from past conversations, organizing them into entity and conversation-centric augmentations. MemInsight outperforms traditional retrieval methods, achieving up to 34% higher recall on the LoCoMo QA dataset compared to Dense Passage Retrieval (RAG). It also improves movie recommendations by matching genres and reducing memory size by 90%, while increasing persuasive outputs by 12%. MemInsight can summarize long conversations using memory alone, achieving coherence similar to raw-dialogue baselines. The system shows minimal hallucinations and stable performance, particularly when using carefully selected models for memory augmentation.
- Anthropic Economic Index: Insights from Claude 3.7 Sonnet. Anthropic has launched the Economic Index, leveraging Claude Sonnet 3.7 to evaluate AI’s impact across various job sectors. The model analyzes productivity changes, automation risks, and labor market trends, offering data-driven insights to help policymakers and businesses adapt to AI-driven economic transformations.
- Investigating Affective Use and Emotional Well-being on ChatGPT. Researchers from OpenAI and MIT Media Lab examine the impact of emotionally engaging interactions with ChatGPT, particularly in Voice Mode, on user well-being. They combine a large-scale analysis of 4M+ conversations and 4,000+ surveys with a randomized controlled trial (RCT) involving 981 participants. The studies reveal that high usage, especially with voice interactions, is linked to emotional dependence, a preference for chatbot interactions, and discomfort with changes in voice/personality. Voice mode had mixed effects, improving well-being for some, but long-term usage led to increased loneliness and problematic use. A small group of users drove most emotionally charged conversations, forming pseudo-relationships with the model. Automated classifiers developed to analyze conversations mirrored user self-reports and highlighted the need for socioaffective alignment, urging developers to design models that support users without exploiting emotional needs.
- PLAY2PROMPT: Zero-shot Tool Instruction Optimization for LLM Agents via Tool Play. Researchers from MIT CSAIL and IBM introduce Play2Prompt, a framework that enables LLM agents to learn how to use external tools in a zero-shot manner, without needing labeled examples or high-quality documentation. Play2Prompt discovers tool usage patterns through trial-and-error API calls, generating query-answer pairs based on successful tool invocations. The system iteratively refines tool demonstrations and documentation via self-reflective beam search and rejection sampling. Play2Prompt shows strong zero-shot performance, improving accuracy by 5–7% on benchmark tasks and even boosting GPT-4o by up to 3.3%. It remains robust even with poor documentation and outperforming methods like EasyTool that rely on labeled examples, particularly in challenging tool settings.
- Global modules robustly emerge from local interactions and smooth gradients. The principle of peak selection is described, by which local interactions and smooth gradients drive self-organization of discrete global modules.
- Evolutionary optimization of model merging recipes. Akiba et al. developed an evolutionary approach to automatically merge artificial intelligence models, creating powerful hybrid models without extensive training. The method produces models with enhanced mathematical and visual capabilities that outperform larger models.
- Enhanced Cell Segmentation. CellVTA improves vision transformer-based models for cell instance segmentation by injecting high-resolution spatial features through a CNN-based adapter, achieving state-of-the-art performance on multiple datasets.
- Synthetic Data Generation Using Large Language Models: Advances in Text and Code. LLMs are increasingly employed to generate synthetic training data for language and code tasks, enhancing performance in low-resource settings through methods like prompt-based generation and self-refinement. The paper outlines advantages such as reduced cost and broader coverage, while also addressing challenges like factual inaccuracies and bias. It proposes mitigations and highlights future research directions in prompt automation and data quality evaluation.
- Current and Future Use of LLMs for Knowledge Work. A two-part survey of 216 and 107 participants shows that knowledge workers currently use LLMs for tasks such as code generation and text enhancement, but anticipate more integrated use within workflows and data systems. The results provide insights for shaping future design and adoption of generative AI in professional environments.
- Backdoor Attacks in CLIP. CLIP models are extremely susceptible to backdoor poisoning attacks, with almost perfect success rates using very little poisoned data. A practical way to detect this is by applying local outlier detection to reveal accidental backdoors in current datasets.
- Large Small Net. A new class of efficient vision models draws inspiration from the human visual system’s ability to process broad scenes while focusing on details, known as “See Large, Focus Small”. LSNet delivers leading performance with strong efficiency across multiple vision tasks and features a novel convolution kernel design.
News
- Trump to consider final proposal on TikTok future as US ban deadline looms. Owner ByteDance required to find non-Chinese buyer for video app’s American operations by Saturday
- UK needs to relax AI laws or risk transatlantic ties, thinktank warns. Tony Blair Institute says enforcing stricter licensing rules for copyright-protected material will threaten national security interests
- OpenAI raises $40bn in deal with SoftBank that values it at $300bn. Japanese investor to put $10bn at first into OpenAI and $30bn more by end of 2025 if certain conditions are met
- xAI acquires X in $80B all-stock deal. xAI has officially acquired X in an all-stock transaction that values the combined company at over $110 billion.
- Gemini 2.5: Our most intelligent AI model. Gemini 2.5 Pro, an advanced AI model, is topping LMArena benchmarks by a wide margin. It boosts performance and accuracy through enhanced reasoning, analyzing information and making informed decisions. The model builds on the advancements of Gemini 2.0 Flash Thinking.
- Announcing ARC-AGI-2 and ARC Prize 2025. The ARC Prize has introduced ARC-AGI-2, a demanding benchmark designed to push the development of general AI systems. Current models perform well below human levels. The ARC Prize 2025 competition, hosted on Kaggle with a $1 million prize pool, encourages open-source innovation by rewarding efficient and capable solutions to ARC-AGI-2 tasks.
- OpenAI reshuffles leadership as Sam Altman pivots to technical focus.In a significant executive shuffle announced Monday, OpenAI is expanding COO Brad Lightcap’s responsibilities while CEO Sam Altman shifts his attention more toward the company’s technical direction.
- Tim Cook says China’s DeepSeek AI is ‘excellent’ during visit to country. Despite DeepSeek AI’s security and privacy issues, Tim Cook praised it as “excellent” during his China visit. The AI, developed in China, rivals top global models at lower development costs but faces investigations in the US and Europe. Cook, who is attending the China Development Forum, often has to make diplomatic remarks about China due to Apple’s business interests there.
- Google’s AI-focused Education Tools AI. Google’s new AI-focused educational tools offer training for educators, resources for students, and broader access to Gemini for younger users.
- Microsoft announces security AI agents to help overwhelmed humans. Microsoft has introduced six AI-powered security agents for its Security Copilot to help teams handle phishing and data loss incidents more efficiently.
- Perplexity CEO Addresses Financial Rumors. Perplexity CEO Aravind Srinivas has denied financial trouble rumors, stating the company has healthy funding and no IPO plans before 2028.
- Amazon Nova Act . Amazon has launched Nova Act, an AI model that enables agents to operate within web browsers. A research preview SDK is available, allowing developers to build agents capable of executing complex, multi-step tasks by decomposing them into atomic commands and manipulating browser actions for greater reliability. Nova Act is designed to extend agent capabilities beyond basic API tasks, boosting business productivity and task automation.
- Runway releases an impressive new video-generating AI model. Runway has released its next-generation video model, which excels at prompt adherence and cinematic motion generation.
- OpenAI to release an Open Weight model. OpenAI is soliciting feedback for an open weight model that has reasoning.
- Earth AI’s algorithms found critical minerals in places everyone else ignored. Earth AI has identified promising mineral deposits in previously neglected areas of Australia through AI-driven analysis. Unlike traditional techniques, its technology rapidly scans vast regions to pinpoint potential sources of minerals such as copper and cobalt, marking a shift toward more efficient, AI-powered exploration in the mining industry.
- Quora’s Poe launches its most affordable subscription plan for $5/month. Quora’s chatbot app, Poe, launched new subscription plans at $5/month for 10,000 daily points and $250/month for 12.5 million points.
- Nvidia’s AI assistant is here to optimize your gaming PC. Nvidia’s Project G-Assist is a real AI assistant for RTX GPU owners that optimizes game settings, measures frame rates, and controls accessory lighting.
- Nvidia is reportedly in talks to acquire Lepton AI. The semiconductor giant is reportedly nearing a deal to acquire Lepton AI, a company that rents out servers that are powered by Nvidia’s AI chips
- OpenAI Announces $40B in New Funding. OpenAI has secured $40 billion in funding at a $300 billion valuation to advance AI research, scale infrastructure, and support its expanding user base. The company has also partnered with SoftBank to further accelerate AGI development.
- Gemini Robotics from Google DeepMind. Google DeepMind has unveiled its Gemini Robotics models, extending Gemini 2.0 with fine-tuning capabilities for executing physical actions.
- Nexthop AI Locks up $110M Led by Lightspeed. Nexthop AI has raised $110 million in a funding round led by Lightspeed Venture Partners to advance networking solutions for hyperscalers, with a focus on cost and power efficiency. The round also included investments from Kleiner Perkins, WestBridge Capital, Battery Ventures, and Emergent Ventures. CEO Anshul Sadana highlighted the company’s mission to innovate in collaboration with cloud providers.
- Alibaba Head Warns AI Industry Is Showing Signs of Bubble. Alibaba chairman Joe Tsai has cautioned about a possible AI bubble, citing massive data center investments without clear demand. With $52 billion already committed to AI development, concerns are growing over potential overinvestment. Recent events, such as turbulence around Chinese startup DeepSeek, have fueled investor anxiety about overpaying in the AI space.
- Amazon’s Alexa Fund is now backing AI startups. Amazon’s Alexa Fund is expanding its investment focus to include AI startups, investing in companies like NinjaTech AI, Hedra, Ario, and HeyBoss.
- Life-giving oxygen is wafting out of lakes worldwide. Machine-learning method shows declining oxygen levels in thousands of lakes as their waters warm.
- Mathematician who reshaped theory of symmetry wins Abel Prize. Masaki Kashiwara is the first Japanese person to be awarded the most prestigious prize in mathematics.
- ‘Meta has stolen books’: authors to protest in London against AI trained using ‘shadow library’. Writers will gather at the Facebook owner’s King’s Cross office in opposition to its use of the LibGen database to train its AI models
- Anthropic’s LLM for Education. Anthropic has launched Claude for Education, featuring tools like Learning Mode to encourage critical thinking. The initiative also includes broad university access through collaborations with major institutions and educational platforms.
- Claude Available for U.S. Government Use. Claude has attained FedRAMP High and IL-2 compliance via Google Cloud’s Vertex AI, enabling its use by federal agencies and defense organizations with stringent security requirements.
- Meta and UFC partner to enhance fan engagement with AI and VR. Meta Platforms and the Ultimate Fighting Championship (UFC) have formed a multiyear partnership to bring Meta’s AI and VR technologies to UFC events. The collaboration aims to deliver immersive fan experiences through devices like Meta Quest and AI-powered glasses. Meta’s branding will appear in the Octagon during fights, and Threads will be the official social media partner. Financial details of the deal were not disclosed.
- Satya Nadella: DeepSeek is the new bar for Microsoft’s AI success. Microsoft CEO Satya Nadella stressed the need to translate AI research into successful products after an $80 billion investment in AI. The company is prioritizing enhancements to offerings like Copilot and Muse while ensuring its AI efforts align with sustainability goals. Despite the growing demands of AI workloads, Microsoft remains committed to becoming carbon-negative by 2030.
- Alphabet’s AI drug discovery platform Isomorphic Labs raises $600M from Thrive. Isomorphic Labs, a DeepMind spinout, has raised $600 million from Thrive Capital to advance its AI-driven drug design platform. The funding will help expand its research team and move discovered drugs into clinical trials. The company also has partnerships with Eli Lilly and Novartis to leverage its AI model in pharmaceutical development.
- NotebookLM Adds Web-Based Source Discovery. Google’s NotebookLM now features a Discover tool that gathers curated web sources based on user-defined topics, making research and information collection more efficient.
- AI-Powered Conversational Videos. Tavus leverages Llama 3.3 to enable realistic AI-generated video conversations, integrating visual question-answering and multi-image reasoning to create lifelike digital interactions.
- Blanket ban on teen smartphone use ‘potentially detrimental’, says academic. Dr Amy Orben says there are no ‘one-size-fits-all answers’ given importance of access to online information
- Meta faces £1.8bn lawsuit over claims it inflamed violence in Ethiopia. Son of murdered academic calls on Facebook owner to ‘radically change how it moderates dangerous content’
- OpenAI just made its first cybersecurity investment. OpenAI has invested in Adaptive Security, a startup that uses AI to simulate and train employees to defend against social engineering attacks. The company raised $43 million in Series A funding and plans to strengthen its platform as AI-driven threats grow. Co-founded by veteran entrepreneur Brian Long, Adaptive Security has gained over 100 clients since its 2023 launch.
- OpenAI Nonprofit Guidance Commission. OpenAI is establishing a new expert commission to guide its philanthropic arm in supporting communities through AI, aiming to better align AI innovation with the practical needs of nonprofits.
- Google is shipping Gemini models faster than its AI safety reports. Google has introduced Gemini 2.5 Pro, an AI reasoning model that excels in coding and math. However, safety reports haven’t been released yet. Google intends to share them after gathering feedback from experimental deployments, a move that raises concerns about transparency. Although the company has committed to openness, its focus on rapid releases appears to conflict with standard responsible AI practices.
- Code with Claude Developer Conference. Anthropic has revealed its inaugural developer event, featuring practical sessions and guidance on building with Claude, scheduled for May in San Francisco.
- Devin, the viral coding AI agent, gets a new pay-as-you-go plan. Cognition, the startup behind the viral AI programming tool Devin, has introduced a new low-cost plan to incentivize signups.
- Trump extends deadline for TikTok sale to non-Chinese buyer to avoid ban. Deadline set by US president was supposed to be Saturday, with Trump now considering decreasing tariffs to get deal
- US authors’ copyright lawsuits against OpenAI and Microsoft combined in New York with newspaper actions. California cases over AI trainers’ use of work by writers including Ta-Nehisi Coates and Michael Chabon transferred to consolidate with New York suits from John Grisham and Jonathan Franzen and more
Resources
- Qwen2.5-Omni. Qwen2.5-Omni is an end-to-end multimodal model capable of perceiving and understanding text, audio, images, and video, while generating both text and speech in real-time. It features the Thinker-Talker architecture, where Thinker handles perception and text generation, and Talker generates speech, trained together for synchronized output. The model’s streaming-first design uses block-wise encoders and TMRoPE for real-time interaction. Trained on over 1.2 trillion tokens, Qwen2.5-Omni is fine-tuned for natural speech and performs well across multiple modalities. It achieves state-of-the-art results on OmniBench, outperforms previous models in ASR and TTS, and significantly closes the gap in voice-text instruction following.
- Test-Time Visual In-Context Tuning. A new method enables test-time adaptation of VICL models using only the test sample, enhancing generalization across different visual tasks under domain shifts.
- High-Fidelity Simultaneous Speech-To-Speech Translation. Kyutai has unveiled its latest audio system, a real-time audio-to-audio translation tool powered by a robust multi-stream transformer. It features expressive voice capabilities, delivering impressive performance in speech translation.
- Mobile-VideoGPT. A compact multimodal video model with under 1B parameters, incorporating dual visual encoders and token pruning to enable real-time inference on edge devices.
- Multimodal Adaptation Methods. A curated list of methods for multimodal adaptation, including traditional domain adaptation, test-time adaptation, and recent innovative approaches.
- ReAG — Reasoning Augmented Generation. Traditional Retrieval-Augmented Generation (RAG) systems use a two-step approach: semantic search retrieves documents based on surface-level similarity, followed by a language model generating responses from those documents. While effective, this often overlooks deeper context and introduces irrelevant information. ReAG — Reasoning Augmented Generation — proposes a stronger alternative by feeding raw documents directly into the language model, enabling it to process and integrate the full context. This unified method results in more accurate, nuanced, and context-aware outputs.
- Awesome Vision-to-Music Generation. A curated and regularly updated list of methods, datasets, and demos focused on converting visual inputs into music (V2M), showcasing both academic and industrial advancements in the field.
- Video Generation Faithfulness Benchmark. A benchmark designed to evaluate how accurately video generation aligns with the given prompt. It also introduces methods to improve the quality of generated videos in relation to the user’s input prompt.
- Optimal Stepsize in Diffusion Models. Optimal Stepsize for Diffusion Sampling (OSS) improves diffusion model sampling by learning efficient stepsize schedules using dynamic programming, achieving a 10× speedup with minimal loss in generation quality.
- SAMWISE video segmentation. This work gives SAM 2 open vocabulary segmentation and more precise semantic tracking over long videos.
- Orpheus.Orpheus is a text-to-speech system. It is easy to install and runs without a GPU, similar to Llama cpp.
- Video-R1. Video-R1 presents a rule-based reinforcement learning method for video reasoning, utilizing a temporal variant of GRPO and introducing new datasets. It is efficiently trainable on 4 H20 or 5 A100 GPUs.
- Fast Text-to-3D. Progressive Rendering Distillation enables training 3D generators from text prompts without ground-truth meshes, producing high-quality 3D meshes in just 1.2 seconds and outperforming previous approaches.
- TIDE for Underwater Scene Understanding. A text-to-image and dense annotation generation method for underwater scenes that produces high-quality synthetic datasets with consistent pixel-level labels.
- OpenAI launches OpenAI Academy, a free AI learning platform for everyone. OpenAI has introduced OpenAI Academy, a free platform offering AI courses that range from beginner content to advanced subjects like AI safety and governance. Designed for diverse audiences, the platform seeks to expand access to AI education and promote thoughtful engagement with its societal implications. Early feedback praises its accessibility and thorough approach to making AI more understandable worldwide.
- Video Motion Segmentation. Building on recent trends in tracking systems, this work incorporates dense pixel tracking to enhance long-term segmentation using Dino and SAM2.Open Hands Coding Model.A powerful 32B model fine-tuned with reinforcement learning on top of Qwen, outperforming many much larger models on agentic coding tasks.
- Open-Reasoner-Zero: An Open Source Approach to Scaling Up Reinforcement Learning on the Base Model. There’s a major open question in RL reasoning around whether a sufficiently strong base model is essential for emergent reasoning. This work explored various cases of scaling RL on a base model and found that strong base models significantly aid reasoning convergence.
- Easi3R: Estimating Disentangled Motion from DUSt3R Without Training. Easi3R is a 3D vision system designed to more accurately estimate 3D scenes with high motion. It significantly outperforms previous methods in full scene reconstruction by masking moving objects and learning them separately from the background.
- Benchmark for RL-based Video Understanding. SEED-Bench-R1 is a benchmark designed to assess post-training methods such as RL and SFT for multimodal LLMs on complex video-based tasks. It highlights RL’s advantages in perception and data efficiency while also revealing its difficulties in maintaining logical coherence.
- Flow Prediction for Autonomous Driving. UniOcc is a unified framework for forecasting and flow prediction in driving scenarios, designed for multi-dataset training and cross-domain evaluation across both real and synthetic environments.
- Paper Bench. OpenAI has introduced a new benchmark for academic paper creation that involves fully replicating selected papers. This includes comprehending their experiments and results, as well as generating original ideas, to evaluate deeper understanding and creativity in AI models.
- GeometryCrafter: Consistent Geometry Estimation for Open-world Videos with Diffusion Priors. A powerful model that leverages video diffusion as a prior for consistent geometry estimation over time. It operates at approximately 1.5 FPS for full point cloud estimation and also performs accurate camera pose estimation.
- DSO: Aligning 3D Generators with Simulation Feedback for Physical Soundness. Most 3D synthesized data is created with a focus on aesthetic quality, which often results in models that can’t stand or support themselves in physics-based environments. This work introduces slight fine-tuning to improve the physical plausibility of these models.
- Medical Reasoning Dataset. A large-scale medical reasoning dataset designed to enable faithful and explainable problem-solving in LLMs, aiming to advance research in medical AI.
- DeepMind’s Study for Kernel Fuzzing. Snowplow is a kernel fuzzer that uses a learned white-box mutator to enhance test mutation efficiency, leading to significantly improved code coverage and increased bug discovery in Linux kernel fuzzing.
- The hottest AI models, what they do, and how to use them. This article reviews the leading AI models released since 2024, showcasing their applications and strengths. Key highlights include OpenAI’s GPT-4.5 Orion for its robust world knowledge, Google’s Gemini 2.5 Pro for its coding capabilities, and Cohere’s Aya Vision, which stands out in image-related tasks. The overview helps simplify understanding the fast-changing AI landscape.
- DeepSite open source canvas. DeepSite is a DeepSeek powered open source canvas for “vibe coding” that updates apps in real time while the system writes the code.
- Articulated Kinematics Distillation from Video Diffusion Models. This work presents Articulated Kinematics Distillation (AKD), a framework that combines skeleton-based animation with generative diffusion models to generate high-fidelity, physically plausible character motions with lower complexity. It ensures structural consistency and surpasses existing methods in 3D coherence and expressive motion quality by employing Score Distillation Sampling for precise joint-level control.
- Enhanced LoRA-based Fine Tuning. MetaLoRA introduces dynamic parameter generation based on meta-learning principles, improving the flexibility and task-awareness of LoRA-based fine-tuning approaches.
- Pplx Cuda Kernels. Perplexity has open-sourced some of its MoE kernels that surpass DeepSeek in large-scale performance and offer more flexibility with fewer constraints on the MoE architecture.
- HateBench for Evaluating Hate Speech. HateBench offers a framework to assess hate speech detection models on content generated by LLMs, including a hand-labeled dataset and tools for analyzing subtle and adversarial hate campaigns.
- Zonos TTS. An impressive Apache 2.0 model for speech synthesis and voice cloning, featuring multilingual support and expressive real-time generation.
- Hugging Face’s AI Agents Course. Hugging Face released an AI agents course today. This free course will take you on a journey, from beginner to expert, in understanding, using, and building AI agents.
- The LLM Course from Hugging Face. Hugging Face has updated its well-known NLP course into a more comprehensive LLM curriculum, adding new chapters on fine-tuning, reasoning models, and current AI agent workflows.
Perspectives
- Tools and Weapons: Microsoft’s Story, Told by Its CEOs. Hosted by Microsoft Vice Chair and President Brad Smith, the Tools and Weapons podcast examines the global impact of technology. In recent episodes, Bill Gates, Steve Ballmer, and Satya Nadella reflect on Microsoft’s 50-year journey, discussing its past, present, and future.
- AI-powered therapy shows shocking results in mental health study. A Dartmouth study found that the AI therapy chatbot Therabot significantly alleviated symptoms in participants dealing with depression, anxiety, and eating disorders.
- Databricks Has a Trick That Lets AI Models Improve Themselves. Databricks has introduced Test-time Adaptive Optimization (TAO), a technique that uses reinforcement learning and synthetic data to enhance AI models without relying on clean labeled data.
- ‘Open source’ AI isn’t truly open — here’s how researchers can reclaim the term. Many firms are misusing the ‘open source’ label. The scientific community, which relies on transparency and replicability, must resist this trend.
- Transparency (in training data) is what we want. As more powerful generative AI tools appear on the market, legal debates about the use of copyrighted content to develop such tools are intensifying. To resolve these issues, transparency regarding which copyrighted data have been used and where in the AI training pipeline needs to be a starting point.
- How does the brain control consciousness? This deep-brain structure. In a world of constant stimulation, the thalamus filters which thoughts we become aware of and which we don’t.
- DeepMind’s Approach to AGI Safety. Google DeepMind has shared its approach to developing safe and secure artificial general intelligence, stressing the importance of strong oversight and technical safeguards as AGI capabilities advance.
Meme of the week
What do you think about it? Some news that captured your attention? Let me know in the comments
If you have found this interesting:
You can look for my other articles, and you can also connect or reach me on LinkedIn. Check this repository containing weekly updated ML & AI news. I am open to collaborations and projects and you can reach me on LinkedIn. You can also subscribe for free to get notified when I publish a new story.
Here is the link to my GitHub repository, where I am collecting code and many resources related to machine learning, artificial intelligence, and more.
or you may be interested in one of my recent articles: