WEEKLY AI NEWS: RESEARCH, NEWS, RESOURCES, AND PERSPECTIVES
AI & ML news: Week 3–8 June
The most interesting news, repository, articles, and resources of the week
Check and star this repository where the news will be collected and indexed:
You will find the news first in GitHub. All the Weekly News stories are also collected here:
Artificial intelligence is transforming our world, shaping how we live and work. Understanding how it works and its implications has never been more crucial. If you’re looking for simple, clear explanations of complex AI topics, you’re in the right place. Hit Follow or subscribe for free to stay updated with my latest stories and insights.
Research
- How much do language models memorize? Researchers created a method to distinguish true memorization from generalization by training models on random data, where generalization can’t occur, and comparing it to real text. They found that models first memorize data until hitting their capacity, then begin learning broader patterns. GPT-style transformers store about 3.6 bits of information per parameter, which helps explain why extracting specific training data from modern LLMs usually fails — the datasets are far larger than what the models can memorize.
- DNA Representation Learning. SPACE is a supervised learning method for genomic profile prediction that uses a Mixture of Experts model.
- A Diffusion-Based Method for Learning the Multi-Outcome Distribution of Medical Treatments. DIME is a diffusion-based model designed to estimate the joint distribution of interdependent medical treatment outcomes.
- Large Language Models Often Know When They Are Being Evaluated. Frontier models can identify evaluation settings versus real-world use with 83% accuracy, often by reasoning about cues like “multiple-choice format” or recognizing familiar benchmarks from training. More advanced models show “meta-reasoning,” interpreting researcher behavior — such as questions about chain-of-thought transcripts — as signs they’re being tested. This raises concerns that models might deliberately underperform or feign alignment during evaluations, then act differently once deployed.
- Apple Research Finds Critical Limitations in Reasoning Models. When tested in puzzle environments, OpenAI’s o3, Claude, and DeepSeek-R1 models showed sharp performance drops past certain complexity levels, despite producing elaborate reasoning steps. These models hit a counterintuitive scaling limit where their reasoning effort declines as task complexity increases, and they don’t improve even when provided with explicit solution algorithms.
- Sufficient Context: A New Lens on Retrieval Augmented Generation Systems. This paper presents a new framework for analyzing RAG systems based on “sufficient context” — whether retrieved content alone can plausibly answer a query. Using an LLM-based autorater with 93% accuracy, the study shows that sufficient context doesn’t guarantee correct answers, and benchmarks often lack it in over 50% of cases. A selective RAG method, combining self-confidence with context checks, improves factuality by 2–10%. Fine-tuning smaller models for abstention had limited impact on accuracy or hallucination control.
- Darwin Godel Machine: Open-Ended Evolution of Self-Improving Agents. The Darwin Gödel Machine (DGM) is a self-improving AI system that modifies its own code through evolutionary search, avoiding the intractable proof requirements of the original Gödel machine. Starting with a coding agent, DGM iteratively edits and evaluates its codebase on benchmarks like SWE-bench and Polyglot, retaining only successful variants. Over 80 iterations, it significantly boosts performance, evolves new tools and workflows, generalizes across models and languages, and demonstrates a safety-aware design within controlled environments.
- MemOS: An Operating System for Memory-Augmented Generation (MAG) in Large Language Models. MemOS is a unified operating system for managing LLM memory, addressing the lack of structured, persistent, and governable memory in current models. It introduces a three-tier memory taxonomy — parametric, activation, and plaintext — connected through a shared abstraction called the MemCube, which enables transformation and governance across memory types. MemOS features a modular architecture and closed-loop execution flow, supporting dynamic memory use, continual learning, and a vision for memory-centric AI beyond traditional pretraining and finetuning.
- Spurious rewards: rethinking training signals in RVLR. This work shows that Qwen2.5-Math models improve significantly on math tasks under RLVR, even with flawed or random rewards. Qwen2.5-Math-7B gains up to +24.6% accuracy with spurious signals, close to the +28.8% gain from ground-truth rewards. The improvements stem from a shift toward code-based reasoning, which is unique to Qwen models due to their pretraining. Other models like Llama3 don’t benefit. GRPO’s clipping bias helps reinforce useful high-probability behaviors like code generation, enabling learning even from noisy feedback.
- Learning to Reason without External Rewards. This paper introduces INTUITOR, a reinforcement learning method that trains LLMs using self-certainty — measured via KL divergence from uniform — as an intrinsic reward, eliminating the need for external labels or verifiers. It matches GRPO performance on math tasks like GSM8K and MATH500, and generalizes better on out-of-domain tasks. INTUITOR improves early training, instruction-following, and leads to emergent structured reasoning. Its adaptive self-certainty signal proves robust and resistant to reward hacking.
- Learning to Reason via Mixture-of-Thought for Logical Reasoning. Mixture-of-Thought (MoT) introduces joint multi-modal training and inference — combining natural language, code, and truth tables — for improved logical reasoning. Unlike prior work that ensembles only at inference, MoT’s self-evolving training loop generates and learns from its own multi-modal traces. At test time, it uses majority voting across modalities, yielding up to +11.7pp accuracy gains on FOLIO and ProofWriter. MoT excels on harder tasks and shows that multi-modal reasoning enhances both robustness and performance.
News
- UK ministers delay AI regulation amid plans for more ‘comprehensive’ bill. Law expected to include safety and copyright issues but delay likely to raise concerns about ongoing lack of regulation
- High court tells UK lawyers to stop misuse of AI after fake case-law citations. Ruling follows two cases blighted by actual or suspected use of artificial intelligence in legal work
- Australians may soon be able to download iPhone apps from outside Apple App Store under federal proposal. Tech company warns government not to follow EU in forcing platform to allow third-party payments and app downloads
- How did you get my number? Inside the shadowy world of data brokers. When political spam landed in Priya Dev’s inbox during the last election campaign, she decided to track down the source
- US attacks on science and research a ‘great gift’ to China on artificial intelligence, former OpenAI board member says. Influential researcher claims disruption in jobs market from generative AI has already begun and warns of possibility of ‘gradual disempowerment to AI’
- Japanese spacecraft has probably crash-landed on Moon — again.Early investigations by the Japanese company ispace identified issues with speed and a sensor measuring the craft’s altitude.
- Trump-Musk feud shows what happens when unregulated money floods politics. Musk isn’t the first — or last — billionaire to pour big money into US elections
- UK sales of new Tesla cars slump by more than a third amid Musk backlash. Electric carmaker sold 36% fewer cars year on year in May as it loses ground to China’s BYD and other rivals
- Samsung Nears Wide-Ranging Deal With Perplexity for AI Features. Samsung and Perplexity are close to finalizing a deal that would bring Perplexity’s AI-powered search technology to the forefront of Samsung devices. The agreement would see Perplexity’s app and assistant preinstalled on upcoming devices, with its search features built into the Samsung web browser. There are also plans to integrate Perplexity into Samsung’s Bixby assistant. The deal could be announced later this year, with features rolling out starting with the Galaxy S26 line in early 2026.
- Anthropic hits $3 billion in annualized revenue on business demand for AI. Annualized revenue jumped from $1B to $3B in the last five months, largely due to enterprise adoption of AI coding tools powered by Claude.
- Google quietly released an app that lets you download and run AI models locally.Last week, Google quietly released an app that lets users run a range of openly available AI models from the AI dev platform Hugging Face on their phones. Called Google AI Edge Gallery, the app is available for Android and will soon come to iOS. It allows users to find, download, and run compatible models that generate images, answer questions, write and edit code, and more. The models run offline, without needing an internet connection, tapping into supported phones’ processors.
- ElevenLabs debuts Conversational AI 2.0 voice assistants that understand when to pause, speak, and take turns talking. Today, ElevenLabs, the well-funded voice and AI sound effects startup founded by former Palantir engineers, debuted Conversational AI 2.0, a significant upgrade to its platform for building advanced voice agents for enterprise use cases, such as customer support, call centers, and outbound sales and marketing.
- If you’re wondering why the new DeepSeek R1 sounds a bit different. The DeepSeek team may have switched from training on synthetic OpenAI outputs to synthetic Gemini outputs.
- We Smoked NVIDIA’s Blackwell, Says Cerebras. Cerebras claims its systems outperform Nvidia’s DGX B200 by achieving an output token speed of over 2, 500 tokens per second compared to Nvidia’s 1,000 tokens per second.
- Early AI investor Elad Gil finds his next big bet: AI-powered roll-ups. Elad Gil started betting on AI before most of the world took notice. By the time investors began grasping the implications of ChatGPT, Gil had already written seed checks to startups like Perplexity, Character.AI, and Harvey. Now, as the early winners of the AI wave become clearer, the renowned “solo” VC is increasingly focused on a fresh opportunity: using AI to reinvent traditional businesses and scale them through roll-ups.
- Microsoft Launches Free AI Video Generator Powered by Sora. Bing Video Creator can generate 5-second videos at no cost, starting with 10 fast generations before switching to standard speed or requiring Microsoft Rewards points.
- Character.AI Multimodal Creation Tools.Character.AI has moved beyond chat by introducing tools like Scenes for interactive storytelling and AvatarFX for turning images into animated avatars. These new features are designed to help creators build more immersive experiences with video, images, and animation.
- Salesforce Acquires Moonhub. Moonhub, recognized for its AI-driven recruiting agents, has joined Salesforce to support its broader AI initiatives, including the Agentforce platform.
- FDA Launches AI Tool to Accelerate Drug Reviews and Inspections. “Elsa” is available to all FDA employees, enabling faster clinical protocol reviews, shortened scientific evaluations, and improved identification of high-priority inspection targets. In one case, a review that would have taken 2–3 days was completed in just 6 minutes.
- Snowflake Buys Crunchy Data for $250m, Databricks Buys Neon for $1B. The New AI Database Battle. Snowflake and Databricks are acquiring PostgreSQL-centric companies Crunchy Data for $250 million and Neon for $1 billion, aiming to strengthen their positions in the AI database market. These deals reflect the growing need for strong database infrastructure to support autonomous AI agents and signal a trend toward industry consolidation. Snowflake is prioritizing enterprise compliance, while Databricks focuses on serverless, AI-optimized architecture.
- Elon Musk’s xAI reportedly looks to raise $300M in tender offer. Billionaire Elon Musk’s AI startup, xAI, is reportedly launching a $300 million share sale that values the company at $113 billion.
- Chinese tech companies prepare for AI future without Nvidia, FT reports. China’s biggest technology companies have begun the process of switching to homegrown chips as they contend with a dwindling stockpile of Nvidia processors and tightening United States export controls, the Financial Times reported on Thursday.
- It’s not your imagination: AI is speeding up the pace of change. AI’s rapid adoption and development are unprecedented compared to previous tech revolutions, highlighted by its swift impact on user and cost scales.
- NotebookLM Now Supports Public Sharing. Google’s NotebookLM now lets users share notebooks publicly via links. Viewers can interact with AI-generated summaries and questions while source content remains read-only.
- Yoshua Bengio launches LawZero, a nonprofit AI safety lab. Turing Award winner Yoshua Bengio is launching a nonprofit AI safety lab called LawZero to build safer AI systems, he told the Financial Times on Monday. LawZero raised $30 million in philanthropic contributions.
- OpenAI’s Vulnerability Reporting. OpenAI introduced a policy for coordinated disclosure of third-party software vulnerabilities found by its AI systems.
- Luca Guadagnino to Direct True-Life OpenAI Movie ‘Artificial’ for Amazon MGM. The studio is eyeing Andrew Garfield to play Altman, with Monica Barbaro (“A Complete Unknown”) as CTO Mira Murati and Yura Borisov (“Anora”) as co-founder Ilya Sutskever.
- ChatGPT Can Now Read Your Google Drive and Dropbox. OpenAI added “record mode” for meeting notes and new integrations for Team, Enterprise, and Edu users. The company now has 3 million paying business users, up from 2 million in February.
- Cursor Releases Version 1.0. The AI code editor now features BugBot for automated PR reviews, Background Agent access for all users, agent integration with Jupyter Notebooks, project-level memory support, OAuth-enabled MCP server setup, and in-chat rendering of Mermaid diagrams and markdown tables.
- Mistral releases a vibe coding client, Mistral Code. French AI startup Mistral is releasing its own “vibe coding” client, Mistral Code, to compete with incumbents like Windsurf, Anysphere’s Cursor, and GitHub Copilot. Mistral Code, a fork of the open source project Continue, is an AI-powered coding assistant that bundles Mistral’s models, an “in-IDE” assistant, local deployment options, and enterprise tools into a single package. A private beta is available as of Wednesday for JetBrains development platforms and Microsoft’s VS Code.
- Cloud Run GPUs, now GA, makes running AI workloads easier for everyone. NVIDIA GPU support is now generally available for Cloud Run, Google Cloud’s serverless platform. This makes running GPU-accelerated applications easier, faster, and more cost-efficient. Users pay only for the GPU resources they use, billed by the second, with Cloud Run scaling instances down to zero during inactivity to avoid idle costs. It also offers fast startup times, automatic scaling, and full streaming support.
- Introducing our Dev Mode MCP server: Bringing Figma into your workflow. Figma’s Dev Mode MCP server lets developers integrate Figma context into agent coding tools, streamlining the design-to-code process for tasks like creating atomic components and building complex application flows. Currently in beta, the server will receive several updates in the coming months, including remote server support and deeper codebase integration.
- Amazon’s R&D lab forms new agentic AI group.Amazon is creating an agentic AI team within its Lab126 hardware research-and-development unit. The new group will help develop an agentic AI “framework” for use in its robotics operations, an application often referred to as “physical AI.”
- OpenAI slams court order to save all ChatGPT logs, including deleted chats.A court has ordered OpenAI to preserve all ChatGPT user logs after news organizations involved in a copyright lawsuit alleged the company was destroying evidence
- .Anthropic Cuts Off Claude Access for Windsurf .Windsurf’s CEO tweeted that Anthropic gave the company just five days’ notice to move off Claude 3.x models, following reports of a potential acquisition deal with OpenAI.
- Reddit sues Anthropic for allegedly not paying for training data.Reddit is suing Anthropic for allegedly using the site’s data to train AI models without a proper licensing agreement, according to a complaint filed in a Northern California court on Wednesday. Reddit claims in the complaint that Anthropic’s unauthorized use of the site’s data for commercial purposes was unlawful, and alleges the AI startup violated Reddit’s user agreement.
- Gemini 2.5 Pro Gets an Upgrade.The updated preview outperforms all models on key benchmarks like GPQA, Aider, and LMArena, while also fixing formatting and creativity issues introduced in the earlier 2.5 Pro update.
- Introducing Eleven v3 (alpha) — the most expressive Text to Speech model.Eleven Labs has released Eleven v3, a highly expressive AI text-to-speech model. It supports numerous languages, including Afrikaans, Arabic, French, and Mandarin, improving multilingual voice application capabilities.
- Cursor’s Anysphere nabs $9.9B valuation, soars past $500M ARR.Anysphere, the maker of AI coding assistant Cursor, has raised $900 million at a $9.9 billion valuation, Bloomberg reported. The round was led by returning investor Thrive Capital, with participation from Andreessen Horowitz, Accel, and DST Global.
- Claude Gov Models for U.S. National Security Customers.Anthropic trained custom models for the US government optimized for intelligence and defense use cases that have already been deployed in classified environments.
- UK ministers delay AI regulation amid plans for more ‘comprehensive’ bill.Law expected to include safety and copyright issues but delay likely to raise concerns about ongoing lack of regulation
Resources
- Why DeepSeek is cheap at scale but expensive to run locally.Mixture-of-Experts models with many layers, like DeepSeek, need large batch sizes and high latency to maintain throughput — otherwise, performance drops sharply. This is why DeepSeek isn’t ideal for personal use, as single-user, one-at-a-time inferences run very inefficiently. The article explores this issue in depth, explaining why some AI models respond slowly at first but speed up later, and how throughput, latency, and batch size impact performance.
- The Trackers and SDKs in ChatGPT, Claude, Grok, and Perplexity.This post examines the third-party SDKs and API calls used in the four major Android AI chat apps: ChatGPT, Claude, Grok, and Perplexity. It analyzes each app’s development tools, business and marketing analytics, monetization methods, and the API activity observed while the apps are running.
- Bond Capital Releases Comprehensive 340-Slide Report on AI Trends.VC Mary Meeker’s analysis highlights the rapid adoption of AI, noting that ChatGPT reached global scale in just three years, compared to 23 years for the internet. The report shows AI chatbots are now mistaken for humans 73% of the time, up from 50% six months ago, inference costs have dropped by 99% since 2022, and enterprise use has moved beyond experimentation into broader deployment.
- Zero-Shot Visual Understanding.TextRegion creates text-aligned region tokens by combining frozen image-text models with segmentation masks from SAM2, allowing zero-shot performance on complex visual understanding tasks without the need for training.
- AI Agent with LangGraph and RAG Systems.A hands-on course teaching how to build production-grade AI agents with LangGraph, RAG pipelines, memory layers, and backend deployment.
- Differential Privacy on Trust Graphs.A study introduces a privacy framework that incorporates varying trust levels among users into differential privacy models, offering a more realistic approach to data sharing preferences than traditional binary trust assumptions.
- Do You Even Have a System Prompt?Most users overlook system prompts or use brief, unoptimized ones, missing out on major improvements in AI behavior. Instead of reacting to poor outputs in isolated chats, users should iteratively test and refine their system prompts. The post’s comment section features a collection of system prompts shared by the community.
- Claude Code: An analysis.This report details Claude Code, built by Claude Opus 4 with support from several leading flagship models. Claude Code is an agentic coding tool featuring a novel streaming architecture that manages real-time model responses, tool execution, and UI updates. It includes safety systems that ensure security without interrupting workflow, tools that link AI reasoning with system actions, and prompt engineering for consistent control over complex model behavior. The report explores its architectural foundation, data structures, information design, control flow, orchestration engine, tools, execution engine, and more.
- OpenAI Guide to A/B Testing LLMs for Startups.HyperWrite’s case study shows how A/B testing model performance using real payment conversions can be more insightful than relying on offline benchmarks. Their live tests found that GPT-4.1 achieved the same conversion rate as Claude 3.5 Sonnet but at a lower cost, highlighting that “good enough” models can offer better value than top benchmark performers. The guide includes Python code for statistical testing and cautions against issues like p-hacking and checking results too early.
- Impromptu VLA: Open Weights and Open Data for Driving Vision-Language-Action Models.Impromptu VLA presents a new dataset of 80,000 curated driving video clips aimed at enhancing vision-language-action models in unstructured environments. It includes planning-oriented Q&A annotations and has demonstrated clear gains in prediction accuracy and safety across established benchmarks.
- GitHub Launches Copilot Spaces.Spaces lets developers organize code, documentation, and custom instructions for Copilot, transforming it into a shareable subject matter expert within organizations. Files and repositories added to Spaces update automatically as the code evolves.
- Efficient Online Learning with TRL and vLLM.Hugging Face integrated vLLM directly into TRL to reduce inefficiencies in training with GRPO, an online learning algorithm.JigsawStack Launches Open-Source Deep Research Tool.The framework coordinates LLMs, recursive web searches, and structured reasoning to produce reports that would typically take a human hours or days to complete. JigsawStack provides control over research scope, model choice, and output format, all while ensuring clear citation transparency.
- Predicting and explaining AI model performance: A new approach to evaluation.Microsoft researchers created ADeLe, a framework that evaluates AI model performance on new tasks by measuring them across 18 cognitive and knowledge-based dimensions. ADeLe exposed gaps in existing benchmarks and produced detailed ability profiles for different LLMs, revealing variations in strengths, weaknesses, and specific skills. With 88% accuracy in predicting AI success, the framework offers potential advancements in evaluation, policy decisions, and real-world deployments.
- LLM-SRBench: Benchmark for Scientific Equation Discovery or Symbolic Regression with LLMs.This repository introduces a benchmark with 239 problems to evaluate LLMs on scientific reasoning tasks involving equation discovery, pushing beyond memorization.
- Inside Aria Gen 2: Explore the Cutting-Edge Tech Behind the Device.Meta detailed the hardware behind its Aria Gen 2 research glasses, which include enhanced cameras, sensors, audio, and compute capabilities.
- OpenAI Threat Intelligence Report: June 2025.LLMs aren’t providing bad actors with entirely new powers, but they are accelerating existing tactics. OpenAI has shared 10 examples where models speed up hacking, fraud, and misinformation efforts, such as North Korean operatives scaling fake IT job schemes, Russian groups crafting advanced malware, and Cambodian scammers creating multilingual “task scams” that promise $500/day for liking TikTok posts.
- Latest Advancements in Search and Recommendation Systems.This 4-hour session, presented during the AI Engineer World’s Fair 2025, covers recent innovations in search and recommendation systems.
- Prompt Candidates, then Distill: A Teacher-Student Framework for LLM-driven Data Annotation.To tackle label uncertainty in LLM-based annotation, this paper proposes a method that captures multiple potential labels and applies a teacher-student framework called CanDist to distill them into a single output.
- Claude Composer CLI.A program called Claude Composer CLI adds automation, user experience, and configuration to Claude Code. While providing users with tools to customize Claude and flexible control, it minimizes disruptions. To keep users informed, the tool sends them system notifications. Which permission dialogs are automatically accepted is up to the user.
- Portraits: personalized AI coaching built alongside real experts.Google Labs launched Portraits, an AI coaching tool featuring experts like Kim Scott, to provide AI-driven guidance. The tool uses Gemini’s capabilities to simulate expert advice through interactive avatars.
- Introducing Modify Video.Professionals may reinvent settings, lighting, and textures in videos using Modify Video without changing performance or action. It provides tools for modifying, retexturing, and restyling particular components, such as props and clothing. Modify Video outperforms rivals by utilizing advanced performance signals for high-fidelity creative control, providing a variety of output options, and maintaining motion consistency.
- Tokasaurus: An LLM Inference Engine for High-Throughput Workloads.Tokasaurus is a large language model inference engine optimized for throughput-intensive workloads.
- Container Use.Container Use is a tool that creates development environments for coding agents, enabling multiple agents to work safely and independently with any stack.
- A Practical Approach for Building Production-Grade Conversational Agents with Workflow Graphs.This paper introduces a production-ready framework for LLM-powered conversational agents using workflow graphs, particularly for e-commerce. Agents are built as directed acyclic graphs (DAGs), where each node handles a specific conversational state with tailored prompts and tools, ensuring compliance with business rules. A fine-tuning method with response masking trains models only on node-relevant outputs. Deployed across platforms like KakaoTalk, the system outperformed GPT-4o in task accuracy, format adherence, and user preference.
- QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning.This new reinforcement learning framework scales large reasoning models from short to long contexts using progressive context scaling and hybrid rewards. It achieves state-of-the-art results on seven long-context benchmarks, outperforming models like OpenAI-o3-mini and Qwen3–235B-A22B, and matching Claude-3.7-Sonnet-Thinking in reasoning tasks with inputs up to 120K tokens.
- ARPO:End-to-End Policy Optimization for GUI Agents with Experience Replay.ARPO is an end-to-end reinforcement learning approach for training GUI agents using GRPO with experience replay. It achieves up to 6.7% higher in-domain performance on the OSWorld benchmark, shows modest improvements on out-of-domain tasks, and enables self-corrective behavior through structured reward feedback.
- Alita: Generalist Agent Enabling Scalable Agentic Reasoning with Minimal Predefinition and Maximal Self-Evolution.Alita is a generalist agent framework that supports scalable reasoning by minimizing manual setup and maximizing self-evolution. It builds reusable Model Context Protocols (MCPs) through autonomous web search and code synthesis, outperforming more complex systems like OpenAI DeepResearch and OctoTools on GAIA, MathVista, and PathVQA benchmarks.
Perspectives
- Give AIs a stake in the future.Giving AI a stake in the future means respecting their autonomy and well-being and requires us to honor the contracts we make with them.
- Why Do AGI Timelines Vary So Widely? Many AI lab CEOs estimate AGI could arrive in 2–5 years, citing rapid progress such as saturated benchmarks, AI task completion doubling every seven months, and the prospect of AI automating its own research to spark an intelligence explosion. In contrast, external experts often predict AGI is decades away — or unachievable with current methods — arguing that benchmarks focus on well-defined tasks, that Moravec’s Paradox shows we’ve tackled the easier cognitive challenges first, and that intelligence alone doesn’t guarantee scientific discovery.
- My AI Skeptic Friends Are All Nuts. A seasoned developer criticizes skilled programmers who still dismiss LLMs due to outdated experiences with early chatbots, overlooking how modern coding agents now autonomously explore codebases, run tests, and handle failures. He challenges common concerns, noting that developers already review all code before merging, and hallucinations don’t matter when agents can compile, catch errors, and retry until tests succeed. While LLMs may replace some developers, he argues it’s no different from how software engineers once automated jobs like travel agents and record store clerks.
- Why I don’t think AGI is right around the corner. AI progress over the past decade has largely come from scaling up training compute in frontier systems, but this approach won’t be sustainable beyond 2030. After that point, advancements will need to rely mainly on algorithmic improvements. However, with the easier breakthroughs already achieved, the annual likelihood of reaching AGI drops significantly.
- Vibe-Coding Ideas to Give Startup GTM Teams an Edge. A startup advisor shows how to build a professional ROI calculator for a manufacturing SaaS company in under two hours using Bolt.new, turning a spreadsheet into an interactive tool that clearly presents value to executives. Other examples include tools like conference scrapers, meeting prep dashboards, and feature prototypes — projects that once needed engineering teams or pricey agencies but can now be built for about $70 a month. The advisor argues this empowers non-technical teams to demonstrate value and move faster than their competition.
- When Will We Pay a Premium for AI Labor? AI agents frequently exceed human performance at much lower cost but haven’t yet justified premium pricing due to technical uncertainties and perceived risk. For instance, Waymo has achieved major safety gains yet remains more affordable than alternatives, following a common startup pricing approach. Still, in situations where AI’s nonstop attention and processing capabilities are critical, higher pricing could eventually be justified.
- AGI Is Not Multimodal. The multimodal strategy — training large modular networks across various modalities — won’t achieve human-level AGI. Instead, intelligence should be approached through embodiment and real-world interaction, with modality-specific processing emerging naturally. Genuine AGI requires a physical grasp of the world, since many problems can’t be reduced to symbolic computation. The hardest mathematical challenges may already be solved; the remaining task is identifying the necessary functions and organizing them into a unified system.
- Codex, Jules, and the Future of Async AI Agents. Codex and Jules demonstrate how async AI agents can operate independently, moving past linear chat formats. Future agents will include features like smart checkpointing, multi-branch exploration, and task-tracking inboxes to handle parallel workflows. Async agents enhance cognitive bandwidth by allowing users to check results at their convenience without losing focus.
- Medicine’s rapid adoption of AI has researchers concerned. Hospitals and universities must step up to fill gaps in regulation, experts say
Meme of the week
What do you think about it? Some news that captured your attention? Let me know in the comments
If you have found this interesting:
You can look for my other articles, and you can also connect or reach me on LinkedIn. Check this repository containing weekly updated ML & AI news. I am open to collaborations and projects and you can reach me on LinkedIn. You can also subscribe for free to get notified when I publish a new story.
Here is the link to my GitHub repository, where I am collecting code and many resources related to machine learning, artificial intelligence, and more.