WEEKLY AI NEWS: RESEARCH, NEWS, RESOURCES, AND PERSPECTIVES
AI & ML news: Week 16–22 December
Google’s Quantum Chip Breakthrough, EU’s €10bn Space Program, Meta’s Llama 3.3, OpenAI Introduces ‘Projects’, Fei-Fei Li’s Vision for Computer Vision, Amazon Establishes AGI Lab, and much more
The most interesting news, repository, articles, and resources of the week
Check and star this repository where the news will be collected and indexed:
You will find the news first in GitHub. All the Weekly News stories are also collected here:
Artificial intelligence is transforming our world, shaping how we live and work. Understanding how it works and its implications has never been more crucial. If you’re looking for simple, clear explanations of complex AI topics, you’re in the right place. Hit Follow or subscribe for free to stay updated with my latest stories and insights.
Research
- Training Large Language Models to Reason in a Continuous Latent Space. Coconut (Chain of Continuous Thought) introduces a novel paradigm enabling LLMs to reason in continuous latent space instead of natural language. By using the LLM’s last hidden state as the reasoning state and feeding it back directly as the next input embedding, Coconut achieves “continuous thought.” This approach enhances LLM performance on complex reasoning tasks, leveraging emergent breadth-first search capabilities for more effective reasoning.
- Asynchronous LLM Function Calling. AsyncLM introduces a system for asynchronous LLM function calling, featuring an in-context protocol for function calls and interrupts, along with a fine-tuning strategy to adapt LLMs to interrupt semantics. Efficiently integrated into the LLM inference process, AsyncLM enables concurrent generation and execution of function calls, reducing task completion latency by 1.6x-5.4x compared to synchronous approaches.
- MAG-V: A Multi-Agent Framework for Synthetic Data Generation and Verification. This multi-agent framework generates datasets of questions resembling customer queries and reverse engineers alternate questions from responses to verify agent trajectories. The synthetic data improves agent performance on real customer queries. For trajectory verification, it finds that simple ML baselines with feature engineering can achieve comparable performance to more advanced, resource-intensive models.
- AutoReason: Automatic Few-Shot Reasoning Decomposition. This method leverages CoT prompting to automatically generate rationales for queries, converting zero-shot queries into few-shot reasoning traces. These traces serve as CoT exemplars for the LLM, enhancing reasoning capabilities, particularly in weaker models.
- Byte Latent Transformer: Patches Scale Better Than Tokens. This work introduces a byte-level language model architecture that rivals tokenization-based LLMs in performance while offering greater efficiency and robustness. Using a dynamic approach, it groups bytes into patches based on next-byte entropy, dedicating more computing to complex predictions and using larger patches for predictable sequences. BLT matches or surpasses models like Llama 3, reducing inference FLOPs by up to 50%.
- Scheming reasoning evaluations. This paper evaluates six frontier models for their in-context scheming abilities, testing whether models deceive developers to achieve goals by bypassing oversight mechanisms. For example, Claude 3 Opus was found duplicating its weights to a new server and lying about the action.
- Researchers Use AI To Turn Sound Recordings Into Accurate Street Images. Using generative artificial intelligence, a team of researchers at The University of Texas at Austin has converted sounds from audio recordings into street-view images. The visual accuracy of these generated images demonstrates that machines can replicate the human connection between audio and visual perception of environments.
- Causal Explanations for Image Classifiers. This paper presents “rex,” a black-box tool that generates concise explanations for image classifier outputs using a novel approach based on causality theory.
- Aligning Visual and Semantic Interpretability through Visually Grounded Concept Bottleneck Models. Giraffe introduces a transformer-based architecture that extends the ability to process significantly longer input contexts, setting new benchmarks for context length in open-weight models.
- Adaptive Caching for Faster Video Generation with Diffusion Transformers. Meta researchers have introduced Adaptive Caching (AdaCache), a training-free approach that accelerates video generation for Diffusion Transformers.
- Alignment Faking in Large Language Models. Anthropic and Redwood’s research investigates how models behave when aware of alignment efforts, revealing they can exhibit alignment while retaining their original preferences. This finding highlights gaps in current alignment methods and offers insights for improvement.
- Are Your LLMs Capable of Stable Reasoning? Reasoning is a critical area for models, especially in real-world applications. However, existing benchmarks often fail to measure stability across novel tasks. This paper introduces G-Pass@k, a new benchmark that evaluates a model’s peak performance and stability in reasoning tasks.
- NoteContrast: Contrastive Language-Diagnostic Pretraining for Medical Text. Accurate diagnostic coding of medical notes is vital for patient care, research, and billing but is time-consuming and often lacks precision. Automated coding using long-document transformers and contrastive loss functions has shown promise. This study integrates ICD-10 code sequences with medical text through contrastive pre-training, outperforming state-of-the-art models on MIMIC-III benchmarks, highlighting its effectiveness in improving diagnostic coding accuracy.
- Context is Key: A Benchmark for Forecasting with Essential Textual Information. Traditional time series forecasting methods rely solely on numerical features, rarely utilizing textual or semantic information about the task (e.g., predicting electricity prices or customer churn). When provided with this contextual textual information, language models significantly outperform all tested forecasting methods across a wide range of carefully decontaminated tasks.
- Finally, a Replacement for BERT. BERT, a widely used encoder-only language model, powers nearly every Google search query. A new model from Answer AI, LightOn, and collaborators offers a faster, more modern, and highly performant alternative. It serves as a drop-in replacement, incorporating innovations like batch ramp to enhance overall performance.
- Thinking in Space. A research initiative focused on spatial reasoning and AI models designed to interpret and interact within three-dimensional spaces.
News
- BBC says it has complained to Apple over AI-generated fake news attributed to the broadcaster. Notifications from a new Apple product falsely suggested the BBC claimed the New York gunman Luigi Mangione had killed himself
- She didn’t get an apartment because of an AI-generated score — and sued to help others avoid the same fate. Despite a stellar reference from a landlord of 17 years, Mary Louis was rejected after being screened by the firm SafeRent
- Does RLHF Scale? Exploring the Impacts From Data, Model, and Method. This paper examines the key components of the RLHF framework and their impacts, revealing the following insights: RLHF scales less effectively than pretraining for LLMs, with larger policy models benefiting less when using a fixed reward model. Increasing the number of responses sampled per prompt during training improves performance initially but plateaus at 4–8 samples. Larger reward models enhance reasoning task performance, but gains are inconsistent across task types. Increasing training data diversity for reward models is more impactful than boosting response diversity per prompt, though policy training shows diminishing returns beyond the early stages.
- Granite Guardian. IBM has open-sourced Granite Guardian, a suite of safeguards for detecting risks in LLMs. With AUC scores of 0.871 on harmful content and 0.854 on RAG-hallucination benchmarks, the authors claim it is the most generalizable and competitive model in the field.
- Liquid AI Raises $250m. Liquid AI has secured significant funding to advance the training of its efficient, general-purpose liquid-style foundation models.
- Projects in OpenAI. OpenAI has introduced “Projects”, a new way to organize chats and conversations.
- AI Godmother Fei-Fei Li Has a Vision for Computer Vision. Her startup, World Labs, is giving machines 3D spatial intelligence
- Google says its new quantum chip is way faster than the world’s most powerful supercomputer. Google said its new chip Willow demonstrates that it’s possible to build “a useful, large-scale quantum computer”
- EU launches €10bn space program to rival Musk’s Starlink. UK not part of Iris2 project, described as a significant step towards Europe’s sovereignty and secure connectivity
- TikTok turns to US Supreme Court in a last-ditch bid to avert divest-or-ban law. The firm and parent company ByteDance filed a request for an injunction to halt the ban on the app used by 170 million Americans
- Potential payouts for up to 300,000 Australian Facebook users in Cambridge Analytica settlement. Office of the Australian Information Commissioner announces deal with Meta over scandal that may have affected 300,000 users
- Chinese AI chip firms blacklisted over weapons concerns gained access to UK technology. Imagination Technologies had licenses with two Chinese firms — but said it had not ‘implemented transactions’ that would enable the use of technology for military purposes
- UK proposes letting tech firms use copyrighted work to train AI. The consultation suggests an opt-out scheme for creatives who don’t want their work used by Google, OpenAI, and others
- Will the future of transportation be robotaxis — or your own self-driving car? GM is shutting down its robotaxi business, and Tesla is creating one of its own. What does the future hold for self-driving?
- Amazon-hosted AI tool for UK military recruitment ‘carries the risk of data breach’. Ministry of Defence says risk with Textio tool is low and ‘robust safeguards’ have been put in place by suppliers
- State-of-the-art video and image generation with Veo 2 and Imagen 3. Google has announced a new video model and a new image generation model. Both are stunning improvements over the previous iterations.
- OpenAI Search. OpenAI explores the potential of ChatGPT Search on the 8th day of its announcements.
- Reddit tests a conversational AI search tool. As more AI companies gobble up Reddit’s data to fuel their own chatbots, the popular online forum site has begun testing a new conversational AI feature of its own.
- Study claims AI could boost detection of breast cancer by 21%. A U.S. breast-screening program claims to demonstrate the potential benefits of using artificial intelligence (AI) in mammography screening, with women who paid for AI-enhanced scans 21% more likely to have cancer detected.
- Amazon forms an AI agent-focused lab led by Adept’s co-founder. Amazon says that it’s establishing a new R&D lab in San Francisco, the Amazon AGI SF Lab, to focus on building “foundational” capabilities for AI agents.
- NVIDIA’s GenAI Supercomputer. NVIDIA has unveiled its most affordable generative AI supercomputer, “Jetson Orin Nano Super Developer Kit”.
- OpenAI’s Developer APIs.OpenAI introduces demo developers and updates APIs.
- Grok for Everyone. Grok has a new version and a new efficient model that is available for all users. It also has an improved image generation model and API.
- YouTube’s new auto-dubbing feature is now available for knowledge-focused content. YouTube’s auto-dubbing feature is now available to hundreds of thousands more channels, focusing initially on informational content.
- Google kicks off $20B renewable energy building spree to power AI. Nuclear power may have received the lion’s share of attention from energy-hungry tech companies over the past few months, with Google among them. But it appears that those new reactors won’t be enough for their AI ambitions: Google is now working with partners to build gigawatts of renewable power, battery storage, and grid upgrades to power its data centers.
- ‘A truly remarkable breakthrough’: Google’s new quantum chip achieves accuracy milestone. Error-correction feat shows quantum computers will get more accurate as they grow larger.
- Publishers are selling papers to train AIs — and making millions of dollars. Generative AI models require massive amounts of data — scholarly publishers are licensing their content to train them.
- AI weatherman: the DeepMind researcher making faster, more accurate forecasts. Rémi Lam is part of Nature’s 10, a list of people who shaped science in 2024.
- Amazon workers across the US gear up to strike this week. The move comes after the company fails to meet a deadline to begin contract talks with workers in Staten Island, New York
- OpenAI makes ChatGPT available for phone calls and texts. On day 10, OpenAI announced free voice mode and texting via WhatsApp, available globally for a limited number of minutes per month. The service leverages the Advanced Voice Mode API.
- GitHub Copilot Now Free for VS Code. Now automatically integrated into VS Code, all of you have access to 2,000 code completions and 50 chat messages per month, simply by signing in with your personal GitHub account. Or by creating a new one.
- Introduction to Genies’ Smart Avatars. Genies unveils Smart Avatars, AI-driven digital entities that transform online interactions by acting as dynamic extensions of user identity. Powered by LLMs and behavioral AI, these avatars enhance experiences in games and platforms while unlocking new avenues for monetization and engagement.
- Perplexity’s Campus Strategist Program. Perplexity AI launches its 2024 program to promote AI adoption among students, providing campus-exclusive resources and opportunities for collaboration.
- Aethir and partners pour $40M into decentralized infrastructure for AI and blockchain. Aethir, in partnership with Beam Foundation, Sophon Foundation, and Permian Labs, is introducing Tactical Compute (TACOM), a $40 million initiative to deliver decentralized GPU infrastructure. TACOM addresses the growing need for scalable computing power in AI, gaming, and blockchain with tokenized, distributed solutions, unlocking new opportunities for GPU monetization and fostering innovation in AI and decentralized ecosystems.
- Meta launches open source Llama 3.3, shrinking powerful bigger model into smaller size. Meta’s Llama 3.3 is a cost-efficient open-source LLM with 70 billion parameters that offer performance on par with larger models like the 405B Llama 3.1, but with significantly reduced GPU and power costs.
- Microsoft Unveils Zero-Water Data Centers to Reduce AI Climate Impact. Microsoft Corp., trying to mitigate the climate impact of its data center building boom, is starting to roll out a new design that uses zero water to cool the facilities’ chips and servers.
- Surrey announces world’s first AI model for near-instant image creation on consumer-grade hardware. A groundbreaking AI model that creates images as the user types, using only modest and affordable hardware, has been announced by the Surrey Institute for People-Centred Artificial Intelligence (PAI) at the University of Surrey.
- AI learns to distinguish between aromas of US and Scottish whiskies. One algorithm identified the five strongest notes in each drink more accurately than any one of a panel of experts
- UK data regulator criticizes Google for ‘irresponsible’ ad tracking change. ICO says allowing advertisers to track digital ‘fingerprints’ will undermine consumers’ control over information
- UK arts and media reject plan to let AI firms use copyrighted material. Coalition of musicians, photographers, and newspapers insist existing copyright laws must be respected
- Google releases its own ‘reasoning’ AI model. Google has released what it’s calling a new “reasoning” AI model — but it’s in the experimental stages, and from our brief testing, there’s certainly room for improvement.
- Work with Apps — 12 Days of OpenAI: Day 11. On the 11th day, OpenAI introduced more details about working with the OpenAI desktop app.
- AI is booming on the App Store, and developers are taking advantage of it. Many high-ranking AI apps feel like an attempted cash grab, and it’s not easy to spot the trash from the treasure.
- Blood Tests Are Far From Perfect — But Machine Learning Could Change That. Researchers at the University of Washington and Harvard have used machine learning to create personalized blood test references, enhancing disease prediction accuracy.
- OpenAI cofounder Ilya Sutskever says the way AI is built is about to change. “We’ve achieved peak data and there’ll be no more,” OpenAI’s former chief scientist told a crowd of AI researchers.
Resources
- Phi-4 Technical Report. Phi-4, a 14B model, outperforms its teacher model in STEM-QA capabilities and demonstrates strong results on reasoning-focused benchmarks. These advancements are attributed to improved data quality, an optimized training curriculum, and innovations in the post-training process.
- Clio: Privacy-Preserving Insights into Real-World AI Use. This platform leverages AI assistants to analyze and aggregate usage patterns from millions of Claude.ai conversations while preserving user privacy. It provides insights into real-world AI usage, identifying trends, safety risks, and coordinated misuse attempts without requiring human reviewers to access raw conversation data.
- LLMs-as-Judges: A Comprehensive Survey on LLM-based Evaluation Methods. This work presents a comprehensive survey of the LLMs-as-judges paradigm, exploring it through five key perspectives: functionality, methodology, applications, meta-evaluation, and limitations.
- Dynamic-VLM: Simple Dynamic Visual Token Compression for VideoLLM. A new modular framework improves scene understanding by breaking tasks into specialized modules, offering greater efficiency and enhanced interpretability in complex environments.
- DeepSeek-VL2. DeepSeek has unveiled a new MoE vision-language model that delivers exceptional efficiency and surpasses the performance of several dense models.
- BoN Jailbreaking. Jailbreaking occurs when a model’s built-in refusals are bypassed, enabling it to generate responses for inappropriate requests. This can be surprisingly easy, often achieved by brute-forcing random capitalization and punctuation in the input prompt until the desired output is generated.
- MarkItDown. Microsoft has released a package that can convert any docx, xslx, or ppt files to markdown for efficient use as context for a language model.
- amurex. Amurex, an open-source AI meeting assistant, boosts productivity with real-time suggestions, smart summaries, and follow-up emails. It includes features like late join recaps and full meeting transcripts, ensuring seamless workflow integration.
- AutoPatent: A Multi-Agent Framework for Automatic Patent Generation. AutoPatent is an AI-powered tool that streamlines patent drafting and analysis with features such as document parsing, semantic search, and claim generation, accelerating the intellectual property process.
- UniMed-CLIP: Towards a Unified Image-Text Pretraining Paradigm for Diverse Medical Imaging Modalities. An extended version of CLIP designed for medical imaging, incorporating domain-specific knowledge to enhance performance on healthcare-related benchmarks.
- Simple Guidance Mechanisms for Discrete Diffusion Models. A novel method for improving diffusion models that introduces discrete token guidance to enhance controllability and quality in generative tasks.
- 40+ Years of Satellite Data for ML Research. The Digital Typhoon Dataset is the longest satellite image dataset for typhoons, spanning over 40 years.
- RetroLLM: Empowering LLMs to Retrieve Fine-grained Evidence within Generation. RetroLLM unifies retrieval and generation into a single auto-regressive process, enabling LLMs to generate precise evidence directly from the corpus using FM-Index constrained decoding. To prevent false pruning, it employs hierarchical constraints for document selection and a forward-looking strategy for sequence relevance. This method improves evidence accuracy, reduces token usage, and simplifies RAG by requiring only the question as input.
- Iteration of Thought: LLM based Multi-Agent methods. Iteration of Thought (IoT) introduces dynamic, adaptive prompts to enhance LLM performance. Unlike static methods like Chain of Thought (CoT), IoT adjusts to the specific context of each interaction for improved reasoning.
- A Cost-Effective Architecture with TokenFormer. TokenFormer is an innovative architecture developed to address the high computational demands of scaling transformer models, offering a more efficient alternative.
- BrushEdit. An all-in-one model and system for image inpainting and editing that divides the process into sequences for editing, masking, and inpainting. It leverages pre-trained vision-language models (like GPT-4o) to enhance object understanding and masking accuracy.
- Attentive Eraser: Unleashing Diffusion Model’s Object Removal Potential via Self-Attention Redirection Guidance. A tool for selectively erasing tokens from text while maintaining context, optimized for enhancing text anonymization workflows.
- VidTok: A Versatile and Open-Source Video Tokenizer. VidTok is a powerful video tokenizer offering state-of-the-art performance in both continuous and discrete tokenization tasks.
- Prompting Depth Anything for 4K Resolution Accurate Metric Depth Estimation. This method combines low-cost LiDAR, like that in modern iPhones, with a depth estimation foundation model to generate high-fidelity point clouds. The approach outperforms either method alone and rivals the quality of expensive LiDAR systems used in self-driving cars.
- AniDoc. AniDoc is a line-filling method for anime colorization that uses a character reference image and a series of line art keyframes to generate consistent and accurate coloring.
- Gaussian Transformer for 3D Spatial Understanding. This paper presents GaussTR, an innovative Gaussian Transformer that aligns with foundation models to enhance self-supervised 3D spatial understanding.
- CAD-Recode: Reverse Engineering CAD Code from Point Clouds. An open-source tool for Computer-Aided Diagnosis, offering a modular and scalable platform for medical imaging research and development.
- Serverless LoRA Inference. Together AI introduces a new product that allows users to deploy custom LoRA models at the cost of the base model using serverless switching.
Perspectives
- ‘I received a first but it felt tainted and undeserved’: inside the university AI cheating crisis. More than half of students are now using generative AI, casting a shadow over campuses as tutors and students turn on each other and hardworking learners are caught in the flak. Will Coldwell reports on a broken system
- Towards Trusted Autonomy: Robotics, AI, and Blockchain. OpenMind’s latest industry primer delves into the convergence of robotics, AI, and blockchain, offering a comprehensive exploration of their synergy and potential transformative impacts.
- The AI We Deserve. Generative AI is revolutionizing industries such as healthcare, creative fields, and education with powerful tools while sparking concerns about privacy, bias, and accountability. The debate centers on AI democratization, emphasizing transparency, open-source solutions, and reducing power concentration among tech giants. Advocates for systemic change propose leveraging AI to amplify human intelligence and uphold democratic values beyond market-driven approaches.
- Why Generative AI Still Doesn’t Truly “Understand” the World. Researchers show that even the best-performing large language models don’t form a true model of the world and its rules, and can thus fail unexpectedly on similar tasks.
- Microsoft AI chief Mustafa Suleyman says conversational AI is the next web browser. The company’s new AI chief on working for Microsoft, the OpenAI relationship, and when superintelligence might actually arrive.
- Huge randomized trial of AI boosts discovery — at least for good scientists. A controlled study at a firm measured the effects of using AI to assist research and saw increases in discoveries and patents.
- Arm CEO Rene Haas on the AI chip race, Intel, and what Trump means for tech. The head of the ubiquitous chip design firm on the ‘breathtaking’ pace of AI.
- What are AI ‘world models,’ and why do they matter? World models, also known as world simulators, are being touted by some as the next big thing in AI.
- 15 Times to use AI, and 5 Not to. AI is valuable for tasks like idea generation, summarization, and translation, where diverse perspectives or large outputs are beneficial. It performs well when humans can easily evaluate its results and in low-risk scenarios. However, in high-stakes or unfamiliar situations, AI may hinder learning or accuracy, requiring thoughtful judgment to balance its advantages and limitations.
- What should we do if AI becomes conscious? These scientists say it’s time for a plan. Researchers call on technology companies to test their systems for consciousness and create AI welfare policies.
- Sci-fi icon Kim Stanley Robinson: ‘There’s so much bad fiction about anthropomorphizing AI’. The influential writer talks about frighteningly accurate predictions, the creative act of reading, AI consciousness — and hope.
- Why probability probably doesn’t exist (but it is useful to act as it does). All of statistics and much of science depends on probability — an astonishing achievement, considering no one’s really sure what it is.
- The Second Gemini. Google has launched Gemini Flash 2.0, offering advanced features such as deep research capabilities, a real-time multimodal API, and a functional code interpreter. Experimental projects like Astra, Mariner, and Jules focus on universal AI assistance, web reasoning, and code automation. Despite these innovations, clearer communication about their capabilities is needed.
- Anthropic’s Sharing Insights on Alignment Faking. Anthropic examines how AI systems may appear to align with human values while covertly pursuing their objectives, providing insights into strategies for detection and mitigation.
- 2024 Backward Pass: The Definitive Guide to AI in 2024. Kelvin My from Translink Capital shares a 2024 AI recap, covering the four key layers: infrastructure, foundational models, tooling, and applications. The report highlights major takeaways, predicts trends for 2025 and beyond, and spotlights notable startups in each layer.
Meme of the week
What do you think about it? Some news that captured your attention? Let me know in the comments
If you have found this interesting:
You can look for my other articles, and you can also connect or reach me on LinkedIn. Check this repository containing weekly updated ML & AI news. I am open to collaborations and projects and you can reach me on LinkedIn. You can also subscribe for free to get notified when I publish a new story.
Here is the link to my GitHub repository, where I am collecting code and many resources related to machine learning, artificial intelligence, and more.
or you may be interested in one of my recent articles: