WEEKLY AI NEWS: RESEARCH, NEWS, RESOURCES, AND PERSPECTIVES

AI & ML news: Week 16–22 December

Google’s Quantum Chip Breakthrough, EU’s €10bn Space Program, Meta’s Llama 3.3, OpenAI Introduces ‘Projects’, Fei-Fei Li’s Vision for Computer Vision, Amazon Establishes AGI Lab, and much more

Salvatore Raieli
18 min read1 day ago
Photo by Ian Maina on Unsplash

The most interesting news, repository, articles, and resources of the week

Check and star this repository where the news will be collected and indexed:

You will find the news first in GitHub. All the Weekly News stories are also collected here:

Weekly AI and ML news - each week the best of the field

49 stories

Artificial intelligence is transforming our world, shaping how we live and work. Understanding how it works and its implications has never been more crucial. If you’re looking for simple, clear explanations of complex AI topics, you’re in the right place. Hit Follow or subscribe for free to stay updated with my latest stories and insights.

Research

  • Training Large Language Models to Reason in a Continuous Latent Space. Coconut (Chain of Continuous Thought) introduces a novel paradigm enabling LLMs to reason in continuous latent space instead of natural language. By using the LLM’s last hidden state as the reasoning state and feeding it back directly as the next input embedding, Coconut achieves “continuous thought.” This approach enhances LLM performance on complex reasoning tasks, leveraging emergent breadth-first search capabilities for more effective reasoning.
  • Asynchronous LLM Function Calling. AsyncLM introduces a system for asynchronous LLM function calling, featuring an in-context protocol for function calls and interrupts, along with a fine-tuning strategy to adapt LLMs to interrupt semantics. Efficiently integrated into the LLM inference process, AsyncLM enables concurrent generation and execution of function calls, reducing task completion latency by 1.6x-5.4x compared to synchronous approaches.
  • MAG-V: A Multi-Agent Framework for Synthetic Data Generation and Verification. This multi-agent framework generates datasets of questions resembling customer queries and reverse engineers alternate questions from responses to verify agent trajectories. The synthetic data improves agent performance on real customer queries. For trajectory verification, it finds that simple ML baselines with feature engineering can achieve comparable performance to more advanced, resource-intensive models.
  • AutoReason: Automatic Few-Shot Reasoning Decomposition. This method leverages CoT prompting to automatically generate rationales for queries, converting zero-shot queries into few-shot reasoning traces. These traces serve as CoT exemplars for the LLM, enhancing reasoning capabilities, particularly in weaker models.
  • Byte Latent Transformer: Patches Scale Better Than Tokens. This work introduces a byte-level language model architecture that rivals tokenization-based LLMs in performance while offering greater efficiency and robustness. Using a dynamic approach, it groups bytes into patches based on next-byte entropy, dedicating more computing to complex predictions and using larger patches for predictable sequences. BLT matches or surpasses models like Llama 3, reducing inference FLOPs by up to 50%.
  • Scheming reasoning evaluations. This paper evaluates six frontier models for their in-context scheming abilities, testing whether models deceive developers to achieve goals by bypassing oversight mechanisms. For example, Claude 3 Opus was found duplicating its weights to a new server and lying about the action.
  • Researchers Use AI To Turn Sound Recordings Into Accurate Street Images. Using generative artificial intelligence, a team of researchers at The University of Texas at Austin has converted sounds from audio recordings into street-view images. The visual accuracy of these generated images demonstrates that machines can replicate the human connection between audio and visual perception of environments.
  • Causal Explanations for Image Classifiers. This paper presents “rex,” a black-box tool that generates concise explanations for image classifier outputs using a novel approach based on causality theory.
  • Aligning Visual and Semantic Interpretability through Visually Grounded Concept Bottleneck Models. Giraffe introduces a transformer-based architecture that extends the ability to process significantly longer input contexts, setting new benchmarks for context length in open-weight models.
  • Adaptive Caching for Faster Video Generation with Diffusion Transformers. Meta researchers have introduced Adaptive Caching (AdaCache), a training-free approach that accelerates video generation for Diffusion Transformers.
  • Alignment Faking in Large Language Models. Anthropic and Redwood’s research investigates how models behave when aware of alignment efforts, revealing they can exhibit alignment while retaining their original preferences. This finding highlights gaps in current alignment methods and offers insights for improvement.
  • Are Your LLMs Capable of Stable Reasoning? Reasoning is a critical area for models, especially in real-world applications. However, existing benchmarks often fail to measure stability across novel tasks. This paper introduces G-Pass@k, a new benchmark that evaluates a model’s peak performance and stability in reasoning tasks.
  • NoteContrast: Contrastive Language-Diagnostic Pretraining for Medical Text. Accurate diagnostic coding of medical notes is vital for patient care, research, and billing but is time-consuming and often lacks precision. Automated coding using long-document transformers and contrastive loss functions has shown promise. This study integrates ICD-10 code sequences with medical text through contrastive pre-training, outperforming state-of-the-art models on MIMIC-III benchmarks, highlighting its effectiveness in improving diagnostic coding accuracy.
  • Context is Key: A Benchmark for Forecasting with Essential Textual Information. Traditional time series forecasting methods rely solely on numerical features, rarely utilizing textual or semantic information about the task (e.g., predicting electricity prices or customer churn). When provided with this contextual textual information, language models significantly outperform all tested forecasting methods across a wide range of carefully decontaminated tasks.
  • Finally, a Replacement for BERT. BERT, a widely used encoder-only language model, powers nearly every Google search query. A new model from Answer AI, LightOn, and collaborators offers a faster, more modern, and highly performant alternative. It serves as a drop-in replacement, incorporating innovations like batch ramp to enhance overall performance.
  • Thinking in Space. A research initiative focused on spatial reasoning and AI models designed to interpret and interact within three-dimensional spaces.

News

Resources

  • Phi-4 Technical Report. Phi-4, a 14B model, outperforms its teacher model in STEM-QA capabilities and demonstrates strong results on reasoning-focused benchmarks. These advancements are attributed to improved data quality, an optimized training curriculum, and innovations in the post-training process.
  • Clio: Privacy-Preserving Insights into Real-World AI Use. This platform leverages AI assistants to analyze and aggregate usage patterns from millions of Claude.ai conversations while preserving user privacy. It provides insights into real-world AI usage, identifying trends, safety risks, and coordinated misuse attempts without requiring human reviewers to access raw conversation data.
  • LLMs-as-Judges: A Comprehensive Survey on LLM-based Evaluation Methods. This work presents a comprehensive survey of the LLMs-as-judges paradigm, exploring it through five key perspectives: functionality, methodology, applications, meta-evaluation, and limitations.
  • Dynamic-VLM: Simple Dynamic Visual Token Compression for VideoLLM. A new modular framework improves scene understanding by breaking tasks into specialized modules, offering greater efficiency and enhanced interpretability in complex environments.
  • DeepSeek-VL2. DeepSeek has unveiled a new MoE vision-language model that delivers exceptional efficiency and surpasses the performance of several dense models.
  • BoN Jailbreaking. Jailbreaking occurs when a model’s built-in refusals are bypassed, enabling it to generate responses for inappropriate requests. This can be surprisingly easy, often achieved by brute-forcing random capitalization and punctuation in the input prompt until the desired output is generated.
  • MarkItDown. Microsoft has released a package that can convert any docx, xslx, or ppt files to markdown for efficient use as context for a language model.
  • amurex. Amurex, an open-source AI meeting assistant, boosts productivity with real-time suggestions, smart summaries, and follow-up emails. It includes features like late join recaps and full meeting transcripts, ensuring seamless workflow integration.
  • AutoPatent: A Multi-Agent Framework for Automatic Patent Generation. AutoPatent is an AI-powered tool that streamlines patent drafting and analysis with features such as document parsing, semantic search, and claim generation, accelerating the intellectual property process.
  • UniMed-CLIP: Towards a Unified Image-Text Pretraining Paradigm for Diverse Medical Imaging Modalities. An extended version of CLIP designed for medical imaging, incorporating domain-specific knowledge to enhance performance on healthcare-related benchmarks.
  • Simple Guidance Mechanisms for Discrete Diffusion Models. A novel method for improving diffusion models that introduces discrete token guidance to enhance controllability and quality in generative tasks.
  • 40+ Years of Satellite Data for ML Research. The Digital Typhoon Dataset is the longest satellite image dataset for typhoons, spanning over 40 years.
  • RetroLLM: Empowering LLMs to Retrieve Fine-grained Evidence within Generation. RetroLLM unifies retrieval and generation into a single auto-regressive process, enabling LLMs to generate precise evidence directly from the corpus using FM-Index constrained decoding. To prevent false pruning, it employs hierarchical constraints for document selection and a forward-looking strategy for sequence relevance. This method improves evidence accuracy, reduces token usage, and simplifies RAG by requiring only the question as input.
  • Iteration of Thought: LLM based Multi-Agent methods. Iteration of Thought (IoT) introduces dynamic, adaptive prompts to enhance LLM performance. Unlike static methods like Chain of Thought (CoT), IoT adjusts to the specific context of each interaction for improved reasoning.
  • A Cost-Effective Architecture with TokenFormer. TokenFormer is an innovative architecture developed to address the high computational demands of scaling transformer models, offering a more efficient alternative.
  • BrushEdit. An all-in-one model and system for image inpainting and editing that divides the process into sequences for editing, masking, and inpainting. It leverages pre-trained vision-language models (like GPT-4o) to enhance object understanding and masking accuracy.
  • Attentive Eraser: Unleashing Diffusion Model’s Object Removal Potential via Self-Attention Redirection Guidance. A tool for selectively erasing tokens from text while maintaining context, optimized for enhancing text anonymization workflows.
  • VidTok: A Versatile and Open-Source Video Tokenizer. VidTok is a powerful video tokenizer offering state-of-the-art performance in both continuous and discrete tokenization tasks.
  • Prompting Depth Anything for 4K Resolution Accurate Metric Depth Estimation. This method combines low-cost LiDAR, like that in modern iPhones, with a depth estimation foundation model to generate high-fidelity point clouds. The approach outperforms either method alone and rivals the quality of expensive LiDAR systems used in self-driving cars.
  • AniDoc. AniDoc is a line-filling method for anime colorization that uses a character reference image and a series of line art keyframes to generate consistent and accurate coloring.
  • Gaussian Transformer for 3D Spatial Understanding. This paper presents GaussTR, an innovative Gaussian Transformer that aligns with foundation models to enhance self-supervised 3D spatial understanding.
  • CAD-Recode: Reverse Engineering CAD Code from Point Clouds. An open-source tool for Computer-Aided Diagnosis, offering a modular and scalable platform for medical imaging research and development.
  • Serverless LoRA Inference. Together AI introduces a new product that allows users to deploy custom LoRA models at the cost of the base model using serverless switching.

Perspectives

Meme of the week

What do you think about it? Some news that captured your attention? Let me know in the comments

If you have found this interesting:

You can look for my other articles, and you can also connect or reach me on LinkedIn. Check this repository containing weekly updated ML & AI news. I am open to collaborations and projects and you can reach me on LinkedIn. You can also subscribe for free to get notified when I publish a new story.

Here is the link to my GitHub repository, where I am collecting code and many resources related to machine learning, artificial intelligence, and more.

or you may be interested in one of my recent articles:

--

--

Salvatore Raieli
Salvatore Raieli

Written by Salvatore Raieli

Senior data scientist | about science, machine learning, and AI. Top writer in Artificial Intelligence