WEEKLY AI NEWS: RESEARCH, NEWS, RESOURCES, AND PERSPECTIVES

AI & ML news: Week 2- 8 December

Google’s Core Business Challenges, TSMC’s 2nm Ambitions, Chip War Intensifies Between the US and China, Meta’s Nuclear Energy Initiative, Generative AI’s Role in Drone Warfare, and much more

Salvatore Raieli
20 min readDec 9, 2024
Photo by Austin Distel on Unsplash

The most interesting news, repository, articles, and resources of the week

Check and star this repository where the news will be collected and indexed:

You will find the news first in GitHub. All the Weekly News stories are also collected here:

Weekly AI and ML news - each week the best of the field

49 stories

Artificial intelligence is transforming our world, shaping how we live and work. Understanding how it works and its implications has never been more crucial. If you’re looking for simple, clear explanations of complex AI topics, you’re in the right place. Hit Follow or subscribe for free to stay updated with my latest stories and insights.

Research

  • Large language models surpass human experts in predicting neuroscience results. Researchers have introduced BrainBench, a tool designed to evaluate large language models’ (LLMs) ability to predict outcomes in neuroscience experiments. By fine-tuning an LLM on neuroscience literature, they developed BrainGPT, which achieved an 86% accuracy rate in forecasting study results, surpassing human experts who averaged 63%. Notably, when BrainGPT expressed high confidence in its predictions, its accuracy increased, indicating a strong correlation between confidence levels and correctness.
  • Foundational Generative Audio Transformer Opus 1. NVIDIA has introduced a generative AI sound model capable of creating and transforming music, voices, and sounds through text and audio inputs. Trained on 2.5 billion parameters, the model can produce unique audio outputs, such as trumpets barking or saxophones meowing.o1
  • Replication Journey — Part 2. The study demonstrates that combining simple distillation from o1’s API with supervised fine-tuning significantly enhances performance on complex mathematical reasoning tasks. A base model fine-tuned on just tens of thousands of o1-distilled long-thought chains outperforms o1-preview on the American Invitational Mathematics Examination (AIME).
  • Beyond Examples: High-level Automated Reasoning Paradigm in In-Context Learning via MCTS. Enhances in-context learning with high-level automated reasoning, achieving state-of-the-art accuracy (79.6%) on the MATH benchmark using Qwen2.5–7B-Instruct, outperforming GPT-4o (76.6%) and Claude 3.5 (71.1%). Instead of relying on manually crafted high-quality demonstrations, it emphasizes abstract thinking patterns. The approach introduces five atomic reasoning actions to form chain-structured patterns and employs Monte Carlo Tree Search to explore reasoning paths and create thought cards that guide inference.
  • Generative Agent Simulations of 1,000 People. Presents a novel agent architecture leveraging LLMs to simulate real individuals’ behaviors, achieving 85% accuracy in replicating human responses on the General Social Survey and reducing demographic biases compared to traditional methods.
  • Measuring Bullshit in the Language Games played by ChatGPT. Suggests that LLM-based chatbots engage in the “language game of bullshit.” By instructing ChatGPT to produce scientific articles on topics it lacks knowledge or expertise in, the authors created a reference set illustrating how this “bullshit” manifests.
  • Study: 94% Of AI-Generated College Writing Is Undetected By Teachers. Increasingly, homework and exam writing are being done by generative AI instead of students, turned in and passed off as authentic work for grades, credit, and degrees.
  • Mapping the ionosphere with the power of Android. Google researchers successfully mapped the Ionosphere using GPS fluctuations combined with innovative algorithms. This approach, which is typically costly and time-intensive, offers potential benefits for various climate solutions.
  • DeMo: Decoupled Momentum Optimization. 2.5x faster and requiring 100x less communication, this new optimizer, developed by the original Adam author, delivers significant performance gains for language model training, surpassing existing optimization methods.
  • Diffusion Meets Flow Matching: Two Sides of the Same Coin. This post explores the literature and demonstrates that, mathematically, flow matching and diffusion models are equivalent. However, flow matching appears to scale more effectively in practice.
  • Genie 2: A large-scale foundation world model. Genie 2 is a large-scale latent diffusion model designed for world generation. It accepts character control as input, operates without a classifier, and produces stunning outputs with consistent control over time.
  • Virtual lab powered by ‘AI scientists’ super-charges biomedical research. Could human-AI collaborations be the future of interdisciplinary studies?

News

Resources

  • Large Language Model-Brained GUI Agents: A Survey. Provides an overview of LLM-powered GUI Agents, covering their techniques and applications.
  • A Survey on LLM-as-a-Judge. Offers an in-depth survey of the LLM-as-a-Judge paradigm, with a detailed exploration of strategies for developing reliable LLM-as-a-Judge systems.
  • TÜLU 3: Pushing Frontiers in Open Language Model Post-Training. Introduces a suite of fully open state-of-the-art post-trained models, along with their accompanying data, code, and training methodologies, providing a detailed guide to contemporary post-training techniques.
  • INTELLECT-1 Release: The First Globally Trained 10B Parameter Model. INTELLECT-1 is a 10B parameter model trained on 1 trillion tokens using globally distributed hardware. Its benchmarks are solid, and achieving an MFU of over 30% is remarkable considering the distributed training setup. If these results are validated, they represent a significant advancement in decentralized large-model training.
  • From Open Vocabulary to Open World: Teaching Vision Language Models to Detect Novel Objects. This framework advances object detection in open-world settings by enabling AI to recognize and learn from previously unseen objects.
  • HUPE: Heuristic Underwater Perceptual Enhancement with Semantic Collaborative Learning. HUPE is an AI-driven technique that enhances underwater image clarity while maintaining essential details for tasks such as object detection.
  • LTNtorch: PyTorch Implementation of Logic Tensor Networks. Logic Tensor Networks (LTN) combine deep learning with logical reasoning, enabling neural models to learn by optimizing a knowledge base constructed from logical formulas.
  • Programming Every Example: Lifting Pre-training Data Quality Like Experts at Scale. ProX is a framework that approaches data refinement as a programming task, enabling models to perform detailed operations on individual examples at scale. It enhances pre-training corpus quality by utilizing small language models to generate programs.
  • MMDuet. MMDuet introduces a unique “video-text duet” interaction format for VideoLLMs, enabling AI to deliver real-time responses as videos play. This method simulates a dialogue where users and AI can exchange messages during video playback.
  • Converting GPT to Llama. This repository contains code for converting a GPT implementation to Meta AI’s Llama.
  • DeMo training run. Nous is training a 15B distributed model using the DeMo optimizer. All of the training can be followed live at this link.
  • Fine-Tune Models with LoRA-SB. LoRA-SB is a new method that brings full fine-tuning performance to low-rank adapters for large language models.
  • Making AI Datasets More Diverse. Researchers proposed a new approach, Diversity-driven EarlyLate Training (DELT), to enhance dataset distillation for large-scale tasks.
  • Google’s plan to keep AI out of search trial remedies isn’t going very well. US District Judge Amit Mehta indicates that AI could be pivotal in shaping remedies after the government’s win in the Google search monopoly trial, potentially impacting Google’s AI products. The DOJ has proposed measures to prevent Google from leveraging AI to maintain market dominance, including limits on exclusive agreements and AI investments. Microsoft opposes Google’s requests for confidential AI deal details, citing irrelevance, while OpenAI may face pressure to disclose data in this context.
  • Using uv with PyTorch. Documentation on how to use the new package manager UV to install PyTorch.
  • Amazon Launches Nova. Amazon Nova unveils a series of multimodal models tailored for tasks such as document analysis, visual comprehension, and creative content generation. Prioritizing customization and efficiency, Nova models address various enterprise needs and excel in handling text, image, and video inputs.
  • Restructuring Vector Quantization with the Rotation Trick. Vector Quantization uses the Straight Through Gradient estimator for gradient estimation, though its direction can occasionally be inaccurate. This paper proposes using rotation to correct the gradients and enhance codebook utilization.
  • Layout Generation with Diffusion GANs. DogLayout is a hybrid model integrating GANs with diffusion processes to address challenges in layout generation.
  • Hunyuan Video Model. Tencent’s state-of-the-art open video model stands out for its realistic motion and dual training as both a video and image generation model. This dual approach enhances the aesthetic quality of its output, making it comparable to image generation models like Flux.
  • Scene Text Recognition. TextSSR is a framework leveraging diffusion-based techniques to produce precise and realistic synthetic text images for scene text recognition.
  • T2Vid: Efficient Video Fine-tuning Scheme for MLLMs. T2Vid is a novel approach aimed at enhancing video comprehension in Multimodal Large Language Models (MLLMs). It creates video-like samples to diversify training instructions.
  • aisuite. aisuite offers a unified interface for seamless interaction with multiple LLM providers, enabling developers to test and compare outputs without modifying their code.
  • Motion Prompting: Controlling Video Generation with Motion Trajectories. Motion Prompting is a technique for training video generation models using novel input types, including text, the first image frame, and a pixel tracking field. This enables innovative control during inference, allowing for new pixel fields (e.g., indicating an object moving in a different direction) to generate corresponding videos. While highly compelling, the method is not open source.
  • Remote Sensing Temporal Vision-Language Models: A Comprehensive Survey. This repository provides an extensive survey on the use of Vision-Language Models (VLMs) in remote sensing.
  • ImplicitPRM. Process reward models (PRMs) provide detailed feedback by assessing reasoning step-by-step, unlike outcome reward models (ORMs), which evaluate complete responses. However, training PRMs demands detailed intermediate annotations, making it challenging. This paper demonstrates that an implicit PRM can be obtained at no extra cost by training an ORM on response-level labels, utilizing log-likelihood ratios between policy and reference models, thereby enabling optimization without specific loss objectives.
  • Unsloth — Dynamic 4-bit Quantization. The Unsloth team seeks to compress a 20GB language model into 5GB while maintaining accuracy. Although various algorithms attempt this, challenges arise with outliers and compressibility. Llama, known for its difficulty in quantization, is addressed by selectively avoiding the quantization of specific parameters, significantly enhancing overall accuracy.
  • AccDiffusion v2: Tackling Repetitive Image Generation. AccDiffusion v2 enhances diffusion models for generating high-resolution images without requiring additional training, resolving issues such as object repetition and local distortions.
  • Optimizing AI Inference at Character.AI. Character AI features a robust inference pipeline. This post explores their implementation of int8 quantization and flash attention 3, offering valuable insights for those interested in scaling large language models.Flow.Flow is a lightweight engine for creating flexible AI workflows using dynamic task scheduling and concurrent execution.
  • OpenAI o1 System Card. This report details the safety measures undertaken before releasing OpenAI o1 and o1-mini, including external red teaming and frontier risk assessments aligned with OpenAI’s Preparedness Framework.
  • PaliGemma 2: A Family of Versatile VLMs for Transfer. Paligemma 2 is among the top Vision-Language Models (VLMs) available today, utilizing SigLIP and Gemma technologies.
  • ASANet: Asymmetric Semantic Aligning Network for RGB and SAR image land cover classification. The Asymmetric Semantic Aligning Network (ASANet) improves land cover classification using both SAR and RGB images.
  • AIM: Adaptive Inference of Multi-Modal LLMs via Token Merging and Pruning. Researchers have created a training-free method to enhance the efficiency of multi-modal language models (LLMs) with minimal performance loss. Their technique reduces computational demands by up to sevenfold through strategic merging and pruning of visual data tokens.
  • Google DeepMind GraphCast and GenCast. DeepMind has open-sourced its GraphCast algorithm, which significantly outperforms and accelerates localized weather predictions for up to 36 hours, operating in a fraction of the time required by other methods.
  • Anagram-MTL. visual anagram generation — images that change appearance when flipped or rotated -using diffusion models
  • ScoreLiDAR. ScoreLiDAR is a new method that speeds up 3D LiDAR scene completion for autonomous vehicles.
  • New Fish Audio Model. Fish Audio 1.5 is currently ranked #2 on the Text-to-Speech Leaderboards, just behind ElevenLabs. It supports voice cloning and runs quickly, though the output quality can be inconsistent.
  • Deepthought-8B. Deepthought-8B is a small and capable reasoning model built on LLaMA-3.1 8B, designed to make AI reasoning more transparent and controllable. Despite its relatively small size, it achieves sophisticated reasoning capabilities that rival much larger models.
  • LLM-Brained GUI Agents. A Collection of Research Papers and Projects in Large Language Model-Brained GUI Agents: A Survey.

Perspectives

  • AI expert Marietje Schaake: ‘The way we think about technology is shaped by the tech companies themselves’. The Dutch policy director and former MEP on the unprecedented reach of big tech, the need for confident governments, and why the election of Trump changes everything
  • If AI can provide a better diagnosis than a doctor, what’s the prognosis for medics? Studies in which ChatGPT outperformed scientists and GPs raise troubling questions for the future of professional work
  • Building LLMs is probably not going to be a brilliant business. LLM developers, including OpenAI, face major hurdles due to the industry’s structure, particularly NVIDIA’s dominance as a critical chip supplier and the intense price sensitivity and competition among buyers. While many AI companies secure significant funding, they often face profitability challenges, reminiscent of past tech firms like Netscape. Nonetheless, technology is likely to continue to progress. AI businesses may find success by focusing on leveraging existing models instead of creating new ones.
  • Rox: How to Manufacture Path Dependence in Applied AI. like Salesforce by leveraging AI to manage unstructured data and integrate seamlessly with data warehouses. Its strategy focuses on enhancing the productivity of top sales performers through AI-powered agents while ensuring customer data security for future AI developments. This approach has attracted significant investor confidence, with Rox securing $50 million in funding from Sequoia Capital, GV, and General Catalyst across its seed and Series A rounds.
  • How close is AI to human-level intelligence? Large language models such as OpenAI’s o1 have electrified the debate over achieving artificial general intelligence, or AGI. But they are unlikely to reach this milestone on their own.
  • The race is on to make AI agents do your online shopping for you. Tech companies are creating AI shopping agents to automate online purchases, which could transform the retail industry. Perplexity’s model faces operational hurdles, while OpenAI, Google, and Amazon are also working on AI purchasing tools. These advancements aim to simplify shopping but raise concerns about privacy, retailer dynamics, and the future of online shopping.
  • Salesforce CEO Marc Benioff Has Thoughts on AI Agents, Automation, And The Future of Your Job. Salesforce CEO Marc Benioff foresees companies using AI agents to manage customer service and sales by utilizing their existing data and policies, with Salesforce serving as a central enabler of this change. He contends that AI-driven automation will boost productivity rather than replace jobs, enabling businesses to grow and operate more efficiently without adding human labor. Benioff emphasizes this transition as a pivotal moment in business evolution, offering a competitive advantage and transforming traditional workflows.
  • Reward Hacking in Reinforcement Learning. Lilian Weng has published an insightful blog post on the issue of Reward Hacking in language model alignment, a key challenge hindering the deployment of models in production environments.
  • Create JSONL dataset from API chat logs. A straightforward utility that enables the creation of a JSONL dataset from messages exchanged between the user and the API.
  • The ChatGPT secret: is that text message from your friend, your lover — or a robot? People are turning to chatbots to solve all their life problems, and they like their answers. But are they on a very slippery slope?
  • A System of Agents brings Service-as-Software to life. AI is evolving software from a tool into autonomous agents capable of performing tasks traditionally handled by humans, representing a projected $4.6 trillion market opportunity. Advancements like LLMs and agents empower AI systems to handle unstructured data, make decisions, and operate independently in sectors such as sales and healthcare. The future of AI envisions Systems of Agents working collaboratively and learning from one another, akin to a highly skilled team delivering seamless services.
  • Over ½ of Long Posts on LinkedIn are Likely AI-Generated Since ChatGPT Launched. Since the launch of ChatGPT, LinkedIn has experienced an 189% increase in AI-generated content, with more than half of long-form posts now probably AI-created.
  • AI’s computing gap: academics lack access to powerful chips needed for research. The survey highlights the disparity between academic and industry scientists’ access to computing power needed to train machine-learning models.
  • ’Brutal’ math test stumps AI but not human experts. The benchmark shows humans can still top machines — but for how much longer?
  • Finetuning LLM Judges for Evaluation. Evaluating LLMs is challenging due to their complex, open-ended outputs. While traditional human evaluation provides detailed insights, it is inefficient. Therefore, scalable assessments using automatic metrics and model-based approaches like LLM-as-a-Judge are essential. Innovations such as fine-tuned judges (e.g., Prometheus) and synthetic data generation are improving evaluation precision and adaptability across various tasks and domains.
  • The Gen AI Bridge to the Future. Generative AI is set to revolutionize wearable technology by creating on-demand UI interfaces that adapt to user needs and context.
  • Sam Altman Says Artificial General Intelligence Is on the Horizon. Speaking at The New York Times DealBook Summit, Sam Altman, the chief executive of OpenAI, said that the arrival of artificial general intelligence would “matter much less” to the average person than currently thought.

Meme of the week

What do you think about it? Some news that captured your attention? Let me know in the comments

If you have found this interesting:

You can look for my other articles, and you can also connect or reach me on LinkedIn. Check this repository containing weekly updated ML & AI news. I am open to collaborations and projects and you can reach me on LinkedIn. You can also subscribe for free to get notified when I publish a new story.

Here is the link to my GitHub repository, where I am collecting code and many resources related to machine learning, artificial intelligence, and more.

or you may be interested in one of my recent articles:

--

--

Salvatore Raieli
Salvatore Raieli

Written by Salvatore Raieli

Senior data scientist | about science, machine learning, and AI. Top writer in Artificial Intelligence