WEEKLY AI NEWS: RESEARCH, NEWS, RESOURCES, AND PERSPECTIVES

AI & ML news: Week

Spotify’s AI Innovations in Music, Podcasts, and Recommendations. AI Model Identifies Brain Tumors in 10 Seconds, US Justice Department Pushes Google to Sell Chrome, Breakthrough Robot Performs Surgeries After Watching Videos, and much more

Salvatore Raieli

21 min readNov 26, 2024

The most interesting news, repository, articles, and resources of the week

Check and star this repository where the news will be collected and indexed:

GitHub — SalvatoreRa/ML-news-of-the-week: A collection of the the best ML news every week…

A collection of the the best ML news every week (research, news, resources) — GitHub — SalvatoreRa/ML-news-of-the-week…

github.com

You will find the news first in GitHub. All the Weekly News stories are also collected here:

Salvatore Raieli

Weekly AI and ML news - each week the best of the field

View list

63 stories

Research

Artificial Intelligence, Scientific Discovery, and Product Innovation. indicates that leading scientists use their expertise to focus on the most promising AI-generated suggestions, while others often expend considerable resources on false positives; shows that adopting AI technology for materials discovery boosts productivity, resulting in 44% more materials discovered, a 39% increase in patent filings, and 17% greater product innovation; notes that these improvements come with drawbacks, as 82% of scientists experienced lower job satisfaction, citing reduced creativity and underutilization of their skills.
Scaling Laws for Precision. presents “precision-aware” scaling laws that forecast how both training and inference precision impact LLM performance; key insights include: 1) post-training quantization becomes increasingly detrimental as models are trained on larger datasets, to the point where more pretraining may harm performance, 2) training with lower precision necessitates a larger model size to sustain performance levels, and 3) when optimizing model size, data, and precision together, the ideal training precision is around 7–8 bits, independent of compute availability; further notes that with fixed model size, the optimal precision for compute increases roughly logarithmically with data size; the authors confirm their predictions on models up to 1.7B parameters trained on up to 26B tokens, demonstrating that both very high (16-bit) and very low (under 4-bit) training precisions may be inefficient.
Sequence modeling and design from molecular to genome-scale with Evo. a 7B parameter AI model built to comprehend and generate DNA sequences across various biological scales; trained on 2.7 million prokaryotic and phage genomes, it can handle sequences up to 131 kilobases long while preserving single-nucleotide precision, allowing it to capture both molecular interactions and genome-wide patterns; Evo outperforms in predicting and generating functional DNA, RNA, and protein sequences, achieving the first experimentally validated AI-generated CRISPR-Cas complexes and transposable systems.
The Surprising Effectiveness of Test-Time Training for Abstract Reasoning. examines test-time training (TTT), where model parameters are temporarily updated during inference, to enhance an LLM’s abstract reasoning on the ARC benchmark; highlights three essential components: initial fine-tuning on related tasks, using auxiliary task formats and augmentations, and per-instance training; TTT yields substantial performance gains, with accuracy improvements of up to 6x over base fine-tuned models; applying TTT to an 8B LLM results in 53% accuracy on ARC’s public validation set, a nearly 25% increase over the previous state-of-the-art for neural approaches; combining their method with program generation techniques achieves a new public validation accuracy of 61.9%, on par with average human performance; the results indicate that explicit symbolic search is not the sole route to better abstract reasoning in LLMs, and that test-time training on few-shot examples can be highly effective.
Toward Optimal Search and Retrieval for RAG. investigates the impact of retrieval on performance in RAG pipelines for QA tasks; performs experiments using BGE-base and ColBERT retrievers with LLaMA and Mistral, showing that incorporating more gold (relevant) documents enhances QA accuracy; observes that using approximate nearest neighbor search with lower recall has minimal performance impact while potentially boosting speed and memory efficiency; notes that introducing noisy or irrelevant documents consistently harms performance, refuting prior research claims; concludes that optimizing the retrieval of gold documents is essential for RAG effectiveness and that lower search accuracy can be a practical strategy.
Rapid Response: Mitigating LLM Jailbreaks with a Few Examples. presents a novel approach for defending LLMs against jailbreak attacks, emphasizing the rapid adaptation of defenses upon detecting new attacks rather than striving for perfect initial adversarial robustness; using a new benchmark, the top-performing method — fine-tuning an input classifier — reduced attack success rates by over 240x for known attack types and 15x for new variations after observing just one example of each attack strategy; shows that swiftly responding to emerging jailbreaks can be an effective alternative to traditional static defenses.
Solving the Travelling Salesman Problem. This study highlights the often underestimated value of the “heatmap + Monte Carlo Tree Search (MCTS)” method, demonstrating that well-tuned, straightforward heatmaps can surpass more sophisticated models.
Graph-based AI model maps the future of innovation. MIT researchers created an AI model that employs generative knowledge extraction and graph reasoning to detect intricate patterns across domains such as biology and music. The model efficiently generates knowledge maps from scientific literature, uncovering connections and proposing novel materials inspired by art. This method boosts interdisciplinary research by uncovering hidden insights and fostering innovative concepts for material design.
Teaching Video Models to Understand Time Like a Story. This paper presents NumPro, an innovative approach designed to assist Video Large Language Models in managing Video Temporal Grounding tasks.
Generative World Explorer. The Generative World Explorer (Genex) is a system capable of simulating exploration in 3D spaces through the generation and leveraging those simulations to enhance planning. It employs an ST-VAE and a diffusion pass for its imagination process, leading to better planning outcomes.
Search, Verify and Feedback: Towards Next Generation Post-training Paradigm of Foundation Models via Verifier Engineering. The Generative World Explorer (Genex) is a system capable of simulating exploration in 3D spaces through the generation and leveraging those simulations to enhance planning. It employs an ST-VAE and a diffusion pass for its imagination process, leading to better planning outcomes.
OneNet: A Channel-Wise 1D Convolutional U-Net. OneNet is a 1D convolutional encoder optimized for efficient image segmentation, making it well-suited for edge devices.
AI’s math problem: FrontierMath benchmark shows how far technology still has to go. Artificial intelligence systems may be good at generating text, recognizing images, and even solving basic math problems — but when it comes to advanced mathematical reasoning, they are hitting a wall. A groundbreaking new benchmark, FrontierMath, exposes just how far today’s AI is from mastering the complexities of higher mathematics.
Enhancing Reasoning Capabilities of LLMs via Principled Synthetic Logic Corpus. Researchers have proposed Additional Logic Training to enhance reasoning in LLMs, focusing on teaching them to manage complex deductions involving varied rules and distractions.
Solving Cold Starts in Adaptive Testing. The “cold start” issue in adaptive testing arises when initial questions fail to align with examinees’ abilities. Researchers have addressed this with the Diffusion Cognitive States Transfer Framework (DCSR), which employs diffusion models to utilize prior learning data across domains.
samurai. Tracking a consistent object over an extended period is a challenging task. This work enhances SAM 2 by integrating motion-aware memory banks, ensuring consistency over time and through occlusions. It stands out as one of the most effective visual tracking systems developed so far.
Compress and Reconstruct Images. PCNet is a new compact network for image-compressed sensing. It reduces sampling costs while delivering high-quality reconstructions.
LMM-driven Semantic Image-Text Coding for Ultra Low-bitrate Learned Image Compression. Large multi-modal models can generate captions and compress images simultaneously within a single system

Trapped in the Net: Where is a Foundation Model for Graphs?

Disconnected from the other modalities graphs wait for their AI revolution: is it coming?

towardsdatascience.com

News

Hi-tech recreation of Richard III’s voice has a Yorkshire accent.A digital avatar of the king’s head, complete with ‘meticulously researched’ voice, is on display in York
OpenAI’s tumultuous early years revealed in emails from Musk, Altman, and others.Elon Musk’s lawsuit against OpenAI has unveiled emails from the startup’s early days, exposing internal conflicts.
Spotify’s Plans For AI-Generated Music, Podcasts, and Recommendations, According To Its Co-President, CTO, and CPO Gustav Söderström.Spotify’s Gustav Söderström talks about AI music, Notebook LM podcasts, and the nuance of building better discovery using LLMs.
AI cloning of celebrity voices outpacing the law, experts warn.David Attenborough among famous people whose voices have been exploited by fraudsters
John Oliver on potential US TikTok ban: ‘May not be necessary, but it isn’t sufficient’. Last Week Tonight host looks into looming US ban over privacy concerns and fear of its Chinese parent company
Shop like a Pro: Perplexity’s new AI-powered shopping assistant. Perplexity has introduced a shopping feature for Pro users in the U.S., enabling them to research and purchase products directly within the platform. This feature includes a “Buy with Pro” button that allows users to order items using saved billing and shipping information, with free shipping on all purchases.
Ben Affleck Shares Candid Take on the Positive Use of AI in Hollywood, but Doesn’t See It Threatening Creativity. During an interview, Ben Affleck reassured Hollywood actors and writers, stating that AI currently poses minimal risk to their jobs because of its existing limitations.
The Dawn of GUI Agent: A Preliminary Case Study with Claude 3.5 Computer Use. This work seeks to systematically evaluate the capabilities of new autonomous computer use agents, revealing that Claude is particularly strong at handling traditional linear tasks.
Llama 3.1 405B now runs at 969 tokens/s on Cerebras Inference. Cerebras has developed a 405-billion-parameter Llama 3.1 model, the largest in its class, capable of processing nearly 1,000 tokens per second. This performance is approximately 12 times faster than comparable systems and 18 times faster than some closed-model API providers. The model is expected to be accessible via API at the beginning of next year.
Nous Research Forge. The Forge Reasoning API enhances popular language models by integrating a code interpreter and advanced reasoning capabilities, leading to improved performance.
US Justice Department plans to push Google to sell off Chrome browser. Authorities seek to dismantle monopoly on search market and also want action related to AI and Android
Meta pushes AI bid for UK public sector forward with technology aimed at NHS. Tech giant awards funding to project to shorten waits in A&E, after ‘hackathon’ on using Llama system in Britain
Meta hires Salesforce’s CEO of AI, Clara Shih. Meta is creating a new product unit to develop AI tools for the 200 million businesses that use its apps.
Rox’s Public Beta and $50M Raise. Rox, an AI-powered sales productivity platform, boosts enterprise sales reps’ performance by over 30% through AI analyst teams that handle tasks like planning and engagement. It integrates effortlessly with existing systems, eliminating the inefficiencies of traditional CRMs, and is already used by leading companies. Rox recently secured $50M in funding, led by Sequoia and other prominent investors, to expand its market presence.
Genies launches Parties for brands and creators to launch their own ‘AI Roblox’. Genies, a culture-focused avatar technology company, has launched Parties after developing its foundational technology stack since the last fundraise.
Generative AI taught a robot dog to scramble around a new environment. Teaching robots to navigate new environments is tough. You can train them on physical, real-world data taken from recordings made by humans, but that’s scarce and expensive to collect. Digital simulations are a rapid, scalable way to teach them to do new things, but the robots often fail when they’re pulled out of virtual worlds and asked to do the same tasks in the real one.
Breakthrough robot nails surgery like a human doctor after watching videos. The model can quickly train robots for diverse surgeries, from basic tasks to full procedures, advancing robotic medical capabilities.
DeepL launches DeepL Voice, real-time, text-based translations from voices and videos. DeepL has made a name for itself with online text translation it claims is more nuanced and precise than services from the likes of Google — a pitch that has catapulted the German startup to a valuation of $2 billion and more than 100,000 paying customers. Users will now be able to use DeepL Voice to listen to someone speaking in one language and automatically translate it to another, in real-time.
Google releases standalone Gemini app for iPhone. You’ve always been able to access this in the Google app, but now there’s another way.
ChatGPT can now read some of your Mac’s desktop apps. On Thursday, the startup announced the ChatGPT desktop app for macOS can now read code in a handful of developer-focused coding apps, such as VS Code, Xcode, TextEdit, Terminal, and iTerm2.
Google must sell Chrome to end search monopoly, justice department argues in court filing. Justice department urges court to force Google to share data with rivals as part of wide-ranging changes to end online giant’s monopoly on web searching
Nvidia earnings: AI chip leader shows no signs of stopping mammoth growth. World’s most valuable company delights investors as it reports $35bn of revenue in quarterly results
DeepSeek r1 reasoning model. DeepSeek has replicated o1 with its r1 Deep Think model, a highly powerful system that the company plans to make fully open-source. The model was trained using reinforcement learning with reasoning traces.
Introducing AI Backgrounds, HD Video Calls, Noise Suppression and More for Messenger Calling. Meta has announced new updates for its Messenger app, including HD video calling, noise suppression, and AI-generated backgrounds. HD video calling will be enabled by default on Wi-Fi, but can also be activated using a cell data plan through call settings.
A.I. Chatbots Defeated Doctors at Diagnosing Illness. A small study found ChatGPT outdid human physicians when assessing medical case histories, even when those doctors were using a chatbot.
AlphaQubit tackles one of quantum computing’s biggest challenges. Deepmind and Google Quantum have trained a model that can identify errors in quantum computations and correct them as needed.
Superhuman vision lets robots see through walls, and smoke with new LiDAR-like eyes. PanoRadar, developed by researchers at the University of Pennsylvania, is an AI-driven system that transforms radio waves into 3D views, offering robots LiDAR-like vision at a reduced cost. By leveraging AI to process radio wave reflections, it overcomes challenges faced by traditional sensors in conditions like smoke, fog, and glass. The team plans to integrate PanoRadar with existing sensing technologies to enhance multi-modal perception in robotics.
Google DeepMind has a new way to look inside an AI’s “mind”. DeepMind has introduced Gemma Scope, a tool designed to enhance the understanding of AI models’ internal mechanisms and decision-making processes. By employing sparse autoencoders, Gemma Scope dissects and analyzes data layers, aiding in the identification of biases or errors, such as incorrect numerical interpretations. This advancement in model transparency aims to improve AI control and alignment, thereby reducing deployment risks.
AI model identifies overlooked brain tumors in just 10 seconds. FastGlioma is an AI model that rapidly detects residual brain tumor tissues during surgery with high accuracy.
It’s Surprisingly Easy to Jailbreak LLM-Driven Robots. Researchers induced bots to ignore their safeguards without exception
Nvidia to fuel humanoid robots with ‘Jetson Thor’. Nvidia plans to launch its “Jetson Thor” computing platform in the first half of 2025, providing the processing power needed to bring sophisticated humanoid robots to life.
Introducing FLUX.1 Tools. FLUX.1 Tools is a collection of models designed to enhance control and steerability in the FLUX.1 text-to-image model. It includes utilities and model checkpoints that enable features like inpainting, outpainting, and certain controlnets. These tools are ideal for users looking to expand their creative capabilities using one of the leading models available.
Elon Musk Asked People to Upload Their Health Data. X Users Obliged. Users are uploading medical images to X’s AI chatbot Grok for diagnostic purposes, a practice endorsed by Elon Musk despite concerns about accuracy and privacy. Unlike regulated medical platforms, Grok lacks HIPAA compliance, raising ethical questions about data security. While AI shows promise in healthcare, experts warn of risks related to inaccurate diagnoses and privacy violations.
ElevenLabs now offers the ability to build conversational AI agents. ElevenLabs, a startup that provides AI voice cloning and a text-to-speech API, launched the ability to build conversational AI bots on Monday.
New OpenAI emails reveal a long history of mistrust. Greg Brockman and Ilya Sutskever had questions about Sam Altman’s intentions as early as 2017
Musk’s amended lawsuit against OpenAI names Microsoft as a defendant. Elon Musk’s lawsuit against OpenAI accusing the company of abandoning its nonprofit mission was withdrawn in July, only to be revived in August. Now, in an amended complaint, the suit names new defendants, including Microsoft, LinkedIn co-founder Reid Hoffman, and former OpenAI board member and Microsoft VP Dee Templeton.

Context vs. Prior Knowledge: How to Modify LLM Behavior

Unveiling the Mechanism Behind Controlling Sensitivity in Language Models

levelup.gitconnected.com

Resources

OpenCoder: The Open Cookbook for Top-Tier Code Large Language Models. introduces OpenCoder, a completely open-source LLM tailored for code generation and comprehension; the authors highlight key elements for creating top-performing code LLMs: (1) rigorous data cleaning using code-specific heuristic rules for deduplication, (2) effective recall of related text corpus for code context, and (3) high-quality synthetic data utilized in both annealing and supervised fine-tuning phases; OpenCoder outperforms previous open models at the 6B+ parameter level and provides not only the model weights but also the full training pipeline, datasets, and protocols to support reproducible research.
A Taxonomy of AgentOps for Enabling Observability of Foundation Model-based Agents. examines AgentOps platforms and tools, emphasizing the necessity of robust observability and traceability features to maintain reliability in foundation model-based autonomous agent systems throughout their development and production lifecycle.
Mixture-of-Transformers: A Sparse and Scalable Architecture for Multi-Modal Foundation Models. presents Mixture-of-Transformers (MoT), a novel sparse multi-modal transformer architecture that achieves performance comparable to traditional models while using nearly half the computational resources for text and image tasks; MoT matches the performance of a dense baseline while utilizing only 55.8% of the FLOPs.
HtmlRAG: HTML is Better Than Plain Text for Modeling Retrieved Knowledge in RAG Systems. introduces a novel approach that uses HTML instead of plain text for constructing RAG systems; the core insight is that preserving HTML structure retains richer semantic and structural information compared to plain text conversion, which often loses critical formatting like headings, tables, and semantic tags; to handle the challenge of long HTML documents exceeding LLM context windows, the authors design a two-step pruning method: first, cleaning unnecessary HTML elements to cut length by 94%, and then applying a block-tree-based pruning approach that integrates embedding-based and generative pruning to retain essential content; experiments on six QA datasets show that HtmlRAG surpasses existing plain-text methods, confirming the benefits of maintaining HTML structure in RAG systems.
LLaMA-Mesh: Unifying 3D Mesh Generation with Language Models. NVIDIA has developed LLaMA-Mesh, a method that fine-tunes the LLaMA language model to generate 3D meshes from text prompts. By training LLaMA on a curated dataset of 3D dialogues, LLaMA-Mesh enables the model to represent and generate 3D mesh data in plain text format, integrating 3D mesh generation with language understanding.
Your Fixed Watermark is Fragile: Towards Semantic-Aware Watermark for EaaS Copyright Protection. Researchers have introduced the Semantic Perturbation Attack (SPA) to exploit vulnerabilities in current watermarking schemes for Embedding-as-a-Service (EaaS) systems. Traditional watermarking methods often inject fixed signals into embeddings, regardless of the input’s semantics, making them susceptible to adaptive attacks. SPA leverages semantic perturbations to identify and bypass these static watermark signals, effectively compromising watermark verification.
Don’t Look Twice: Faster Video Transformers with Run-Length Tokenization. By adaptively caching video tokens that remain unchanged across frames, you can significantly accelerate run time without sacrificing performance or requiring extra training.
Region-Aware Text-to-Image Generation via Hard Binding and Soft Refinement. An improved technique for generating images with improved control based on chosen regions.
Accurate Image Matching. MOP+MiHo+NCC is a non-deep, modular method for improving image matches using a combination of three techniques. Multiple Overlapping Planes (MOP) clusters inlier matches and use RANSAC to remove outliers. Middle Homography (MiHo) minimizes distortion during planar reprojection. Normalized Cross Correlation (NCC) adjusts keypoint positions post-transformation.
The Beginner’s Guide to Visual Prompt Injections. Visual prompt injections present security threats to LLMs like GPT-4V by embedding harmful instructions within images, potentially causing unintended model behavior. These vulnerabilities can manipulate outputs, for instance, by causing the model to overlook certain individuals in images or misrepresent described contexts. With the increasing adoption of generative AI, companies must implement strong security measures to address these risks.
PyGen: Turning Your Ideas into Python Package. PyGen simplifies the process of turning your ideas into software, making coding more accessible and enjoyable. Leveraging advanced language models, PyGen acts like a tech-savvy assistant, transforming abstract concepts into complete Python tools, including testing and documentation.
UltraVox Audio Language Models. A suite of open-weight models that can take text and audio as input modalities.
Pixtral large. Pixtral Large is a 124B open-weight multimodal model built upon Mistral Large 2. As the second model in this multimodal series, it showcases advanced image comprehension, capable of interpreting documents, charts, and natural images, while retaining the top-tier text understanding of Mistral Large 2.
LLaVA-o1: Let Vision Language Models Reason Step-by-Step. Although this isn’t an exact replication of the training process used for o1, it remains a robust VLM trained on reasoning traces.
CLIP for Semantic Segmentation. Although CLIP has excelled in open-vocabulary tasks, it faces challenges in semantic segmentation due to noisy features and limited resolution. Trident tackles the resolution problem with a training-free framework, integrating CLIP and DINO features from sub-images and employing SAM’s encoder for global feature aggregation.
Confidence-aware Denoised Fine-tuning of Off-the-shelf Models for Certified Robustness. This work focuses on improving the certified robustness of smoothed classifiers by fine-tuning off-the-shelf models
ReCapture: Generative Video Camera Controls for User-Provided Videos using Masked Video Fine-Tuning. This paper from Google demonstrates a method for altering the camera viewpoint of an existing video.
Evaluating-Constitutions. Code to assist in evaluating constitutions based on human feedback.
StableV2V: Stabilizing Shape Consistency in Video-to-Video Editing. StableV2V is a novel video editing framework that maintains shape consistency across frames, even when user prompts require significant transformations. This method ensures smooth and precise modifications throughout the video, preserving structural integrity
CCExpert: Advancing MLLM Capability in Remote Sensing Change Captioning with Difference-Aware Integration and a Foundational Dataset. CCExpert is an AI model developed to describe changes in images using natural language. It can identify what has changed, where the change occurred, and how it happened.
SAM Decoding: Speculative Decoding via Suffix Automaton. SAM-Decoding offers a faster method for text generation in LLMs by utilizing a suffix automaton to create drafts efficiently and accurately.
That Chip Has Sailed: A Critique of Unfounded Skepticism Around AI for Chip Design. DeepMind has issued a robust defense of its AlphaChip project, which has faced criticism from some academic circles despite widespread industry adoption. In a recent paper titled “That Chip Has Sailed: A Critique of Unfounded Skepticism Around AI for Chip Design,” DeepMind addresses these critiques, emphasizing AlphaChip’s significant contributions to chip design. The paper highlights AlphaChip’s role in creating superhuman chip layouts for Google’s Tensor Processing Units (TPUs) and its influence on the hardware used globally.
PoM: Efficient Image and Video Generation with the Polynomial Mixer. Polynomial Mixer offers a faster and more memory-efficient alternative to Multi-Head Attention (MHA) in diffusion models used for image and video generation.
Cross-View Geo-Localization. Researchers have created a framework to address the challenges of cross-view geo-localization, including variations in viewpoints and large-scale global contexts.
A statistical approach to model evaluations. When two models are evaluated on a benchmark, declaring one as superior to the other often lacks strong confidence. This research from Anthropic introduces robust statistical methods to reliably determine when one model genuinely outperforms the other.
Software is a team sport. GitHub Copilot, utilized by over 2.8 million developers, enhances the development experience with AI-powered features such as code completion, debugging, and secure code reviews. Developers can select AI models from providers like OpenAI and Google within Visual Studio Code. Integration with Azure and tools like GitHub Actions streamlines cloud deployments and continuous integration/continuous deployment (CI/CD) processes.
Prompt Injecting Your Way To Shell: OpenAI’s Containerized ChatGPT Environment. This article examines the interactive features of OpenAI’s Debian-based sandbox environment for ChatGPT, revealing surprising details about its structure. Users can run Python scripts, manage files, and possibly expose core instructions through prompt engineering. These capabilities have sparked debates around transparency and privacy. While designed as intentional features, OpenAI does not consider them security vulnerabilities unless they result in breaches of the sandbox environment.

The Art of LLM Bonsai: How to Make Your LLM Small and Still Beautiful

Mastering the Balance Between Efficiency and Accuracy in LLM Quantization

levelup.gitconnected.com

Perspectives

AI could cause ‘social ruptures’ between people who disagree on its sentience.AI could cause ‘social ruptures’ between people who disagree on its sentience
Is this (finally) the end for X? Delicate Musk-Trump relationship and growing rivals spell trouble for the platform. The former Twitter could fade away, or help shape a dark future hosting voices of a new authoritarian world
‘Have your bot speak to my bot’: can AI productivity apps turbocharge my life? I tried out organizational software to help streamline my work and build a ‘second brain’. I never knew there were so many different ways to take notes…
Is “AI welfare” the new frontier in ethics? A few months ago, Anthropic quietly hired its first dedicated “AI welfare” researcher, Kyle Fish, to explore whether future AI models might deserve moral consideration and protection, reports AI newsletter Transformer. While sentience in AI models is an extremely controversial and contentious topic, the hire could signal a shift toward AI companies examining ethical questions about the consciousness and rights of AI systems.
What if AI doesn’t just keep getting better forever? Recent reports suggest that traditional large language model (LLM) training is encountering diminishing returns, with newer models like OpenAI’s Orion showing only modest improvements over predecessors. Experts are concerned about the scarcity of high-quality textual data for LLM training, leading to a shift towards synthetic data and specialized AI models. Future advancements may prioritize enhancing reasoning capabilities and developing task-specific models over general scaling.
AI Makes Tech Debt More Expensive. AI amplifies the cost of tech debt by widening the velocity gap between low-debt and high-debt codebases.
Where’s My Robot Butler? Advancements in AI and robotics are speeding up the creation of humanoid robots like Atlas, Optimus, and Neo, designed to handle domestic tasks similar to Rosie from “The Jetsons.” However, developing cost-effective, safe, and efficient actuators remains a challenge. AI models play a vital role in training these robots for autonomous, complex tasks. Although there has been notable progress, these robots are currently better suited for industrial applications and may only become practical for home use with major breakthroughs.
Google’s head of research on whether ‘learn to code’ is still good advice in the age of AI. Even though AI can manage some coding tasks, having a fundamental understanding of coding remains essential and opens up new opportunities in various fields, such as healthcare and education.
Why are we using LLMs as calculators? Researchers are experimenting with LLMs’ ability to solve math problems to assess their reasoning capabilities.
GPTs Are Maxed Out. OpenAI’s next-generation model, internally called Orion, is said to fall short of expectations set by Sam Altman, hinting at a possible limit to the scalability of AI model improvements.
Can Google Scholar survive the AI revolution? The largest scholarly search engine is celebrating its 20th birthday, but AI-driven competitors offer advantages.
Computational technologies of the Human Cell Atlas. As the international effort reaches a ‘critical mass’ of achievements, Nature highlights seven tools that are poised to enable the next set of discoveries.
Can a fluffy robot really replace a cat or dog? My weird, emotional week with an AI pet. Casio says Moflin can develop its own personality and build a rapport with its owner — and it doesn’t need food, exercise or a litter tray. But is it essentially comforting or alienating?
The Evolution of the Creator. Generative AI is transforming the creator economy by reducing production barriers, and allowing creators to produce high-quality content effortlessly. Innovations like digital clones are reshaping content distribution and engagement, unlocking new monetization opportunities by scaling interactions and fan transactions. With AI revolutionizing creation, distribution, and monetization, the creator economy is poised to give rise to a new generation of major tech companies.
‘A place of joy’: why scientists are joining the rush to Bluesky. Researchers say the social-media platform — an alternative to X — offers more control over the content they see and the people they engage with.
Tülu 3: The next era in open post-training. An open-source, cutting-edge post-training framework offering open data, training code, model weights, and scientific insights. It may be the most comprehensive resource for understanding modern post-training techniques for large language models.
We can all be AI engineers — and we can do it with open-source models. The barriers to AI engineering are quickly lowering as improved tools and standardized workflows streamline complex processes. Creating AI applications now involves applying basic engineering skills to utilize models, prompts, integrations, testing, and deployment. Open-source models ensure data privacy while existing DevOps tools support the development and management of AI applications.
‘An AI Fukushima is inevitable’: scientists discuss technology’s immense potential and dangers. Experts are optimistic about energy and drug production breakthroughs but also fear its potential misuse

Open the Artificial Brain: Sparse Autoencoders for LLM Inspection

A deep dive into LLM visualization and interpretation using sparse autoencoders

towardsdatascience.com

Meme of the week

Teach What You Know, Learn What Is Hard to Master

Adaptive Knowledge Distillation for Efficient Learning from Large Language Models

levelup.gitconnected.com

What do you think about it? Some news that captured your attention? Let me know in the comments

If you have found this interesting:

You can look for my other articles, and you can also connect or reach me on LinkedIn. Check this repository containing weekly updated ML & AI news. I am open to collaborations and projects and you can reach me on LinkedIn. You can also subscribe for free to get notified when I publish a new story.

Get an email whenever Salvatore Raieli publishes.

Get an email whenever Salvatore Raieli publishes. By signing up, you will create a Medium account if you don’t already…

salvatore-raieli.medium.com

Here is the link to my GitHub repository, where I am collecting code and many resources related to machine learning, artificial intelligence, and more.

GitHub — SalvatoreRa/tutorial: Tutorials on machine learning, artificial intelligence, data science…

Tutorials on machine learning, artificial intelligence, data science with math explanation and reusable code (in python…

github.com

or you may be interested in one of my recent articles:

What Is The Best Therapy For a Hallucinating AI Patient?

Exploring the Art and Science of Prompt Engineering to Cure LLM Hallucinations

levelup.gitconnected.com

LLMs and the Student Dilemma: Learning to Solve or Learning to Remember?

Investigating Whether Large Language Models Rely on Genuine Understanding or Clever Heuristics in Arithmetic Reasoning

levelup.gitconnected.com

You Know Nothing, John LLM: Why Do You Answer Anyway?

Distinguishing Knowledge Gaps from Misguided Confidence in Large Language Models

levelup.gitconnected.com

The Savant Syndrome: Is Pattern Recognition Equivalent to Intelligence?

Exploring the limits of artificial intelligence: why mastering patterns may not equal genuine reasoning

towardsdatascience.com

The Cultural Lens of AI: Which Party Would Your LLM Vote?

Unveiling Ideological Bias Across Languages and Cultures in Large Language Models

levelup.gitconnected.com

WEEKLY AI NEWS: RESEARCH, NEWS, RESOURCES, AND PERSPECTIVES

AI & ML news: Week

Spotify’s AI Innovations in Music, Podcasts, and Recommendations. AI Model Identifies Brain Tumors in 10 Seconds, US Justice Department Pushes Google to Sell Chrome, Breakthrough Robot Performs Surgeries After Watching Videos, and much more

GitHub — SalvatoreRa/ML-news-of-the-week: A collection of the the best ML news every week…

A collection of the the best ML news every week (research, news, resources) — GitHub — SalvatoreRa/ML-news-of-the-week…

Weekly AI and ML news - each week the best of the field

Research

Trapped in the Net: Where is a Foundation Model for Graphs?

Disconnected from the other modalities graphs wait for their AI revolution: is it coming?

News

Context vs. Prior Knowledge: How to Modify LLM Behavior

Unveiling the Mechanism Behind Controlling Sensitivity in Language Models

Resources

The Art of LLM Bonsai: How to Make Your LLM Small and Still Beautiful

Mastering the Balance Between Efficiency and Accuracy in LLM Quantization

Perspectives

Open the Artificial Brain: Sparse Autoencoders for LLM Inspection

A deep dive into LLM visualization and interpretation using sparse autoencoders

Meme of the week

Teach What You Know, Learn What Is Hard to Master

Adaptive Knowledge Distillation for Efficient Learning from Large Language Models

What do you think about it? Some news that captured your attention? Let me know in the comments

If you have found this interesting:

Get an email whenever Salvatore Raieli publishes.

Get an email whenever Salvatore Raieli publishes. By signing up, you will create a Medium account if you don’t already…

GitHub — SalvatoreRa/tutorial: Tutorials on machine learning, artificial intelligence, data science…

Tutorials on machine learning, artificial intelligence, data science with math explanation and reusable code (in python…

What Is The Best Therapy For a Hallucinating AI Patient?

Exploring the Art and Science of Prompt Engineering to Cure LLM Hallucinations

LLMs and the Student Dilemma: Learning to Solve or Learning to Remember?

Investigating Whether Large Language Models Rely on Genuine Understanding or Clever Heuristics in Arithmetic Reasoning

You Know Nothing, John LLM: Why Do You Answer Anyway?

Distinguishing Knowledge Gaps from Misguided Confidence in Large Language Models

The Savant Syndrome: Is Pattern Recognition Equivalent to Intelligence?

Exploring the limits of artificial intelligence: why mastering patterns may not equal genuine reasoning

The Cultural Lens of AI: Which Party Would Your LLM Vote?

Unveiling Ideological Bias Across Languages and Cultures in Large Language Models

Sign up to discover human stories that deepen your understanding of the world.

Free

Membership

Written by Salvatore Raieli

No responses yet