WEEKLY AI NEWS: RESEARCH, NEWS, RESOURCES, AND PERSPECTIVES

AI & ML news: Week 14–20 October

Microsoft Patents Audio-to-Image Generator, Google’s World-First Nuclear Power Deal for AI Data centers, AMD Launches AI Chip to Compete with Nvidia, and much more

Salvatore Raieli

23 min readJust now

Photo by Johann Walter Bantz on Unsplash

The most interesting news, repository, articles, and resources of the week

Check and star this repository where the news will be collected and indexed:

GitHub — SalvatoreRa/ML-news-of-the-week: A collection of the the best ML news every week…

A collection of the the best ML news every week (research, news, resources) — GitHub — SalvatoreRa/ML-news-of-the-week…

github.com

You will find the news first in GitHub. All the Weekly News stories are also collected here:

Salvatore Raieli

Weekly AI and ML news - each week the best of the field

View list

44 stories

Research

Astute RAG: Overcoming Imperfect Retrieval Augmentation and Knowledge Conflicts for Large Language Models. Introduces a novel RAG method to address the challenges of imperfect retrieval augmentation and knowledge conflicts in LLMs. Astute RAG adaptively extracts critical information from the internal knowledge of LLMs, then iteratively merges this with external knowledge while maintaining source awareness. Its interactive consolidation mechanism enhances the integration of internal and external information by identifying consistent passages, detecting conflicting data, and filtering out irrelevant content.
ToolGen: Unified Tool Retrieval and Calling via Generation. Incorporates tool knowledge directly into LLMs by encoding tools as unique tokens, allowing the model to generate tool calls and arguments, facilitating smooth tool invocation alongside natural language generation. Experiments involving over 47,000 tools demonstrate that ToolGen outperforms in both tool retrieval and autonomous task execution.
Long-Context LLMs Meet RAG: Overcoming Challenges for Long Inputs in RAG. Finds that in many long-context LLMs, output quality diminishes as the number of passages increases, with the performance decline attributed to retrieved hard negatives. The authors propose two methods to enhance long-context LLM-based RAG: retrieval reordering and RAG-specific tuning with intermediate reasoning to improve relevance identification. These approaches show marked improvements in both accuracy and robustness in long-context RAG performance.
GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models. Evaluates several state-of-the-art (SoTA) models using a benchmark built with symbolic templates that allow for a range of mathematical problems. The results show that LLMs display variability when answering different versions of the same questions, and their performance drops when numerical values in the questions are adjusted. As the complexity of the questions increases (e.g., adding more clauses), performance deteriorates significantly. The authors suggest that this decline in performance is likely due to a lack of logical reasoning capabilities in current LLMs.
Addition is All You Need for Energy-efficient Language Models. Introduces an algorithm that approximates floating-point multiplication using integer addition operations, making it computationally less intensive than 8-bit floating-point arithmetic while achieving higher precision. The authors report that implementing the proposed L-Mul operation in tensor processing hardware could potentially reduce energy consumption by 95% for elementwise floating-point tensor multiplications and by 80% for dot product operations.
I Want to Break Free! Anti-Social Behavior and Persuasion Ability of LLMs in Multi-Agent Settings with Social Hierarchy. Examines the interaction patterns of LLMs within a multi-agent setting involving a social hierarchy, specifically in a scenario where a guard and a prisoner interact, with the prisoner either seeking extra yard time or attempting to escape. The study finds that when power dynamics are present, LLMs struggle to maintain coherent conversations. Additionally, the authors highlight that agents’ personas significantly influence their behaviors. Interestingly, even without explicit prompting, merely assigning roles to agents resulted in the emergence of anti-social behaviors.
Were RNNs All We Needed? The paper revisits RNNs and demonstrates that removing the hidden states from the input, forget, and update gates allows for efficient parallel training. This adjustment eliminates the need for architectures like LSTMs and GRUs to rely on backpropagation through time (BPTT). They introduce new variants, called minLSTMs and minGRUs, which are 175 times faster for sequences of length 512.
LLMs Know More Than They Show: On the Intrinsic Representation of LLM Hallucinations. The study finds that “truthfulness” information in LLMs is concentrated in specific tokens, offering a way to improve error detection and address related challenges. They also suggest that the internal representations of LLMs can be used to predict the types of errors these models are prone to making.
Archon: An Architecture Search Framework for Inference-Time Techniques. The paper presents a modular framework for constructing and optimizing LLMs by integrating various inference-time techniques. This approach redefines the task of LLM system design as a hyperparameter optimization problem. Tested on benchmarks like MT-Bench and CodeContests, the framework, named Archon, outperforms top models such as GPT-4o and Claude 3.5 Sonnet, achieving a 15.1% average accuracy improvement.
RATIONALYST: Pre-training Process-Supervision for Improving Reasoning. RATIONALYST is a model designed for process-supervision of reasoning, enabling it to generalize across a wide range of reasoning tasks. This is accomplished by pre-training on a dataset of 79k rationales from the Pile and a variety of reasoning datasets, with minimal human involvement. Fine-tuned from LLaMa-3–8B, the model achieves a 3.9% average accuracy improvement across seven reasoning benchmarks.
Fact, Fetch, and Reason: A Unified Evaluation of Retrieval-Augmented Generation. The paper introduces a unified framework to evaluate an LLM’s capability to provide factual responses, assess retrieval skills, and reason through the generation of final answers. The framework includes multi-hop questions that require combining information from multiple sources. It reports that state-of-the-art LLMs struggle with this task, achieving only 40% accuracy without retrieval. However, the proposed multi-step retrieval method improves performance to 66% accuracy.
Not All LLM Reasoners Are Created Equal. The paper introduces a unified framework to evaluate an LLM’s capability to provide factual responses, assess retrieval skills, and reason through the generation of final answers. The framework includes multi-hop questions that require combining information from multiple sources. It reports that state-of-the-art LLMs struggle with this task, achieving only 40% accuracy without retrieval. However, the proposed multi-step retrieval method improves performance to 66% accuracy.
Rejection Sampling IMLE: Designing Priors for Better Few-Shot Image Synthesis. Training generative models like GANs with limited data is challenging. Existing Implicit Maximum Likelihood Estimation (IMLE) methods suffer from poor alignment between the latent codes used during training and those used during inference. The proposed approach, RS-IMLE, modifies the prior distribution during training, resulting in better test-time performance and higher-quality image generation.
Simplifying, Stabilizing and Scaling Continuous-Time Consistency Models. This study introduces a unified framework aimed at enhancing training stability in continuous-time consistency models, leading to substantial improvements in the performance of generative models.
DARNet: Dual Attention Refinement Network with Spatiotemporal Construction for Auditory Attention Detection. DARNet is an innovative model for auditory attention detection (AAD) that improves the decoding of brain signals, such as EEG, by integrating spatiotemporal and dual attention mechanisms.
DuoAttention: Efficient Long-Context LLM Inference with Retrieval and Streaming Heads. DuoAttention is a framework designed to optimize memory usage and reduce latency in long-context large language models (LLMs) by selectively applying full key-value (KV) caching to only the most essential attention heads.
Meta-DT: Offline Meta-RL as Conditional Sequence Modeling with World Model Disentanglement. Meta Decision Transformer (Meta-DT) aims to enhance generalization in reinforcement learning by integrating transformer-based sequential modeling with effective task representation learning.

Less Distraction, More Precision: The Diff Transformer’s Secret to Better Language Models

Unlocking Efficiency in AI: How the Diff Transformer Filters Noise to Enhance Accuracy and Performance

ai.gopubby.com

News

AI gives voice to dead animals in Cambridge exhibition. Creatures can converse and share their stories by voice or text through visitors’ mobile phones at the Museum of Zoology
Three-armed robot conductor makes debut in Dresden. German city’s Sinfoniker says the aim is not to replace humans but to play music human conductors would find impossible
Tesla’s value drops $60bn after investors fail to hail self-driving ‘Cybercab’.Analysts criticize lack of detail about the ‘robotaxi’ showcased by CEO Elon Musk
Microsoft may have an audio-to-image generator in the works, new patent shows. Microsoft has submitted a patent for an AI system that transforms live audio into images using large language models (LLMs). The system is intended to improve communication by creating real-time visuals from audio streams. Once developed, it could potentially be incorporated into Microsoft Teams through Copilot integration.
Australia’s spy chief warns AI will accelerate online radicalization. Asio boss Mike Burgess says social media impact is a ‘step-change’ in the threat posed by extremism
Google to buy nuclear power for AI datacentres in ‘world first’ deal. Tech company orders six or seven small nuclear reactors from California’s Kairos Power
Silicon Valley is debating if AI weapons should be allowed to decide to kill. In late September, Shield AI co-founder Brandon Tseng swore that weapons in the U.S. would never be fully autonomous — meaning an AI algorithm would make the final decision to kill someone. “Congress doesn’t want that,” the defense tech founder told TechCrunch. “No one wants that.”
Zoom’s custom AI avatar tool may come with risks. The upcoming feature, announced today at Zoom’s annual dev conference, will translate a video clip that users record of themselves into a digital clone — complete with a head, upper arms, and shoulders. Users will be able to type a script of what they want the digital double to say, and Zoom will generate audio that syncs with the avatar’s lip movements.
Generate Video (beta) on Firefly Web App. During the Adobe MAX conference, Adobe revealed the extension of its Firefly series of creative generative AI models to include video.
OpenAI appoints international expansion boss. OpenAI has named Oliver Jay as the head of its international expansion, with a focus on AI strategy and operations. The company also revealed the opening of a new APAC office in Singapore and is working on developing datasets for local languages. The o1 model, which incorporates “chain of thought” methods, is designed to improve AI accuracy.
Anthropic challenges OpenAI with affordable batch processing. Anthropic has introduced a Message Batches API, enabling businesses to handle large data volumes at half the cost of traditional API calls. The API allows for up to 10,000 asynchronous queries within 24 hours, providing a cost-efficient solution by shifting AI processing from real-time to “right-time.” This approach encourages AI adoption among mid-sized companies but may draw attention away from the advancement of real-time AI capabilities.
OpenAI Projections Imply Losses Tripling To $14 Billion In 2026. OpenAI projects losses to rise to $14 billion in 2026, with total losses reaching $44 billion by 2028.
AMD launches AI chip to rival Nvidia’s Blackwell. AMD has introduced the Instinct MI325X AI chip, targeting competition with Nvidia’s leading data center GPUs.
Meta’s open AI hardware vision. Meta unveiled its open AI hardware designs, including the Catalina rack and the enhanced Grand Teton platform, at the OCP Global Summit. Notably, training the Llama 3.1 405B model required 16,000 NVIDIA H100 GPUs, demonstrating Meta’s robust scaling infrastructure. These open AI hardware systems are essential for driving further advancements in AI capabilities.
The New York Times warns AI search engine Perplexity to stop using its content. The New York Times has sent a cease and desist letter to AI startup Perplexity, accusing the company of using its content without authorization for AI search purposes. Perplexity asserts that it does not scrape content for training but instead indexes web pages to provide factual information. The company is currently in discussions with publishers and seeks to resolve the matter by collaborating with the Times and other media organizations.
Decagon raises $65m Series B led by Bain Capital Ventures to bring total funding to $100m. Decagon has secured $65 million in Series B funding to further develop its AI customer support agents, which are already utilized by companies such as Duolingo and Eventbrite to streamline customer interactions. These AI agents automate routine tasks, allowing customer support teams to focus on more strategic roles. The funding will be used to strengthen Decagon’s engineering team and extend its AI solutions into new markets and industry sectors.
New high-quality AI video generator Pyramid Flow launches — and it’s fully open source! The number of AI video generation models continues to grow with a new one, Pyramid Flow, launching this week and offering high-quality video clips up to 10 seconds in length — quickly, and all open source.
This three-person robotics startup is working with designer Yves Béhar to bring humanoids home. Kind Humanoid’s three-person team is developing a whimsical humanoid robot named Mona, specifically designed for home use rather than industrial applications. The team aims to conduct field tests with a dozen initial prototypes next year. Unlike many AI-driven robotics companies that focus on industrial markets and heavy fundraising, Kind prioritizes innovation and efficiency, setting its approach apart from competitors in the robotics space.
INTELLECT–1: Launching the First Decentralized Training of a 10B Parameter Model. INTELLECT-1 is the first decentralized model with 10 billion parameters, designed to harness global contributions for open-source AGI development. It utilizes OpenDiLoCo scaling to train large models across distributed devices, with innovations in bandwidth efficiency and fault tolerance. The new Prime framework further enhances decentralized training by optimizing compute utilization, achieving a 98% utilization rate during INTELLECT-1’s 10-billion-parameter training run. This marks a significant advancement in decentralized AI model training.
Elon Musk Shows Off Tesla ‘Robotaxi’ That Drives Itself. You could fall asleep and wake up at your destination,” said Mr. Musk, Tesla’s C.E.O., but some experts are skeptical that such cars will be ferrying passengers soon.
ByteDance lays off hundreds of TikTok employees in the shift to AI content moderation. ByteDance’s TikTok is laying off hundreds of employees, mainly in Malaysia, according to Reuters. The cuts come as the social network is increasingly turning to AI for content moderation. The cuts do not impact employees in the U.S.
Microsoft Artificial Intelligence VP Bubeck to Join OpenAI. Microsoft Corp. said one of its artificial intelligence vice presidents, Sebastien Bubeck, is leaving to join OpenAI, where Microsoft is both the largest investor and a rival.
‘It’s not me, it’s just my face’: the models who found their likenesses had been used in AI propaganda. London-based Synthesia’s technology was employed to make deepfake videos for authoritarian regimes
Amazon.com joins push for nuclear power to meet data center demand. Company says it signed three agreements on developing small modular reactor nuclear power technology
Un Ministral, des Ministraux. On the first anniversary of Mistral 7B, Mistral launched two advanced models designed for on-device and edge computing: Ministral 3B and Ministral 8B. These models are optimized for tasks under 10 billion parameters, offering superior knowledge, reasoning, and efficiency. They also support a context length of up to 128k and deliver faster inference.
Former Palantir CISO Dane Stuckey joins OpenAI to lead security. Dane Stuckey, the former CISO of analytics firm Palantir, has joined OpenAI as its newest CISO, serving alongside OpenAI head of security Matt Knight.
Can AI really compete with human data scientists? OpenAI’s new benchmark puts it to the test. OpenAI has introduced a new tool to measure artificial intelligence capabilities in machine learning engineering. The benchmark, called MLE-bench, challenges AI systems with 75 real-world data science competitions from Kaggle, a popular platform for machine learning contests.
Adobe’s AI video model is here, and it’s already inside Premiere Pro. New beta tools allow users to generate videos from images and prompts and extend existing clips in Premiere Pro.
Customize Audio Overviews with Google’s NotebookLM. NotebookLM now enables users to customize their Audio Overview experience, providing greater control over the areas of focus and expertise of the AI hosts. Companies can apply for the new NotebookLM Business pilot program, which includes improved tools designed for professional applications.
Combining next-token prediction and video diffusion in computer vision and robotics. A new method can train a neural network to sort corrupted data while anticipating next steps. It can make flexible plans for robots, generate high-quality video, and help AI agents navigate digital environments.
Nvidia just dropped a new AI model that crushes OpenAI’s GPT-4 — no big launch, just big results. Nvidia quietly unveiled a new artificial intelligence model on Tuesday that outperforms offerings from industry leaders OpenAI and Anthropic, marking a significant shift in the company’s AI strategy and potentially reshaping the competitive landscape of the field.
Invisible text that AI chatbots understand and humans can’t? Yep, it’s a thing. A quirk in the Unicode standard harbors an ideal steganographic code channel.
Google supercharges Shopping tab with AI and personalized recommendation feed. After bringing generative AI to Search in 2023, Google is supercharging its Shopping tab with the technology. The company announced on Tuesday that it will use AI to help users shop for products based on exactly what they’re looking for. It also launched a new scrollable feed of personalized, shoppable products.
Adobe’s Project Super Sonic uses AI to generate sound effects for your videos. Adobe’s Project Super Sonic leverages text-to-audio technology, object recognition, and voice input to create audio effects for video projects.
White House considers expanding Nvidia’s and AMD’s AI chip export limits to additional countries. The Biden administration is contemplating limitations on AI chip sales from Nvidia and AMD to countries in the Persian Gulf, citing national security concerns.

Power Corrupts: Hierarchies, Persuasion, and Anti-Social Behavior in LLMs

Unraveling Power Dynamics and Ethical Implications in LLM Agents

ai.gopubby.com

Resources

MLE-bench: Evaluating Machine Learning Agents on Machine Learning Engineering. It introduces a new benchmark to assess machine learning agents’ proficiency in machine learning engineering tasks. The benchmark consists of 75 Kaggle competitions focused on key MLE skills, including model training, dataset preparation, and experiment execution. OpenAI’s o1-preview model, utilizing the AIDE scaffolding, reaches a bronze medal level in 16.9% of the competitions.
Optima: Optimizing Effectiveness and Efficiency for LLM-Based Multi-Agent System. Presents a novel framework aimed at improving both communication efficiency and task effectiveness in LLM-based multi-agent systems through targeted LLM training. It introduces an iterative “generate, rank, select, and train” approach, enhanced by a reward function to optimize performance, token usage, and communication efficiency. The framework integrates Monte Carlo Tree Search-inspired techniques for DPO data generation, promoting diverse exploration. Experimental results show consistent improvements over single-agent baselines and standard multi-agent systems (MAS) using Llama 3 8B, achieving a 2.8x performance boost while utilizing fewer than 10% of tokens on tasks involving extensive information exchange.
Zyphra’s Mamba 2 based model beats Mistral. Introduces the first state space-style model that surpasses transformers at the 7B scale. It excels in understanding and generating long-context data, thanks to the linear time scaling of the Mamba 2 blocks, which significantly enhances its efficiency and performance.
OpenAI’s Swarm. OpenAI has introduced a lightweight framework designed to facilitate communication between agents. While it will not receive further updates, the framework could still offer valuable ideas and inspiration for future developments.
EvolveDirector: Approaching Advanced Text-to-Image Generation with Large Vision-Language Models. EvolveDirector aims to develop a competitive text-to-image generation model using open, publicly available resources, avoiding the limitations imposed by proprietary models.
Rethinking the Evaluation of Visible and Infrared Image Fusion. Researchers propose the Segmentation-oriented Evaluation Approach (SEA) to improve the evaluation of Visible and Infrared Image Fusion (VIF) techniques, which play a critical role in applications such as object detection and semantic segmentation.
A Gentle Introduction and Tutorial on Deep Generative Models in Transportation Research. A gentle introduction and tutorial on deep generative models in transportation research provides a comprehensive overview of how these models can be applied to solve transportation problems.
Trans4D: Realistic Geometry-Aware Transition for Compositional Text-to-4D Synthesis. Trans4D is a new framework developed to address the challenges of realistic 4D scene transitions, enhancing text-to-4D synthesis. It offers improved capabilities in generating coherent, dynamic 4D scenes from textual descriptions, making it more suitable for tasks that require accurate spatial and temporal scene transitions.
DocMTAgent. DelTA, short for Document-levEL Translation Agent, is an online translation tool designed for handling document-level translations. It leverages a multi-level memory architecture to improve translation accuracy and coherence across larger texts, providing more context-aware translations compared to sentence-level models.
Fast Feedforward 3D Gaussian Splatting Compression. Fast Compression of 3D Gaussian Splatting (FCGS) is a new model designed to eliminate the need for the slow, per-scene optimization required by earlier methods. Instead, FCGS achieves rapid compression using a quick feed-forward pass, reducing the processing time from minutes to just seconds. This significantly accelerates the compression process while maintaining high-quality results for 3D data.
OneRef: Unified One-tower Expression Grounding and Segmentation with Mask Referring Modeling. OneRef presents an optimized framework for referring segmentation by integrating visual and language feature spaces within a unified transformer architecture.
SmartPretrain: Model-Agnostic and Dataset-Agnostic Representation Learning for Motion Prediction. SmartPretrain offers a versatile, model-agnostic, and dataset-agnostic self-supervised learning framework designed to enhance motion prediction in autonomous vehicles.
UvA — An Introduction to Group Equivariant Deep Learning. Resources for studying deep learning techniques applied to specific types of geometric data while addressing architectural limitations.
Diffusion model simulating CS:GO. An open-source replication of a diffusion model that generates visual simulations of a video game, using keyboard and mouse inputs to influence the output.
Reward-Augmented Data Enhances Direct Preference Alignment of LLMs. This study addresses the shortcomings of current alignment algorithms in large language models (LLMs), which tend to overfit to relative preferences and neglect response quality. The authors introduce reward-conditioned LLM policies and a novel data relabeling method that incorporates response quality, enabling the model to better generalize to optimal responses.
entropix. Entropix is a tool designed to modify the sampling behavior of language models.
LoLCATs Blog Part 2: How to Linearize LLMs for Me and You. Hazy Research has published another insightful post that delves into techniques for linearizing existing language models while maintaining much of their performance. This exploration highlights methods to simplify model architectures, making them more efficient, without significantly compromising their effectiveness in tasks like text generation and understanding.
TextCtrl: Diffusion-based Scene Text Editing with Prior Guidance Control. TextCtrl is a newly introduced diffusion-based method designed to enhance scene text editing. It achieves a balance between maintaining content accuracy and preserving the original style, ensuring that both the textual content and the visual appearance remain consistent during edits.
Generalizable Humanoid Manipulation with Improved 3D Diffusion Policies. iDP3 is an advanced 3D visuomotor policy designed to enable humanoid robots to autonomously navigate and perform tasks in a variety of real-world environments. This improved policy enhances the robot’s ability to perceive and interact with its surroundings, making it more adaptable and efficient in complex and dynamic settings.
tabled. Tabled is a small library for detecting and extracting tables. It uses Surya to find all the tables in a PDF, identifies the rows/columns, and formats cells into markdown, csv, or HTML.
HART: Efficient Visual Generation with Hybrid Autoregressive Transformer. HART is a cutting-edge visual generation model designed to produce high-quality 1024x1024 images, presenting a challenge to the capabilities of diffusion models. It enhances image reconstruction and reduces training costs by employing a hybrid tokenizer that integrates both discrete and continuous tokens, resulting in more efficient and effective image generation.
DeBiFormer: Vision Transformer with Deformable Agent Bi-level Routing Attention. The Deformable Bi-level Routing Attention (DBRA) module is an innovation designed to enhance attention mechanisms in vision transformers. DeBiFormer, which is built upon DBRA, optimizes the selection of key-value pairs in the attention process, resulting in more efficient computations and better interpretability of queries within attention maps. This leads to improved performance and understanding of how the model attends to different parts of an image.
Six tips for going public with your lab’s software. It’s not enough to write high-quality programs. If you want to make your apps public — and usable — you should also follow these steps.
CoTracker3: Simpler and Better Point Tracking by Pseudo-Labelling Real Videos. CoTracker is a newly developed tracking model that bridges the performance gap between synthetic and real video data by employing semi-supervised training techniques.
A Consistency-Aware Spot-Guided Transformer for Versatile and Hierarchical Point Cloud Registration. Researchers have developed a novel consistency-aware spot-guided Transformer designed to improve the efficiency and accuracy of point cloud registration.
Ditto — the simplest self-building coding agent. Ditto is a user-friendly tool that allows you to generate a multi-file Flask application from simple natural language descriptions using a no-code interface. By leveraging a simple LLM loop with a few tools, Ditto automates the coding process, (occasionally) turning your ideas into functional web applications (or at least trying and getting close).
F5 Text-to-Speech System. F5-TTS is a non-autoregressive, zero-shot text-to-speech system featuring a flow-matching mel spectrogram generator and a diffusion transformer. Developed on the MLX framework, F5 outperforms earlier systems such as E2 TTS by incorporating ConvNeXT v2 blocks for improved text alignment, enabling high-quality speech generation in approximately 11 seconds on modern hardware.
Movie Gen Bench. ”Movie Gen Bench” is an evaluation benchmark designed to assess performance in both video (Video Bench) and audio (Audio Bench). It includes 1,003 prompts that encompass a variety of testing aspects and concepts.LongAlign.LongAlign enhances the capability of text-to-image (T2I) diffusion models to process lengthy text inputs by incorporating segment-level encoding and a decomposed preference optimization approach.
Stabilize the Latent Space for Image Autoregressive Modeling: A Unified Perspective. DiGIT is an auto-regressive generative model that forecasts tokens in a latent space through self-supervised learning. This discrete tokenizer enhances image generation on ImageNet by clustering hidden states derived from DINOv2.
FL-Launching (Fling). The FedPart method tackles the layer mismatch problem in federated learning by limiting model updates to designated layers in each training round.
Distributed Training Guide. This is an in-depth guide on best practices for distributed training, troubleshooting errors, and maximizing the use of available resources.

Neighbors Count: Boosting Document Embeddings with Contextual Encoding

Harnessing Neighboring Documents to Elevate Retrieval Accuracy through Context-Aware Embeddings

levelup.gitconnected.com

Perspectives

Nobel winner Geoffrey Hinton is the ‘godfather of AI’. Here’s an offer he shouldn’t refuse… The computer scientist’s dogged belief in the potential of neural networks helped unlock machine learning. But he’d be wise to remember the experience of a fellow laureate
Machines of Loving Grace. Dario Amodei, CEO of Anthropic, often writes internal memos, and one of them was published externally. In this memo, he explores the potential extremely positive impact of successfully building powerful AI systems. He envisions how AI could radically transform the world for the better, improving areas like science, economics, and societal well-being, while acknowledging the immense responsibility of ensuring AI development is aligned with human interests and safety.
This AI-Powered Invention Machine Automates Eureka Moments. Iprova’s AI-driven software analyzes diverse technical literature to generate patentable inventions by linking previously unrelated ideas. It uses semantic search and generative AI to identify novel inventions for companies like Procter & Gamble and Panasonic. Although AI plays a key role, human insight remains essential for applying the inventions practically, especially in fast-evolving industries. Iprova highlights the importance of human creativity in refining and validating invention ideas, ensuring that AI serves as a tool to enhance rather than replace human innovation.
Burn the Playbooks. AI excels at tasks that follow structured rulesets, such as automating tax processes or solving math problems, where it can often outperform humans. However, relying too much on playbook-driven approaches in our work risks stifling human creativity, a key trait that differentiates us from machines. Overemphasizing formulaic tasks could make us more dependent on AI’s strengths, limiting our own unique creative potential and inadvertently making us more “machine-like” in areas where creativity and flexibility are crucial.
Hurricane Helene and the ‘Fuck It’ Era of AI-Generated Slop. An AI-generated image depicting Hurricane Helene has gone viral, despite viewers being fully aware that it isn’t real. The image has sparked widespread attention and discussion, highlighting the power of AI-generated content to captivate audiences even when the authenticity is known. This trend reflects the growing influence of AI in shaping public perception and the viral nature of digital content.
OpenAI pursues public benefit structure to fend off hostile takeovers. OpenAI is planning to restructure as a public benefit corporation (PBC) to safeguard against hostile takeovers and ensure its mission of benefiting humanity remains intact. This change will help OpenAI maintain its commitment to ethical AI development, prioritizing public good over profit while allowing the organization to continue innovating in a sustainable and mission-driven way.
Al Will Take Over Human Systems From Within. In this post, Yuval Noah Harari, the Israeli historian and author of “Sapiens,” “Homo Deus,” and “Nexus,” explores the impact of information networks and AI on societal narratives, which can either unite or fragment communities. He cautions that AI, functioning as an “alien intelligence,” could centralize power due to its lack of self-correcting mechanisms, potentially threatening democratic systems. Harari stresses the importance of strong institutions to uphold truth in a world increasingly influenced by AI-driven decision-making across different sectors.
Sticky humans in a post-AGI world. AI tutors encounter considerable difficulties in replicating the social and intellectual interactions offered by human teachers. Although AI has made progress, it still falls short in handling complex educational tasks and cannot deliver the nuanced socio-intellectual experiences that human educators provide. A hybrid approach, where AI complements rather than replaces human teachers, may be more effective, given the essential social and cultural elements of the learning process.
AI has dreamt up a blizzard of new proteins. Do any of them actually work? Emerging protein-design competitions aim to sift out the functional from the fantastical. However, researchers hope that the real prize will be a revolution in the field.
Considerations for governing open foundation models. Foundation models drive AI innovation, but debates on their release — whether open or closed — raise concerns about potential risks and the impact of regulations on innovation.
I AI-generated some podcasts — and the results are uncanny. Google’s new tool NotebookLM lets you create podcasts at the click of a button. They’re way more realistic than you’d think …
SB 1047: Our Side Of The Story. California’s proposed SB 1047, which sought to require AI companies to address existential risks posed by their technologies, was vetoed by Governor Newsom. He argued that the bill did not adequately regulate smaller, potentially dangerous AI models. Despite strong support from AI safety advocates like Dan Hendrycks and high-profile figures such as Elon Musk, the bill faced opposition from major AI companies, including OpenAI and Google. Newsom’s veto has sparked discussions within the AI community about future regulatory strategies and potential collaborations with broader political groups to create comprehensive AI safety measures.
Overview of strong human intelligence amplification methods. Advancements in AI depend on developing humans with enhanced cognitive abilities to effectively manage the complexities of AGI development. Approaches such as brain emulation, genomic modifications, adult brain gene editing, and brain-brain interfaces are being explored, each presenting distinct challenges and risks. These efforts are aimed at solving deep philosophical issues, significantly amplifying human intelligence, and addressing the potential threats posed by AGI.
LLMs don’t do formal reasoning — and that is a HUGE problem. A study conducted by Apple raises questions about the effectiveness of large language models (LLMs), revealing that they primarily depend on pattern matching instead of formal reasoning. This reliance results in fragile and inconsistent outcomes, challenging the robustness of LLMs in tasks requiring deeper cognitive processes.
Why ChatGPT maker OpenAI is at fight with Open AI. OpenAI is currently engaged in a legal dispute with Guy Ravine’s company, Open AI, over the rights to the “Open AI” name and the original open-source AI vision. The conflict centers on ownership of the name and the direction of the open-source principles that initially defined the AI development approach.
AI mediation tool may help reduce culture war rifts, say researchers. A system built by the Google DeepMind team takes individual views and generates a set of group statements
Here’s the deal: AI giants get to grab all your data unless you say they can’t. Fancy that? No, neither do I. Data is vital to AI systems, so firms want the right to take it and ministers may let them. We must wake up to the danger
Where’s The Generative AI ROI? Start With The Supply Chain. Generative AI is revolutionizing supply chain operations by effectively managing unstructured documents, resulting in substantial time and cost savings. Flexport, a technology company focused on supply chain solutions, has effectively implemented AI to automate and optimize document management, cutting processing time by 80%. This use of AI highlights its practical value in revenue-generating activities rather than merely in theoretical advancements.

AI Search Engine: Finding Ariadne’s Thread or Losing the Way

Exploring the Pathways and Pitfalls of Multimodal AI Search with Large Language Models

ai.gopubby.com

Meme of the week

OpenAI’s New ‘Reasoning’ AI Models Arrived: Will They Survive the Hype?

Will the Captain Catch the Whale of Reasoning or Sink in the Pursuit

levelup.gitconnected.com

What do you think about it? Some news that captured your attention? Let me know in the comments

If you have found this interesting:

You can look for my other articles, and you can also connect or reach me on LinkedIn. Check this repository containing weekly updated ML & AI news. I am open to collaborations and projects and you can reach me on LinkedIn. You can also subscribe for free to get notified when I publish a new story.

Get an email whenever Salvatore Raieli publishes.

Get an email whenever Salvatore Raieli publishes. By signing up, you will create a Medium account if you don’t already…

salvatore-raieli.medium.com

Here is the link to my GitHub repository, where I am collecting code and many resources related to machine learning, artificial intelligence, and more.

GitHub — SalvatoreRa/tutorial: Tutorials on machine learning, artificial intelligence, data science…

Tutorials on machine learning, artificial intelligence, data science with math explanation and reusable code (in python…

github.com

or you may be interested in one of my recent articles:

Kolmogorov-Arnold Transformer (KAT): Is the MLP Headed for Retirement?

Exploring how the Kolmogorov-Arnold Transformer (KAT) challenges the MLP dominance in modern deep-learning

levelup.gitconnected.com

Through the Uncanny Mirror: Do LLMs Remember Like the Human Mind?

Exploring the Eerie Parallels and Profound Differences Between AI and Human Memory

towardsdatascience.com

Lie to Me: Why Large Language Models Are Structural Liars

Unveiling the Inherent Hallucinations and Limitations of AI-Language Models

levelup.gitconnected.com

Graph ML: How Do you Visualize Large network?

Seeing is understanding: How to visualize large networks

ai.gopubby.com

WEEKLY AI NEWS: RESEARCH, NEWS, RESOURCES, AND PERSPECTIVES

AI & ML news: Week 14–20 October

Microsoft Patents Audio-to-Image Generator, Google’s World-First Nuclear Power Deal for AI Data centers, AMD Launches AI Chip to Compete with Nvidia, and much more

GitHub — SalvatoreRa/ML-news-of-the-week: A collection of the the best ML news every week…

A collection of the the best ML news every week (research, news, resources) — GitHub — SalvatoreRa/ML-news-of-the-week…

Weekly AI and ML news - each week the best of the field

Research

Less Distraction, More Precision: The Diff Transformer’s Secret to Better Language Models

Unlocking Efficiency in AI: How the Diff Transformer Filters Noise to Enhance Accuracy and Performance

News

Power Corrupts: Hierarchies, Persuasion, and Anti-Social Behavior in LLMs

Unraveling Power Dynamics and Ethical Implications in LLM Agents

Resources

Neighbors Count: Boosting Document Embeddings with Contextual Encoding

Harnessing Neighboring Documents to Elevate Retrieval Accuracy through Context-Aware Embeddings

Perspectives

AI Search Engine: Finding Ariadne’s Thread or Losing the Way

Exploring the Pathways and Pitfalls of Multimodal AI Search with Large Language Models

Meme of the week

OpenAI’s New ‘Reasoning’ AI Models Arrived: Will They Survive the Hype?

Will the Captain Catch the Whale of Reasoning or Sink in the Pursuit

What do you think about it? Some news that captured your attention? Let me know in the comments

If you have found this interesting:

Get an email whenever Salvatore Raieli publishes.

Get an email whenever Salvatore Raieli publishes. By signing up, you will create a Medium account if you don’t already…

GitHub — SalvatoreRa/tutorial: Tutorials on machine learning, artificial intelligence, data science…

Tutorials on machine learning, artificial intelligence, data science with math explanation and reusable code (in python…

Kolmogorov-Arnold Transformer (KAT): Is the MLP Headed for Retirement?

Exploring how the Kolmogorov-Arnold Transformer (KAT) challenges the MLP dominance in modern deep-learning

Through the Uncanny Mirror: Do LLMs Remember Like the Human Mind?

Exploring the Eerie Parallels and Profound Differences Between AI and Human Memory

Lie to Me: Why Large Language Models Are Structural Liars

Unveiling the Inherent Hallucinations and Limitations of AI-Language Models

Graph ML: How Do you Visualize Large network?

Seeing is understanding: How to visualize large networks

Written by Salvatore Raieli