WEEKLY AI NEWS: RESEARCH, NEWS, RESOURCES, AND PERSPECTIVES
AI & ML news: Week 26 August — 1 September
Microsoft release Phi, Scientist use AI to predict dementia and much more
The most interesting news, repository, articles, and resources of the week
Check and star this repository where the news will be collected and indexed:
You will find the news first in GitHub. Single posts are also collected here:
Research
- Automated Design of Agentic Systems. declares that it is possible to learn any possible agentic system, including prompts, tool use, control flows, and more, using their approach. They accomplish this by concentrating on three main components, known as search space (define agents), search algorithm (explore search space), and the evaluation function (evaluate candidate agents). presents Meta Agent Search, a meta agent that iteratively programs and tests new agents based on a growing archive of previous discoveries.
- LLM Pruning and Distillation in Practice: The Minitron Approach. presents pruning and distillation techniques applied to the original models to produce 4B and 8B parameter models, respectively. Before pruning, they also fine-tune the teacher model on their datasets leading to better distillation; their compression strategy yields a state-of-the-art 8B model (MN-Minitron-8B) which outperforms all similarly-sized models on common language modeling benchmarks. offers a thorough report on effective methods for compressing Llama 3.1 and Mistral NeMo models.
- The Vizier Gaussian Process Bandit Algorithm. introduces Vizier, an open-source Python implementation of the Gaussian process bandit optimization technique, which is utilized by Google for millions of optimizations and research. It includes benchmarking data that show the algorithm’s wider applicability.
- Enhancing Robustness in Large Language Models: Prompting for Mitigating the Impact of Irrelevant Information. proposes a two-stage prompting technique to remove irrelevant information from context; it serves as a self-mitigation process that first identifies the irrelevant information and then filters it out; this leads to enhancement in robustness of the model and overall better performance on reasoning tasks.
- MagicDec: Breaking the Latency-Throughput Tradeoff for Long Context Generation with Speculative Decoding. demonstrates how speculative decoding can improve throughput, lower latency, and preserve accuracy in long context generation scenarios; it discovers that bottlenecks change from compute-bound to memory-bound as sequence length and batch size increase; with these realizations, they demonstrate that speculative decoding can be used more successfully for longer sequences, even when using large batch sizes.
- PEDAL: Enhancing Greedy Decoding with Large Language Models using Diverse Exemplars. employs a hybrid self-ensembling approach (based on diverse exemplars) to enhance LLM performance overall. Specifically, it generates multiple candidate responses using diverse exemplars and aggregates them using an LLM to produce a final response; this approach achieves lower cost compared to self-consistency approaches and better accuracy compared to greedy decoding.
- Autonomous Driving with Spiking Neural Networks. The first unified Spiking Neural Network (SNN) designed to tackle the energy issues associated with autonomous driving is called Spiking Autonomous Driving (SAD).
- Pre-training Small Base LMs with Fewer Tokens. By inheriting a few transformer blocks and training on a very small percentage (0.1%) of the initial data, Inheritune is a simplified technique for creating smaller base language models from larger ones. With just one A6000 GPU and this method, a 1.5B parameter model could be created in less than 30 minutes, with performance comparable to larger models trained on much greater amounts of data.
- Teaching chat models to solve chess puzzles. At 1800 elo on average, traditional base language models are rather competent chess players. Nevertheless, chat models frequently see a sharp decline in performance. This article explains how to use prompting and fine-tuning to teach conversation models, such as GPT-4o, to play chess.
- xGen-VideoSyn-1: High-fidelity Text-to-Video Synthesis with Compressed Representations. The text-to-video (T2V) model xGen-VideoSyn-1 from Salesforce creates lifelike scenes based on written descriptions. The model makes use of a diffusion transformer (DiT) for enhanced temporal consistency and generalization and a video variational autoencoder (VidVAE) for video data compression, which lowers processing requirements.
- Memory-Efficient LLM Training with Online Subspace Descent. Online Subspace Descent is a novel optimizer that increases memory efficiency to improve LLM training.
- Generative Verifiers: Reward Modeling as Next-Token Prediction. Typically, reward models are taught to be discriminative classifiers. The reward signal in this DeepMind experiment is the yes/no logits of a language model. It was discovered that enabling a model to incorporate ensembling and CoT increased performance by sixteen percent.
- Multilingual Arbitrage: Optimizing Data Pools to Accelerate Multilingual Progress. By using the discrepancy between routing synthetic data creation and oracle model performance, Cohere’s Aya model was able to significantly increase its win rate in comparison to baseline models.
- Text2SQL is Not Enough: Unifying AI and Databases with TAG. A novel paradigm called Table-Augmented Generation answers complex natural language queries by fusing databases and language models.
- The Mamba in the Llama: Distilling and Accelerating Hybrid Models. Because mamma models do not include a KV cache for backtracking, they are difficult to accelerate with speculative decoding. This document presents several new distillation techniques and acceleration algorithms from some of the original authors.
- Efficient LLM Scheduling by Learning to Rank. Head of-line bottlenecks occur when delivering multiple concurrent requests to a large language model since we don’t know how long output generation will take. The shortest requests can be served first if you can learn to rank the relative lengths between them, which will increase throughput for multi-batch generation by 6.5 times.
- MTMamba++: Enhancing Multi-Task Dense Scene Understanding via Mamba-Based Decoders. A new model architecture called MTMamba++ aims to improve multi-task scene understanding. This method captures long-range dependencies and enhances cross-task interactions using a Mamba-based decoder with two core blocks: STM and CTM.
News
- Scientists to use AI to analyze 1.6m brain scans to develop tool predicting dementia risk. Researchers will use artificial intelligence to match image data of patients from Scotland with linked health records
- Microsoft releases powerful new Phi-3.5 models, beating Google, OpenAI, and more. Microsoft unveiled the Phi-3.5-mini-instruct, Phi-3.5-MoE-instruct, and Phi-3.5-vision-instruct, three new models in its Phi series that each achieve remarkable benchmark achievements while tackling distinct AI tasks. Developers can access these models on Hugging Face and they are offered as open source under the MIT License. The Phi models have outperformed rivals like GPT-4o and Llama in certain benchmarks, demonstrating near-state-of-the-art performance despite their smaller size than some of their contemporaries.
- Data Exfiltration from Slack AI via indirect prompt injection. It was found that there is a vulnerability in Slack AI that allows attackers to use indirect prompt injection to steal data from private channels they do not have access. Through the use of public channel messages, attackers can coerce the LLM into disclosing sensitive data, like API keys, in response to queries. This problem continues, along with a phishing attack vector, even after Slack AI’s update on August 14th, which added channel and DM files and greatly increased the surface area at risk for exploits of this kind.
- Bringing Llama 3 to life. Llama 3.1, an enhanced open-source LLM from Meta, adds new features like model distillation and the ability to generate synthetic data.
- Anthropic reveals system prompts for Claude. Anthropic has updated all models’ dates and included system prompts.
- D-ID launches an AI video translation tool that includes voice cloning and lip sync. AI video creation platform D-ID is the latest company to ship a tool for translating videos into other languages using AI technologies. However, in this case, D-ID also clones the speaker’s voice and changes their lip movements to match the translated words as part of the AI editing process.
- Vyond Pushes AI Video’s Enterprise Era. Vyond is an AI platform for creating videos with an emphasis on enterprise use cases.
- Mark Zuckerberg says White House ‘pressured’ Facebook to censor Covid-19 content. Meta boss regrets bowing to government power and says he would not make the same choices today
- What the Telegram founder’s arrest means for the regulation of social media firms. Pavel Durov’s detention by French authorities is a major break from the norm — but his low-moderation, non-encrypted app is an anomaly
- Tesla Is Erasing Its Own History. CEO Elon Musk’s original Tesla Motors Master Plan no longer exists on Tesla’s website.
- After a decade of free Alexa, Amazon now wants you to pay. AI is a chance for companies to charge for products we’re in the habit of using for free.
- AI for creating comics? Europe’s industry completely rejects it, Tintin executive says. Tools such as Midjourney and Dall-E have triggered a fightback in comic land as publishers gear up for litigation ahead of new EU rules
- Police officers are starting to use AI chatbots to write crime reports. Will they hold up in court? AI technology is being integrated into police work to automate the writing of reports from body camera footage.
- Questions about the safety of Tesla’s ‘Full Self-Driving’ system are growing. Tesla has been accused of deceptive marketing over its self-driving technology, as a prominent analyst questions the safety and readiness of the system, potentially leading to increased scrutiny of automated driving claims.
- Japan: AI-powered drones to monitor disaster zones and identify criminals. Drones move faster than police cars or guards, reaching incident site quickly and allowing for prompt action and response.
- Artifacts are now generally available.Artifacts are now widely accessible, including on mobile devices, thanks to Anthropic.
- Introducing Cerebras Inference. Large unified memory is present in the chipset of Cerebras. It can therefore avoid problems with bandwidth and serve models at thousands of tokens per second.
- OpenAI Aims to Release New AI Model, ‘Strawberry,’ in Fall. ”Strawberry” is a new AI product that OpenAI intends to launch in the fall. It will be able to carry out complex jobs like creating marketing plans and will have advanced thinking abilities, such as the capacity to answer math problems that have never been seen before.
- This 1mm ‘fan on a chip’ could put active cooling inside ultra-thin gadgets. The XMC-2400 µCooling chip, a 1mm-tall solid-state fan intended to cool down thin electronics such as smartphones, has been introduced by xMEMS.
- Nvidia rides big tech’s AI investment to beat Wall Street’s sky-high expectations. Chipmaker, the third most valuable company in world, records $30.04bn in revenue, showing AI demand continues to rise
- AI makes racist decisions based on dialect. Large language models strongly associated negative stereotypes with African American English
- Lawmakers call for crackdown on AI deepfakes after Grok backlash. A group of Democratic lawmakers are pushing the Federal Election Commission (FEC) to increase regulation on artificial intelligence (AI) deepfakes following the release of the social platform X’s chatbot Grok.
- Midjourney says it’s ‘getting into hardware’. Midjourney, the AI image-generating platform that’s reportedly raking in more than $200 million in revenue without any VC investment, is getting into hardware.
- Google rolling out Gems and Imagen 3, with people generation, to Gemini Advanced. Gems are “custom versions of Gemini” that you can create to “act as an expert on topics or refine them toward your specific goals.” They can “remember a detailed set of instructions to help you save time on tedious, repetitive or difficult tasks.”
- OpenAI in Talks for Funding Round Valuing It Above $100 Billion. With Microsoft anticipated to take part, OpenAI is in talks to raise several billion dollars in a fresh investment round headed by Thrive Capital, which would value the business over $100 billion.
- How to harness AI’s potential in research — responsibly and ethically. Artificial intelligence is propelling advances in all areas of science. But vigilance is needed, warn four researchers at the leading edge.
- The On‑Device Intelligence Update. Cartesian has released several updates to its models and systems. Additionally, an open hybrid State space model has been released.
- Stephen Wolfram thinks we need philosophers working on big questions around AI. Stephen Wolfram, a renowned mathematician and computer scientist, has grown to appreciate the importance of philosophy in understanding and guiding the development of AI. He argues that as AI raises profound existential and moral questions, integrating philosophical thinking into AI research is crucial for addressing these complex issues, signaling a potential “golden age” of philosophy in the context of technology.
- The top AI deals in Europe this year. Despite general headwinds for startups, AI ventures continue to secure substantial funding. U.S. AI startups have achieved nearly 30 deals over $100M in 2024, with Europe not far behind. Major investments include WAYVE ($1B), Mistral AI (~$1B), Helsing ($484M), Poolside ($400M), DeepL ($320M), H ($220M), and Flo Health ($200M).
- California advances landmark legislation to regulate large AI models. The groundbreaking bill aims to reduce potential AI risks — requiring model testing and disclosure of safety protocol
- Nvidia shares fall on slowing growth and production concerns. Doubling quarterly revenues to £23bn fails to allay worry about delays to the next generation of AI chips
- X’s AI tool Grok lacks effective guardrails preventing election disinformation, a new study finds. The Center for Countering Digital Hate (CCDH) found that Grok was able to churn out ‘convincing’ AI fake images including one of Vice President Kamala Harris doing drugs and another of former President Donald Trump looking sick in bed
- 100M Token Context Windows. It isn’t a typo, yes. 100 million tokens for agent programming and reasoning in context. Additionally, Magic Dev disclosed a collaboration to construct two new supercomputers on Google Cloud. This is a result of a recent $320 million fundraising effort to quicken the company’s product development.
- OpenAI and Anthropic will share their models with the US government. The companies will grant the AI Safety Institute access to major new models for safety testing.
- California legislature passes controversial “kill switch” AI safety bill. After passing the State Assembly, California’s contentious AI safety bill, SB-1047, is now one step closer to being signed into law by Governor Gavin Newsom. By September 30, Newsom must determine whether or not to sign it into law.
- OpenAI says ChatGPT usage has doubled since last year. OpenAI reported that 92% of Fortune 500 firms utilize ChatGPT, and that the platform has over 200 million weekly active users — a tripling of its user base from a year ago.
- TikTok owner ByteDance launches new video search tool, eyeing Baidu’s dominance. In a direct challenge to Baidu’s search dominance, ByteDance has released Douyin Search, an app for searching short video content on TikTok’s Chinese counterpart.
Resources
- Language Modeling on Tabular Data: A Survey of Foundations, Techniques, and Evolution. includes topics like classification of tabular data structures and data types, datasets used for model training and evaluation, modeling techniques and training objectives, data processing methods, popular architectures, challenges, and future research directions. It also provides a thorough survey of language modeling techniques for tabular data.
- Graph Retrieval-Augmented Generation: A Survey. focuses on methods used in the GraphRAG workflow (graph-guided retrieval, graph-based indexing, and graph-enhanced creation); and explores GraphRAG’s tasks, applications, assessment, and industrial use cases.
- Controllable Text Generation for Large Language Models: A Survey. gives a thorough overview of controllable text generating techniques in LLMs; covers topics like helpfulness, safety, consistency, and style.
- Challenges and Responses in the Practice of Large Language Models. selects several significant questions and provides thoughtful answers; the questions are divided into groups according to themes including data, applications, infrastructure, software architecture, and brain science.
- Self-Supervised Learning of Time Series Representation via Diffusion Process and Imputation-Interpolation-Forecasting Mask. The first diffusion-based method for learning time series representations is called Time Series Diffusion Embedding, or TSDE. Time series data is divided into segments by TSDE, which then creates informative embeddings by using dual-orthogonal Transformer encoders with a crossover mechanism.
- Liger Kernel: Efficient Triton Kernels for LLM Training. Surprisingly, LinkedIn released the Liger Kernel, a productive set of kernels for training language models. For the widely used Llama models, it reduces memory utilization by about 60% and boosts throughput by 20%. It interacts with several common modeling frameworks and just takes three lines of code change, which is important for practitioners.
- pgvectorscale. With better performance for embedding search and more affordable storage for AI applications, pgvectorscale expands upon pgvector. Compared to other popular and competitive vector retailers, it is about 28 times faster.
- GenderCARE. A thorough framework called GenderCARE is designed to identify and lessen gender prejudices. It presents novel standards for assessing gender prejudice, with a focus on diversity, inclusivity, and impartiality.
- Generalized SAM: Efficient Fine-Tuning of SAM for Variable Input Image Sizes. A novel technique for more effectively fine-tuning the Segment Anything Model (SAM) with variable-size images is called Generalized SAM (GSAM).
- google/siglip-so400m-patch14–224. A new SigLIP model from Google leverages a vision transformer model architecture that is tuned for shape.
- GaussianOcc: Fully Self-supervised and Efficient 3D Occupancy Estimation with Gaussian Splatting. Using surround views, GaussianOcc is an effective and entirely self-supervised approach for 3D occupancy estimate.
- Infinite Dataset Hub. This space, which is powered by phi-3-mini, generates data on any topic using a rarity prompt. It is intriguing and potent even though it isn’t the most accurate.
- Neural Assets: 3D-Aware Multi-Object Scene Synthesis with Image Diffusion Models. By conditioning on individual object representations, neural networks are able to represent and manage 3D objects in 2D contexts. This work could be the key to untangling 3D objects.
- T3M: Text Guided 3D Human Motion Synthesis from Speech. T3M is a brand-new technique that researchers have developed for producing 3D animations that are controlled by text inputs. T3M is a useful technology for virtual reality, gaming, and film creation because it enables more precise and customized animations than earlier methods that solely used voice.
- BiRefNet. Bireference segmentation with background removal at the cutting edge of technology.
- RB-Modulation. Google has developed a really innovative method for customizing diffusion models that works better than several widely used techniques. It may be used with PyTorch and, with some adjustments, Flux as well.
- FlexEdit: Marrying Free-Shape Masks to VLLM for Flexible Image Editing. With FlexEdit, you may precisely modify images based on language commands by combining free-shape masks with Vision Large Language Models (VLLMs).
- Quick Fine-tuning of Phi 3.5. Quick fine-tuning script with Unsloth of the new Microsoft models.
- Fire-Flyer AI-HPC: A Cost-Effective Software-Hardware Co-Design for Deep Learning. A paper detailing DeepSeek’s hardware-software co-design approach for deep learning has been published.
- Announcing Higgs Llama V2. Higgs-Llama-3–70B-v2, a new model from Boson AI, performs exceptionally well on conversation and comprehension benchmarks such as Arena-Hard and AlpacaEval 2.0. Compared to Claude 3.5 Sonnet, the model increases day 1 retention by 5.3% and decreases response regeneration rates by 21.6%. Improved using an internal reward model called Higgs Judger, its performance is tied to that of Google’s Gemini 1.5 Pro.
- The Zyphra Training Cookbook. Pre-training normal Transformers is not the same as pre-training hybrid (Mamba type) models. To get the desired performance, this post examines scaling various hyperparameters, data gathering, and other factors.
- LlamaDuo. This is a system that optimizes small models to act as a backup if closed API models become unavailable. It demonstrates a smooth transition from a large to a small model.
- LitServe. A flexible and user-friendly serving engine for AI models based on FastAPI is called LitServe. The need to rebuild a FastAPI server for each model is eliminated by features like batching, streaming, and GPU autoscaling.
- IntelLabs/LlavaOLMoBitnet1B. Llava BitNet is the first ternary (-1, 0, 1) weight model trained on VLM tasks. The model, weights, and scripts are in the process of being fully open-sourced. The technical report will be released soon and suggests the model has promising performance.
- Qwen2-Audio .Qwen has released audio input style models that can reason about music, audio, and sound.
- Sketch2Scene: Automatic Generation of Interactive 3D Game Scenes from User’s Casual Sketches. This team developed an incredible model that generates fully playable 3D game scenarios from a single input sketch by sequentially using many models.
- OctFusion: Octree-based Diffusion Models for 3D Shape Generation. OctFusion is an efficient and high-quality method for using diffusion models to generate 3D objects. In about 2.5 seconds, it can generate 3D shapes at any resolution using a single Nvidia 4090 GPU.
- MambaInLlama. By reusing weights from attention layers, researchers have shown that massive Transformer models can be reduced to more deployable linear RNNs.
- Cross-Modal Temporal Alignment for Event-guided Video Deblurring. By incorporating an event camera — which records motion with microsecond temporal resolution — researchers have created a novel method for video deblurring that improves the quality of motion-blurred footage.
- JoyCaption Pre-Alpha. An open-source VLM created especially for upcaptioning images.
- Introducing RPBench-Auto. An automated evaluation pipeline called RPBench-Auto, which draws inspiration from ArenaHard and Alpaca Eval, has been introduced by Boson AI to measure the role-playing talents of LLMs.
- Lightweight Champ: NVIDIA Releases Small Language Model With State-of-the-Art Accuracy. Mistral-NeMo-Minitron 8B is a miniaturized version of the recently released Mistral NeMo 12B model, delivering high accuracy combined with the compute efficiency to run the model across GPU-accelerated data centers, clouds, and workstations.
- NousResearch/hermes-function-calling-v1. Excellent publicly available dataset from Nous Research to train call function models.
- Qwen2-VL: To See the World More Clearly .Qwen2-VL is the latest version of the vision language models based on Qwen2 in the Qwen model families
- RAW-Adapter: Adapting Pre-trained Visual Model to Camera RAW Images .A novel method called RAW-Adapter modifies pre-trained sRGB models so they can efficiently handle RAW data from cameras.
- Llama usage double May through July. Meta has published some usage statistics for the Llama model. It discovered that there was a high demand for its models being used in business environments.
- SAM & SAM 2 in 3D Slicer: SegmentWithSAM Extension for Annotating Medical Images. In order to expedite the annotation of 3D medical pictures, this study modified the Segment Anything Model 2 (SAM 2), which was initially created for video annotation.
Perspectives
- AI analysed 1,500 policies to cut emissions. These ones worked. Only 63 climate change interventions led to significant reductions in carbon emissions.
- AI cheating is overwhelming the education system — but teachers shouldn’t despair. With adjustments to the way we teach students to think about writing, we can shift the emphasis from product to process
- What’s Really Going On in Machine Learning? Some Minimal Models. The inventor of Wolfram
- AI companies are pivoting from creating gods to building products. Good. AI firms are finding it difficult to match their products to the markets for LLMs, which has resulted in large investments but little profit. The five primary obstacles impeding the commercialization of AI products are price, dependability, security and safety concerns, privacy, and user interface constraints. It is imperative that these sociotechnical obstacles are resolved in order for AI to be widely integrated and used in consumer goods.
- My friend, Claude. Due to increased job obligations, this author relies on Anthropic’s LLM Claude for technical writing, highlighting the expanding value of LLMs in professional settings. Claude’s help has been cost-effective even though it required expert verification, and it highlights how quickly the landscape for specialty experts confronting AI-driven automation is changing. The author considers how knowledge work may change when AI technologies like Claude are more frequently used for everyday tasks.
- AI firms must play fair when they use academic data in training. Researchers are among those who feel uneasy about the unrestrained use of their intellectual property in training commercial large language models. Firms and regulators need to agree on the rules of engagement.
- Stakes high for European Union after arrest of Telegram co-founder. The charges against Pavel Durov increases pressure on Brussels to enforce new European law on the platform
- MIT neuroscientists discover neurons with distinct language processing timescales. In language-processing areas of the brain, some cell populations respond to one word, while others respond to strings of words.
- How to Tell If What You’re Reading Was Written By AI. From the moment ChatGPT introduced the world to generative AI in late 2022, it was apparent that, going forward, you can no longer trust that something you’re reading was written by a human.
- California AI bill sparks debate in Silicon Valley as some tech giants call it a threat to innovation. A first-of-its-kind AI bill is winding its way through California, causing infighting between groups of AI pioneers.
- Exodus at OpenAI: Nearly half of AGI safety staffers have left, says former researcher. Nearly half the OpenAI staff that once focused on the long-term risks of superpowerful AI have left the company in the past several months, according to Daniel Kokotajlo, a former OpenAI governance researcher.
- Technology may be advancing — but it’s making us more stupid. ‘Deskilling’ in the face of cognitive automation is a problem that is too easily ignored
- Inference is FREE and INSTANT. Large language models (LLMs) may not be much better at reasoning, but they will be more helpful for repeated jobs due to their rising speeds and falling prices. These models may not have genuine understanding, yet they are nonetheless capable of handling simple tasks effectively.
- UK’s new science minister on budget battles, Brexit and AI leadership. Former clinical scientist Patrick Vallance speaks to Nature about his priorities as the minister overseeing the nation’s research.
- Urgently clarify how AI can be used in medicine under new EU law. The European Union’s Artificial Intelligence Act entered into force on 1 August. Phased implementation begins in February 2025, banning artificial intelligence (AI) systems deemed to pose unacceptable risks. Before that happens, policymakers must do more to ensure that patients’ safety and interests are protected.
Meme of the week
What do you think about it? Some news that captured your attention? Let me know in the comments
If you have found this interesting:
You can look for my other articles, and you can also connect or reach me on LinkedIn. Check this repository containing weekly updated ML & AI news. I am open to collaborations and projects and you can reach me on LinkedIn. You can also subscribe for free to get notified when I publish a new story.
Here is the link to my GitHub repository, where I am collecting code and many resources related to machine learning, artificial intelligence, and more.
or you may be interested in one of my recent articles: