WEEKLY AI NEWS: RESEARCH, NEWS, RESOURCES, AND PERSPECTIVES
AI & ML news: Week 8–14 July
Apple M5 chip, Google and META new models, xAI ends the deal with Oracle, and much more
The most interesting news, repository, articles, and resources of the week
Check and star this repository where the news will be collected and indexed:
You will find the news first in GitHub. Single posts are also collected here:
Research
- MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases. Comprehensive and fascinating work by Meta that demonstrates how to train tiny models to maximize performance.
- Non-Adversarial Learning: Vector-Quantized Common Latent Space for Multi-Sequence MRI. Without the need for paired samples, researchers have created a new generative model to enhance MRI image translation between various sequences.
- Free-SurGS: SfM-Free 3D Gaussian Splatting for Surgical Scene Reconstruction. A new approach to 3D reconstruction of surgical scenes that do not require SfM has been presented. It overcomes the drawbacks of earlier methods that had trouble with inconsistent photometry and sparse textures.
- FunAudioLLM: Voice Understanding and Generation Foundation Models for Natural Interaction Between Humans and LLMs. Extremely powerful models for audio understanding and generation were provided by the Tongyi speech team.
- APIGen: Automated Pipeline for Generating Verifiable and Diverse Function-Calling Datasets. A dataset with 60K entries is also released to aid in research on function-calling-enabled agents. APIGen — presents an automated data generation pipeline to synthesize high-quality datasets for function-calling applications; demonstrates that 7B models trained on curated datasets outperform GPT-4 models and other state-of-the-art models on the Berkeley Function-Calling Benchmark.
- Searching for Best Practices in Retrieval-Augmented Generation. Looking for Best Practices in RAG outlines best practices for creating efficient RAG workflows and suggests performance- and efficiency-focused tactics, such as newly developed multimodal retrieval tools.
- Self-Evaluation as a Defense Against Adversarial Attacks on LLMs. The article “Self-Evaluation as a Defense Against Adversarial Attacks on LLMs” suggests using self-evaluation as a defense against adversarial attacks. It demonstrates that developing a dedicated evaluator can significantly lower the success rate of attacks and uses a pre-trained LLM to build a defense that is more effective than fine-tuned models, dedicated safety LLMs, and enterprise moderation APIs. The article evaluates various settings, such as attacks on the generator alone and the generator + evaluator combined.
- Adaptable Logical Control for Large Language Models. The Ctrl-G framework, which combines LLMs and Hidden Markow Models to enable the following logical constraints (represented as deterministic finite automata), is presented in Adaptable Logical Control for LLMs. Ctrl-G achieves over 30% higher satisfaction rate in human evaluation compared to GPT4.
- LLM See, LLM Do: Guiding Data Generation to Target Non-Differentiable Objectives. In LLM See, LLM Do, the effectiveness and effects of synthetic data are examined in detail, along with how they affect a model’s internal biases, calibration, attributes, and preferences. It is discovered that LLMs are sensitive to certain attributes even when the prompts from the synthetic data seem neutral, indicating that it is possible to influence the generation profiles of models to reflect desirable attributes.
- Chinese developers scramble as OpenAI blocks access in China. US firm’s move, amid Beijing-Washington tensions, sparks rush to lure users to homegrown models
- PartCraft: Crafting Creative Objects by Parts. PartCraft is a novel approach in generative visual AI that goes beyond conventional text- or sketch-based methods by enabling users to choose visual concepts by parts.
- AriGraph: Learning Knowledge Graph World Models with Episodic Memory for LLM Agents. AriGraph is a new technique that assists AI agents in creating a memory graph that incorporates episodic and semantic memories.
- Researchers leverage shadows to model 3D scenes, including objects blocked from view. Researchers at MIT and Meta developed PlatoNeRF, an AI method that builds 3D representations of scenes, including blocked areas, using single-photon lidar and shadows. This technique could improve AR/VR experiences and increase the safety of autonomous vehicles. With lower-resolution sensors, PlatoNeRF performs better than conventional techniques and shows promise for real-world applications.
- Distilling System 2 into System 1. Models classified as System 2 employ techniques similar to Chain of Thought in order to increase test time, compute, and enhance thinking. It turns out that this behavior can be reduced to a speedier, similarly accurate System 1 model.
- Learning to (Learn at Test Time): RNNs with Expressive Hidden States. a recently developed RNN variation that beats Mamba in several tasks. Significantly, extended contexts and in-context learning are made possible by the update function, which is an ML model in and of itself.
- NuminaMath 7B TIR: Open Math Olympiad Model Released. NuminaMath is a series of language models that are trained to solve math problems using tool-integrated reasoning (TIR).
- 4D Contrastive Superflows are Dense 3D Representation Learners. SuperFlow is a novel system that uses successive LiDAR-camera pairs for spatiotemporal pretraining to improve 3D vision in autonomous driving.
- PaliGemma: A versatile 3B VLM for transfer. Based on Gemma 2B and SigLIP, PaliGemma is a powerful vision language model. Many of the choices taken in terms of architecture and data collecting are displayed in this technical paper.
- ConceptExpress: Harnessing Diffusion Models for Single-image Unsupervised Concept Extraction. A novel job called Unsupervised Concept Extraction (UCE) collects and reconstructs many concepts from a single image without the need for human annotations.
- Lookback Lens. A simple model called Lookback Lens can be used to identify contextual hallucinations in large language models.
News
- A Hacker Stole OpenAI Secrets, Raising Fears That China Could, Too. A security breach at the maker of ChatGPT last year revealed internal discussions among researchers and other employees, but not the code behind OpenAI’s systems.
- Figma pulls AI tool after criticism that it ripped off Apple’s design. Figma says it didn’t train the generative AI models it used and blames a ‘bespoke design system.’
- Hollywood stars’ estates agree to the use of their voices with AI. Earlier this week, AI company ElevenLabs said it is bringing digitally produced celebrity voice-overs of deceased actors, including Garland, James Dean, and Burt Reynolds, to its newly launched Reader app. The company said the app takes articles, PDFs, ePub, newsletters, e-books, or any other text on your phone and turns it into voice-overs.
- Smart Paste for context-aware adjustments to pasted code. We present Smart Paste, an internal tool that streamlines the code authoring workflow by automating adjustments to pasted code. We describe key insights from our UX and model preparation efforts, which have led to high performance and successful adoption among Google developers.
- Apple M5 Chip’s Dual-Use Design Will Power Future Macs and AI Servers. Apple will reportedly use a more advanced SoIC packaging technology for its M5 chips, as part of a two-pronged strategy to meet its growing need for silicon that can power consumer Macs and enhance the performance of its data centers and future AI tools that rely on the cloud.
- Apple Intelligence and a better Siri may be coming to iPhones this spring. Expect Apple’s AI system in iOS 18.4, says a new Bloomberg rumor.
- Meta claims news is not an antidote to misinformation on its platforms. Company says it has ‘never thought about news’ as a way to counter misleading content on Facebook and Instagram despite evidence to the contrary
- Meta drops AI bombshell: Multi-token prediction models now open for research. Meta has thrown down the gauntlet in the race for more efficient artificial intelligence. The tech giant released pre-trained models on Wednesday that leverage a novel multi-token prediction approach, potentially changing how large language models (LLMs) are developed and deployed
- .Google DeepMind’s AI Rat Brains Could Make Robots Scurry Like the Real Thing. In order to investigate the brain circuits underlying complicated motor skills, DeepMind and Harvard University created a virtual rat using artificial intelligence (AI) neural networks trained on real rat motions and neural patterns. With its ability to transfer acquired movement skills to other settings, this bio-inspired AI could advance robotics and provide new insights into brain function. The study shows that brain activity associated with various behaviors may be accurately mimicked and decoded by digital simulations.
- Microsoft drops observer seat on OpenAI board amid regulator scrutiny. The startup’s new approach means Apple will no longer be able to appoint an executive to a similar role
- xAI ends deal with Oracle, builds own AI data center. Oracle has terminated xAI’s agreement. After Grok 2 training is completed, it will construct its own data center. Originally, the corporation had a deal with Oracle for 24k H100s.
- a16z is trying to keep AI alive with Oxygen initiative. According to The Information, VC firm Andreessen Horowitz has secured thousands of AI chips, including Nvidia H100 GPUs, to dole out to its AI portfolio companies in exchange for equity.
- Quora’s Poe now lets users create and share web apps. Poe, Quora’s subscription-based, cross-platform aggregator for AI-powered chatbots like Anthropic’s Claude and OpenAI’s GPT-4o, has launched a feature called Previews that lets people create interactive apps directly in chats with chatbots.
- Ex-Meta scientists debut gigantic AI protein design model. EvolutionaryScale’s protein language model — among the largest AI models in biology — has created new fluorescent proteins and won big investment.
- Anthropic’s Claude adds a prompt playground to quickly improve your AI apps. Prompt engineering became a hot job last year in the AI industry, but it seems Anthropic is now developing tools to at least partially automate it.
- OpenAI and Los Alamos National Laboratory announce bioscience research partnership. OpenAI and Los Alamos National Laboratory are developing evaluations to understand how multimodal AI models can be used safely by scientists in laboratory settings.
- ‘I am happy to see how my baby is bouncing’: the AI transforming pregnancy scans in Africa. While ultrasound services are normal practice in many countries, software being tested in Uganda will allow a scan without the need for specialists, providing an incentive for pregnant women to visit health services early on
- Samsung is to launch upgraded voice assistant Bixby this year with its own AI. Samsung will launch an upgraded version of its voice assistant Bixby this year based on its own artificial intelligence models, mobile chief TM Roh told CNBC.
- Google says Gemini AI is making its robots smarter. DeepMind is using video tours and Gemini 1.5 Pro to train robots to navigate and complete tasks.
- Here’s how Qualcomm’s new laptop chips really stack up to Apple, Intel, and AMD. The Snapdragon X Elite and X Plus chips from Qualcomm are making Windows on Arm a competitive platform, roughly matching the performance and battery life of AMD Ryzen, Apple’s M3 chip, and Intel Core Ultra. The Snapdragon chips are excellent in multi-core scores and power economy, even though they don’t lead in GPU performance. The latest generation of laptops with Snapdragon processors is a more affordable option than MacBooks and conventional Intel or AMD-based devices.
- China’s Laws of Robotics: Shanghai publishes first humanoid robot guidelines. Shanghai has published China’s first governance guidelines for humanoid robots, calling for risk controls and international collaboration, as tech giants like Tesla showed off their own automatons at the country’s largest artificial intelligence (AI) conference.
- Crowdsourced Decentralized AI Market Map. Open sourcing a community-led market map of Decentralized AI
Resources
- CapPa: Training vision models as captioners. Craiyon’s trained CapPa vision model achieves state-of-the-art results on several difficult vision benchmarks.
- Kolors: Effective Training of Diffusion Model for Photorealistic Text-to-Image Synthesis. Trained on billions of text-image pairs, Kolors exhibits significant advantages over both open-source and proprietary models in visual quality, complex semantic accuracy, and text rendering for both Chinese and English characters.
- EGIInet: Explicitly Guided Information Interaction Network for Cross-modal Point Cloud Completion. By means of geometric task guiding, EGIInet successfully combines two modalities to present a novel way to point cloud completion.
- Quality Prompts. QualityPrompts implements 58 prompting techniques explained in this survey from OpenAI, Microsoft, et al.
- Summary of a Haystack: A Challenge to Long-Context LLMs and RAG Systems. Describes a new job, SummHay, to evaluate a model’s capacity to process a Haystack and produce a summary that highlights the key insights and references the original documents; finds that RAG components are found to improve performance on the benchmark, making it a feasible choice for holistic RAG evaluation. Long-context LLMs score 20% on the benchmark, which lags the human performance estimate of 56%.
- AI Agents That Matter. AI Agents That Matter examines existing agent evaluation procedures and identifies flaws that could prevent practical deployment; it also suggests a framework to prevent overfitting agents and an implementation that simultaneously maximizes accuracy and cost.
- An Extremely Opinionated Annotated List of My Favourite Mechanistic Interpretability Papers v2. A post by Neel Nanda, a Research Engineer at Google DeepMind, about his favorite papers to read in Mechanistic Interpretability.
- SAE. This library trains k-sparse autoencoders (SAEs) on the residual stream activations of HuggingFace language models, roughly following the recipe detailed in Scaling and evaluating sparse autoencoders (Gao et al. 2024)
- MInference. To speed up Long-context LLMs’ inference, approximate and dynamic sparse calculate the attention, which reduces inference latency by up to 10x for pre-filling on an A100 while maintaining accuracy.
- micro-agent. An AI agent that writes and fixes code for you.
- AnySR. A novel method for improving efficiency and scalability in single-image super-resolution (SISR) is called AnySR. The ‘Any-Scale, Any-Resource’ implementation is supported by AnySR, in contrast to previous techniques, which reduces resource requirements at smaller scales without the need for extra parameters.
- Unsupervised Learning of Category-Level 3D Pose from Object-Centric Videos. Without human supervision, researchers have created a novel method for estimating category-level 3D poses from informal, object-centric films.
- SenseVoice. a speech foundation model that possesses a variety of speech understanding functions, such as auditory event detection, spoken language identification, automatic speech recognition, and speech emotion recognition.
- Boosting Large Vision Language Models with Self-Training. A novel method called Video Self-Training with Augmented Reasoning (Video-STaR) aims to enhance Large Vision Language Models (LVLMs).
- GraphRAG. With GraphRAG, you may use language models to analyze unstructured text. The quick start is simple to spin up because it operates on Azure.
- iLLM-TSC. To enhance traffic signal control systems, researchers have created a novel framework that blends reinforcement learning with a sizable language model.
- Tutorials on Tinygrad. A set of tools called Tinygrad is used to train deep-learning models. An in-depth look at Tinygrad internals is made possible by this set of notes, which serves as an excellent introduction to AI compilers.
- OccSora: 4D Occupancy Generation Models as World Simulators for Autonomous Driving. A 4D occupancy generation model based on diffusion called OccSora is intended to enhance long-term temporal evolutions.
- Awesome AGI Survey. The goal of Artificial General Intelligence (AGI) is to execute a variety of real-world jobs with human-like efficiency. This project explores the path towards AGI.
- ANOLE: An Open, Autoregressive, Native Large Multimodal Models for Interleaved Image-Text Generation. Developed from Meta’s Chameleon model, Anole is an open autoregressive multimodal model. With focused fine-tuning, this effort restores the model’s ability to generate images.
- Powerful and Flexible: Personalized Text-to-Image Generation via Reinforcement Learning. A novel reinforcement learning framework is presented by researchers to enhance customized text-to-image generation.
- PerlDiff: Controllable Street View Synthesis Using Perspective-Layout Diffusion Models. PerlDiff is a technique that incorporates 3D geometric information to increase the accuracy of street view image production.
- Paints-Undo. Paints UNDO is a system where a model generates strokes that are used to reconstruct an image. It comes from the same creators as ControlNet, IC-Light, and many other image production systems. Remarkably, in contrast to earlier stroke systems, this model is able to cancel strokes and frequently completely reevaluates its strategy halfway through — quite like a human artist would.
- minRF. For Stable Diffusion 3, scalable rectified flow transformers are partially utilized. This repository contains sweeps of the muP hyperparameters along with a rudimentary implementation of them.
- RouteLLM. RouteLLM is a framework for serving and evaluating LLM routers
- 30x speedup in model init for HF Transformers. If you move some lazy loading to the model on the first pass, you can significantly reduce the amount of tokens lost every second during model initialization.
- FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-precision. The basis for contemporary fast language models is FlashAttention. Up from 35% previously, this new variant takes 75% of the H100 capacity. This capability gain is the result of several significant system enhancements.
- OV-DINO: Unified Open-Vocabulary Detection with Language-Aware Selective Fusion. A novel approach to open-vocabulary detection called OV-DINO addresses the difficulties of combining various data sources and making use of language-aware capabilities.
- Open-Vocabulary Video Instance Segmentation. A innovative approach to Open-Vocabulary Video Instance Segmentation (VIS), OVFormer tackles important problems in the field. It uses video-based training to increase temporal consistency and align embeddings better.
- Satellite Image Time Series Semantic Change Detection: Novel Architecture and Analysis of Domain Shift. This work integrates semantic segmentation and change detection to address semantic change detection using satellite image time series (SITS-SCD).
- PosFormer: Recognizing Complex Handwritten Mathematical Expression with Position Forest Transformer. The PosFormer model overcomes the drawbacks of sequence-based methods to greatly enhance Handwritten Mathematical Expression Recognition (HMER).
Perspectives
- Real criminals, fake victims: how chatbots are being deployed in the global fight against phone scammers. New scambaiting AI technology Apate aims to keep scammers on the line while collecting data that could help disrupt their business model
- James Muldoon, Mark Graham, and Callum Cant: ‘AI feeds off the work of human beings’. The Fairwork trio talk about their new book on the ‘extraction machine’, exposing the repetitive labor, often in terrible conditions, that big tech is using to create artificial intelligence
- Superintelligence — 10 years later. Ten years after the publication of Nick Bostrom’s seminal book “Superintelligence,” advances in AI have raised awareness of the potential for AGI and its associated concerns. With 2024 being a turning point toward guaranteeing control and alignment with human values, the AI research community is now giving AI safety serious attention. With AI technologies advancing so quickly, the sector faces concerns related to safety and ethics that were previously thought to be theoretical.
- How Good Is ChatGPT at Coding, Really? Depending on the task difficulty and programming language, OpenAI’s ChatGPT may generate code with success rates anywhere from less than 1% to 89%.
- TechScape: Can AI really help fix a healthcare system in crisis? Artificial intelligence is heralded as helping the NHS fight cancer. But some warn it’s a distraction from more urgent challenges
- Pop Culture. In a critical 31-page analysis titled “Gen AI: Too Much Spend, Too Little Benefit?”, Goldman Sachs makes the case that utility spending would rise sharply due to generative AI’s power consumption and very little productivity advantages and returns. The study raises concerns about AI’s potential to completely change industries by highlighting its high price, problems with the electrical infrastructure, and inability to produce appreciable increases in productivity or revenue. If significant advancements in technology are not made, it could portend a dismal future for the field.
- The AI summer. Compared to other tech innovations like the iPhone and e-commerce, which took years to acquire hold, ChatGPT’s quick adoption — it hit 100 million users in just two months — is noteworthy. Even with the initial excitement, not many users have found ChatGPT to be useful in the long run, and business adoption of big language models is still few. This suggests that more work is necessary to establish substantial product-market fit and long-term value.
- A Deep Dive on AI Inference Startups. The development of AI’s “picks and shovels,” such as model fine-tuning, observability, and inference, is a well-liked field for venture capital investment. VCs are placing bets that when businesses integrate AI into their products, they won’t want to develop things themselves. For AI inference, the TAM is highly limited. For VCs’ investments to be profitable, they must have faith in significant TAM expansion. Although platforms for AI inference benefit startups in the short run, over the long run, they hurt them.
- Cyclists can’t decide whether to fear or love self-driving cars. San Francisco cyclists have reported near misses and safety concerns with self-driving cars from Waymo and Cruise. Almost 200 complaints about these self-driving cars’ unpredictable behavior and near-misses have been filed with the California DMV. Despite the manufacturers’ claims that their cars had improved safety features, the events cast doubt on the vehicles’ suitability for widespread use in the face of heightened regulatory scrutiny.
- Augmenting Intelligence. This essay promotes a practical approach to employing AI as an enhancement to human intelligence and explores bridging the divide between techno-optimists and pessimists on the subject. It discusses AI’s role in education, its effects on creativity and the arts, and its ethical application. The paper highlights that artificial intelligence (AI) is a tool that augments human capabilities rather than poses a threat, suggesting that the term “augmented intelligence” is a more realistic description.
Meme of the week
What do you think about it? Some news that captured your attention? Let me know in the comments
If you have found this interesting:
You can look for my other articles, and you can also connect or reach me on LinkedIn. Check this repository containing weekly updated ML & AI news. I am open to collaborations and projects and you can reach me on LinkedIn. You can also subscribe for free to get notified when I publish a new story.
Here is the link to my GitHub repository, where I am collecting code and many resources related to machine learning, artificial intelligence, and more.
or you may be interested in one of my recent articles: