WEEKLY AI NEWS: RESEARCH, NEWS, RESOURCES, AND PERSPECTIVES
AI & ML news: Week 1–7 July
Gemini on Apple, Grok 2 announced and much more
The most interesting news, repository, articles, and resources of the week
Check and star this repository where the news will be collected and indexed:
You will find the news first in GitHub. Single posts are also collected here:
Research
- LongRAG: Enhancing Retrieval-Augmented Generation with Long-context LLMs. claims to achieve 64.3% on HotpotQA (full-wiki), which is on par with the state-of-the-art model. proposes LongRAG, which combines RAG with long-context LLMs to enhance performance; uses a long retriever to significantly reduce the number of extracted units by operating on longer retrieval units; the long reader takes in the long retrieval units and leverages the zero-shot answer extraction capability of long-context LLMs to improve performance of the overall system.
- From Artificial Needles to Real Haystacks: Improving Retrieval Capabilities in LLMs by Finetuning on Synthetic Data. suggests a fine-tuning strategy to increase the precision of information retrieval in LLMs while preserving reasoning abilities over long-context inputs; the fine-tuning dataset consists of 350 sample numerical dictionary key-value retrieval tasks; results show that this strategy reduces the “lost-in-the-middle” effect and enhances performance on both long-context reasoning and information retrieval.
- GraphReader: Building Graph-based Agent to Enhance Long-Context Abilities of Large Language Models. enhances the long-context capabilities of LLMs by proposing a graph-based agent system that organizes long text into a graph and uses an agent to explore the graph (using predefined functions guided by a step-by-step rational plan) to efficiently generate answers to questions; consistently outperforms GPT-4–128k across context lengths ranging from 16k to 256k.
- Following Length Constraints in Instructions. explains a method for addressing length bias and training language models that adhere to length constraints more closely; it refines a model using DPO using a dataset that has been augmented with length instructions and demonstrates fewer length constraint violations while maintaining a high response quality.
- Adam-mini: Use Fewer Learning Rates To Gain More. a new optimizer that carefully divides parameters into blocks and assigns a single high-quality learning that outperforms Adam; it achieves consistent results on language models sized from 125M -7B for pre-training, SFT, and RLHF. It uses fewer learning rates, which results in a 45%–50% reduction in memory footprint while still performing on par or even better than AdamW.
- MUMU: Bootstrapping Multimodal Image Generation from Text-to-Image Data. generative image model with better performance than pure text conditioned models due to its ability to interleave text and images.
- Scaling Synthetic Data Creation with 1,000,000,000 Personas. By treating web text as originating from a persona, this approach can significantly enhance job performance downstream by conditioning on that persona. The researchers find a jump of 20% points on MATH.
- Odd-One-Out: Anomaly Detection by Comparing with Neighbors. A novel anomaly detection challenge has been presented by researchers that focus on things that appear unusual in comparison to other objects in the scene. In contrast to conventional techniques, anomalies in this case are distinctive to the scene and can be determined from several angles.
- Adaptable Logical Control for Large Language Models. This approach enables the control of model generation at inference time, as well as interactive text editing. It achieves strong performance with tiny models and permits logical limitations in the generating process.
- Pairwise Difference Learning for Classification. Scholars have expanded Pairwise Difference Learning (PDL), which was first developed as a regression method, to include classification tasks. PDL makes predictions about the differences between pairs of instances rather than the outcomes themselves.
- AXIAL. This research improves the explainability of model decisions by putting forth a novel technique for identifying Alzheimer’s disease using 3D MRI scans.
- Multi-Session SLAM with Differentiable Wide-Baseline Pose Optimization. A novel technique called Multi-Session SLAM creatively records camera movements throughout multiple disconnected video sequences using a single global frame of reference.
News
- An Update to Adept. The founders of Adept are heading to Amazon to license some of their technology.
- Time strikes a deal to funnel 101 years of journalism into OpenAI’s gaping maw. Time has joined a growing number of publications to sign a licensing deal with OpenAI. The ChatGPT creator will legally be able to train its large language models on 101 years’ worth of the storied publication’s journalism, as Axios first reported.
- Amazon Investigates Perplexity AI Over Potential Data-Scraping Violations. Amazon Web Services is looking into whether Perplexity is breaking its rules after Wired said the AI startup is swiping its web archives without consent. Perplexity, however, says it’s following the rules.
- Apple could announce a Google Gemini deal this fall. If you’re disappointed that the only AI model that will integrate with Apple devices so far will be ChatGPT, it sounds like you won’t have to wait long for that to change. Apple will announce “at least” one other deal — to add Google Gemini, too — this fall.
- Meta accused of breaking EU digital law by charging for ad-free social networks. European Commission objects to ‘pay or consent’ model for users of Facebook and Instagram
- Microsoft’s Mustafa Suleyman says he loves Sam Altman, believes he’s sincere about AI safety. In an interview at the Aspen Ideas Festival on Tuesday, Mustafa Suleyman, CEO of Microsoft AI, made it very clear that he admires OpenAI CEO Sam Altman.
- When the Terms of Service Change to Make Way for A.I. Training. As they negotiate a complicated web of privacy regulations and user consent, tech giants like Google and Meta are revising their privacy rules to allow the use of public and potentially private user data to train AI systems. There has been a backlash since consumers and content creators are afraid that their work will be used to train AI that may eventually replace them. The conflicts draw attention to new issues in data privacy, AI development, and striking a balance between innovation and morality in the IT sector.
- Meet Figma AI. Designers may get assistance with tasks like visual search, asset search, text editing, image editing, prototyping, layer renaming, and design generation with Figma AI, a new suite of AI-powered capabilities for Figma. During the beta phase, these features — which are driven by AI models from third parties — are free to use.
- Google’s emissions climb nearly 50% in five years due to AI energy demand. Tech giant’s goal of reducing climate footprint at risk as it grows increasingly reliant on energy-hungry data centers
- Amazon beefs up AI development, hiring execs from startup Adept and licensing its technology. Amazon has hired top executives from AI agent startup Adept, the company confirmed. As part of the deal, Amazon will license technology from Adept, including some of its AI models and datasets. Amazon has been trying to keep pace with competitors in AI by developing services and through its investment in OpenAI competitor Anthropic.
- YouTube now lets you request removal of AI-generated content that simulates your face or voice. YouTube also quietly rolled out a policy change in June that will allow people to request the takedown of AI-generated or other synthetic content that simulates their face or voice. The change allows people to request the removal of this type of AI content under YouTube’s privacy request process.
- Phil Schiller to join OpenAI board in ‘observer’ role following Apple’s ChatGPT deal. At WWDC last month, Apple announced its partnership with OpenAI to integrate ChatGPT into iOS 18. While no money is changing hands between Apple and OpenAI, a new report today reveals that Apple will get an “observer role” on OpenAI’s board of directors as part of the arrangement.
- Japan introduces enormous humanoid robot to maintain train lines. The 12-metre high machine has coke bottle eyes and a crude Wall-E-like head, as well as large arms that can be fitted with blades or paint brushes
- Elon Musk: Grok 2 AI Arrives in August. Musk says Grok 2 ‘should exceed current AI on all metrics,’ though Grok 3 is waiting in the wings.
- Nvidia CEO Jensen Huang addresses rising competition at shareholder meeting after historic stock surge. Nvidia CEO Jensen Huang answered questions at the company’s annual shareholder meeting after a more than 200% surge in the stock over the past year. The company passed a $3 trillion valuation and was briefly the most valuable public company. Without naming competitors, Huang laid out the company’s overall strategy to maintain its position.
- Persona’s founders are certain the world can use another humanoid robot. MIT research scientist Jerry Pratt is back at it. In 2022, he left Boardwalk Robotics, a humanoid startup he founded and led, and joined the well-funded ranks of the Bay Area-based robotics firm Figure as its CTO months before it exited stealth. But he and Figure quietly parted ways last month.
- Kyutai unveils today the very first voice-enabled AI openly accessible to all. A pure audio LLM with low latency has been trained by Kyutai, an open research lab in France. In the upcoming months, the very amazing demo that it has managed to produce will be made available for public use.
- Face screening tool detects stroke in seconds. A new smartphone face-screening tool could help paramedics to identify stroke in seconds — much sooner and more accurately than is possible with current technologies.
- This is Big Tech’s playbook for swallowing the AI industry. With Amazon’s hiring of the team behind a buzzy AI startup, a pattern is emerging: the reverse acquihire.
- Intel shows off first fully integrated optical compute interconnect, designed to scale up AI workloads. Intel Corp. said today it has achieved another key milestone as it strives to make integrated photonics technology for high-speed data transfers a reality.
- OpenAI’s ChatGPT Mac app was storing conversations in plain text. After the security flaw was spotted, OpenAI updated its desktop ChatGPT app to encrypt the locally stored records.
- Jeff Bezos to sell $5bn of Amazon shares after stock hits record high. Proposed sale of 25m shares disclosed in a notice on Tuesday after the stock hit an all-time high of $200.43 during session
- Wimbledon employs AI to protect players from online abuse. Threat Matrix service monitors social media profiles and flags up death threats, racism and sexist comments
Resources
- EAGLE-2: Faster Inference of Language Models with Dynamic Draft Trees. improves the long-context capabilities of LLMs by putting forth a graph-based agent system that efficiently generates answers to questions by organizing long text into a graph and employing an agent to explore the graph (using predefined functions guided by a step-by-step reasonable plan); surpasses GPT-4–128k with consistency in context lengths between 16k and 256k.
- On LLMs-Driven Synthetic Data Generation, Curation, and Evaluation: A Survey. survey on LLM-based synthetic data generation, curation, and evaluation.
- Text2Bricks: Fine-tuning Open-Sora in 1,000 GPU Hours. Lambda Labs trained the Open Sora video model on its 1-click cluster to create Lego movies.
- Laplace Neural Operator. One architecture for approximating PDEs that is based on neural networks is the Laplace operator.
- llama-agents. llama-agents is an async-first framework for building, iterating, and productionizing multi-agent systems, including multi-agent communication, distributed tool execution, human-in-the-loop, and more!
- Suri: Multi-constraint Instruction Following for Long-form Text Generation. A collection of 20,000 lengthy documents and intricate instructions is called Suri. Its goal is to enhance AI’s capacity to adhere to intricate writing requirements. The Suri development team has presented Instructional ORPO (I-ORPO), an alignment technique that provides feedback through artificially damaged instructions.
- Cambrian-1.High-performing, fully open vision model from NYU with significant improvements over text encoders and data mixtures.
- DEX-TTS: Diffusion-based EXpressive Text-to-Speech with Style Modeling on Time Variability. A novel expressive text-to-speech (TTS) model called DEX-TTS makes use of reference speech to enhance style representation and model generalization.
- Debugging in PyTorch. PyTorch is an excellent modeling tool. Nonetheless, a few prevalent issues have the ability to significantly lower model performance. Examining this list will aid you when debugging your model code.
- vision-agent. Vision Agent is a library that helps you utilize agent frameworks to generate code to solve your vision task.
- What to do to scale up? An amazing and surprisingly understandable post about fine-tuning hyperparameters as model and dataset sizes increase.
- Web2Code. A novel procedure that researchers have created will enhance Web2Code instruction tweaking. It entails generating new text question-answer pairs, generating new webpage image-code pairs, improving webpage understanding data, and developing new webpage code generation pairs.
- Block Transformer: Global-to-Local Language Modeling for Fast Inference. This repository presents a brand-new Transformer type with a significantly smaller KV cache size. Although it hasn’t been tested in large quantities, it should be able to perform on par with typical Transformers.
- Composio. Equip your agent with high-quality tools & integrations without worrying about authentication, accuracy, and reliability in a single line of code!
- Segment Anything without Supervision. Unsupervised SAM (UnSAM) is a ‘segment anything’ model for promptable and automatic whole-image segmentation which does not require human annotations.
- Following Length Constraints in Instructions. Most models don’t adhere to length specifications (less than 40 words, for example). This piece demonstrates how to tune them to do that.
- AI Overviews Research: Comparing pre and post-rollout results on 100K keywords. The prevalence of Google’s AI Overviews (AIO) feature, which typically links to the top 10 organic results, has significantly decreased from 64% pre-rollout to just 8.71% of SERPs for 100K keywords. Following the implementation, both the length of AIO material and the number of links have grown, demonstrating Google’s focus on thorough responses and reliable sources. In this dynamic search environment, where user searches with longer inquiries, lower search volumes, and lower CPC are more likely to result in AI-generated results, SEO techniques must change to stay relevant.
- Meta 3D Gen. Meta has trained both a PBR texture creation system and an advanced 3D object generation model. It generates synthetic data by using the proprietary 2D picture-generating model of the company.
- Mutahunter. An open-source, LLM-based mutation testing tool for automated software testing that is independent of language.
- LLaRA: Large Language and Robotics Assistant. LLaRA is a framework that leverages conversation-style instruction-response pairings and Large Language Models (LLMs) to enhance robot action policy. These Vision Language Models (VLMs) use visual inputs to evaluate state data and produce the best possible policy choices.
- MM-Instruct. MM-Instruct: Generated Visual Instructions for Large Multimodal Model Alignment
- Parable of the Parser. Great keynote talk from CVPR.
- InstantStyle-Plus : Style Transfer with Content-Preserving in Text-to-Image Generation. Style transfer with modern diffusion models and content embedders.
- RSCaMa: Remote Sensing Image Change Captioning with State Space Model. A novel technique called RSCaMa has been presented by researchers to use natural language to describe changes in remote sensing photographs.
- Simple Diffusion Language Models. Excellent talk about utilizing diffusion as a target for language modeling by Hugging Face researcher and Cornell Tech professor Sasha Rush.
- 3D Reconstruction from Blurry Images. Researchers have created a technique that uses neural radiance fields (NeRF) and event streams to recreate three-dimensional sceneries from a single fuzzy image. This novel method eliminates the requirement for pre-computed camera poses by modeling camera motion and synthesizing brightness changes to produce high-quality, view-consistent images from hazy inputs.
- Agentless. Agentless is an agentless approach to automatically solve software development problems. To solve each issue, Agentless follows a simple two-phase process: localization and repair.
- MInference 1.0: Accelerating Pre-filling for Long-Context LLMs via Dynamic Sparse Attention. A novel technique called inference speeds up the processing of lengthy cues in big language models. To get around the considerable delays brought on by conventional approaches, it makes use of sparse computation techniques
- .torch.compile, the missing manual. Manual for resolving torch.compile errors to make your code run faster.
- facebook/multi-token-prediction.Models for Meta’s multi-token prediction model were provided, and they performed incredibly well.
- Maestro — A Framework for Claude Opus, GPT and local LLMs to Orchestrate Subagents. This Python script demonstrates an AI-assisted task breakdown and execution workflow using the Anthropic API. It utilizes two AI models, Opus and Haiku, to break down an objective into sub-tasks, execute each sub-task, and refine the results into a cohesive final output.
- Magic Insert: Style-Aware Drag-and-Drop. Method from Google to introduce meaningful items into photos with diffusion. The demo and dataset are accessible.
- Discrete Semantic Tokenization for Deep CTR Prediction. UIST is a unique method that transforms dense embeddings into discrete, compact tokens for user and item representations, therefore significantly improving click-through rate estimates.
- CELLO: Causal Evaluation of Large Vision-Language Models. With 14,094 causal questions, CELLO is a new dataset designed to help AI understand causality beyond common sense thinking.
- OpenStreetView-5M. With more than 5 million geotagged street photos from 225 countries, OpenStreetView-5M is a sizable open-access dataset aimed at evaluating computer vision techniques for picture localization.
- PTQ4SAM: Post-Training Quantization for Segment Anything. A new framework called PTQ4SAM was created to lessen the memory and processing requirements of the large-scale Segment Anything Model (SAM).
- Boosting Smartphone Camera Clarity. In this study, a self-supervised learning model that enhances reference-based super-resolution (RefSR) is used to present a technique for improving smartphone image resolution.
- An Investigation of Incorporating Mamba for Speech Enhancement. SEMamba is a novel speech enhancement system that enhances voice signal clarity by utilizing the Mamba state-space model.
- Florence 2 on WebGPU. The tiny vision model is fully functional within the onnx and WebGPU-based browser.
- FlexiFilm: Long Video Generation with Flexible Conditions. A diffusion model called FlexiFilm was created expressly to produce long videos — more than 30 seconds — with excellent quality and consistency.
Perspectives
- Smudgy chins, weird hands, dodgy numbers: seven signs you’re watching a deep fake. Look out for surplus fingers, compare mannerisms with real recordings and apply good old-fashioned common sense and skepticism, experts advise
- Training MoEs at Scale with PyTorch.To write about scaling their MoE models to thousands of GPUs, the Mosaic team has teamed up with PyTorch.
- Investing in the Age of Generative AI. Though there is currently a “euphoria” surrounding investment, the generative AI business is already showing signs of fragility.
- Can AI boom drive Nvidia to a $4tn valuation despite investor doubt? Powerful new chips are on the way but there are questions over whether tech firm’s growth can be sustained
- AI scaling myths. It is improbable that LLMs will ever be able to achieve AGI through scaling on its own. Although scaling has been found to improve model capabilities, it largely improves confusion instead of emergent skills. Getting hold of high-quality training data is getting harder and harder.
- A discussion of discussions on AI bias. The nature of AI bias has come under more scrutiny, with detractors claiming that biases in machine learning are demonstrated by the way models like Playground AI occasionally change a user’s ethnicity in photos. Some users refute this as a flaw or pertinent prejudice, pointing to instances in which Asian traits are overrepresented. The discussion touches on the wider ramifications of AI bias in many businesses. There is no easy answer to this complicated problem.
- The shape of information. This article describes how to use binary logic to maximize scarce resources.
- why we no longer use LangChain for building our AI agents. Octomind’s codebase and team productivity increased after it eschewed the LangChain framework for AI test automation in favor of more straightforward, modular building parts. It found that the high-level abstractions of LangChain were rigid, making development and maintenance more difficult. Octomind now benefits from a leaner architecture and faster iteration for its AI agent duties as a result of changing strategy.
- The Five Stages Of AI Grief. Benjamin Bratton, a professor at the University of California, San Diego and director of the Antikythera program at the Berggruen Institute, refers to the global response to artificial intelligence as a “Copernican Trauma,” comparing it to historical changes that have reshaped humanity’s understanding of itself. Bratton offers the following five stages of “AI grief” to describe how society would react to AI’s evolution: from skepticism to integration into our conception of intelligence: denial, rage, bargaining, depression, and acceptance. He contends that rather than being a uniquely human story, the integration of AI represents a larger biological and technological evolutionary process.
- How to win at Enterprise AI — A playbook. This AI-focused playbook describes AI adoption methods for enterprises, emphasizing the move from human-performed services to software-driven workflows known as “Service-as-a-software.” It explores how these changes may affect business models, including performance-based pricing, and stresses how crucial workflow capture and AI accuracy are to the implementation process’s success. The handbook also covers threats such as lateral attacks and emphasizes that in enterprise contexts, AI must show real performance, not simply potential.
- AI is disrupting Customer Support. Salesforce is feeling the pinch. Customer support software providers like Salesforce and Zendesk are facing challenges as enterprises redirect their IT spending toward AI proof-of-concept projects. For traditional software suppliers, the increasing integration of solutions such as ChatGPT in customer assistance has resulted in longer payback periods due to higher customer acquisition expenses. The creativity of these businesses and the overall macroeconomic climate will determine how much money is invested in customer support software in the future.
- Contra Acemoglu on AI. In contrast to more positive projections, economist Daron Acemoglu’s working paper on AI proposes a modest 0.06% annual rise in TFP growth. He identifies four distinct ways that AI affects productivity, but he ignores the development of new labor-intensive goods and the further automation of existing processes, perhaps underestimating the economic potential of AI. His method is criticized for being unduly restrictive and for perhaps distorting the wider socioeconomic effects of AI developments.
- Inside the maths that drives AI. Loss functions measure algorithmic errors in artificial intelligence models, but there’s more than one way to do that. Here’s why the right function is so important.
- ‘The disruption is already happening!’ Is AI about to ruin your favorite TV show?I t won’t be long till everything from Drag Race to Keeping Up With the Kardashians could be written without humans — and you might be able to write yourself as the hero of a new show. But will robot TV ever be up to snuff?
- Can the climate survive the insatiable energy demands of the AI arms race? New computing infrastructure means big tech is likely to miss emissions targets but they can’t afford to get left behind in a winner takes all market
- Our attitudes towards AI reveal how we feel about human intelligence. We’re in the untenable position of regarding the AI as alien because we’re already in the position of alienating each other
Meme of the week
What do you think about it? Some news that captured your attention? Let me know in the comments
If you have found this interesting:
You can look for my other articles, and you can also connect or reach me on LinkedIn. Check this repository containing weekly updated ML & AI news. I am open to collaborations and projects and you can reach me on LinkedIn. You can also subscribe for free to get notified when I publish a new story.
Here is the link to my GitHub repository, where I am collecting code and many resources related to machine learning, artificial intelligence, and more.
or you may be interested in one of my recent articles: