WEEKLY AI NEWS: RESEARCH, NEWS, RESOURCES, AND PERSPECTIVES

AI & ML news: Week 29 July — 4 August

OpenAI Faces Massive Financial Challenges Despite High Revenue, Llama 3.1 Launches with Advanced Capabilities and much more

Salvatore Raieli
19 min readAug 6, 2024
Photo by Flipboard on Unsplash

The most interesting news, repository, articles, and resources of the week

Check and star this repository where the news will be collected and indexed:

You will find the news first in GitHub. Single posts are also collected here:

Weekly AI and ML news - each week the best of the field

44 stories

Research

  • Retrieval Augmented Generation or Long-Context LLMs? A Comprehensive Study and Hybrid Approach. compares RAG to long-context LLMs and discovers that while RAG is much less expensive, long-context LLMs perform better on average; Offers Self-Route, which routes inquiries to RAG or LC by using self-reflection; it claims to have a substantial computational cost reduction with a performance that is comparable to LC.
  • Recursive Introspection: Teaching Language Model Agents How to Self-Improve. asserts that LLMs can be iteratively fine-tuned to improve their own response over multiple turns with additional feedback from the environment; the LLM learns to recursively detect and correct its past mistakes in subsequent iterations; and enhances 7B models’ self-improvement abilities on reasoning tasks (GSM8K and MATH), achieving an improvement over turns that is not observed in strong proprietary models.
  • LazyLLM: Dynamic Token Pruning for Efficient Long Context LLM Inference. presents a novel dynamic token pruning technique for effective long-context LLM inference; it can maintain high accuracy while speeding up the prefilling stage of a Llama 2 7B model by 2.34 times; it computes the KV for tokens that are crucial for the next token prediction in both the prefilling and decoding stages; it enables language models to dynamically select different subsets of tokens from the context in different generation steps, even though they may have been pruned in a previous step.
  • Generation Constraint Scaling Can Mitigate Hallucination. suggests a novel training-free method to reduce hallucinations in LLMs; they scaled the readout vector that limits generation in a memory-augmented LLM decoder; current research suggests that LLMs with explicit memory mechanisms can help reduce hallucinations; this work employs a memory-augmented LLM and applies lightweight memory primitives to limit generation in the decoder.
  • Align and Distill: Unifying and Improving Domain Adaptive Object Detection. The difficulties of getting object detection models to perform well on a variety of data formats that they weren’t initially trained on are addressed by a new method named ALDI.
  • Small Molecule Optimization with Large Language Models. By gathering a dataset of 100 million molecules (40 billion token equivalent), two new language models were able to enhance their performance by 8% on the Practical Molecular Optimization benchmark.
  • The Larger the Better? Improved LLM Code-Generation via Budget Reallocation. With a fairly comparable inference cost, code generation performance can be enhanced by repeatedly using smaller models.
  • Self-Directed Synthetic Dialogues and Revisions Technical Report. More than 300,000 dialogues and criticisms will be incorporated into open models. The dataset, which was primarily produced with synthetics, is a potent illustration of synthetic data utilizing open models.
  • Theia: Distilling Diverse Vision Foundation Models for Robot Learning. Theia, a vision foundation model for robot learning that combines several current vision models, is presented in this study. Rich visual representations provided by Theia improve robot learning even when using smaller model sizes and less training data. Test results indicate that Theia performs better than its predecessors, and the authors propose that enhanced performance is caused by more entropy in feature norms. The public is free to utilize the models and code.
  • Do We Really Need Graph Convolution During Training? Light Post-Training Graph-ODE for Efficient Recommendation. A novel strategy to increase the effectiveness and scalability of recommender systems is called LightGODE. Adopting a continuous graph ODE and concentrating on post-training graph convolution, avoids the need for costly computations during training.

News

Resources

  • OpenDevin: An Open Platform for AI Software Developers as Generalist Agents. provides a framework for creating generalist agents that use software to interact with the outside world. Its features include 1) an interface for creating and executing code, 2) an environment with a sandboxed operating system and web browser accessible to the agents, 3) an interface for interacting with interfaces and environments, 4) support for multiple agents, and 5) an evaluation framework.
  • A Survey on Employing Large Language Models for Text-to-SQL Tasks. gives an overview of using LLMs for Text-to-SQL operations, covering benchmarks, prompt engineering strategies, and fine-tuning procedures.
  • MINT-1T: Scaling Open-Source Multimodal Data by 10x: A Multimodal Dataset with One Trillion Tokens. Open-source a massive multimodal interleaved dataset with 3.4 billion images and 1 trillion tokens; additional sources like PDFs and ArXiv papers are also included.
  • StreamMOS: Streaming Moving Object Segmentation with Multi-View Perception and Dual-Span Memory. StreamMOS is a new approach for segmenting moving objects using LiDAR in autonomous driving and robotics.
  • Joint RGB-Spectral Decomposition Model Guided Image Enhancement in Mobile Photography. Scientists have devised a technique that incorporates miniature spectrometers to enhance mobile photography. To improve image quality, this innovative method combines RGB and low-resolution multi-spectral images.
  • BetterDepth: Plug-and-Play Diffusion Refiner for Zero-Shot Monocular Depth Estimation. A fresh and enhanced monocular depth model for numerous real-world situations.
  • 3D Object Segmentation with Language. RefMask3D is a technique that uses natural language descriptions to partition items in 3D point clouds. With Geometry-Enhanced Group-Word Attention and Linguistic Primitives Construction, the system improves vision-language feature fusion and tackles sparse and irregular point cloud problems.
  • Efficient Cell Segmentation. A novel technique for high-accuracy cell segmentation, LKCell strikes a compromise between computational efficiency and broad receptive fields.
  • Tactics for multi-step AI app experimentation. Typically, LLM programs have several components; this article examines various strategies along with pertinent code snippets.
  • AccDiffusion. a technique that significantly enhances diffusion models’ ability to synthesize high-quality images.
  • HybridDepth. A depth estimate pipeline called HYBRIDDEPTH was created to address issues with scale ambiguity and technology variation in mobile augmented reality.
  • VSSD: Vision Mamba with Non-Causal State Space Duality. A novel method for mitigating the high computing needs of vision transformers is the Visual State Space Duality (VSSD) paradigm.
  • A New Benchmark for Autonomous Agents. AppWorld Engine is a sophisticated execution environment that features nine daily apps and 457 APIs
  • Crash Course in Deep Learning. The creation and application of multi-layer perceptrons (MLPs), a kind of fully connected neural network used in deep learning, are covered in this article.
  • SaulLM-54B & SaulLM-141B: Scaling Up Domain Adaptation for the Legal Domain. In this study, two huge language models with 54 billion and 141 billion parameters, respectively, that are intended for the legal industry, are introduced: SaulLM-54B and SaulLM-141B. The researchers used the Mixtral architecture to provide large-scale domain adaptation by aligning outputs with human legal interpretations, continuing pre-training using an extensive legal corpus, and adhering to a particular legal instruction-following procedure. The models provide state-of-the-art performance on LegalBench-Instruct and outperform earlier open-source models. These models’ base, instruct, and aligned versions are available for reuse and group study under the MIT License.
  • WFEN. To boost face super-resolution, researchers have created a feature augmentation network based on wavelets. The technique uses a full domain Transformer and breaks down input data into high and low-frequency components to improve facial details without generating distortions.
  • ChartQA-MLLM. This experiment suggests a novel approach to multimodal large language models-based chart question answering.
  • DGFNet.A novel method for forecasting the paths of several traffic participants in autonomous driving is called DGFNet. By taking into account the variations in difficulty between agents, recording detailed spatiotemporal data, and utilizing a difficulty-guided decoder, it improves predictions.
  • SAE for Gemma. This demo is a beginner-friendly introduction to interpretability that explores an AI model called Gemma 2 2B. It also contains interesting and relevant content even for those already familiar with the topic.
  • Machine Unlearning in Generative AI: A Survey. This in-depth analysis of generative AI examines machine unlearning. It addresses how to formulate problems, how to evaluate them, and the advantages and disadvantages of different approaches.
  • Elysium: Exploring Object-level Perception in Videos via MLLM. A step toward providing object tracking and related tasks in films for Multi-modal Large Language Models (MLLMs) is represented by Elysium.
  • Piano Performance Generation. The two-stage Transformer-based model for creating emotionally charged piano performances is presented in this paper.
  • 3D Generative Model for Dynamic Scenes. A 3D generative model called DynaVol-S is very good at extracting object-centric representations from unsupervised films.
  • Add-SD: Rational Generation without Manual Reference. Add-SD is a program that uses short text prompts to put things into realistic environments. Unlike other methods, this one doesn’t require bounding boxes or other explicit references.
  • Flow Matching: Matching flows instead of scores. Diffusion models possess great strength. It can be difficult to understand them. Theoretically, flow matching is one way to view them. This blog delves further into the diffusion math of flow matching.
  • MMTrail: A Multimodal Trailer Video Dataset with Language and Music Descriptions. MMTrail is a large-scale multi-modality video-language dataset with over 20M trailer clips, featuring high-quality multimodal captions that integrate context, visual frames, and background music, aiming to enhance cross-modality studies and fine-grained multimodal-language model training.
  • ARCLE — ARC Learning Environment. ARCLE is an environment to aid reinforcement learning studies using the Abstraction and Reasoning Corpus (ARC).
  • Mishax. DeepMind has released a library for studying language models via MI. The library helps with running models and functions from complex codebases without tons of importing headaches.
  • Engine Core. Engine Core demonstrates a pattern for enabling LLMs to undertake tasks of a given scope with a dynamic system prompt and a collection of tool functions.
  • alphaXiv. Open research discussion directly on top of arXiv

Perspectives

  • My new iPhone symbolizes stagnation, not innovation — and a similar fate awaits AI. Development of ChatGPT and its ilk will plateau, just like it did for smartphones, and then what are we left with? More ho-hum consumer tech
  • AI: Are we in another dot-com bubble? A thorough examination by Translink Capital’s Kelvin Mu contrasts the present AI cycle with the internet/telecom cycle of the 1990s. After comparing the two eras’ technological, economic, and capital disparities, he comes to the conclusion that, even though a bubble may eventually occur, we are still a long way from there.
  • Robots sacked, screenings shut down: a new movement of Luddites is rising up against AI. Company after company is swallowing the hype, only to be forced into embarrassing walkbacks by anti-AI backlash
  • Chalkboards and What They Can Teach Us About Generative AI. This article discusses the use of generative AI as a teaching tool and makes the case that the technology’s compatibility with educational ideals should be taken into account in addition to its technical analysis. Although the author is receptive to the use of AI, she is wary of its potential effects and stresses the necessity for clear justifications for the use of particular resources in the classroom. The conversation compares and contrasts AI with conventional tools such as whiteboards, taking into account the educational and cultural consequences of each.
  • The Evolution of SaaS Pricing in the AI Era. Because AI can automate work, the traditional seat-based pricing model in SaaS is becoming outdated. Work-based or outcome-based pricing models, which set prices according to the quantity of work AI completes or the results it achieves, are becoming more and more popular among businesses. While established players continue to use seat-based pricing, startups are utilizing innovative approaches to gain a competitive edge and more properly represent the value of AI.
  • TechScape: Will OpenAI’s $5bn gamble on chatbots pay off? Only if you use them. The ChatGPT maker is betting big, while Google hopes its AI tools won’t replace workers, but help them to work better
  • New online therapies could help at least twice the number of people recover from anxiety. Four internet treatments developed by the University of Oxford will be rolled out across NHS trusts
  • AI Is a Services Revolution. The effect of LLMs on the service economy is covered in this article, with special attention to knowledge-based industries including education, healthcare, and law. Enterprise adoption of AI is gradual, with many still in the trial phase, despite the rapid breakthroughs suggesting tremendous automation possibilities. The actual rollout is anticipated to occur gradually. In the changing market, specialized AI businesses that use LLMs to enhance industry-specific workflows will have an advantage.
  • Why Big Tech Wants to Make AI Cost Nothing. Almost all firms are free to use Meta’s open-sourced Llama 3.1, an LLM that competes with OpenAI’s ChatGPT. This tactic might turn LLMs into commodities and increase demand for complementary products like server space. AI companies may encounter difficulties when large tech develop models that are comparable to theirs. Industry titans may surpass smaller rivals in terms of AI breakthroughs.
  • Who will control the future of AI? To maintain AI supremacy over authoritarian regimes, OpenAI’s Sam Altman has presented a strategic imperative for the US and its allies to lead a global AI initiative based on democratic values. This initiative calls for strong security, infrastructure investment, commercial diplomacy, and cooperative norms development.
  • Advanced AI assistants that act on our behalf may not be ethically or legally feasible. Google and OpenAI have recently announced major product launches involving artificial intelligence (AI) agents based on large language models (LLMs) and other generative models. Notably, these are envisioned to function as personalized ‘advanced assistants’. With other companies following suit, such AI agents seem poised to be the next big thing in consumer technology, with the potential to disrupt work and social environments.
  • Three ways AI is changing the 2024 Olympics for athletes and fans. From training to broadcasting, artificial intelligence will have an imprint on this year’s event for the first time.
  • Mixed signals on tech stocks amid debate over the viability of AI boom. Fears of fresh sell-off after Nvidia and Microsoft shares dip, but other chip stocks continue to rise
  • Cheap light sources could make AI more energy efficient. Light-based devices can reduce the energy consumption of computers, but most rely on lasers, which are expensive to integrate with other technologies. An approach that uses LEDs instead of lasers provides a path forward.
  • Raising children on the eve of AI. As transformative AI becomes more likely, this author wonders how to get kids ready for a future that might look very different from what it is today, while also struggling with the timing and unpredictability of changes. In addition, they discuss the moral implications of bearing children in the face of AI-induced uncertainty. They also offer practical advice on how to raise “AI-native” children and parenting techniques that put happiness and adaptability before conventional career-focused routes. The author promotes having an open discussion about possible hazards with children, planning for a variety of futures, and leading a balanced life.
  • Your new AI Friend is almost ready to meet you. Rather than focusing on increasing productivity, Avi Schiffmann is creating “Friend,” an AI companion housed in a wearable necklace that is meant to provide connection and support. The gadget, which connects through an app, will initially be sold in 30,000 pieces for $99 per. January shipping is scheduled without a subscription cost. Schiffmann sees Friend developing into a digital relationship platform, separating the product from AIs that are task-oriented and concentrating instead on the new trend of meaningfully connecting with digital entities.
  • These AI firms publish the world’s most highly cited work. US and Chinese firms dominate the list of companies that are producing the most research and patents in artificial intelligence.
  • How TikTok bots and AI have powered a resurgence in UK far-right violence. Experts warn growth of extremist influencers and ‘micro-donations’ could create an even bigger wave of unrest
  • On speaking to AI. The new AI-powered Siri and ChatGPT’s new Advanced Voice mode have different ideologies. Agent systems, such as ChatGPT Voice, use strong, multimodal models for more natural and dynamic interactions, while Copilot systems use minimal models to focus on safety and privacy. This demonstrates the conflict between less capable, lower-risk systems and ones that give greater control and possible advantages.
  • How This Brain Implant Is Using ChatGPT. Synchron has incorporated OpenAI’s ChatGPT into their brain-computer interface (BCI) technology to provide quicker communication for individuals who are paralyzed. This BCI, known as a stentrode, is capable of deciphering mental orders. It currently provides response possibilities created by AI; in the future, it may also support multimodal inputs. With an eye toward FDA approval, Synchron plans to adapt its AI integrations to meet the demands of patients.
  • At the Olympics, AI is watching you. Paris increased security in anticipation of the 2024 Olympics by using artificial intelligence (AI) to scan CCTV footage from metro and train stations for possible threats.
  • Why have the big seven tech companies been hit by AI boom doubts? Their shares have fallen 11.8% from last month’s peak but more AI breakthroughs may reassure investors
  • We must be wary of the power of AI. Robert Skidelsky is concerned about the surveillance potential or AI, while Brian Reffin Smith is worried about its capacity to hijack culture, and Michael Heaton warns that it relieves us of the need to think
  • OpenAI’s Sam Altman is becoming one of the most powerful people on Earth. We should be very afraid. Sam Altman’s ChatGPT promises to transform the global economy. But it also poses an enormous threat. Here, a scientist who appeared with Altman before the US Senate on AI safety flags up the danger in AI — and in Altman himself

Meme of the week

What do you think about it? Some news that captured your attention? Let me know in the comments

If you have found this interesting:

You can look for my other articles, and you can also connect or reach me on LinkedIn. Check this repository containing weekly updated ML & AI news. I am open to collaborations and projects and you can reach me on LinkedIn. You can also subscribe for free to get notified when I publish a new story.

Here is the link to my GitHub repository, where I am collecting code and many resources related to machine learning, artificial intelligence, and more.

or you may be interested in one of my recent articles:

--

--

Salvatore Raieli

Senior data scientist | about science, machine learning, and AI. Top writer in Artificial Intelligence