WEEKLY AI NEWS: RESEARCH, NEWS, RESOURCES, AND PERSPECTIVES
AI & ML news: Week 29 July — 4 August
OpenAI Faces Massive Financial Challenges Despite High Revenue, Llama 3.1 Launches with Advanced Capabilities and much more
The most interesting news, repository, articles, and resources of the week
Check and star this repository where the news will be collected and indexed:
You will find the news first in GitHub. Single posts are also collected here:
Research
- Retrieval Augmented Generation or Long-Context LLMs? A Comprehensive Study and Hybrid Approach. compares RAG to long-context LLMs and discovers that while RAG is much less expensive, long-context LLMs perform better on average; Offers Self-Route, which routes inquiries to RAG or LC by using self-reflection; it claims to have a substantial computational cost reduction with a performance that is comparable to LC.
- Recursive Introspection: Teaching Language Model Agents How to Self-Improve. asserts that LLMs can be iteratively fine-tuned to improve their own response over multiple turns with additional feedback from the environment; the LLM learns to recursively detect and correct its past mistakes in subsequent iterations; and enhances 7B models’ self-improvement abilities on reasoning tasks (GSM8K and MATH), achieving an improvement over turns that is not observed in strong proprietary models.
- LazyLLM: Dynamic Token Pruning for Efficient Long Context LLM Inference. presents a novel dynamic token pruning technique for effective long-context LLM inference; it can maintain high accuracy while speeding up the prefilling stage of a Llama 2 7B model by 2.34 times; it computes the KV for tokens that are crucial for the next token prediction in both the prefilling and decoding stages; it enables language models to dynamically select different subsets of tokens from the context in different generation steps, even though they may have been pruned in a previous step.
- Generation Constraint Scaling Can Mitigate Hallucination. suggests a novel training-free method to reduce hallucinations in LLMs; they scaled the readout vector that limits generation in a memory-augmented LLM decoder; current research suggests that LLMs with explicit memory mechanisms can help reduce hallucinations; this work employs a memory-augmented LLM and applies lightweight memory primitives to limit generation in the decoder.
- Align and Distill: Unifying and Improving Domain Adaptive Object Detection. The difficulties of getting object detection models to perform well on a variety of data formats that they weren’t initially trained on are addressed by a new method named ALDI.
- Small Molecule Optimization with Large Language Models. By gathering a dataset of 100 million molecules (40 billion token equivalent), two new language models were able to enhance their performance by 8% on the Practical Molecular Optimization benchmark.
- The Larger the Better? Improved LLM Code-Generation via Budget Reallocation. With a fairly comparable inference cost, code generation performance can be enhanced by repeatedly using smaller models.
- Self-Directed Synthetic Dialogues and Revisions Technical Report. More than 300,000 dialogues and criticisms will be incorporated into open models. The dataset, which was primarily produced with synthetics, is a potent illustration of synthetic data utilizing open models.
- Theia: Distilling Diverse Vision Foundation Models for Robot Learning. Theia, a vision foundation model for robot learning that combines several current vision models, is presented in this study. Rich visual representations provided by Theia improve robot learning even when using smaller model sizes and less training data. Test results indicate that Theia performs better than its predecessors, and the authors propose that enhanced performance is caused by more entropy in feature norms. The public is free to utilize the models and code.
- Do We Really Need Graph Convolution During Training? Light Post-Training Graph-ODE for Efficient Recommendation. A novel strategy to increase the effectiveness and scalability of recommender systems is called LightGODE. Adopting a continuous graph ODE and concentrating on post-training graph convolution, avoids the need for costly computations during training.
News
- Llama 3.1 a group of LLMs that includes models with 8B, 70B, and 405B parameters; it supports eight languages and expands the context window to 128K tokens; it exceeds state-of-the-art models in certain situations and competes favorably in other areas, including as general knowledge, math reasoning, and tool use.
- Nvidia’s new Titan GPU will beat the RTX 5090, according to leak. After skipping its ultra-expensive flagship graphics card with its Ada lineup, Nvidia could be bringing back the Titan with a Blackwell GPU.
- Elon Musk will ‘discuss’ Tesla investing $5 billion in his private AI company. Elon Musk says that he will ‘discuss’ Tesla investing $5 billion in xAI, his own private artificial intelligence company. For the last few years, Musk has claimed that “Tesla is an AI company.”
- OpenAI training and inference costs could reach $7bn for 2024, AI startup set to lose $5bn — report. In 2023, OpenAI projected that ChatGPT inference would cost about $4 billion on Microsoft’s Azure servers, potentially resulting in large financial losses. Even though OpenAI is making about $2 billion a year from ChatGPT, it would need more money in less than a year to cover a $5 billion deficit. With subsidized prices from Azure, it presently uses the equivalent of 350,000 Nvidia A100 chip servers, primarily for ChatGPT.
- Elon Musk sets new date for Tesla robotaxi reveal, calls everything beyond autonomy ‘noise’. Elon Musk says he will show off Tesla’s purpose-built “robotaxi” prototype during an event October 10, after scrapping a previous plan to reveal it August 8. Musk said Tesla will also show off “a couple of other things,” but didn’t explain what that meant.
- Stability AI steps into a new-gen AI dimension with Stable Video 4D. Stability AI is expanding its growing roster of generative AI models, quite literally adding a new dimension with the debut of Stable Video 4D.
- Google’s Gemini AI is getting faster with its Flash upgrade. Google’s Gemini AI chatbot will be able to respond to you more quickly and process more content in prompts thanks to an upgrade to the company’s Gemini 1.5 Flash AI model.
- Introducing SAM 2: The next generation of Meta Segment Anything Model for videos and images. Real-time promptable segmentation for videos and images from Meta.
- Apple says its AI models were trained on Google’s custom chips. Apple said in a technical paper on Monday that the two AI models underpinning Apple Intelligence, its AI system, were pre-trained on Google-designed chips in the cloud.
- AI Startup Anthropic Faces Backlash for Excessive Web Scraping. Freelancer.com CEO claims Anthropic’s crawler violated the “do not crawl” protocol, causing site slowdowns.
- Apple Intelligence Foundation Language Models. Apple has outlined the basics of its language models for its newly announced “Apple Intelligence” initiative.
- Microsoft beats revenue forecasts but poor performance of cloud services drags share price. The firm’s earnings were up 15% year-on-year, but Azure’s lower returns resulted in share prices falling by as much as 7%
- UK regulator looks at Google’s partnership with Anthropic.CMA to consider whether the deal with AI startup is a potential merger, which could prompt full investigation
- OpenAI has released a new ChatGPT bot that you can talk to. The voice-enabled chatbot will be available to a small group of people today, and to all ChatGPT Plus users in the fall.
- Meta’s new AI Studio helps you create your own custom AI chatbots. Headed for the web as well as Instagram, Messenger, and WhatsApp, AI Studio will let you build a chatbot that acts as a virtual extension of yourself.
- Perplexity Will Soon Start Selling Ads Within AI Search. Facing backlash for scraping publisher data, the young company says it’ll now compensate publishers whose content is used in answers to search questions.
- The AI job interviewer will see you now. AI interview services say they’re eliminating bias — but not everyone agrees. Companies are adopting AI job interview systems to handle incoming applicants. LLMs allow the interviewer to incorporate follow-up questions based on the subject’s response. Critics say the opaque models raise serious concerns about bias, particularly where there is no documentation about how a decision is made.
- Canva buys Leonardo. Leonardo, a generative picture firm, joins Canva to enhance the creative tools of both organizations.
- Announcing Phi-3 fine-tuning, new generative AI models, and other Azure AI updates. Updates to Azure AI have been released by Microsoft. These include PHI-3 model serverless fine-tuning, enhanced PHI-3-MINI performance, and the incorporation of models such as Meta’s LLAMA 3.1 and GPT-4o mini into Azure AI.
- Strong earnings report pushes Meta shares up amid heavy AI spending. Stock price grew around 5%, which revealed the company outperformed analysts’ expectations for its second-quarter
- Argentina will use AI to ‘predict future crimes’ but experts worry for citizens’ rights. President Javier Milei creates a security unit as some say certain groups may be overly scrutinized by the technology
- White House says no need to restrict ‘open-source’ artificial intelligence — at least for now. The White House is coming out in favor of “open-source” artificial intelligence technology, arguing in a report Tuesday that there’s no need right now for restrictions on companies making key components of their powerful AI systems widely available.
- Samsung hints at new products as it bets on AI to drive upgrades to its latest foldable phones. Speaking to CNBC, Samsung Electronics’ mobile boss TM Roh discussed Galaxy AI and software strategy, while hinting at future foldable products and mixed reality headsets. Roh said the company hopes its suite of AI software will push users to upgrade to its latest smartphones.
- Elon Musk calls Grok ‘the most powerful AI by every metric’ but ‘secretly’ trains the new model with your X data by default. X’s new experience is automatically set to opt-in and uses your data to train its Grok AI model.
- NVIDIA Accelerates Humanoid Robotics Development. To accelerate the development of humanoid robotics, NVIDIA has introduced new services and platforms, such as teleoperated data-capturing workflows, OSMO orchestration, and NIM microservices.
- US’ first robot-assisted dual kidney transplant performed in Ohio. Joanne’s surgery was unique because doctors used the robotic surgical technique to implant two kidneys from a single deceased donor.
- Intel announces plan to cut 15,000 jobs to ‘resize and refocus’ business. Firm reported a loss in its second quarter and said it would cut 15% of its workforce to cut costs and compete with rivals
- UK shelves £1.3bn of funding for technology and AI projects. Britain’s first next-generation supercomputer, planned by Tories, is in doubt after the Labour government's move
- Black Forest Labs. The founders of Latent Diffusion, Stable Diffusion, VQGAN, and other startups have raised over $30 million to launch their new business. They have introduced new flagship picture generation devices that are available on multiple levels and are incredibly competent.
- OpenAI pledges to give the U.S. AI Safety Institute early access to its next model. OpenAI CEO Sam Altman says that OpenAI is working with the U.S. AI Safety Institute, a federal government body that aims to assess and address risks in AI platforms, on an agreement to provide early access to its next major generative AI model for safety testing.
- The EU’s AI Act is now in force. This starts the clock on a series of staggered compliance deadlines that the law will apply to different types of AI developers and applications. Most provisions will be fully applicable by mid-2026. But the first deadline, which enforces bans on a small number of prohibited uses of AI in specific contexts, such as law enforcement use of remote biometrics in public places, will apply in just six months.
- Introducing Stable Fast 3D: Rapid 3D Asset Generation From Single Images. A fantastic new quick and strong 3D generation model has been launched by Stability AI. Like the company’s earlier versions, it operates under the same commercial license.
- Introducing torchchat: Accelerating Local LLM Inference on Laptop, Desktop, and Mobile. A fantastic sample library for local language model chats has been made available by the PyTorch team. It can run the most recent Llama 3.1 models and comes with a reliable sample system.
- Heeyo built an AI chatbot to be a billion kids’ interactive tutor and friend. Xiaoyin Qu founded the firm Heeyo, which has released AI-powered software with interactive games and a chatbot for kids three to eleven years old. With features like data protection and material created by child development specialists, the app strives to prioritize safety while offering tailored learning experiences. Though there may be worries about AI for children, Heeyo has raised $3.5 million in seed money. It presents itself as a secure and instructive substitute for well-known video and gaming platforms.
- Cerebras IPO. Cerebras Systems announced a proposal for IPO to the SEC.
- LLMs breach a threshold. FLOPs as a regulatory threshold have been the subject of dispute since Meta’s open-source LLM Llama 3.1, trained on 3.8x10²⁵ FLOPs and equipped with 405B parameters, was recently released.
Resources
- OpenDevin: An Open Platform for AI Software Developers as Generalist Agents. provides a framework for creating generalist agents that use software to interact with the outside world. Its features include 1) an interface for creating and executing code, 2) an environment with a sandboxed operating system and web browser accessible to the agents, 3) an interface for interacting with interfaces and environments, 4) support for multiple agents, and 5) an evaluation framework.
- A Survey on Employing Large Language Models for Text-to-SQL Tasks. gives an overview of using LLMs for Text-to-SQL operations, covering benchmarks, prompt engineering strategies, and fine-tuning procedures.
- MINT-1T: Scaling Open-Source Multimodal Data by 10x: A Multimodal Dataset with One Trillion Tokens. Open-source a massive multimodal interleaved dataset with 3.4 billion images and 1 trillion tokens; additional sources like PDFs and ArXiv papers are also included.
- StreamMOS: Streaming Moving Object Segmentation with Multi-View Perception and Dual-Span Memory. StreamMOS is a new approach for segmenting moving objects using LiDAR in autonomous driving and robotics.
- Joint RGB-Spectral Decomposition Model Guided Image Enhancement in Mobile Photography. Scientists have devised a technique that incorporates miniature spectrometers to enhance mobile photography. To improve image quality, this innovative method combines RGB and low-resolution multi-spectral images.
- BetterDepth: Plug-and-Play Diffusion Refiner for Zero-Shot Monocular Depth Estimation. A fresh and enhanced monocular depth model for numerous real-world situations.
- 3D Object Segmentation with Language. RefMask3D is a technique that uses natural language descriptions to partition items in 3D point clouds. With Geometry-Enhanced Group-Word Attention and Linguistic Primitives Construction, the system improves vision-language feature fusion and tackles sparse and irregular point cloud problems.
- Efficient Cell Segmentation. A novel technique for high-accuracy cell segmentation, LKCell strikes a compromise between computational efficiency and broad receptive fields.
- Tactics for multi-step AI app experimentation. Typically, LLM programs have several components; this article examines various strategies along with pertinent code snippets.
- AccDiffusion. a technique that significantly enhances diffusion models’ ability to synthesize high-quality images.
- HybridDepth. A depth estimate pipeline called HYBRIDDEPTH was created to address issues with scale ambiguity and technology variation in mobile augmented reality.
- VSSD: Vision Mamba with Non-Causal State Space Duality. A novel method for mitigating the high computing needs of vision transformers is the Visual State Space Duality (VSSD) paradigm.
- A New Benchmark for Autonomous Agents. AppWorld Engine is a sophisticated execution environment that features nine daily apps and 457 APIs
- Crash Course in Deep Learning. The creation and application of multi-layer perceptrons (MLPs), a kind of fully connected neural network used in deep learning, are covered in this article.
- SaulLM-54B & SaulLM-141B: Scaling Up Domain Adaptation for the Legal Domain. In this study, two huge language models with 54 billion and 141 billion parameters, respectively, that are intended for the legal industry, are introduced: SaulLM-54B and SaulLM-141B. The researchers used the Mixtral architecture to provide large-scale domain adaptation by aligning outputs with human legal interpretations, continuing pre-training using an extensive legal corpus, and adhering to a particular legal instruction-following procedure. The models provide state-of-the-art performance on LegalBench-Instruct and outperform earlier open-source models. These models’ base, instruct, and aligned versions are available for reuse and group study under the MIT License.
- WFEN. To boost face super-resolution, researchers have created a feature augmentation network based on wavelets. The technique uses a full domain Transformer and breaks down input data into high and low-frequency components to improve facial details without generating distortions.
- ChartQA-MLLM. This experiment suggests a novel approach to multimodal large language models-based chart question answering.
- DGFNet.A novel method for forecasting the paths of several traffic participants in autonomous driving is called DGFNet. By taking into account the variations in difficulty between agents, recording detailed spatiotemporal data, and utilizing a difficulty-guided decoder, it improves predictions.
- SAE for Gemma. This demo is a beginner-friendly introduction to interpretability that explores an AI model called Gemma 2 2B. It also contains interesting and relevant content even for those already familiar with the topic.
- Machine Unlearning in Generative AI: A Survey. This in-depth analysis of generative AI examines machine unlearning. It addresses how to formulate problems, how to evaluate them, and the advantages and disadvantages of different approaches.
- Elysium: Exploring Object-level Perception in Videos via MLLM. A step toward providing object tracking and related tasks in films for Multi-modal Large Language Models (MLLMs) is represented by Elysium.
- Piano Performance Generation. The two-stage Transformer-based model for creating emotionally charged piano performances is presented in this paper.
- 3D Generative Model for Dynamic Scenes. A 3D generative model called DynaVol-S is very good at extracting object-centric representations from unsupervised films.
- Add-SD: Rational Generation without Manual Reference. Add-SD is a program that uses short text prompts to put things into realistic environments. Unlike other methods, this one doesn’t require bounding boxes or other explicit references.
- Flow Matching: Matching flows instead of scores. Diffusion models possess great strength. It can be difficult to understand them. Theoretically, flow matching is one way to view them. This blog delves further into the diffusion math of flow matching.
- MMTrail: A Multimodal Trailer Video Dataset with Language and Music Descriptions. MMTrail is a large-scale multi-modality video-language dataset with over 20M trailer clips, featuring high-quality multimodal captions that integrate context, visual frames, and background music, aiming to enhance cross-modality studies and fine-grained multimodal-language model training.
- ARCLE — ARC Learning Environment. ARCLE is an environment to aid reinforcement learning studies using the Abstraction and Reasoning Corpus (ARC).
- Mishax. DeepMind has released a library for studying language models via MI. The library helps with running models and functions from complex codebases without tons of importing headaches.
- Engine Core. Engine Core demonstrates a pattern for enabling LLMs to undertake tasks of a given scope with a dynamic system prompt and a collection of tool functions.
- alphaXiv. Open research discussion directly on top of arXiv
Perspectives
- My new iPhone symbolizes stagnation, not innovation — and a similar fate awaits AI. Development of ChatGPT and its ilk will plateau, just like it did for smartphones, and then what are we left with? More ho-hum consumer tech
- AI: Are we in another dot-com bubble? A thorough examination by Translink Capital’s Kelvin Mu contrasts the present AI cycle with the internet/telecom cycle of the 1990s. After comparing the two eras’ technological, economic, and capital disparities, he comes to the conclusion that, even though a bubble may eventually occur, we are still a long way from there.
- Robots sacked, screenings shut down: a new movement of Luddites is rising up against AI. Company after company is swallowing the hype, only to be forced into embarrassing walkbacks by anti-AI backlash
- Chalkboards and What They Can Teach Us About Generative AI. This article discusses the use of generative AI as a teaching tool and makes the case that the technology’s compatibility with educational ideals should be taken into account in addition to its technical analysis. Although the author is receptive to the use of AI, she is wary of its potential effects and stresses the necessity for clear justifications for the use of particular resources in the classroom. The conversation compares and contrasts AI with conventional tools such as whiteboards, taking into account the educational and cultural consequences of each.
- The Evolution of SaaS Pricing in the AI Era. Because AI can automate work, the traditional seat-based pricing model in SaaS is becoming outdated. Work-based or outcome-based pricing models, which set prices according to the quantity of work AI completes or the results it achieves, are becoming more and more popular among businesses. While established players continue to use seat-based pricing, startups are utilizing innovative approaches to gain a competitive edge and more properly represent the value of AI.
- TechScape: Will OpenAI’s $5bn gamble on chatbots pay off? Only if you use them. The ChatGPT maker is betting big, while Google hopes its AI tools won’t replace workers, but help them to work better
- New online therapies could help at least twice the number of people recover from anxiety. Four internet treatments developed by the University of Oxford will be rolled out across NHS trusts
- AI Is a Services Revolution. The effect of LLMs on the service economy is covered in this article, with special attention to knowledge-based industries including education, healthcare, and law. Enterprise adoption of AI is gradual, with many still in the trial phase, despite the rapid breakthroughs suggesting tremendous automation possibilities. The actual rollout is anticipated to occur gradually. In the changing market, specialized AI businesses that use LLMs to enhance industry-specific workflows will have an advantage.
- Why Big Tech Wants to Make AI Cost Nothing. Almost all firms are free to use Meta’s open-sourced Llama 3.1, an LLM that competes with OpenAI’s ChatGPT. This tactic might turn LLMs into commodities and increase demand for complementary products like server space. AI companies may encounter difficulties when large tech develop models that are comparable to theirs. Industry titans may surpass smaller rivals in terms of AI breakthroughs.
- Who will control the future of AI? To maintain AI supremacy over authoritarian regimes, OpenAI’s Sam Altman has presented a strategic imperative for the US and its allies to lead a global AI initiative based on democratic values. This initiative calls for strong security, infrastructure investment, commercial diplomacy, and cooperative norms development.
- Advanced AI assistants that act on our behalf may not be ethically or legally feasible. Google and OpenAI have recently announced major product launches involving artificial intelligence (AI) agents based on large language models (LLMs) and other generative models. Notably, these are envisioned to function as personalized ‘advanced assistants’. With other companies following suit, such AI agents seem poised to be the next big thing in consumer technology, with the potential to disrupt work and social environments.
- Three ways AI is changing the 2024 Olympics for athletes and fans. From training to broadcasting, artificial intelligence will have an imprint on this year’s event for the first time.
- Mixed signals on tech stocks amid debate over the viability of AI boom. Fears of fresh sell-off after Nvidia and Microsoft shares dip, but other chip stocks continue to rise
- Cheap light sources could make AI more energy efficient. Light-based devices can reduce the energy consumption of computers, but most rely on lasers, which are expensive to integrate with other technologies. An approach that uses LEDs instead of lasers provides a path forward.
- Raising children on the eve of AI. As transformative AI becomes more likely, this author wonders how to get kids ready for a future that might look very different from what it is today, while also struggling with the timing and unpredictability of changes. In addition, they discuss the moral implications of bearing children in the face of AI-induced uncertainty. They also offer practical advice on how to raise “AI-native” children and parenting techniques that put happiness and adaptability before conventional career-focused routes. The author promotes having an open discussion about possible hazards with children, planning for a variety of futures, and leading a balanced life.
- Your new AI Friend is almost ready to meet you. Rather than focusing on increasing productivity, Avi Schiffmann is creating “Friend,” an AI companion housed in a wearable necklace that is meant to provide connection and support. The gadget, which connects through an app, will initially be sold in 30,000 pieces for $99 per. January shipping is scheduled without a subscription cost. Schiffmann sees Friend developing into a digital relationship platform, separating the product from AIs that are task-oriented and concentrating instead on the new trend of meaningfully connecting with digital entities.
- These AI firms publish the world’s most highly cited work. US and Chinese firms dominate the list of companies that are producing the most research and patents in artificial intelligence.
- How TikTok bots and AI have powered a resurgence in UK far-right violence. Experts warn growth of extremist influencers and ‘micro-donations’ could create an even bigger wave of unrest
- On speaking to AI. The new AI-powered Siri and ChatGPT’s new Advanced Voice mode have different ideologies. Agent systems, such as ChatGPT Voice, use strong, multimodal models for more natural and dynamic interactions, while Copilot systems use minimal models to focus on safety and privacy. This demonstrates the conflict between less capable, lower-risk systems and ones that give greater control and possible advantages.
- How This Brain Implant Is Using ChatGPT. Synchron has incorporated OpenAI’s ChatGPT into their brain-computer interface (BCI) technology to provide quicker communication for individuals who are paralyzed. This BCI, known as a stentrode, is capable of deciphering mental orders. It currently provides response possibilities created by AI; in the future, it may also support multimodal inputs. With an eye toward FDA approval, Synchron plans to adapt its AI integrations to meet the demands of patients.
- At the Olympics, AI is watching you. Paris increased security in anticipation of the 2024 Olympics by using artificial intelligence (AI) to scan CCTV footage from metro and train stations for possible threats.
- Why have the big seven tech companies been hit by AI boom doubts? Their shares have fallen 11.8% from last month’s peak but more AI breakthroughs may reassure investors
- We must be wary of the power of AI. Robert Skidelsky is concerned about the surveillance potential or AI, while Brian Reffin Smith is worried about its capacity to hijack culture, and Michael Heaton warns that it relieves us of the need to think
- OpenAI’s Sam Altman is becoming one of the most powerful people on Earth. We should be very afraid. Sam Altman’s ChatGPT promises to transform the global economy. But it also poses an enormous threat. Here, a scientist who appeared with Altman before the US Senate on AI safety flags up the danger in AI — and in Altman himself
Meme of the week
What do you think about it? Some news that captured your attention? Let me know in the comments
If you have found this interesting:
You can look for my other articles, and you can also connect or reach me on LinkedIn. Check this repository containing weekly updated ML & AI news. I am open to collaborations and projects and you can reach me on LinkedIn. You can also subscribe for free to get notified when I publish a new story.
Here is the link to my GitHub repository, where I am collecting code and many resources related to machine learning, artificial intelligence, and more.
or you may be interested in one of my recent articles: