WEEKLY AI NEWS: RESEARCH, NEWS, RESOURCES, AND PERSPECTIVES

AI & ML news: Week 12–18 August

Uber efforts for autonomous vehicles, Apple changes App Store rules, the US consider breaking up Google, and much more

Salvatore Raieli
20 min readAug 18, 2024
Photo by Adeolu Eletu on Unsplash

The most interesting news, repository, articles, and resources of the week

Check and star this repository where the news will be collected and indexed:

You will find the news first in GitHub. Single posts are also collected here:

Weekly AI and ML news - each week the best of the field

44 stories

Research

  • Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clusters. This is an expansion of Ring Attention, which spans many GPUs to provide incredibly lengthy context. An energy function is derived by the researchers to guide the sharding of the models.
  • Bias-Aware Low-Rank Adaptation: Mitigating Catastrophic Inheritance of Large Language Models. Bias propagation from pre-training data is addressed via a novel method for optimizing LLMs called bias-aware low-rank adaptation (BA-LoRA).
  • MM-Forecast: A Multimodal Approach to Temporal Event Forecasting with Large Language Models. Researchers investigate how employing LLMs to improve temporal event predictions can benefit from photos. Two important roles of images are identified by their suggested framework, MM-Forecast: highlighting and supplementing textual data.
  • SAM 2: Segment Anything in Images and Videos. an open, consistent approach for promptable, real-time object segmentation in photos and videos that can be used to visual content that hasn’t been seen before without the requirement for special adaption; To facilitate precise mask prediction in videos, a memory mechanism is incorporated to retain data about the object and past interactions. Additionally, the memory module permits the processing of videos of any length in real-time. SAM2 considerably surpasses prior methods in interactive video segmentation over 17 zero-shot video datasets, all while requiring three times fewer human-in-the-loop interactions.
  • Structured Generation Limits Reasoning. It examines whether structured generation can affect an LLM’s capacity for reasoning and comprehensive domain knowledge; and finds that when format constraints are applied, an LLM’s reasoning skills significantly deteriorate in comparison to free-form responses; this degradation effect is exacerbated when stricter format constraints are applied to reasoning tasks.
  • RAG Foundry: A Framework for Enhancing LLMs for Retrieval Augmented Generation. presents RAGFoundry, an open-source framework for enhanced LLMs for RAG use cases; it facilitates the generation of data-augmented datasets to fine-tune and assess LLMs in RAG situations. The system enables data creation, training, inference, and assessment.
  • Synthesizing Text-to-SQL Data from Weak and Strong LLMs. suggests using integrated synthetic data to create the highly specialized SoTA text-to-SQL model known as SENSE; the use of strong models’ synthetic data improves data variety, while the incorporation of important erroneous data from weaker models with an executor allows for the learning of execution feedback; By using preference learning to instruction-tune LLMs to learn from both correct and incorrect samples, SENSE closes the performance gap between open-source models and approaches utilizing closed-source models, achieving state-of-the-art scores on the SPIDER and BIRD benchmarks.
  • Conversational Prompt Engineering. describes a two-step process that allows users to create personalized few-shot prompts by interacting with the model and sharing the output. The model shapes the initial instruction based on user-provided unlabeled data, and the user provides feedback on the outputs and instructions. This iterative process produces a personalized few-shot prompt that performs better and more optimally on the desired task.
  • Self-Taught Evaluators. an approach to enhance model-based evaluators with only synthetic training data; it claims to outperform LLM-judges like GPT-4 and match top-performing reward models trained on labeled examples; it first generates contrasting outputs (good and bad model responses) and trains an LLM-as-a-Judge to produce reasoning traces and final judgments; the self-improvement scheme iteratively repeats the training process using its improved predictions.
  • UGrid: An Efficient-And-Rigorous Neural Multigrid Solver for Linear PDEs. The UGrid solver is a recently created neural solver that combines the advantages of MultiGrid and U-Net methods for solving linear partial differential equations (PDEs).
  • Causal Agent based on Large Language Model. The Causal Agent is an agent framework that can manage causal issues since it has memory, reasoning, and tool modules.
  • ReCLIP++: Learn to Rectify the Bias of CLIP for Unsupervised Semantic Segmentation. Biases in CLIP can make it less effective in tasks like unsupervised semantic segmentation when images are not annotated. In this research, a technique to explicitly model and correct these biases is proposed.
  • Sakana Launches AI Scientist. A system that can independently conduct research by formulating hypotheses, carrying out experiments, developing code, and compiling the findings into well-reasoned publications has been unveiled by the Japanese artificial intelligence company Sakana. Together with an open-sourced version of the system, the company has supplied samples of the papers the system wrote.
  • Small but Mighty: Introducing answerai-colbert-small. ColBERT is a highly effective retrieval model. Despite having just 33 million parameters, this new model performs remarkably well on several measures. This article explains how to train a comparable model and what tips and techniques produced good results.
  • In Defense of Lazy Visual Grounding for Open-Vocabulary Semantic Segmentation.” Lazy visual grounding” is a two-step approach to open-vocabulary semantic segmentation that finds object masks independently of text and subsequently identifies the objects with textual information.
  • Introducing Agent Q: Research Breakthrough for the Next Generation of AI Agents with Planning & Self Healing Capabilities. An agent educated by Multion to do web queries via self-play. It increased from 18% to 81% during training on a range of web-based tasks, such as placing restaurant orders. To get better, it employs DPO and MCTS. A publication from this work is published on the website, and researchers from Stanford also contributed to it. It seems to be based on Salesforce Research’s xLAM function calling mechanism.
  • Anchored Preference Optimization. Modifying models to conform to human tastes typically necessitates post-training. It is unclear, nevertheless, why one example is superior to another when these models are being trained. By using an existing example that has deteriorated, APO allows models to anchor the preference difference.
  • Mutual Reasoning Makes Smaller LLMs Stronger Problem-Solvers. Research on tree search for inference time computation for language models is very active. This Microsoft article presents a very strong argument for how small models can significantly outperform large models on mathematical tasks.
  • MetaSeg: MetaFormer-based Global Contexts-aware Network for Efficient Semantic Segmentation. Based on the MetaFormer design, MetaSeg is a potent semantic segmentation network that improves the network’s decoder and backbone.
  • Long Context RAG Performance of LLMs. This article investigates the performance of long context models on several RAG tasks. Increasing the amount of examples can be beneficial. These models frequently break down in odd but expected ways.

News

  • Uber highlights autonomous vehicle efforts now that Tesla’s in its rearview mirror. Uber reported strong second-quarter results, with gross bookings and net profit both up decently. But the company has chosen to highlight the success of its autonomous vehicle effort, likely to assuage investors concerned about incoming competition from Tesla, which aims to reveal its first robotaxi in October.
  • Mistral: build, tweak, repeat. With the introduction of LLM customizations by La Plateforme, such as Mistral Large 2 and Codestral, developers can now fine-tune models with specialized domain knowledge. The ‘Agents’ alpha release offers sophisticated, multi-layered processes that are integrated with the capabilities of Mistral Large 2. For Python and Typescript, the Mistralai SDK has reached a stable 1.0 release, which enhances consistency and usefulness.
  • Zico Kolter Joins OpenAI’s Board of Directors. Expert in AI robustness and safety, Zico Kolter is a professor at Carnegie Mellon University. He just joined the Safety and Security Committee of OpenAI and the Board of Directors. His in-depth studies on model robustness, alignment, and safety in AI will strengthen OpenAI’s endeavors to guarantee that AI serves humanity.
  • Apple changes EU App Store rules after commission charges. Change in policy means developers will be able to communicate with customers outside App Store
  • World’s 1st AI-powered hearing aids boost speech understanding by 53 times. With AI and dual-chip technology, Sonova has unveiled the Phonak Audéo Sphere, a hearing aid that promises a 53x improvement in speech understanding in noisy conditions. The technology, which took years to develop, uses the DEEPSONIC chip with enhanced DNN capabilities to address the main issue facing users of hearing aids: clarity in noisy environments. Sonova hopes that this technological advancement will greatly enhance the lives of those who are hard of hearing.
  • Apple Intelligence may come to EU after all…but only for Mac. As per the most recent beta release notes, Mac users in the EU will get access to Apple’s AI features in the next macOS Sequoia, unlike on iOS and iPadOS 18. Macs are not covered by the EU exclusion, which stems from problems with Digital Markets Act compliance. If Mac users have their system set to U.S. English, they should be able to access Apple Intelligence.
  • Waymo is expanding its robotaxi service areas in San Francisco and Los Angeles. The company is looking to add more customers to its burgeoning driverless car business.
  • Intel reportedly gave up a chance to buy a stake in OpenAI in 2017. According to reports, Intel decided against investing in OpenAI, which is currently a major participant in the AI space, in 2017–2018 because then-CEO Bob Swan doubted the industry’s preparation for AI.
  • YouTube is testing a feature that lets creators use Google Gemini to brainstorm video ideas. YouTube is testing integration with Google Gemini to help creators brainstorm video ideas, titles and thumbnails.
  • Forget Midjourney — Flux is the new king of AI image generation and here’s how to get access. Black Forest Labs’ Flux AI is the newest and most promising open-source AI image-generating technology available. Laptops intended for consumers can run it. It is better at providing people and quick adherence than rivals such as Midjourney in certain areas. There are three versions of the model available: Pro, Dev, and Schnell. An open-source text-to-video model is being planned.
  • Paid Apple Intelligence features are likely at least 3 years away. Some analysts this week started reporting that Apple could charge as much as $20/month for paid Apple Intelligence features. While that may be true, we likely won’t see Apple charging for these features for at least 3 years.
  • Elon Musk to pause X’s AI training on some EU data, Ireland says. Des Hogan, the Irish Commissioner for Data Protection, has filed a lawsuit against an undisclosed business, contesting how it handles the personal data of EU citizens and perhaps affecting its AI chatbot’s GDPR-compliant data processing procedures.
  • Intel is bringing GPUs to cars. The Arc A760A is a discrete GPU for automobiles from Intel that aims to improve in-car entertainment through AI-powered capabilities like gesture and speech recognition.
  • US considers breaking up Google after illegal monopoly ruling, reports say. DoJ could force divestment of the Android operation system and Chrome web browser following the antitrust verdict
  • Google launches Pixel 9 phones with advanced AI.New Pixel phones, foldable, and earbuds feature Gemini Live for free-flowing conversations with an AI bot
  • Grok-2 Beta Release. The latest model from xAI, Grok 2, is a frontier class model with mathematical, coding, and reasoning abilities. To make FLUX available to X users, it is working with Black Forest Labs.
  • Prompt Caching With Claude. Anthropic’s Claude models now have prompt caching, which enables developers to cache context that is regularly utilized. This reduces costs and latency considerably, and early adopters like Notion are now enjoying faster and more effective AI-powered features.
  • OpenAI updates ChatGPT to new GPT-4o model based on user feedback. Unannounced, OpenAI upgraded the GPT-4o model for ChatGPT, adding features based on user feedback but leaving the reasoning style unchanged. Users conjectured about improved multi-step reasoning and image-generating capabilities, but OpenAI made it clear that the model’s reasoning remains unchanged. To improve developer experiences, the business also mentioned that the most recent version of ChatGPT could not be the same as the API version.
  • 14 new things you can do with Pixel thanks to AI. The Pixel Watch 3 uses sophisticated motion sensing and machine learning for better running form analysis, and it makes use of machine learning for automated sleep detection and mode modifications. It presents a Loss of Pulse Detection AI program that, if required, will automatically notify emergency services. Additionally, Pixel’s AI-powered call screening and holding features are carried over to the watch.
  • MIT releases comprehensive database of AI risks. The AI Risk Repository, a comprehensive database of over 700 verified AI dangers, was developed by MIT and other institutions to assist enterprises and researchers in assessing and mitigating evolving AI risks through the use of a two-dimensional classification system and frequently updated data.
  • Universal Music and Meta Announce ‘Expanded Global Agreement’ for AI, Monetization and More. With an emphasis on equitable pay and resolving difficulties with unlicensed AI content, Meta and Universal Music Group have extended their multi-year licensing deal. This move aims to increase revenue and develop creative opportunities for UMG’s artists on platforms such as Facebook, Instagram, and now WhatsApp.
  • As Alexa turns 10, Amazon looks to generative AI. Despite having a high household penetration rate, Amazon’s Alexa subsidiary lost $10 billion in 2022 and had to lay off employees, underscoring the unviability of its loss leader approach. With the growing apathy towards smart assistants such as Siri and Google Assistant, Amazon is relying on generative AI to boost user engagement and enhance Alexa’s functionality. The company’s main goals are to get around the “smart timer” restriction and improve conversational interactions.
  • Replika CEO Eugenia Kuyda says it’s okay if we end up marrying AI chatbots. CEO of Replika Eugenia Kuyda recently talked about her vision for AI partners in human interactions, emphasizing the app’s potential to provide romance, companionship, or therapy via avatars. Replika hopes to create a new class of connections by evolving LLMs to enhance human interaction rather than replace it. Even in the face of controversy — like brief bans on sexual content — the app’s goal of enhancing users’ mental health never changes. Replika, which employs 50–60 people and has millions of users, is preparing a big relaunch to improve dialogue realism and interaction.
  • Gemini 1.5 Flash price drop with tuning rollout complete, and more. With a 78% reduction in input and a 71% reduction in output token costs, Gemini 1.5 Flash has experienced a pricing reduction. Additionally, its API is now supported in more than 100 languages.
  • Prediction marketplace Polymarket partners with Perplexity to show news summaries. To incorporate event-related news summaries and data visualizations into its prediction marketplace, Polymarket has teamed up with AI search engine Perplexity.
  • Nouse Hermes 3. Nous Research has released its flagship model. Trained on top of Llama 3, the model has strong performance and a great personality like many of the company’s original models.
  • California AI bill SB 1047 aims to prevent AI disasters, but Silicon Valley warns it will cause one. Silicon Valley is opposed to California’s SB 1047, which aims to stop “critical harms” from massive AI models. Stakeholders are split on the bill’s possible effects on innovation. Prominent businesses and industry leaders discuss the bill’s benefits and implications for AI safety and advancement. The measure is headed for a final Senate vote. It mandates AI model safety protocols and third-party audits. It also outlines enforcement procedures and heavy fines for non-compliance.
  • SoftBank’s Intel AI processor plans in doubt as insiders say it is now considering a TSMC partnership. Intel failed to produce AI processors for SoftBank’s Project Izanagi, leading SoftBank to explore a partnership with TSMC. Despite setbacks, SoftBank remains committed to challenging major AI players with its own hardware and data center ecosystem, potentially backed by significant investment from global partners. The move could strain SoftBank’s relationship with Arm clients as it risks direct competition.
  • Another Apple smart ring patent granted, includes controlling smart glasses. A smart ring that can monitor health and control other Apple devices is described in a recently awarded patent by Apple, which also refers to potential integration with AR/VR headsets and smart glasses.
  • Iranian group used ChatGPT to try to influence US election, OpenAI says. AI company bans accounts and says operation did not appear to have meaningful audience engagement
  • Russia’s AI tactics for US election interference are failing, Meta says. New Meta security report finds that AI-powered deception campaigns ‘provide only incremental’ results for bad actors

Resources

Perspectives

  • ‘His rhetoric has made Tesla toxic’: is Elon Musk driving away his target market? There are signs the billionaire is becoming unpopular with the very demographic group most likely to buy EVs
  • Why Elon Musk’s fun week of stirring up unrest shows the limits of our online safety laws. Twitter under the tech owner has become the perfect test case for the UK’s new legislation — but critics say more needs to be done
  • Elon’s politics: how Musk became a driver of elections misinformation. X owner, who will interview Trump on Monday, has cast doubt on mail ballots and spread false claims
  • Don’t pivot into AI research. In AI and machine learning, scale is now the primary factor influencing performance. Due to the significant cash needed, only a small number of suppliers can hire fruitful machine-learning researchers, resulting in market consolidation. The historical consolidation in chip design is reflected in this dynamic, which points to a potential future decline in the status and pay of machine learning positions when supply exceeds demand. In light of these industry changes, prospective ML professionals should carefully consider why they want to pursue a career in ML.
  • OpenAI Generates More Turmoil. Just two of the eleven founding members of OpenAI are still in the company, indicating a high rate of turnover among the group as worries about the organization’s move from its original non-profit goals to a more profit-driven structure mount. Co-founders Greg Brockman, who is taking a sabbatical, and Ilya Sutskever have also quit amid rumors of burnout and lucrative side benefits. The company faces difficulties since it could need to find a new significant financial partner and because it expects GPT-5 to come later than expected while the industry debates the benefits of “open” vs “closed” AI models.
  • Klarna’s AI chatbot: how revolutionary is it, really? By integrating an AI chatbot created using OpenAI, Klarna may be able to cut down on the amount of support people it need because of its notable efficiency in customer service duties. In 23 markets and more than 35 languages, the bot responds quickly to standard Level 1 support inquiries; however, it refers more complicated problems to human agents. The system reduces expenses and expedites first-level help, but compared to earlier L1 support automation, its revolutionary influence inside the business environment is questionable.
  • Why I bet on DSPy. An open-source program called DSPy may coordinate several LLM calls to solve practical issues. The framework is being updated to solve current issues with accessibility and reliability, with a focus on verified input for outcome measurement. Even with restricted reasoning powers, LLMs can function well as creative engines in the DSPy framework.
  • LinkedIn is a mess. Here’s how to fix it. The networking site one is calling a ‘cesspool’ is riddled with oversharing and lunatics — it’s time for change
  • Silicon Valley is cheerleading the prospect of human–AI hybrids — we should be worried. A pseudo-religion dressed up as technoscience promises human transcendence at the cost of extinction.
  • TechScape: Why Musk’s rabble-rousing shows the limits of social media laws. Twitter under the tech owner has become the perfect test case for the UK’s new legislation — but critics say more needs to be done
  • America & China’s Chip Race. The United States is implementing robust policies to enhance domestic semiconductor production using the CHIPS Act and sanctions designed to impede China’s technological progress. China’s semiconductor industry is booming despite these efforts, with near-record imports of manufacturing equipment and rising domestic chip production. This growing competition points to an ongoing geopolitical tug-of-war over the supremacy of the semiconductor supply chain.
  • Gas pipeline players in talks to fuel AI data center demand. As the power demands of the AI industry rise, pipeline companies such as Energy Transfer LP and Williams Companies are in talks to feed natural gas directly to data centers.
  • Does AI Deserve A Seat At The Boardroom Table? Leaders are being compelled to create strong AI strategies for data-driven decision-making as a result of AI’s integration with corporate governance. Even though AI provides insightful information, particularly when used with LLMs, there are still issues, such as competence gaps and moral dilemmas. AI and human judgment must be properly balanced to support future C-suite decision-making.
  • Self-Driving Cars Are Still The Best Way To Solve The Biggest Problem With Driving In America. Robocars promise to improve traffic even when most of the cars around them are driven by people, study finds
  • Brands should avoid AI. It’s turning off customers. According to a recent study, consumers’ desire to buy may be lowered when things are labeled as “AI-powered” because of mistrust and anxiety about the unknown. People are skeptical about AI’s inner workings and threats, particularly about personal data protection, according to the research, which implies that both cognitive and emotional trust are important. It is suggested that instead of utilizing “AI” as a buzzword, businesses concentrate on communicating the advantages of AI.
  • 14% of PCs shipped globally in Q2 2024 were AI-capable. In Q2 2024, shipments of AI-capable PCs increased significantly to 8.8 million units or 14% of all PCs supplied.
  • Brain implants to treat epilepsy, arthritis, or even incontinence? They may be closer than you think. Startups around the world are engaging in clinical trials in a sector that could change lives — and be worth more than £15bn by the 2030s

Meme of the week

What do you think about it? Some news that captured your attention? Let me know in the comments

If you have found this interesting:

You can look for my other articles, and you can also connect or reach me on LinkedIn. Check this repository containing weekly updated ML & AI news. I am open to collaborations and projects and you can reach me on LinkedIn. You can also subscribe for free to get notified when I publish a new story.

Here is the link to my GitHub repository, where I am collecting code and many resources related to machine learning, artificial intelligence, and more.

or you may be interested in one of my recent articles:

--

--

Salvatore Raieli

Senior data scientist | about science, machine learning, and AI. Top writer in Artificial Intelligence