WEEKLY AI NEWS: RESEARCH, NEWS, RESOURCES, AND PERSPECTIVES

AI & ML news: Week Week 30 September — 6 October

Gov. Newsom vetoes California’s AI bill, OpenAI to remove non-profit control, Tesla Full Self Driving challenges, and much more

Salvatore Raieli
17 min readOct 13, 2024
Photo by Jorge Gardner on Unsplash

The most interesting news, repository, articles, and resources of the week

Check and star this repository where the news will be collected and indexed:

You will find the news first in GitHub. All the Weekly News stories are also collected here:

Weekly AI and ML news - each week the best of the field

49 stories

Research

  • PGN: The RNN’s New Successor is Effective for Long-Range Time Series Forecasting. The Parallel Gated Network (PGN) is an innovative architecture developed to address the challenges that traditional RNNs face in managing long-term dependencies. By shortening the information propagation path and incorporating gated mechanisms, it efficiently captures both past and present time step data.
  • Taming Diffusion Prior for Image Super-Resolution with Domain Shift SDEs. DoSSR is a diffusion-based super-resolution model that improves both performance and efficiency by utilizing pretrained diffusion models and initiating the process with low-resolution images. This approach accelerates the super-resolution process while maintaining high-quality results.
  • MaskLLM: Learnable Semi-Structured Sparsity for Large Language Models. MaskLLM is a pruning technique designed to decrease the computational load of large language models by introducing learnable sparsity. This method optimizes performance while maintaining model efficiency by selectively reducing the number of active parameters.
  • Law of the Weakest Link: Cross Capabilities of Large Language Models. This project emphasizes the importance of evaluating large language models (LLMs) based on their combined abilities rather than focusing solely on individual skills. While most models are trained on specialized datasets that target specific capabilities, real-world tasks frequently demand a blend of expertise across different areas, known as cross-capabilities. This approach ensures that models are better suited to handle complex, multifaceted challenges.
  • Scaling Optimal LR Across Token Horizon. This paper investigates how to adjust the learning rate as the amount of training data increases for a model. While LLaMA applied an exponential scaling factor of -0.28, the paper proposes using an exponential scaling factor of -0.33 for improved performance during training with larger datasets.
  • Knowledge Graph Embedding by Normalizing Flows. This paper presents a novel approach to knowledge graph embedding by leveraging group theory to incorporate uncertainty into the process. This method allows for more nuanced and flexible representations of relationships within knowledge graphs, enhancing the model’s ability to handle uncertain or ambiguous information.
  • How AI is improving simulations with smarter sampling techniques. MIT CSAIL researchers created an AI-powered method for low-discrepancy sampling, which uniformly distributes data points to boost simulation accuracy.

News

  • Apple not investing in OpenAI after all, new report says. Apple is no longer planning to invest in OpenAI, according to a new report from The Wall Street Journal. This comes as OpenAI plans to close a $6.5 billion funding round next week, with investments possible from both Microsoft and Nvidia.
  • Arcade AI raises 17M to transform commerce. Arcade AI, a generative product company that launched this week, has announced securing funding from prominent investors as it aims to develop its “prompt to product” system. This system enables the immediate creation of products that are ready for purchase, streamlining the process from concept to consumer.
  • They stole my voice with AI. Elecrow is suspected of using AI to clone a voice for promotional videos without consent.
  • Amazon-backed Anthropic in talks to raise money at $40B valuation: report. Anthropic, a generative AI startup backed by Amazon and other major tech companies, is in discussions to raise additional funding that could potentially value the company at $40 billion.
  • OpenAI Reportedly Slated for $500 Million SoftBank Investment. SoftBank is planning to invest $500 million in OpenAI’s latest funding round, which could raise OpenAI’s valuation to as high as $150 billion. Microsoft is also participating in this round, highlighting OpenAI’s rapid 1,700% revenue growth, despite the company anticipating losses of around $5 billion.
  • OpenAI Is Growing Fast and Burning Through Piles of Money. As the company looks for more outside investors, documents reviewed by The New York Times show consumer fascination with ChatGPT and a serious need for more cash.
  • Altman reportedly asks Biden to back a slew of multi-gigawatt-scale AI datacenters. OpenAI CEO Sam Altman is calling on the Biden administration to establish AI data centers in the US that could consume up to five gigawatts of power, aiming to maintain US technological leadership over China. The proposal includes building several large-scale data centers across the country. Meanwhile, other tech giants, such as Microsoft and Amazon, are securing nuclear power deals to support their growing AI operations.
  • Samsung’s Galaxy Tab S10 Ultra and Galaxy Tab S10+ are tablets built for AI. Samsung is once again expanding its tablet lineup, and this time, the company is doing so with AI at the forefront. Today, Samsung revealed the Galaxy Tab S10 series, two models that it says are “built with AI enhancements available right out of the box.”
  • Tesla Full Self Driving requires human intervention every 13 miles. It gave pedestrians room but ran red lights and crossed into oncoming traffic.
  • OpenAI Dev Day 2024. OpenAI’s Dev Day 2024 featured several exciting announcements, including the introduction of vision model fine-tuning, a real-time API, prompt caching for faster responses, and model distillation for more efficient deployment of large models. These advancements aim to enhance the capabilities and performance of AI applications across various domains.
  • Pika 1.5. Pika has released version 1.5 with more realistic movement, big screen shots, and Pikaffects.
  • Gov. Newsom vetoes California’s controversial AI bill, SB 1047. Governor Gavin Newsom has vetoed SB 1047, a proposed bill intended to regulate AI development and enforce safety protocols for high-cost models. Newsom expressed concerns that the bill’s broad application to all large, computation-heavy models was not the most effective method for regulating AI. However, he reaffirmed his commitment to AI safety by signing several other AI-related bills and consulting with experts to ensure thoughtful regulation in the future.
  • OpenAI to remove non-profit control and give Sam Altman equity, sources say. ChatGPT-maker OpenAI is working on a plan to restructure its core business into a for-profit benefit corporation that will no longer be controlled by its non-profit board, people familiar with the matter told Reuters, in a move that will make the company more attractive to investors.
  • OpenAI’s latest funding. OpenAI has secured $6.6 billion in new funding, bringing its post-money valuation to $157 billion. Notable investors in this round include Microsoft and Nvidia, with the funds aimed at further scaling AI development and innovation.
  • Google adds a multi-functional quick insert key and new AI features to Chromebook Plus. Google is announcing new Chromebook models today with Samsung and Lenovo. With Samsung’s Galaxy Chromebook Plus model in particular, the company is also introducing a new multifunctional quick insert key. But Google doesn’t want to leave existing Chromebook users behind as it added new AI-powered features for existing devices.
  • Brain-like Computers Tackle the Extreme Edge. Start-up BrainChip announces a new chip design for a milliwatt-level AI inference
  • AI Can Best Google’s Bot Detection System, Swiss Researchers Find. Researchers from ETH Zurich used advanced machine learning to solve 100% of Google’s reCAPTCHAv2, designed to distinguish humans from bots.
  • OpenAI Training Data to Be Inspected in Authors’ Copyright Cases. At a secure room in its San Francisco office, representatives for authors suing OpenAI will examine materials that were used to train its AI system. They allege copyrighted works were utilized without their consent or compensation.
  • ByteDance will reportedly use Huawei chips to train a new AI model. US export restrictions are preventing ByteDance from using NVIDIA chips.
  • Announcing FLUX1.1 [pro] and the BFL API. FLUX1.1 [pro] has been released, offering six times faster generation speeds compared to its predecessor, alongside enhanced image quality and overall performance. The new beta BFL API introduces advanced customization options and competitive pricing, making it easier for developers to integrate FLUX’s capabilities. FLUX1.1 [pro] will be available across multiple platforms, providing greater scalability and efficiency for users and developers alike.
  • OpenAI launches new ‘Canvas’ ChatGPT interface tailored to writing and coding projects. OpenAI introduced a new way to interact with ChatGPT on Thursday: an interface it calls “canvas.” The product opens a separate window, beside the normal chat window, with a workspace for writing and coding projects. Users can generate writing or code directly in the canvas, then highlight sections of the work to have the model edit. Canvas is rolling out in beta to ChatGPT Plus and Teams users on Thursday, and Enterprise and Edu users next week.
  • Anthropic hires OpenAI co-founder Durk Kingma. Durk Kingma, one of the lesser-known co-founders of OpenAI, today announced that he’ll be joining Anthropic.
  • OpenAI unveils easy voice assistant creation at 2024 developer event. Altman steps back from the keynote limelight and lets four major API additions do the talking.

Resources

  • 🚀 FlowTurbo. FlowTurbo is a method developed to accelerate the sampling process in flow-based models while maintaining high-quality outputs. It achieves faster results without compromising the precision or performance of the model.
  • Transformer4SED. This repository presents the Prototype-based Masked Audio Model, which enhances sound event detection by leveraging unlabeled data more effectively. The method generates pseudo labels through a Gaussian mixture model, which directs the training of a Transformer-based audio model for improved performance.
  • VPTQ: Extreme Low-bit Vector Post-Training Quantization for Large Language Models. Vector Post-Training Quantization is a technique aimed at enabling ultra-low-bit quantization for large language models, optimizing memory and storage efficiency during deployment without significantly compromising performance.
  • LightAvatar: Efficient Head Avatar as Dynamic NeLF. LightAvatar is a head avatar model that improves rendering speed and efficiency using neural light fields (NeLFs).
  • Separating code reasoning and editing. Aider has significantly enhanced the performance of general-purpose code editing by employing o1 as the architect and DeepSeek as the writer. This collaboration streamlines the process, leading to more efficient and accurate code generation.
  • Heralax/Mistrilitary-7b. This model was trained using army handbooks and incorporates deep, specialized knowledge that is uncommon in fine-tuned models. This unique training approach allows it to possess a rare level of expertise in military-related tasks and information.
  • Developing a go bot embedding ichiban Prolog. Ichiban Prolog was integrated into Hellabot, a Go-based IRC bot, to eliminate the need for recompiling when adding new triggers. This integration enables dynamic Prolog code execution, allowing users to adjust the bot’s logic in real time. Future enhancements could focus on minimizing interpreter setup overhead and shifting more of the bot’s logic into Prolog for greater flexibility and efficiency.
  • Emu 3 open early fusion multimodal model. Emu 3 is a next-token prediction model that surpasses SDXL in image synthesis, LlaVa-1.6 in image understanding, and OpenSora 2 in video generation. With 9 billion parameters, Emu 3 is trained on these tasks in an interleaved manner, similar to Gemini, making it highly versatile and effective across multiple domains.
  • LOTUS: Diffusion-based Visual Foundation Model for High-quality Dense Prediction. Using pretrained diffusion models for tasks like depth estimation has become highly popular and effective. This work demonstrates how certain previous methods contained minor inaccuracies and presents improvements that not only boost performance but also significantly simplify the overall modeling process.
  • Revisit Anything: Visual Place Recognition via Image Segment Retrieval. SegVLAD is a method for visual place recognition that emphasizes the analysis of image segments instead of relying on entire images. This approach enhances recognition accuracy by focusing on distinctive parts of the scene, making it more robust in various environments.
  • LeanRL — Turbo-implementations of CleanRL scripts. LeanRL is a lightweight library consisting of single-file, pytorch-based implementations of popular Reinforcement Learning (RL) algorithms. The primary goal of this library is to inform the RL PyTorch user base of optimization tricks to cut training time by half or more.
  • E.T. Bench: Towards Open-Ended Event-Level Video-Language Understanding. E.T. Bench is a newly developed benchmark created to assess the performance of video language models on fine-grained, event-level tasks. Unlike earlier benchmarks that emphasize video-level questions, E.T. Bench spans a variety of time-sensitive tasks across multiple domains, providing a more detailed evaluation of model capabilities.
  • MM1.5: Methods, Analysis & Insights from Multimodal LLM Fine-tuning. Apple is continuing to strengthen its in-house AI capabilities by developing a robust multimodal foundation model. This initiative is part of Apple’s broader efforts to integrate advanced AI technologies across its ecosystem, supporting tasks that span text, image, and other data modalities for enhanced user experiences.
  • The Perfect Blend: Redefining RLHF with Mixture of Judges. Meta has introduced an impressive new paper detailing the use of a mixture of judges models to effectively conduct multi-task reinforcement learning with human feedback (RLHF) during post-training. This approach significantly enhances the final performance of models across various benchmarks, demonstrating superior results compared to previous methods.
  • A Survey on the Honesty of Large Language Models. This survey explores the honesty of large language models (LLMs), a crucial aspect in aligning AI with human values. It addresses challenges such as models confidently providing incorrect answers and the difficulty in distinguishing between what the model knows and what it doesn’t. The review highlights these obstacles as key areas for improving the reliability and trustworthiness of LLMs.
  • LexEval: A Comprehensive Benchmark for Evaluating Large Language Models in Legal Domain. LexEval is a benchmark created to evaluate large language models (LLMs) specifically in the legal domain. Recognizing the critical need for accuracy, reliability, and fairness in legal applications, LexEval provides a framework for assessing the strengths and limitations of LLMs when applied to legal tasks, ensuring they meet the rigorous demands of the field.
  • Perceptual Compression (PerCo). PerCo (SD) is a novel perceptual image compression technique built on Stable Diffusion v2.1, specifically designed for ultra-low bit ranges. This method leverages the power of diffusion models to achieve high-quality image compression at significantly reduced bitrates, optimizing storage and transmission without sacrificing visual fidelity.
  • nvidia/NVLM-D-72B. Nvidia conducted a thorough ablation study on various methods of incorporating images into a language model. The results showed that the LlaVa concatenation approach outperformed the other methods, proving to be the most effective for integrating visual information into language models.
  • ProFD: Prompt-Guided Feature Disentangling for Occluded Person Re-Identification. This paper introduces a new method called Prompt-guided Feature Disentangling (ProFD) to tackle occlusion challenges in person Re-Identification (ReID) tasks. ProFD helps separate relevant features from occluded or irrelevant ones, improving the accuracy and robustness of ReID models when identifying individuals in complex or obstructed environments.
  • Local File Organizer: AI File Management Run Entirely on Your Device, Privacy Assured. This tool utilizes Llama 3.2 3B and Llava-1.6 to intelligently organize files on your computer into logical sections based on their content. By analyzing the data within the files, it categorizes and arranges them for easier navigation and more efficient file management.
  • RouterDC: Query-Based Router by Dual Contrastive Learning for Assembling Large Language Models. RouterDC is an innovative method designed to enhance collaboration between multiple large language models (LLMs) through query-based routing. It utilizes contrastive learning to determine the most suitable model for each query, leading to improved performance compared to existing routing techniques. This approach optimizes model selection, ensuring more accurate and efficient responses.
  • Distributed Training of Deep Learning models . This post provides an excellent introduction to the challenges and algorithms involved in distributed training for modern deep-learning models. It explores the difficulties and bottlenecks of training models that are too large for a single GPU, including issues like communication overhead, synchronization, and memory limitations, while also discussing key techniques to overcome these obstacles.
  • ComfyGen: Prompt-Adaptive Workflows for Text-to-Image Generation. Instead of directly generating an image from a prompt, the authors created a workflow using a comfy UI node-based system to guide the image generation process. This approach significantly enhanced the final output quality, allowing for greater control and precision in the generation pipeline.
  • KnobGen. KnobGen is a new framework developed to make sketch-based image generation more accessible to users of varying skill levels. By offering intuitive controls and simplified tools, KnobGen allows users to generate high-quality images from sketches, regardless of their artistic expertise.
  • Tiny Test Models. AI researcher Ross Wightman has released a collection of models trained on ImageNet-1k that are remarkably small, with fewer than 1 million parameters. Despite their compact size, these models perform reasonably well and are designed to be easy to fine-tune, making them highly accessible for various applications where model efficiency is critical.
  • entropix. Entropy-based sampling and parallel Chain of Thought (CoT) decoding are promising strategies for advancing reasoning models to match
  • Concordia. DeepMind’s Concordia repository enables the simulation of social interactions between individuals and groups at a reasonable scale. This platform allows researchers to model complex social behaviors, study group dynamics, and explore various interaction scenarios in a controlled, scalable environment.

Perspectives

  • The Intelligence Age. AI is set to enhance human abilities, empowering us to accomplish tasks that are currently beyond imagination. With the help of deep learning and more powerful computational tools, AI will drive innovations such as personalized assistants, learning tutors, and healthcare advisors. The emphasis should be on ensuring AI is widely accessible while addressing its potential risks, creating a path toward shared prosperity in the era of intelligent systems.
  • How AlphaChip transformed computer chip design. AlphaChip is a reinforcement learning model that dramatically speeds up and improves chip design, creating layouts that surpass human capabilities. It produces optimized chip designs, such as those used in Google’s TPUs, in just hours instead of weeks. This AI-powered approach has wide-ranging applications, benefiting not only Google’s hardware but also external companies like MediaTek.
  • AI pareidolia: Can machines spot faces in inanimate objects? New dataset of “illusory” faces reveals differences between human and algorithmic face detection, links to animal face recognition, and a formula predicting where people most often perceive faces.
  • Table Extraction using LLMs: Unlocking Structured Data from Documents. This article discusses how large language models (LLMs) are transforming table extraction from complex documents, surpassing the limitations of traditional methods such as OCR, rule-based systems, and machine learning. LLMs offer greater flexibility and contextual comprehension, significantly improving accuracy in handling varied and intricate table structures. While challenges like hallucination and high computational demands remain, the integration of traditional techniques with LLMs currently provides the most effective solution for automated table extraction.
  • The Other Bubble. Microsoft considered diverting its US-based server power to GPUs for AI purposes but ultimately abandoned the idea. Major tech companies like Microsoft, Google, and Amazon are making significant investments in AI, yet they continue to see underwhelming returns from generative AI applications. The industry’s reliance on SaaS and the integration of AI tools, which frequently offer limited practical value while incurring substantial costs, underscores an increasing urgency to sustain growth in a slowing market.
  • AI’s Privilege Expansion. AI is quickly broadening access to services that were once expensive and difficult to obtain, such as education, healthcare, and personal styling. Generative AI models like ChatGPT offer affordable, personalized support by acting as tutors, healthcare advisors, and stylists, reducing the need for costly human professionals. This transformation democratizes access to high-end services, making them more widely available to the general public at a significantly lower cost.
  • Behind OpenAI’s Audacious Plan to Make A.I. Flow Like Electricity. OpenAI CEO Sam Altman has proposed a global initiative to construct data centers and chip factories to drive advanced AI development. While Altman initially aimed for trillions in funding, he has now scaled back to targeting hundreds of billions. The plan envisions partnerships with global tech giants and governments, though it faces significant regulatory and logistical hurdles. Despite early skepticism, ongoing discussions suggest potential expansions across the US, Europe, and Asia to significantly increase computing power for AI advancements.
  • Devs gaining little (if anything) from AI coding assistants. Code analysis firm sees no major benefits from AI dev tool when measuring key programming metrics, though others report incremental gains from coding copilots with emphasis on code review.
  • Negligence Liability for AI Developers. This article advocates for a negligence-based approach to AI accountability, emphasizing the human factors and responsibilities behind AI systems. It critiques existing regulatory frameworks for neglecting the role of AI developers and highlights California’s AI safety bill as a promising example. The article also delves into the complexities of defining “reasonable care” in AI development and the potential consequences of classifying AI developers as professionals, raising important questions about the standards and obligations they should meet.
  • I am tired of AI. The author expresses frustration with the widespread marketing and overuse of AI, especially in fields like software testing and conference proposals. They argue that AI tools often prioritize speed at the expense of quality and fail to offer the unique insights that come from human-generated work. While acknowledging some useful applications of AI, the author criticizes the increasing amount of mediocre AI-produced content, seeing it as a detriment to innovation and depth in these areas.
  • The Four Short Term Winners of AI. The global AI arms race is primarily driven by Big Tech companies, chipmakers such as NVIDIA, intellectual property lawyers, and the Big 4 consulting firms. These key players are competing to secure technological dominance, resources, and expertise in AI development, shaping the future of the industry through their influence and innovations.
  • The Art of the OpenAI Deal. OpenAI’s revenue soared to $300 million in August, with the company forecasting $3.7 billion in annual sales for this year and $11.6 billion for next year. However, it is facing a $5 billion annual loss. This rapid growth has been driven primarily by the widespread success of ChatGPT, which generates the majority of its revenue. Despite this momentum, OpenAI is actively seeking additional investors to cover its high operational costs and work towards becoming a profitable enterprise.
  • What comes after? California Governor Gavin Newsom has vetoed SB 1047, a bill aimed at regulating large AI models. He stressed the importance of creating evidence-based regulations and cautioned that overly restrictive rules could hinder innovation. Instead, Newsom plans to collaborate with experts, including Dr. Fei-Fei Li, to develop empirical, science-driven guidelines that balance safety and progress in AI development.
  • Sorry, GenAI is NOT going to 10x computer programming. Recent studies indicate that generative AI has not yet delivered the expected 10x improvement in coding productivity. While AI tools can assist with code generation and streamline certain tasks, the overall productivity gains have been more modest than initially projected, with challenges such as integration, context understanding, and debugging limiting the full potential of these technologies in real-world coding environments.

Meme of the week

What do you think about it? Some news that captured your attention? Let me know in the comments

If you have found this interesting:

You can look for my other articles, and you can also connect or reach me on LinkedIn. Check this repository containing weekly updated ML & AI news. I am open to collaborations and projects and you can reach me on LinkedIn. You can also subscribe for free to get notified when I publish a new story.

Here is the link to my GitHub repository, where I am collecting code and many resources related to machine learning, artificial intelligence, and more.

or you may be interested in one of my recent articles:

--

--

Salvatore Raieli
Salvatore Raieli

Written by Salvatore Raieli

Senior data scientist | about science, machine learning, and AI. Top writer in Artificial Intelligence

No responses yet