WEEKLY AI NEWS: RESEARCH, NEWS, RESOURCES, AND PERSPECTIVES

AI & ML news: Week 3–9 June

NVIDIA is valued 3 Trillion, Antrhopic and OpenAI interpreting LLMs, and much more

Salvatore Raieli
20 min readJun 10, 2024
AI & ML news
Photo by Amanna Avena on Unsplash

The most interesting news, repository, articles, and resources of the week

Check and star this repository where the news will be collected and indexed:

You will find the news first in GitHub. Single posts are also collected here:

Weekly AI and ML news - each week the best of the field

44 stories

Research

  • Contextual Position Encoding: Learning to Count What’s Important. The general position encoding method can attend to the i-th particular word, noun, or sentence; it improves perplexity on language modeling and coding tasks; it is context-dependent and can represent different levels of position abstraction; it suggests a new position encoding method, CoPE, to enable the position to be conditioned on context by incrementing position only on certain tokens.
  • Faithful Logical Reasoning via Symbolic Chain-of-Thought. suggests a way to enhance LLMs’ capacity for logical thinking by combining logical rules and symbolic expressions with chain-of-thought (CoT) prompting; this prompting method is known as Symbolic Chain-of-Thought and it is a fully LLM-based framework that consists of the following important steps: converts the context of natural language to symbolic format, 2) creates a step-by-step solution plan based on symbolic logical rules, and 3) employs a verifier to validate the translation and reasoning chain.
AI & ML news
https://arxiv.org/pdf/2405.18719
  • Transformers Can Do Arithmetic with the Right Embeddings. The main problem this work addresses is the inability of transformers to track the exact position of digits; they do this by adding an embedding to each digit that encodes its position relative to the start of the number; these gains also transfer to multi-step reasoning tasks that include sorting and multiplication. achieves 99% accuracy on 100-digit addition problems by training on only 20-digit numbers with a single GPU.
  • GNN-RAG: Graph Neural Retrieval for Large Language Model Reasoning. blends the reasoning powers of GNNs with the language understanding skills of LLMs in a RAG fashion; the GNN extracts relevant and useful graph information, and the LLM uses the information to answer questions over knowledge graphs (KGQA); GNN-RAG outperforms or matches GPT-4 performance with a 7B tuned LLM, and improves vanilla LLMs on KGQA.
AI & ML news
https://www.therobotreport.com/openai-is-restarting-its-robotics-research-group/
  • Attention as an RNN. is based on the parallel prefix scan algorithm, which enables efficient computation of attention’s many-to-many RNN output. It achieves comparable performance to Transformers on 38 datasets while being more time and memory-efficient. presents a new attention mechanism that can be trained in parallel (like Transformers) and updated with new tokens requiring constant memory usage for inferences (like RNNs).
  • Are Long-LLMs A Necessity For Long-Context Tasks? suggests a reasoning framework to allow short-LLMs to handle long-context tasks by adaptively accessing and utilizing the context based on the tasks presented; it breaks down the long context into short contexts and processes them using a decision-making process. The argument makes the claim that long-LLMs are not necessary to solve long-context tasks.
AI & ML news
https://goombalab.github.io/blog/2024/mamba2-part1-model/
  • Sparse maximal update parameterization: A holistic approach to sparse training dynamics. All frontier model labs use muP, a potent tool, to transfer hyperparameters fine-tuned on tiny models to bigger, more costly training runs. This study investigates how to achieve that for sparse models, resulting in significantly better training results and lower computation expenses.
  • Exploring Color Invariance through Image-Level Ensemble Learning. To address color bias in computer vision, researchers have created a novel learning technique called Random Color Erasing. By selectively excluding color information from training data, this technique strikes a balance between the significance of color and other parameters, producing models that perform better in challenging situations like industrial and wide-area surveillance.
AI & ML news
https://arxiv.org/pdf/2405.18357
AI & ML news
https://www.theverge.com/2024/6/3/24170567/amazons-project-pi-product-defect-return-ai-computer-vision
AI & ML news
https://arxiv.org/pdf/2405.17399
  • Tree Diffusion: Diffusion Models For Code. Wonderful diffusion paper that diffuses picture code. As part of the diffusion process, it has the ability to directly edit. Although it is sluggish, it can be simply used with search to significantly increase one’s capacity for reasoning.
  • Improved Techniques for Optimization-Based Jailbreaking on Large Language Models. Expanding upon the Greedy Coordinate Gradient (GCG) approach, researchers have enhanced methods for optimization-based jailbreaking of huge language models.
  • ZeroSmooth: Training-free Diffuser Adaptation for High Frame Rate Video Generation. A training-free video interpolation technique for generative video diffusion models has been developed by researchers. This novel method improves frame rates without requiring a lot of training or big datasets and works with different models.
  • A whole-slide foundation model for digital pathology from real-world data. Prov-GigaPath, a whole-slide pathology foundation model pre-trained on 1.3 billion 256 × 256 pathology image tiles in 171,189 whole slides. To train Prov-GigaPath, we propose GigaPath, a novel vision transformer architecture for pretraining gigapixel pathology slides. We further demonstrate the potential of Prov-GigaPath on vision–language pretraining for pathology by incorporating the pathology reports. In sum, Prov-GigaPath is an open-weight foundation model that achieves state-of-the-art performance on various digital pathology tasks, demonstrating the importance of real-world data and whole-slide modeling.
AI & ML news
https://techcrunch.com/2024/06/05/cartwheel-generates-3d-animations-from-scratch-to-power-up-creators/
AI & ML news
https://arxiv.org/pdf/2405.20139

News

  • OpenAI Is Restarting Its Robotics Research Group. The San Francisco-based company has been a pioneer in generative artificial intelligence and is returning to robotics after a three-year break.
  • AI Overviews: About last week. In order to improve search results and give users more precise and pertinent information, particularly for complex inquiries, Google created AI Overviews. While there were certain problems, such as incorrect results, and misread content, Google has fixed these difficulties with over a dozen technical updates, like improving the identification of absurd questions and reducing the amount of user-generated content in AI Overviews.
AI & ML news
https://arxiv.org/pdf/2405.13956
  • Nvidia is said to be prepping an AI PC chip with Arm and Blackwell cores. Competition could be heating up in the Windows on Arm space amid talk in the industry that Nvidia is readying a chip pairing next-gen Arm cores with its Blackwell GPU architecture.
  • Ex-OpenAI board member reveals what led to Sam Altman’s brief ousting . In a recent interview, former OpenAI board member Helen Toner provided fresh information into the circumstances surrounding CEO Sam Altman’s November dismissal. It appears that the board was informed via Twitter about the release of ChatGPT. According to Toner, Altman had repeatedly lied to the board. It has been alleged that Altman had been lying about events within the organization for years and hiding facts. The board found it difficult to make decisions as a result of his lies, and they concluded that he wasn’t the best person to take the firm to AGI.
AI & ML news
https://mistral.ai/news/customization/
AI & ML news
https://arxiv.org/pdf/2405.15318
AI & ML news
https://research.google/blog/heuristics-on-the-high-seas-mathematical-optimization-for-cargo-ships/
AI & ML news
https://arxiv.org/pdf/2401.10512v1
AI & ML news
https://openai.com/index/securing-research-infrastructure-for-advanced-ai/
AI & ML news
https://arxiv.org/pdf/2405.20853
  • Extracting Concepts from GPT-4. The team at OpenAI has discovered 16 million interpretable features in GPT-4 including price increases, algebraic rings, and who/what correspondence. This is a great step forward for SAE interpretability at scale. They shared the code in a companion GitHub repository.
  • Mesop: Gradio Competition. A rival to the well-liked AI prototyping framework Gradio has been made available by Google. Gradio is more mature than Mesop, which is pure Python and slightly more composable.
  • Nvidia is now more valuable than Apple at $3.01 trillion. The AI boom has pushed Nvidia’s market cap high enough to make it the second most valuable company in the world.

Resources

  • An Introduction to Vision-Language Modeling. we present this introduction to VLMs which we hope will help anyone who would like to enter the field. First, we introduce what VLMs are, how they work, and how to train them.
  • Aya 23: Open Weight Releases to Further Multilingual Progress. a family of multilingual language models with up to 23 languages supported; it demonstrates that it can perform better on those particular languages than other large-scale multimodal models by purposefully concentrating on fewer languages and allocating greater capacity to them.
  • Financial Statement Analysis with Large Language Models claims that by analyzing trends and financial ratios, LLMs can produce insightful insights; demonstrate that GPT-4 outperforms more specialized models; and develop a profitable trading strategy based on GPT’s predictions.
AI & ML news
https://github.com/sunzc-sunny/ppad
  • SimPO: Simple Preference Optimization with a Reference-Free Reward. SimPO demonstrates how it outperforms other methods like DPO and claims to generate the strongest 8B open-source model. It is a more straightforward and efficient method for preference optimization with a reference-free reward; it uses the average log probability of a sequence as an implicit reward (i.e., no reference model required), which makes it more compute and memory efficient.
  • Experimenting with local alt text generation. A model that runs in the browser and can provide alt text for web photos automatically has been trained by Mozilla.
  • Mora: More like Sora for Generalist Video Generation. Mora is a multi-agent framework designed to facilitate generalist video generation tasks, leveraging a collaborative approach with multiple visual agents. It aims to replicate and extend the capabilities of OpenAI’s Sora.
AI & ML news
https://openai.com/index/extracting-concepts-from-gpt-4/
  • FABRIC: Personalizing Diffusion Models with Iterative Feedback.FABRIC (Feedback via Attention-Based Reference Image Conditioning) is a technique to incorporate iterative feedback into the generative process of diffusion models based on StableDiffusion.
  • KL is All You Need. KL divergence is a quick, affordable, and effective method of measuring a certain type of distance between objects. In both conventional and contemporary AI, it is widely employed. This piece examines the potent idea both mathematically and graphically.7
  • Ways AI-Native Companies Can Improve User Retention. a manual with examples of how businesses like Perplexity, Civit, Lapse, Omnivore, and others are using them to increase retention for founders and product executives.
AI & ML news
https://tree-diffusion.github.io/
  • FineWeb: decanting the web for the finest text data at scale. The performance of a large language model (LLM) depends heavily on the quality and size of its pretraining dataset. Recently, we released 🍷 FineWeb, a new, large-scale (15-trillion tokens, 44TB disk space) dataset for LLM pretraining. FineWeb is derived from 96 CommonCrawl snapshots and produces better-performing LLMs than other open pretraining datasets.
  • An entirely open-source AI code assistant inside your editor. Continue enables you to easily create your own coding assistant directly inside Visual Studio Code and JetBrains with open-source LLMs. All this can run entirely on your own laptop or have Ollama deployed on a server to remotely power code completion and chat experiences based on your needs.
  • MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding Benchmark. A popular benchmark for reasoning tasks is MMLU. It is frequently seen as the gold standard and as something that models overfit. A new, more rigorous, and refined benchmark called MMLU Pro is used to gauge language model reasoning.
  • Omost.Omost gives you control over how your images are generated. It comes from the same designer as ControlNet. First, it rewrites the prompts into a collection of illustrative code. After that, it renders the finished image using that. Crucially, you can modify the code either prior to or following generation in order to subtly alter the model’s output.
  • Control-GIC. A novel generative image compression framework called Control-GIC enables fine-grained bitrate modification while preserving high-quality output.
AI & ML news
https://arxiv.org/pdf/2405.21018v1
  • LLM inference speed of light. Using the theoretical speed of light modeling as grounding is extremely significant for problems where the amount of computation and memory access is known a priori as it helps assess the quality of implementations and predict the impact of architectural modifications.
  • Neural Surface Reconstruction. Without the need for 3D supervision, GenS is an end-to-end generalizable neural surface reconstruction model that performs exceptionally well at reconstructing surfaces from multi-view images.
  • MatMul-Free LM. Even at the billion-parameter scale, researchers have managed to remove matrix multiplication (MatMul) from huge language models without sacrificing speed.stable-audio-open-1.0 .The weights for Stable Audio, which was trained to produce sound effects on audio samples with permissive licenses, have been released by Stability AI.
  • CV-VAE: A Compatible Video VAE for Latent Generative Video Models. With its spatio-temporally compressed latent spaces, CV-VAE is a video VAE that works with current image and video models to efficiently train new ones utilizing pre-trained ones.
https://google.github.io/mesop/
  • Qwen2. Pretrained and instruction-tuned models of 5 sizes, including Qwen2–0.5B, Qwen2–1.5B, Qwen2–7B, Qwen2–57B-A14B, and Qwen2–72B.Having been trained on data in 27 additional languages besides English and Chinese. Having been trained on data in 27 additional languages besides English and Chinese. State-of-the-art performance in a large number of benchmark evaluations
  • Dragonfly: A large vision-language model with multi-resolution zoom. We are also launching two new open-source models Llama-3–8b-Dragonfly-v1 a general-domain model trained on 5.5 million image-instruction pairs and Llama-3–8b-Dragonfly-Med-v1 finetuned on additional 1.4 biomedical image-instruction data. Dragonfly demonstrates promising performance on vision-language benchmarks like commonsense visual QA and image captioning. Dragonfly-Med outperforms prior models, including Med-Gemini on multiple medical imaging tasks, showcasing its capabilities for high-resolution medical data.
  • MMLU Pro. The industry standard for assessing knowledge and reasoning in language models is MMLU.

Perspectives

AI & ML news
https://ssyang2020.github.io/zerosmooth.github.io/
  • A Right to Warn about Advanced Artificial Intelligence. A group of AI workers, both present and past, is pleading with advanced AI companies to adopt values that guarantee openness and safeguard workers who voice concerns about risks. They emphasize how important it is for businesses to refrain from enforcing non-disparagement agreements, to make anonymous reporting procedures easier, to encourage candid criticism, and to shield whistleblowers from reprisals.
  • Will Scaling Solve Robotics? The Conference on Robot Learning, which included 11 workshops and nearly 200 submitted papers, drew over 900 attendees last year. Whether it was possible to tackle robotics problems by training a huge neural network on a large data set was one of the main points of contention throughout the event. To help readers better comprehend the topic, this piece offers the opposing viewpoints. Scaling has been successful in several related domains. It is not feasible, though, because there is a lack of readily available robotics data and no obvious method for obtaining it. Scaling, even if it performs as well as it does in other domains, is probably not going to solve robotics.
  • Plentiful, high-paying jobs in the age of AI.Due to comparative advantage, it’s feasible that a large number of professions that humans currently perform will be performed by humans eternally, regardless of how much better AIs become at those tasks.
AI & ML news
https://zzzyuqing.github.io/dreammat.github.io/
AI & ML news
https://arxiv.org/pdf/2406.02350v1
  • A Grand Unified Theory of the AI Hype Cycle. Over the years, the AI sector has experienced multiple hype cycles, each of which produced really useful technology and outlasted the previous one. Instead of following an exponential process, every cycle adheres to a sigmoid one. There is an inevitable limit to any technology development strategy, and it is not too difficult to find. Although this AI hype cycle is unlike any other that has come before it, it will probably go in the same direction.
  • Hi, AI: Our Thesis is on AI Voice Agents. The current state of AI speech agents is described in a blog post and deck created by Andreessen Horowitz, along with potential areas for advancement and investment. It outlines the present state of the B2B and B2C application layer landscape and covers the top infrastructure stack.

Medium articles

A list of the Medium articles I have read and found the most interesting this week:

Meme of the week

What do you think about it? Some news that captured your attention? Let me know in the comments

If you have found this interesting:

You can look for my other articles, and you can also connect or reach me on LinkedIn. Check this repository containing weekly updated ML & AI news. I am open to collaborations and projects and you can reach me on LinkedIn. You can also subscribe for free to get notified when I publish a new story.

Here is the link to my GitHub repository, where I am collecting code and many resources related to machine learning, artificial intelligence, and more.

or you may be interested in one of my recent articles:

--

--

Salvatore Raieli

Senior data scientist | about science, machine learning, and AI. Top writer in Artificial Intelligence