WEEKLY AI NEWS: RESEARCH, NEWS, RESOURCES, AND PERSPECTIVES

AI & ML news: Week 17- 23 June

Ilya Sutskever creates a new company, NIDIA is the most valuable company and much more

Salvatore Raieli
19 min readJun 24, 2024
Photo by Hayden Walker on Unsplash

The most interesting news, repository, articles, and resources of the week

Check and star this repository where the news will be collected and indexed:

You will find the news first in GitHub. Single posts are also collected here:

Weekly AI and ML news - each week the best of the field

49 stories

Research

  • Discovering Preference Optimization Algorithms with and for Large Language Models. suggests an algorithm that adaptively combines logistic and exponential losses; this approach eliminates the need for human intervention by prompting an LLM to suggest and implement preference optimization loss functions based on previously assessed performance metrics. It also suggests an LLM-driven objective discovery of state-of-the-art preference optimization.
  • SelfGoal: Your Language Agents Already Know How to Achieve High-level Goals. a framework to increase the high-level goal-achieving capabilities of an LLM-based agent; during interaction with the environment, the framework adaptively decomposes a high-level goal into a tree structure of useful subgoals; enhances performance on a variety of tasks, including cooperative, competitive, and deferred feedback environments.
https://arxiv.org/pdf/2406.08414
  • Mixture-of-Agents Enhances Large Language Model Capabilities. a strategy that beats GPT-4o on AlpacaEval 2.0, MT-Bench, and FLASK by utilizing the combined strengths of several LLMs through a Mixture-of-Agents methodology; layers are constructed with numerous LLM agents, and each agent builds on the outputs of other agents in the previous levels.
  • Transformers meet Neural Algorithmic Reasoners. Tokens in the LLM can now cross-attend to node embeddings from a GNN-based neural algorithmic reasoner (NAR) thanks to a new hybrid design; the resulting model, named TransNAR, shows gains in OOD reasoning across algorithmic challenges.
  • Self-Tuning: Instructing LLMs to Acquire New Knowledge through Self-Teaching Effectively. increases an LLM’s capacity to learn new information from raw documents through self-teaching; the process consists of three steps: 1) a self-teaching component that enhances documents with a series of knowledge-intensive tasks emphasizing comprehension, memorization, and self-reflection; 2) the model is configured to continuously learn using only the new documents, aiding in the thorough acquisition of new knowledge; and 3) the deployed model is used to learn new information from new documents while evaluating its QA skills.
  • Visual Sketchpad: Sketching as a Visual Chain of Thought for Multimodal Language Models. a framework that gives a multimodal LLM access to a visual sketchpad and drawing tools; it can give a model, such as GPT-4, the ability to create intermediate sketches to reason over complex tasks; over strong base models without sketching, it performs better on many tasks; on all the tasks tested, GPT-4 equipped with SketchPad sets a new state of the art.
  • Mixture of Memory Experts. claims to enable scaling to a high number of parameters while keeping the inference cost fixed. It suggests a method to significantly reduce hallucination (10x) by tuning millions of expert adapters (e.g., LoRAs) to learn exact facts and retrieve them from an index at inference time. The memory experts are specialized to ensure faithful and factual accuracy on the data it was turned on.
https://arxiv.org/pdf/2406.04784
https://s-sahoo.com/mdlm/
https://www.anthropic.com/news/claude-3-5-sonnet

News

  • Apple to ‘Pay’ OpenAI for ChatGPT Through Distribution, Not Cash. The collaboration between Apple and OpenAI isn’t anticipated to bring in a significant amount of money for either company, at least not right away. Apple is not paying OpenAI as part of the agreement because it feels that integrating OpenAI’s technology and brand into its products is as valuable as or more valuable than financial compensation. The agreement isn’t exclusive; Apple is already talking about providing additional chatbot choices. In the long run, Apple intends to profit from AI by entering into revenue-sharing contracts with AI partners.
  • AI will make money sooner than you’d think, says Cohere CEO Aidan Gomez. Enterprise is the pathway to profit, Gomez says, but maybe don’t ask it to do medicine quite yet.
https://chenwu.io/attack-agent/
  • Fake beauty queens charm judges at the Miss AI pageant. An AI model from Romania named Aiyana Rainbow is a finalist in the first Miss AI pageant, which showcases AI-generated models on social media. The event is a part of “The FanVue World AI Creator Awards,” which is organized by FanVue and highlights the talent of AI creators who can create captivating content without having to be the face of the work. The $5,000 prize package for Miss AI will include mentorship and support from the public relations community. At the end of June, the outcomes will be made public.
  • Elon Musk reconsiders phone project after Apple Intelligence OpenAI integration. Elon Musk threatened to forbid any Apple devices from being used on the properties of his firms in response to Apple integrating OpenAI ChatGPT on a few of its devices.
  • Microsoft’s star AI chief peers into OpenAI’s code, highlighting an unusual rivalry. Primarily, OpenAI was established as a safety net against DeepMind, the AI startup that Google purchased in 2014. However, Mustafa Suleyman, a co-founder of DeepMind, has recently been taking on an unimaginable task: delving into OpenAI’s crown jewels, the proprietary algorithms that power foundation models like GPT-4, according to people familiar with the situation. This is due to the fact that Suleyman is currently Microsoft’s head of AI initiatives. As part of Microsoft’s multibillion-dollar investment in OpenAI, the corporation possesses the intellectual property rights to its software.
https://github.com/mbzuai-oryx/videogpt-plus
https://arxiv.org/pdf/2406.04692
  • DeepMind’s new AI generates soundtracks and dialogue for videos. V2A is an AI system that DeepMind is developing to create synchronized soundtracks for videos. It generates music, sound effects, and dialogue using diffusion models trained on audio, dialogue transcripts, and video clips.
  • Giant Chips Give Supercomputers a Run for Their Money . The California-based business Cerebras has proven in molecular dynamics calculations that their second-generation wafer-scale engine outperforms the fastest supercomputer in the world by a large margin. Additionally, it can infer sparse huge language models with no loss of accuracy at one-third of the energy cost of a complete model. The hardware of Cerebras allows for quick memory access and interconnects, which make both accomplishments possible. Cerebras aims to expand the scope of its wafer-scale engine applications to encompass a broader range of issues, such as airflow models surrounding cars and molecular dynamics simulations of biological processes.
  • Nvidia becomes world’s most valuable company amid AI boom. Chipmaker dethrones Microsoft and Apple as stock market surge boosts valuation above $3.34tn
https://linlin-dev.github.io/project/RSG.html
https://github.com/dorjeduck/llm.mojo
  • Introducing Local III. The open-source local agent, Open Interpreter, has recently received a significant upgrade. It now has the capability to control the computer seamlessly and operates entirely offline and locally.
  • Introducing the Property Graph Index: A Powerful New Way to Build Knowledge Graphs with LLMs. LlamaIndex has launched the Property Graph Index, significantly improving knowledge graph capabilities with enhanced modeling, storage, and querying features. This new index enables flexible graph construction and supports schema-guided, implicit, and free-form entity extraction. It also integrates with vector databases for hybrid searches and offers querying options through keyword expansion, vector similarity, Cypher queries, and custom traversal.
https://arxiv.org/pdf/2406.09308
https://github.com/deepseek-ai/DeepSeek-Coder-V2
  • Apple researchers add 20 more open-source models to improve text and image AI. With 20 Core Machine Learning models that Apple has added to the Hugging Face open-source AI repository, the repository now includes a wider selection of public models with improved image classification and depth segmentation. These donations come after Apple earlier in the year released the four OpenELMs to Hugging Face and the Ferret big language model. The action shows Apple’s dedication to developing AI capabilities and its growing involvement with the AI research community.
  • Factory Raises $15M Series A from Sequoia. Led by Sequoia Capital, Factory has raised $15 million in Series A funding to grow its workforce and improve its Droids software development toolset, which leverages artificial intelligence. Its products are rapidly expanding its customer base and setting new benchmarks on the SWE-bench AI coding benchmark. With Factory, software engineering will be increasingly automated, cutting down on laborious processes and speeding up development cycles.
https://arxiv.org/pdf/2406.07394

Resources

  • Nemotron-4 340B. offers a reward model to filter data based on many qualities and an instruct model to generate high-quality data; exhibits impressive results on widely-used benchmarks such as MMLU and GSM8K; It competes with GPT-4 in a number of activities, such as scoring highly in multi-turn chat; Together with the base model, a preference data is also made available.
  • Bag of Tricks: Benchmarking of Jailbreak Attacks on LLMs. Determining ways to incorporate search into language model creation is now the Holy Grail of study. This work is quite encouraging as it demonstrates that on math performance, tiny models with search can match considerably more powerful models.
https://arxiv.org/pdf/2406.06326
  • MCTSr: Mathematic as a Blackbox for LLM. The MCT Self-Refine (MCTSr) algorithm integrates Large Language Models (LLMs) with Monte Carlo Tree Search (MCTS) to enhance performance in complex mathematical reasoning tasks by leveraging systematic exploration and heuristic self-refine mechanisms. Extensive experiments show that MCTSr significantly improves success rates on Olympiad-level mathematical problems, advancing the application of LLMs in strategic reasoning and decision-making.
  • VideoGPT. To improve video understanding, a model called VideoGPT+ combines image and video encoders. While video encoders offer temporal context, image encoders capture finely detailed spatial information.
  • Scene Graph Generation in Large-Size VHR Satellite Imagery: A Large-Scale Dataset and A Context-Aware Approach. To enhance Scene Graph Generation (SGG) for very-high-resolution satellite imaging (VHR SAI), this research introduces a new dataset and methodology.LLM.Mojo.This project is a port of Andrej Karpathy’s llm.c to Mojo, currently in beta and subject to changes.
https://github.com/abgulati/LARS
  • Depth Anything V2. With the use of artificial data, the new Depth Anything model was trained, and its performance on intricate scenes has significantly increased.
  • DeepSeek-Coder-V2. Robust DeepSeek Coder achieves scores of 90+ on HumanEval and matches GPT-4 Turbo on numerous other difficult benchmarks. It is free for business usage and accessible via an API.
  • HelpSteer2: Open-source dataset for training top-performing reward models. Along with an excellent paper about training reward models to match model output to human preferences, Nvidia has made available a dataset and procedure.
https://arxiv.org/pdf/2406.09403
  • Differentiable rasterization. Given a program that produces a vector representation of an image (think SVG), rasterization turns it into a pixel representation (think PNG). Everything ought to be adjustable. This article explains how to write SVG light that is differentiable.
  • LARS — The LLM & Advanced Referencing Solution. LARS is an application that enables you to run LLMs (Large Language Models) locally on your device, upload your own documents, and engage in conversations wherein the LLM grounds its responses with your uploaded content.
  • Beyond the Basics of Retrieval for Augmenting Generation. The RAGatouille creator delivered a great discussion about COLBERT, some of the open issues, and how to significantly increase RAG performance.
  • TokenCost. Tokencost helps calculate the USD cost of using major Large Language Model (LLM) APIs by calculating the estimated cost of prompts and completions.
https://github.com/AgentOps-AI/tokencost
  • GaiaNet node. Install and run your own AI agent service
  • Meta Chameleon. Chameleon is an early fusion model that processes images and text tokens concurrently. The team published the paper a few weeks ago and has now released model checkpoints along with inference code.
  • OGNI-DC: Robust Depth Completion with Optimization-Guided Neural Iterations. OGNI-DC is a new framework for depth completion that employs “Optimization-Guided Neural Iterations” (OGNI). This method refines a depth gradient field and incorporates the depth gradients into a depth map.
https://arxiv.org/pdf/2406.08100
  • Subobject-level Image Tokenization. Subobject tokenization is a novel approach for vision models to interpret images. Rather than dividing images into fixed square patches, this method allows models to analyze images by identifying meaningful segments, such as parts of objects.
  • Introduction to Granite Code Models. We introduce the Granite series of decoder-only code models for code generative tasks (e.g., fixing bugs, explaining code, documenting code), trained with code written in 116 programming languages. A comprehensive evaluation of the Granite Code model family on diverse tasks demonstrates that our models consistently reach state-of-the-art performance among available open-source code LLMs.
  • FireFunction V2: Fireworks Function Calling Model. Open model that matches GPT4-o on function calling benchmarks trained on top of Llama 3 70B.
https://github.com/facebookresearch/chameleon
  • Argilla. For AI developers and subject matter experts who need complete data ownership, high-quality outputs, and overall efficiency, Argilla offers a platform for cooperation.
  • TroL: Traversal of Layers for Large Language and Vision Models. Large language and vision models (LLVMs) with sizes of 1.8B, 3.8B, and 7B parameters are part of the new TroL family of efficient LLVMs.
  • Dot. A stand-alone open-source program designed to be simple to use for local LLMs, and specifically RAG, to interact with files and documents in a manner similar to Nvidia’s Chat with RTX.
  • WebCanvas: Benchmarking Web Agents in Online Environments. WebCanvas is a pioneering online evaluation framework designed to address the dynamic nature of web interactions. It provides a realistic assessment of autonomous web agents by utilizing live web environments and emphasizing task completion through the identification of key nodes.
https://arxiv.org/pdf/2406.07138

Perspectives

  • Computer says yes: how AI is changing our romantic lives. Artificial intelligence is creating companions who can be our confidants, friends, therapists and even lovers. But are they an answer to loneliness or merely another way for big tech to make money?
  • Nvidia’s New Sales Booster: The Global Push for National AI Champions. Governments everywhere are increasing their spending to entice corporations and multinationals to construct new data centers and renovate existing ones so that AI can be developed locally and massive language models can be trained in the original languages using data from their inhabitants. According to Nvidia, these independent AI initiatives should generate over $10 billion in revenue this year. The potential economic effects of generative AI are a source of concern for several governments. For their sensitive data and AI infrastructure, they want sovereign clouds, and US IT companies are happy to construct them for them.
https://arxiv.org/pdf/2406.09297v1
https://arxiv.org/pdf/2406.09136v1
  • Apple’s Intelligent Strategy. Apple showed off an incredible strategic edge in the AI arms race — but some might have missed that the company hints at using its biggest weakness as a formidable weapon against competitors.
  • How to Fix “AI’s Original Sin”. The copyright issues raised by AI models trained on protected content without authorization are discussed in this article. It advises AI developers to adhere to copyright signals, put in place safeguards to stop producing content that violates intellectual property rights and design business plans that guarantee just compensation for content creators. These strategies include retrieval-augmented generation (RAG) and the development of collaborative AI content ecosystems.
https://github.com/byungkwanlee/trol
  • Takeaways from OpenAI and Google’s May announcements. With the introduction of sophisticated AI models by OpenAI and Google, real-time multimodal understanding and answers are now possible and enhanced AI assistants and advancements in speech agents are promised. Google’s Gemini 1.5 Flash offers a notable reduction in latency and cost, while OpenAI’s GPT-4o promises double the speed and half the cost of its predecessor. Both digital behemoths are incorporating AI into their ecosystems, with OpenAI focusing on consumer markets with partnerships and products that could potentially reach up to a billion consumers.
  • Collection of AI Side Business Money-Making Information. There are some respectable AI projects on this list that even beginners can work on.
  • paramount. Paramount lets your expert agents evaluate AI chats

Meme of the week

What do you think about it? Some news that captured your attention? Let me know in the comments

If you have found this interesting:

You can look for my other articles, and you can also connect or reach me on LinkedIn. Check this repository containing weekly updated ML & AI news. I am open to collaborations and projects and you can reach me on LinkedIn. You can also subscribe for free to get notified when I publish a new story.

Here is the link to my GitHub repository, where I am collecting code and many resources related to machine learning, artificial intelligence, and more.

or you may be interested in one of my recent articles:

--

--

Salvatore Raieli

Senior data scientist | about science, machine learning, and AI. Top writer in Artificial Intelligence