WEEKLY AI NEWS: RESEARCH, NEWS, RESOURCES, AND PERSPECTIVES
AI & ML news: Week 17- 23 June
Ilya Sutskever creates a new company, NIDIA is the most valuable company and much more
The most interesting news, repository, articles, and resources of the week
Check and star this repository where the news will be collected and indexed:
You will find the news first in GitHub. Single posts are also collected here:
Research
- Discovering Preference Optimization Algorithms with and for Large Language Models. suggests an algorithm that adaptively combines logistic and exponential losses; this approach eliminates the need for human intervention by prompting an LLM to suggest and implement preference optimization loss functions based on previously assessed performance metrics. It also suggests an LLM-driven objective discovery of state-of-the-art preference optimization.
- SelfGoal: Your Language Agents Already Know How to Achieve High-level Goals. a framework to increase the high-level goal-achieving capabilities of an LLM-based agent; during interaction with the environment, the framework adaptively decomposes a high-level goal into a tree structure of useful subgoals; enhances performance on a variety of tasks, including cooperative, competitive, and deferred feedback environments.
- Mixture-of-Agents Enhances Large Language Model Capabilities. a strategy that beats GPT-4o on AlpacaEval 2.0, MT-Bench, and FLASK by utilizing the combined strengths of several LLMs through a Mixture-of-Agents methodology; layers are constructed with numerous LLM agents, and each agent builds on the outputs of other agents in the previous levels.
- Transformers meet Neural Algorithmic Reasoners. Tokens in the LLM can now cross-attend to node embeddings from a GNN-based neural algorithmic reasoner (NAR) thanks to a new hybrid design; the resulting model, named TransNAR, shows gains in OOD reasoning across algorithmic challenges.
- Self-Tuning: Instructing LLMs to Acquire New Knowledge through Self-Teaching Effectively. increases an LLM’s capacity to learn new information from raw documents through self-teaching; the process consists of three steps: 1) a self-teaching component that enhances documents with a series of knowledge-intensive tasks emphasizing comprehension, memorization, and self-reflection; 2) the model is configured to continuously learn using only the new documents, aiding in the thorough acquisition of new knowledge; and 3) the deployed model is used to learn new information from new documents while evaluating its QA skills.
- Visual Sketchpad: Sketching as a Visual Chain of Thought for Multimodal Language Models. a framework that gives a multimodal LLM access to a visual sketchpad and drawing tools; it can give a model, such as GPT-4, the ability to create intermediate sketches to reason over complex tasks; over strong base models without sketching, it performs better on many tasks; on all the tasks tested, GPT-4 equipped with SketchPad sets a new state of the art.
- Mixture of Memory Experts. claims to enable scaling to a high number of parameters while keeping the inference cost fixed. It suggests a method to significantly reduce hallucination (10x) by tuning millions of expert adapters (e.g., LoRAs) to learn exact facts and retrieve them from an index at inference time. The memory experts are specialized to ensure faithful and factual accuracy on the data it was turned on.
- Multimodal Table Understanding. presents Table-LLaVa 7B, a multimodal LLM for multimodal table understanding; it produces a large-scale dataset MMTab, comprising table images, instructions, and tasks; it is comparable with GPT-4V and greatly outperforms existing MLLMs on numerous benchmarks.
- Never Miss A Beat: An Efficient Recipe for Context Window Extension of Large Language Models with Consistent “Middle” Enhancement. suggests a training-efficient way to extend LLMs to longer context lengths (e.g., 4K -> 256K); it uses a truncated Gaussian to encourage sampling from the middle part of the context during fine-tuning; the approach helps to alleviate the so-called “Lost-in-the-Middle” problem in long-context LLMs. suggests a method to tune an LLM to effectively utilize information from the middle part of the context.
- Simple and Effective Masked Diffusion Language Models. Easy diffusion model to model language. It functions fairly well and generates out-of-order.
- MLKV: Multi-Layer Key-Value Heads for Memory Efficient Transformer Decoding. A novel technique that dramatically lowers memory consumption during auto-regressive inference in transformers is called Multi-Layer Key-Value (MLKV) sharing.
- Understanding Hallucinations in Diffusion Models through Mode Interpolation. This study looks into the reasons behind “hallucinations” — images that never were in the training set — that are produced by diffusion-based picture generation models.
- Chain of Preference Optimization: Improving Chain-of-Thought Reasoning in LLMs. Chain of Preference Optimization (CPO) helps large language models (LLMs) become more adept at logical reasoning. CPO matches the reasoning steps of Chain-of-Thought (CoT) decoding with the optimal routes of ToT by fine-tuning LLMs using search trees from the Tree-of-Thought (ToT) technique.
- Language Modeling with Editable External Knowledge. ERASE is a novel approach to updating language models. Unlike conventional methods that emphasize enhancing retrieval during prediction, ERASE incrementally deletes or rewrites entries in the knowledge base as new documents are incorporated.
- Duoduo CLIP: Efficient 3D Understanding with Multi-View Images. Duoduo CLIP is a 3D representation learning model utilizing multi-view images rather than point-clouds for training and analysis.
- CAMixerSR: Only Details Need More “Attention”. CAMixerSR enhances image resolution by intelligently applying convolution to simpler areas and using deformable window attention for intricate textures.
- ‘Fighting fire with fire’ — using LLMs to combat LLM hallucinations. The number of errors produced by an LLM can be reduced by grouping its outputs into semantically similar clusters. Remarkably, this task can be performed by a second LLM, and the method’s efficacy can be evaluated by a third. The associate article is here
- Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks. Microsoft has published a collection of tiny VLMs under an MIT license that performs noticeably better in captioning, bounding, and classification than much larger models.
- Logit Prisms: Decomposing Transformer Outputs for Mechanistic Interpretability. The logit lens approach has been improved by decomposing logit outputs into contributions from different model components. This aids in comprehending the decision-making process of transformer models. This method, which employs “prisms” for residual streams, attention layers, and MLP layers, demonstrates how these components affect predictions and offer insights into the tasks that the gemma-2b model does, such as factual retrieval and arithmetic.
- PlanRAG: A Plan-then-Retrieval Augmented Generation for Generative Large Language Models as Decision Makers. Using sophisticated data analysis, decision QA is a new role for LLMs that identifies the optimal decisions.
- ChangeViT: Unleashing Plain Vision Transformers for Change Detection. A methodology called ChangeViT makes use of vision transformers (ViTs) to identify significant environmental changes in remote sensing photos.
- LayerMerge: Neural Network Depth Compression through Layer Pruning and Merging. LayerMerge is a novel technique that simultaneously prunes activation functions and convolution layers to increase neural network efficiency.
- Adversarial Attacks on Multimodal Agents. Vision-enabled language models (VLMs) such as Gemini and GPT-4o enable autonomous agents to perform activities like code editing and buying. This investigation demonstrates how susceptible these agents are to malevolent attacks.
- TimeSieve: Extracting Temporal Dynamics through Information Bottlenecks. A novel model called TimeSieve was created to address typical problems in time series forecasting.
News
- Apple to ‘Pay’ OpenAI for ChatGPT Through Distribution, Not Cash. The collaboration between Apple and OpenAI isn’t anticipated to bring in a significant amount of money for either company, at least not right away. Apple is not paying OpenAI as part of the agreement because it feels that integrating OpenAI’s technology and brand into its products is as valuable as or more valuable than financial compensation. The agreement isn’t exclusive; Apple is already talking about providing additional chatbot choices. In the long run, Apple intends to profit from AI by entering into revenue-sharing contracts with AI partners.
- AI will make money sooner than you’d think, says Cohere CEO Aidan Gomez. Enterprise is the pathway to profit, Gomez says, but maybe don’t ask it to do medicine quite yet.
- Fake beauty queens charm judges at the Miss AI pageant. An AI model from Romania named Aiyana Rainbow is a finalist in the first Miss AI pageant, which showcases AI-generated models on social media. The event is a part of “The FanVue World AI Creator Awards,” which is organized by FanVue and highlights the talent of AI creators who can create captivating content without having to be the face of the work. The $5,000 prize package for Miss AI will include mentorship and support from the public relations community. At the end of June, the outcomes will be made public.
- Elon Musk reconsiders phone project after Apple Intelligence OpenAI integration. Elon Musk threatened to forbid any Apple devices from being used on the properties of his firms in response to Apple integrating OpenAI ChatGPT on a few of its devices.
- Microsoft’s star AI chief peers into OpenAI’s code, highlighting an unusual rivalry. Primarily, OpenAI was established as a safety net against DeepMind, the AI startup that Google purchased in 2014. However, Mustafa Suleyman, a co-founder of DeepMind, has recently been taking on an unimaginable task: delving into OpenAI’s crown jewels, the proprietary algorithms that power foundation models like GPT-4, according to people familiar with the situation. This is due to the fact that Suleyman is currently Microsoft’s head of AI initiatives. As part of Microsoft’s multibillion-dollar investment in OpenAI, the corporation possesses the intellectual property rights to its software.
- Amazon says it’ll spend $230 million on generative AI startups. Amazon says that it will commit up to $230 million to startups building generative AI-powered applications.
- McDonald’s ends AI drive-thru trial as fast-food industry tests automation. Companies have touted AI as the future of the industry, but technology has also resulted in viral videos of wrong orders
- Balance effects of AI with profits tax and green levy says IMF. Governments faced with economic upheaval caused by artificial intelligence should consider fiscal policies including taxes on excess profits and a green levy to atone for AI-related carbon emissions, according to the International Monetary Fund.
- Introducing Gen-3 Alpha. Runway has developed a brand-new, incredibly potent video generation model. Many of the current functions on its platform will be powered by it. You can find examples at the given URL.
- DeepMind’s new AI generates soundtracks and dialogue for videos. V2A is an AI system that DeepMind is developing to create synchronized soundtracks for videos. It generates music, sound effects, and dialogue using diffusion models trained on audio, dialogue transcripts, and video clips.
- Giant Chips Give Supercomputers a Run for Their Money . The California-based business Cerebras has proven in molecular dynamics calculations that their second-generation wafer-scale engine outperforms the fastest supercomputer in the world by a large margin. Additionally, it can infer sparse huge language models with no loss of accuracy at one-third of the energy cost of a complete model. The hardware of Cerebras allows for quick memory access and interconnects, which make both accomplishments possible. Cerebras aims to expand the scope of its wafer-scale engine applications to encompass a broader range of issues, such as airflow models surrounding cars and molecular dynamics simulations of biological processes.
- Nvidia becomes world’s most valuable company amid AI boom. Chipmaker dethrones Microsoft and Apple as stock market surge boosts valuation above $3.34tn
- The ‘Godfather of AI’ quit Google a year ago. Now he’s emerged out of stealth to back a startup promising to use AI for carbon capture. Renowned AI researchers Geoff Hinton and Max Welling have gathered a talented team to develop AI systems aimed at advancing material science for carbon capture.
- Nvidia Conquers Latest AI Tests. Nvidia’s Hopper architecture-based systems excelled in two recent MLPerf AI benchmark tests, which assess the fine-tuning of large language models and the training of graph neural networks.
- Perplexity AI searches for users in Japan, via SoftBank deal. Perplexity is capitalizing on its strategic partnership with SoftBank to broaden its presence in Japan. As part of this initiative, it is providing a free year of its premium AI-powered search engine, Perplexity Pro. SoftBank’s goal is to draw users by offering AI services without creating internal solutions. With a valuation of $1 billion, Perplexity is expanding its funding and investor base, which features prominent tech leaders and venture firms.
- Introducing Local III. The open-source local agent, Open Interpreter, has recently received a significant upgrade. It now has the capability to control the computer seamlessly and operates entirely offline and locally.
- Introducing the Property Graph Index: A Powerful New Way to Build Knowledge Graphs with LLMs. LlamaIndex has launched the Property Graph Index, significantly improving knowledge graph capabilities with enhanced modeling, storage, and querying features. This new index enables flexible graph construction and supports schema-guided, implicit, and free-form entity extraction. It also integrates with vector databases for hybrid searches and offers querying options through keyword expansion, vector similarity, Cypher queries, and custom traversal.
- Decagon launches with $35m raised from Accel and a16z. Decagon is developing human-like AI agents for customer support and has recently secured $30 million in Series A funding from Accel, along with $5 million in seed funding from a16z. Decagon’s product manages global support for companies such as Eventbrite, Rippling, Webflow, BILT, and Substack.
- London premiere of movie with AI-generated script cancelled after backlash. Plans to show The Last Screenwriter, whose script is credited to ‘ChatGPT 4.0’, prompted complaints although the film-makers insist the feature is ‘a contribution to the cause’
- OpenAI’s former chief scientist is starting a new AI company. Ilya Sutskever is launching Safe Superintelligence Inc., an AI startup that will prioritize safety over ‘commercial pressures.’
- Claude 3.5 Sonnet. At a fifth of the cost, Claude 3.5 Sonnet outperforms Opus in performance. Plus, it’s the greatest vision model out there right now. This demonstrates how much the frontier models have progressed.
- Apple researchers add 20 more open-source models to improve text and image AI. With 20 Core Machine Learning models that Apple has added to the Hugging Face open-source AI repository, the repository now includes a wider selection of public models with improved image classification and depth segmentation. These donations come after Apple earlier in the year released the four OpenELMs to Hugging Face and the Ferret big language model. The action shows Apple’s dedication to developing AI capabilities and its growing involvement with the AI research community.
- Factory Raises $15M Series A from Sequoia. Led by Sequoia Capital, Factory has raised $15 million in Series A funding to grow its workforce and improve its Droids software development toolset, which leverages artificial intelligence. Its products are rapidly expanding its customer base and setting new benchmarks on the SWE-bench AI coding benchmark. With Factory, software engineering will be increasingly automated, cutting down on laborious processes and speeding up development cycles.
- Optimizing AI Inference at Character.AI. Twenty percent of Google’s search volume, or 20,000 questions per second, are answered by Character AI. It operates this effectively thanks to several advancements.
- Apple delays launch of AI-powered features in Europe, blaming EU rules. Apple says competition rules that require functionality with rival products would compromise privacy and security
Resources
- Nemotron-4 340B. offers a reward model to filter data based on many qualities and an instruct model to generate high-quality data; exhibits impressive results on widely-used benchmarks such as MMLU and GSM8K; It competes with GPT-4 in a number of activities, such as scoring highly in multi-turn chat; Together with the base model, a preference data is also made available.
- Bag of Tricks: Benchmarking of Jailbreak Attacks on LLMs. Determining ways to incorporate search into language model creation is now the Holy Grail of study. This work is quite encouraging as it demonstrates that on math performance, tiny models with search can match considerably more powerful models.
- MCTSr: Mathematic as a Blackbox for LLM. The MCT Self-Refine (MCTSr) algorithm integrates Large Language Models (LLMs) with Monte Carlo Tree Search (MCTS) to enhance performance in complex mathematical reasoning tasks by leveraging systematic exploration and heuristic self-refine mechanisms. Extensive experiments show that MCTSr significantly improves success rates on Olympiad-level mathematical problems, advancing the application of LLMs in strategic reasoning and decision-making.
- VideoGPT. To improve video understanding, a model called VideoGPT+ combines image and video encoders. While video encoders offer temporal context, image encoders capture finely detailed spatial information.
- Scene Graph Generation in Large-Size VHR Satellite Imagery: A Large-Scale Dataset and A Context-Aware Approach. To enhance Scene Graph Generation (SGG) for very-high-resolution satellite imaging (VHR SAI), this research introduces a new dataset and methodology.LLM.Mojo.This project is a port of Andrej Karpathy’s llm.c to Mojo, currently in beta and subject to changes.
- Depth Anything V2. With the use of artificial data, the new Depth Anything model was trained, and its performance on intricate scenes has significantly increased.
- DeepSeek-Coder-V2. Robust DeepSeek Coder achieves scores of 90+ on HumanEval and matches GPT-4 Turbo on numerous other difficult benchmarks. It is free for business usage and accessible via an API.
- HelpSteer2: Open-source dataset for training top-performing reward models. Along with an excellent paper about training reward models to match model output to human preferences, Nvidia has made available a dataset and procedure.
- Differentiable rasterization. Given a program that produces a vector representation of an image (think SVG), rasterization turns it into a pixel representation (think PNG). Everything ought to be adjustable. This article explains how to write SVG light that is differentiable.
- LARS — The LLM & Advanced Referencing Solution. LARS is an application that enables you to run LLMs (Large Language Models) locally on your device, upload your own documents, and engage in conversations wherein the LLM grounds its responses with your uploaded content.
- Beyond the Basics of Retrieval for Augmenting Generation. The RAGatouille creator delivered a great discussion about COLBERT, some of the open issues, and how to significantly increase RAG performance.
- TokenCost. Tokencost helps calculate the USD cost of using major Large Language Model (LLM) APIs by calculating the estimated cost of prompts and completions.
- GaiaNet node. Install and run your own AI agent service
- Meta Chameleon. Chameleon is an early fusion model that processes images and text tokens concurrently. The team published the paper a few weeks ago and has now released model checkpoints along with inference code.
- OGNI-DC: Robust Depth Completion with Optimization-Guided Neural Iterations. OGNI-DC is a new framework for depth completion that employs “Optimization-Guided Neural Iterations” (OGNI). This method refines a depth gradient field and incorporates the depth gradients into a depth map.
- Subobject-level Image Tokenization. Subobject tokenization is a novel approach for vision models to interpret images. Rather than dividing images into fixed square patches, this method allows models to analyze images by identifying meaningful segments, such as parts of objects.
- Introduction to Granite Code Models. We introduce the Granite series of decoder-only code models for code generative tasks (e.g., fixing bugs, explaining code, documenting code), trained with code written in 116 programming languages. A comprehensive evaluation of the Granite Code model family on diverse tasks demonstrates that our models consistently reach state-of-the-art performance among available open-source code LLMs.
- FireFunction V2: Fireworks Function Calling Model. Open model that matches GPT4-o on function calling benchmarks trained on top of Llama 3 70B.
- Argilla. For AI developers and subject matter experts who need complete data ownership, high-quality outputs, and overall efficiency, Argilla offers a platform for cooperation.
- TroL: Traversal of Layers for Large Language and Vision Models. Large language and vision models (LLVMs) with sizes of 1.8B, 3.8B, and 7B parameters are part of the new TroL family of efficient LLVMs.
- Dot. A stand-alone open-source program designed to be simple to use for local LLMs, and specifically RAG, to interact with files and documents in a manner similar to Nvidia’s Chat with RTX.
- WebCanvas: Benchmarking Web Agents in Online Environments. WebCanvas is a pioneering online evaluation framework designed to address the dynamic nature of web interactions. It provides a realistic assessment of autonomous web agents by utilizing live web environments and emphasizing task completion through the identification of key nodes.
- CIFAR-10 Airbench. A benchmark for image classification is CIFAR-10. In a remarkably short amount of time, this algorithm offers a training setting that yields good performance.
- Cost Of Self Hosting Llama-3 8B-Instruct. Compared to using ChatGPT, self-hosting an LLM such as Llama-3 8B-Instruct can be much more expensive, costing approximately $17 per million tokens, while ChatGPT just costs $1 per million tokens. It is possible to lower the cost of self-hosted hardware to less than $0.01 per million tokens, but it would take about 5.5 years for the initial investment to pay for itself.
- GeoBench: Benchmarking and Analyzing Monocular Geometry Estimation Models. Modern surface normal estimate and depth models are assessed using a new benchmark.
- An Empirical Study of Mamba-based Language Models. The Nvidia research that previously showcased the hybrid basic Mamba model is now available.
Perspectives
- Computer says yes: how AI is changing our romantic lives. Artificial intelligence is creating companions who can be our confidants, friends, therapists and even lovers. But are they an answer to loneliness or merely another way for big tech to make money?
- Nvidia’s New Sales Booster: The Global Push for National AI Champions. Governments everywhere are increasing their spending to entice corporations and multinationals to construct new data centers and renovate existing ones so that AI can be developed locally and massive language models can be trained in the original languages using data from their inhabitants. According to Nvidia, these independent AI initiatives should generate over $10 billion in revenue this year. The potential economic effects of generative AI are a source of concern for several governments. For their sensitive data and AI infrastructure, they want sovereign clouds, and US IT companies are happy to construct them for them.
- General Intelligence (2024). What is lacking and what would it take to create a generally intelligent agent? This essay suggests that we will be here in a few years and examines the three concepts required to create an agent. The writer is an OpenAI researcher.
- Human neuroscience is entering a new era — it mustn’t forget its human dimension. The field is taking a leap forward thanks to innovative technologies, such as artificial intelligence. Researchers must improve consent procedures and public involvement.
- AI and Euro 2024: VAR is shaking up football — and it’s not going away. Sports physicist Eric Goff explains how updates to the technology can help referees make the toughest calls.
- How cutting-edge computer chips are speeding up the AI revolution. Engineers are harnessing the powers of graphics processing units (GPUs) and more, with a bevy of tricks to meet the computational demands of artificial intelligence
- Apple’s Intelligent Strategy. Apple showed off an incredible strategic edge in the AI arms race — but some might have missed that the company hints at using its biggest weakness as a formidable weapon against competitors.
- How to Fix “AI’s Original Sin”. The copyright issues raised by AI models trained on protected content without authorization are discussed in this article. It advises AI developers to adhere to copyright signals, put in place safeguards to stop producing content that violates intellectual property rights and design business plans that guarantee just compensation for content creators. These strategies include retrieval-augmented generation (RAG) and the development of collaborative AI content ecosystems.
- Takeaways from OpenAI and Google’s May announcements. With the introduction of sophisticated AI models by OpenAI and Google, real-time multimodal understanding and answers are now possible and enhanced AI assistants and advancements in speech agents are promised. Google’s Gemini 1.5 Flash offers a notable reduction in latency and cost, while OpenAI’s GPT-4o promises double the speed and half the cost of its predecessor. Both digital behemoths are incorporating AI into their ecosystems, with OpenAI focusing on consumer markets with partnerships and products that could potentially reach up to a billion consumers.
- Collection of AI Side Business Money-Making Information. There are some respectable AI projects on this list that even beginners can work on.
- paramount. Paramount lets your expert agents evaluate AI chats
Meme of the week
What do you think about it? Some news that captured your attention? Let me know in the comments
If you have found this interesting:
You can look for my other articles, and you can also connect or reach me on LinkedIn. Check this repository containing weekly updated ML & AI news. I am open to collaborations and projects and you can reach me on LinkedIn. You can also subscribe for free to get notified when I publish a new story.
Here is the link to my GitHub repository, where I am collecting code and many resources related to machine learning, artificial intelligence, and more.
or you may be interested in one of my recent articles: