WEEKLY AI NEWS: RESEARCH, NEWS, RESOURCES, AND PERSPECTIVES

ML news: Week 11–17 March

Salvatore Raieli
18 min readMar 18, 2024

Devin arrives, Google is set to revolutionize search against spam, and much more

Photo by Javy Luzania on Unsplash

The most interesting news, repository, articles, and resources of the week

Check and star this repository where the news will be collected and indexed:

You will find the news first in GitHub. Single posts are also collected here:

Weekly AI and ML news - each week the best of the field

49 stories

Research

https://deepmind.google/discover/blog/sima-generalist-ai-agent-for-3d-virtual-environments/
  • Plum: Prompt Learning using Metaheuristic. In this research, a broad class of more than 100 discrete optimization techniques known as metaheuristics is presented as a potent tool for enhancing rapid learning in big language models.
  • ViewFusion: Towards Multi-View Consistency via Interpolated Denoising. A new technique called ViewFusion aims to enhance the way diffusion models produce images from fresh angles while maintaining the consistency of the images from one view to another.
  • Functional Benchmarks for Robust Evaluation of Reasoning Performance, and the Reasoning Gap. reveals that there is a reasoning gap between the current models and the proposed functional benchmarks for evaluating the reasoning abilities of LLMs, ranging from 58.35% to 80.31%. However, the authors also note that these gaps can be closed with more advanced prompting techniques.
  • Can Large Language Models Reason and Plan? The subject of thinking and planning for LLMs is covered in a recent position paper. The following is an overview of the author’s findings: In summary, I don’t have any strong evidence from anything I’ve read, checked, or done to suggest that LLMs engage in typical reasoning or planning. Instead, they use web-scale training to perform a type of universal approximate retrieval, which is sometimes confused for reasoning abilities, as I have explained.”
https://arxiv.org/pdf/2403.04652.pdf
https://arxiv.org/pdf/2311.08364v1.pdf
https://heheyas.github.io/V3D/
https://sglab.kaist.ac.kr/SemCity/

News

https://caduceus-dna.github.io/
  • So long and thanks for all the pixels: Nvidia reportedly retiring the GTX brand for good. Nvidia has stopped producing GPUs based on its Turing architecture. The last of them included the likes of the GTX 1660, 1650, and 1630 series of GPUs. Once remaining stocks sell, they’ll be gone and with them, the “GTX” brand itself, leaving all Nvidia gaming graphics cards as “RTX” models.
  • Google’s upcoming Tensor G4 Chip set to rival Snapdragon 8 Gen 4 and Apple A18 Pro. Let’s say you’re a smartphone manufacturer aiming to develop a new model. You have two options: partner with an established chipmaker like Qualcomm or MediaTek or follow the path of Apple by designing your own custom chipset. Google has taken a similar approach, developing its in-house Tensor processors. Recent information suggests the Pixel 9 will feature the Tensor G4 chipset, promising improved heat and power management for an enhanced user experience.
https://arxiv.org/pdf/2403.09611.pdf
https://www.theverge.com/2024/3/12/24098728/perplexity-chatbot-yelp-suggestions-data-ai
  • Building Meta’s GenAI Infrastructure. The Llama 3 training infrastructure is described in this Meta blog article. It covers networking, storage, Pytorch, NCCL, and many enhancements. This will prepare the way for Meta’s H100s to go online throughout the course of the remaining months of this year.
  • Physical Intelligence Raises $70M to Build AI-Powered Robots for Any Application. Pi differentiates itself by aiming to create software that can be applied across a wide range of robotics hardware.
  • Researchers create AI worms that can spread from one system to another. Worms could potentially steal data and deploy malware. Now, in a demonstration of the risks of connected, autonomous AI ecosystems, a group of researchers has created one of what they claim is the first generative AI worms — which can spread from one system to another, potentially stealing data or deploying malware in the process.
  • Perplexity brings Yelp data to its chatbot. Perplexity’s responses can source multiple Yelp reviews for that cafe you were considering, along with location data and other information.
  • Gemini now lets you tune and modify responses with a prompt. Google is launching “a more precise way for you to tune Gemini’s responses” on the web app. When selecting (by highlighting) a part of Gemini’s response to your prompt, a pencil/sparkle icon appears to “Modify selected text.” This opens a box with Regenerate, Shorter, Longer, and Remove options, as well as an open text field.
https://every.to/napkin-math/claude-3-is-the-most-human-ai-yet
https://openai.com/blog/global-news-partnerships-le-monde-and-prisa-media
  • World’s first major act to regulate AI passed by European lawmakers. The European Union’s parliament on Wednesday approved the world’s first major set of regulatory ground rules to govern the mediatized artificial intelligence at the forefront of tech investment.
  • Figure 01 can now have full conversations with people. Figure’s robots can now hold in-depth discussions with humans thanks to the integration of OpenAI’s technology. While Figure’s neural networks provide quick, low-level dexterous robot operations, OpenAI’s models offer high-level visual and linguistic intelligence. This X article includes a video of a human conversing with a Figure robot, teaching it how to complete tasks, explaining the rationale behind the tasks, and providing a self-evaluation of the activities’ effectiveness.
  • Claude 3 Is The Most Human AI Yet. Claude 3, Anthropic’s latest AI model, is distinguished by its “warmth,” which makes it a reliable collaborator on creative writing assignments. More human-feeling and lifelike, Claude 3 is said to straddle the line between delightful deep contemplation and good thought. Though this subtlety has not been fully captured by technological benchmarks, Claude 3 is set to transform our relationship with AI in creative processes.

Resources

https://arxiv.org/pdf/2311.11855v1.pdf
  • SaulLM-7B: A pioneering Large Language Model for Law. With 7 billion parameters, SaulLM-7B is the first LLM designed explicitly for legal text comprehension and generation. Leveraging the Mistral 7B architecture as its foundation, SaulLM-7B is trained on an English legal corpus of over 30 billion tokens.
  • A Practical Guide to RAG Pipeline Evaluation (Part 1: Retrieval). Retrieval is a critical and complex subsystem of the RAG pipelines. After all, the LLM output is only as good as the information you provide it unless your App relies solely on the training data of the LLM. The core is measuring retrieval is assessing whether each of the retrieved results is relevant for a given query.
  • C4AI Command-R. C4AI Command-R is a research release of a 35 billion parameter highly performant generative model. Command-R is a large language model with open weights optimized for a variety of use cases including reasoning, summarization, and question-answering. Command-R has the capability for multilingual generation evaluated in 10 languages and highly performant RAG capabilities.
  • Artificial Intelligence Controller Interface (AICI). The Artificial Intelligence Controller Interface (AICI) lets you build Controllers that constrain and direct the output of a Large Language Model (LLM) in real time. Controllers are flexible programs capable of implementing constrained decoding, dynamic editing of prompts and generated text, and coordinating execution across multiple, parallel generations.
https://arxiv.org/pdf/2403.07816.pdf
  • US Public Domain Books (English). This dataset contains more than 650,000 English books (~ 61 billion words) presumed to be in the public domain in the US which were digitized by the Internet Archive and cataloged as part of the Open Library project.
  • transformer-debugger. Transformer Debugger (TDB) is a tool developed by OpenAI’s Superalignment team with the goal of supporting investigations into specific behaviors of small language models. The tool combines automated interpretability techniques with sparse autoencoders.
  • VideoMamba.VideoMamba is a technology that effectively manages global dependencies and local redundancy to tackle the challenges of video interpretation.
  • FastV. FastV is a plug-and-play inference acceleration method for large vision language models relying on visual tokens. It could reach a 45% theoretical FLOP reduction without harming the performance through pruning redundant visual tokens in deep layers.
https://arxiv.org/pdf/2403.07711v1.pdf
  • Maximizing training throughput using PyTorch FSDP. Together, teams from IBM and Meta have achieved 57% MFU by rapidly training potent models in parallel on huge A100 and H100 clusters.MoAI.MoAI is a new large language and vision model that integrates auxiliary visual data from specific computer vision tasks to improve upon existing models.
  • superopenai: logging and caching superpowers for the openai SDK. superopenai is a minimal convenience library for logging and caching LLM requests and responses for visibility and rapid iteration during development.TripoSR.TripoSR, a state-of-the-art open-source model for fast feedforward 3D reconstruction from a single image, was collaboratively developed by Tripo AI and Stability AI.
  • Exploring Alternative UX Patterns for GenAI Interfaces. In the rapidly evolving landscape of GenAI interfaces, it is crucial to venture beyond the established norms. The current dominance of Quick Actions and Multi-Turn engagement patterns in these interfaces, while effective in many cases, should not limit our imagination or hinder the potential for innovation.
  • rerankers. Rerankers are an important part of any retrieval architecture, but they’re also often more obscure than other parts of the pipeline. rerankers seeks to address this problem by providing a simple API for all popular rerankers, no matter the architecture.
https://mapooon.github.io/Face2DiffusionPage/

Perspectives

https://github.com/opengvlab/videomamba
  • Embrace AI to break down barriers in publishing for people who aren’t fluent in English. E. M. Wolkovich describes having a paper rejected because of an unfounded accusation that ChatGPT was used to write it. We think that both the rejection and the bias against the use of artificial intelligence (AI) in scientific writing are misguided.
  • Why scientists trust AI too much — and what to do about it. Some researchers see superhuman qualities in artificial intelligence. All scientists need to be alert to the risks this creates.
  • The Future of Poetry. Questions about whether poems were authored by humans or artificial intelligence (AI) were given to 38 AI experts and 39 English experts. First prize went to The Human, followed by Bard, ChatGPT-4, and Claude in that order, for both writing quality and the ability to deceive respondents into thinking that the poetry was written by a human. The fact that English specialists were far better at identifying which poems were composed by AI suggests that they should be involved more in the development of upcoming AI systems.
https://arxiv.org/pdf/2403.05056v1.pdf
  • Barack Obama on AI, free speech, and the future of the internet. The former president joined me on Decoder to discuss AI regulation, the First Amendment, and of course, what apps he has on his home screen.
  • AI startups require new strategies: This time it’s actually different. The typical dynamics between startups and incumbents do not apply in AI as they did in previous technology revolutions like mobile and the Internet. Ignore this at your peril.
  • Top AIs still fail IQ tests — When asked to read image-based questions. According to recent testing, sophisticated AI models such as ChatGPT-4 and Google’s “Gemini Advanced” do poorly on visual IQ tests, receiving lower-than-average scores. Although ChatGPT-4 exhibits mediocre pattern recognition abilities, it misidentifies objects visually and makes logical mistakes, indicating a considerable difference in comparison to human intellect. These results suggest that the development of universally intelligent AI systems may still be some way off.
  • The Top 100 Gen AI Consumer Apps. Over 40% of the top web products are new, having entered the top 50 in the last six months, according to Andreessen Horowitz’s most recent consumer analysis on the top 100 Gen AI consumer apps.
https://arxiv.org/pdf/2403.03101.pdf
  • This Nvidia Cofounder Could Have Been Worth $70 Billion. Instead, He Lives Off The Grid. If Curtis Priem, Nvidia’s first CTO, had held onto all his stock, he’d be the 16th richest person in America. Instead, he sold out years ago and gave most of his fortune to his alma mater Rensselaer Polytechnic Institute.
  • How to thrive in a crowded enterprise AI market. At a Lightspeed event, Arvind Jain, CEO of Glean, spoke on the difficulties and solutions facing corporate AI startups. He emphasized the need to provide genuine business value, being tenacious in hiring, and placing a higher priority on product quality than speed and cost. Jain also emphasized how privacy and security issues have slowed down the deployment of generative AI tools in businesses. Glean wants to become a widely used workplace AI platform that completely transforms how people work by becoming firmly integrated into organizational operations.
  • As AI tools get smarter, they’re growing more covertly racist, experts find. ChatGPT and Gemini discriminate against those who speak African American Vernacular English, report shows

Medium articles

A list of the Medium articles I have read and found the most interesting this week:

Meme of the week

What do you think about it? Some news that captured your attention? Let me know in the comments

--

--

Salvatore Raieli

Senior data scientist | about science, machine learning, and AI. Top writer in Artificial Intelligence