WEEKLY AI NEWS: RESEARCH, NEWS, RESOURCES, AND PERSPECTIVES

ML news: Week 19–25 February

Salvatore Raieli
15 min readFeb 26, 2024

Google put a foot in open-source LLM, Stable diffusion 3, and much more this week

Photo by Hayden Walker on Unsplash

The most interesting news, repository, articles, and resources of the week

Check and star this repository where the news will be collected and indexed:

You will find the news first in GitHub. Single posts are also collected here:

Weekly AI and ML news - each week the best of the field

44 stories

Research

https://arxiv.org/pdf/2402.04833.pdf
https://arxiv.org/pdf/2402.08934v1.pdf
https://arxiv.org/pdf/2402.10176v1.pdf
https://arxiv.org/pdf/2401.18079.pdf
https://arxiv.org/pdf/2402.10422v1.pdf

News

  • Anthropic takes steps to prevent election misinformation. Called Prompt Shield, the technology, which relies on a combination of AI detection models and rules, shows a pop-up if a U.S.-based user of Claude, Anthropic’s chatbot, asks for voting information. The pop-up offers to redirect the user to TurboVote, a resource from the nonpartisan organization Democracy Works, where they can find up-to-date, accurate voting information.
  • OpenAI’s next AI product could be after your job (again). OpenAI is said to be developing AI agents that automate even more complex tasks, though their launch timeline remains unknown. One AI agent is said to take over the customer’s device to perform tasks like transferring data from a document to a spreadsheet, filling out expense reports, and entering them into accounting software. The other AI agent is said to perform more research-oriented, web-based tasks, such as creating itineraries and booking flight tickets.
https://arxiv.org/pdf/2402.10376v1.pdf
https://arxiv.org/pdf/2402.09257v1.pdf
https://arxiv.org/pdf/2402.12354v1.pdf
  • Gemma: Introducing new state-of-the-art open models. We’re releasing model weights in two sizes: Gemma 2B and Gemma 7B. Each size is released with pre-trained and instruction-tuned variants. Ready-to-use Colab and Kaggle notebooks, alongside integration with popular tools such as Hugging Face, MaxText, NVIDIA NeMo, and TensorRT-LLM, make it easy to get started with Gemma.
  • Reddit has a new AI training deal to sell user content. Over a decade of valuable user content is now for sale as Reddit preps to go public.
  • Apple Developing AI Tool to Help Developers Write Code for Apps. Apple is working on an updated version of Xcode that will include an AI tool for generating code, reports Bloomberg. The AI tool will be similar to GitHub Copilot from Microsoft, which can generate code based on natural language requests and convert code from one programming language to another.
https://github.com/GaussianObject/GaussianObject?tab=readme-ov-file

Resources

  • minbpe. Minimal, clean code for the (byte-level) Byte Pair Encoding (BPE) algorithm commonly used in LLM tokenization. The BPE algorithm is “byte-level” because it runs on UTF-8 encoded strings.
  • GPTScript. GPTScript is a new scripting language to automate your interaction with a Large Language Model (LLM), namely OpenAI. The ultimate goal is to create a fully natural language-based programming experience. The syntax of GPTScript is largely natural language, making it very easy to learn and use.
  • QWEN. We opensource our Qwen series, now including Qwen, the base language models, namely Qwen-1.8B, Qwen-7B, Qwen-14B, and Qwen-72B, as well as Qwen-Chat, the chat models, namely Qwen-1.8B-Chat, Qwen-7B-Chat, Qwen-14B-Chat, and Qwen-72B-Chat.
https://blog.google/technology/ai/google-gemini-next-generation-model-february-2024/#sundar-note
  • Sora Reference Papers. A collection of all papers referenced in OpenAI’s “Video generation models as world simulators”
  • repeng. Control vectors are a low-cost means of controlling the output of semantic generation. Compared to LoRA, they are less expensive to train yet may still be fairly powerful. It’s made simpler with this library.OpenRLHF.This is a Ray-based implementation of RLHF for Mistral and other Llama-style models. Several PPO stabilizing techniques are included to enhance performance.
  • 3D Diffuser Actor: Policy Diffusion with 3D Scene Representations. To enhance robot manipulation, the 3D Diffuser Actor blends 3D scene representations with diffusion strategies. Robots are better able to comprehend and engage with their surroundings thanks to this AI-driven method.
https://stability.ai/news/stable-diffusion-3
  • How to jointly tune learning rate and weight decay for AdamW. AdamW is often considered a method that decouples weight decay and learning rate. In this blog post, we show that this is not true for the specific way AdamW is implemented in Pytorch. We also show how to adapt the tuning strategy to fix this: when doubling the learning rate, the weight decay should be halved.
  • OpenLLMetry-JS. OpenLLMetry-JS is a set of extensions built on top of OpenTelemetry that gives you complete observability over your LLM application. Because it uses OpenTelemetry under the hood, it can be connected to your existing observability solutions — Datadog, Honeycomb, and others.
  • List of GPU clusters for rent. a list of entire clusters that can be rented on an hourly basis.
  • Mamba: The Hard Way. A detailed description of how Mamba works
  • new benchmark for large language models. It’s a collection of nearly 100 tests I’ve extracted from my actual conversation history with various LLMs.
https://github.com/sunsmarterjie/chatterbox
  • BoCoEL. Bayesian Optimization as a Coverage Tool for Evaluating LLMs. Accurate evaluation (benchmarking) is 10 times faster with just a few lines of modular code.
  • FiT: Flexible Vision Transformer for Diffusion Model. This repo contains PyTorch model definitions, pre-trained weights, and sampling code for our flexible vision transformer (FiT). FiT is a diffusion transformer-based model that can generate images at unrestricted resolutions and aspect ratios.
  • RobustVLM. To defend multi-modal models like OpenFlamingo and LLaVA against visual adversarial assaults, a novel technique is presented in this study. The authors successfully defend these models against manipulative picture assaults by fine-tuning the CLIP visual encoder in an unsupervised way, increasing the models’ dependability and security in practical applications without requiring complete model retraining.
  • HELM Instruct: A Multidimensional Instruction Following Evaluation Framework with Absolute Ratings. A popular benchmark called Holistic Evaluation of Language Models (HELM) was issued by the Stanford language modeling group. Additionally, they created HELM-Instruct, a version for instruction following. It is absolute, open-ended, and multifaceted.
  • LoRA Land: Fine-Tuned Open-Source LLMs that Outperform GPT-4. We’re excited to release LoRA Land, a collection of 25 fine-tuned Mistral-7b models that consistently outperform base models by 70% and GPT-4 by 4–15%, depending on the task. This collection of specialized fine-tuned models–all trained with the same base model–offers a blueprint for teams seeking to efficiently and cost-effectively deploy highly performant AI systems.
  • Multimodal LLM’s Ability to Understand Visual Data. A new tool called ChartX is designed to assess how well multi-modal large language models (MLLMs) can understand and make sense of visual charts.
  • A Critical Evaluation of AI Feedback for Aligning Language Models. The efficacy of integrating reinforcement learning with supervised fine-tuning in training is questioned in this repository. The more involved two-step technique can be outperformed by first training with a more sophisticated model, such as GPT-4.
  • MMCSG Dataset. The MMCSG (Multi-Modal Conversations in Smart Glasses) dataset comprises two-sided conversations recorded using Aria glasses, featuring multi-modal data such as multi-channel audio, video, accelerometer, and gyroscope measurements. This dataset is suitable for research in areas like automatic speech recognition, activity detection, and speaker diarization.
  • MultiLora inference server. One base model can have many LoRAs hot-swapped onto it using the Lorax inference server. This allows a large variety of model tunes to be supported with a significant reduction in RAM use.
https://stability.ai/news/stable-diffusion-3
  • GTBench: Uncovering the Strategic Reasoning Limitations of LLMs via Game-Theoretic Evaluations. GTBench is a language-driven environment, evaluating the strategic reasoning limitations of LLMs through game-theoretic tasks. GTBench is built on top of OpenSpiel, supporting 10 widely-recognized games
  • CrewAI. A library called CrewAI is available for creating and managing AI agents that make use of Replit and LangChain. It offers an easy-to-integrate modular setup comprising tasks, agents, crews, and tools for a variety of applications. LangSmith improves performance insights into non-deterministic LLM calls while streamlining the debugging process.
  • gemma.cpp. gemma.cpp is a lightweight, standalone C++ inference engine for the Gemma foundation models from Google.
  • MMedLM. The official codes for “Towards Building Multilingual Language Model for Medicine”.
  • LLM Evaluation Metrics for Labeled Data. How to measure the performance of LLM applications with ground truth data.

Perspectives

  • The data revolution in venture capital. Investors, data scientists, and tool builders leading the data-driven future of venture capital.
  • The Three C’s: Creativity, Collaboration, and Communication. The way we communicate, work together, and complete creative projects has changed significantly since the invention of computing. With AI, we’re beginning to witness the commencement of another significant change. We undervalue how significant this change will be. Businesses that integrate artificial intelligence (AI) into their products from the start will have a significant edge over those who add it later to already-existing goods.
  • Inside OpenAI Logan Kilpatrick (head of developer relations). Have you ever wondered how OpenAI develops and innovates so quickly? The head of developer relations at OpenAI, Logan Kilpatrick, talks about the company’s decision-making structure for product launches, high agency and urgency, and OpenAI’s distinct culture in this podcast.
  • Mind-reading devices are revealing the brain’s secrets. Implants and other technologies that decode neural activity can restore people’s abilities to move and speak — and help researchers understand how the brain works.
  • Generative AI’s environmental costs are soaring — and mostly secret. First-of-its-kind US bill would address the environmental costs of the technology, but there’s a long way to go.
  • Strategies for an Accelerating Future. With Google’s Gemini providing a context window of over a million tokens and Groq’s hardware enabling almost instantaneous responses from GPT-3.5 models, among other recent advancements in AI, these represent a significant advancement in practical AI applications and highlight the pressing need for leaders to comprehend and adjust to the rapidly changing AI landscape.
https://github.com/kaiyuyue/nxtp?tab=readme-ov-file
  • How to lose at Generative AI! Despite its excitement, generative AI is likely to let most startups down since it benefits established players with data advantages, established workflows, and the capacity to integrate AI without requiring significant system changes. A difficult road lies ahead for startups hoping to make a significant impact in the Generative AI space, even in spite of venture capital flooding the space. These startups are essentially preparing the market for incumbents who can readily adopt and integrate AI innovations into their dominant platforms by concentrating on expeditious engineering and UX improvements at the workflow layer.
  • Stockholm declaration on AI ethics: why others should sign. The use of artificial intelligence (AI) in science has the potential to do both harm and good. As a step towards preventing harm, we have prepared the Stockholm Declaration on AI for Science.
  • This is why the idea that AI will just augment jobs, never replace them, is a lie! AI will automate labor in certain areas. The response thus far has been divided: would increased efficiency allow for more human workers to accomplish the same duties, or will fewer workers be needed? This article compares and contrasts the effects of technology on manufacturing, agriculture, and the contemporary knowledge worker.
  • LLM evaluation at scale with the NeurIPS Large Language Model Efficiency Challenge.‍ After a year of breakneck innovation and hype in the AI space, we have now moved sufficiently beyond the peak of the hype cycle to start asking a critical question: are LLMs good enough yet to solve all of the business and societal challenges we are setting them up for?

Meme of the week

What do you think about it? Some news that captured your attention? Let me know in the comments

If you have found this interesting:

You can look for my other articles, and you can also connect or reach me on LinkedIn. Check this repository containing weekly updated ML & AI news. I am open to collaborations and projects and you can reach me on LinkedIn.

Here is the link to my GitHub repository, where I am collecting code and many resources related to machine learning, artificial intelligence, and more.

--

--

Salvatore Raieli

Senior data scientist | about science, machine learning, and AI. Top writer in Artificial Intelligence