WEEKLY AI NEWS: RESEARCH, NEWS, RESOURCES, AND PERSPECTIVES

ML news: Week 1–7 January

Salvatore Raieli
11 min readJan 8, 2024

OpenAI store is soon arriving, Google is working on Bard paid version, and other news

Photo by Filip Mishevski on Unsplash

The most interesting news, repository, articles, and resources of the week

Check and star this repository where the news will be collected and indexed:

You will find the news first in GitHub. Single posts are also collected here:

Weekly AI and ML news - each week the best of the field

49 stories

Research

  • MosaicBERT: A Bidirectional Encoder Optimized for Fast Pretraining. MosaicBERT is a custom BERT architecture optimized for fast pretraining. This study motivated many of the architecture choices around MosaicML’s MPT-7B and MPT-30B models. the main architectural modifications used: FlashAttention, ALiBi, Gated Linear Units, and Low Precision LayerNorm.
MosaicBERT: A Bidirectional Encoder Optimized for Fast Pretraining.
  • Improving Text Embeddings with Large Language Models. Microsoft researchers trained a decoder-only transformer based on Mistral for embeddings using synthetic data. In the class, it is the best. Remarkably, they create the synthetic retrieval training data using GPT-4 and a two-step prompting technique.
  • Images altered to trick machine vision can influence humans too. New research shows that even subtle changes to digital images, designed to confuse computer vision systems, can also affect human perception
  • Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models. The fact that existing language models require very expensive human preference data to function properly is one of their main disadvantages. Determining if it is possible to have language models self-play develop without gathering this data has emerged as a major area of current study. With only SFT data, a new technique called SPIN makes significant progress in that direction by significantly enhancing a basic model’s performance on a variety of tasks.
  • Boundary Attention: Learning to Find Faint Boundaries at Any Resolution. Identifying edges and curves in pictures is a traditional computer vision challenge. Nevertheless, many existing approaches perform poorly when noise, quality changes, or out-of-distribution instances are introduced. With just 207k parameters, this newly discovered approach works very well on sensor readings. It significantly advances state of the art and employs a two-stage training procedure.
source
Boundary Attention: Learning to Find Faint Boundaries at Any Resolution
  • 3D-Aware Visual Question Answering about Parts, Poses and Occlusions. Although there has been progress in Visual Question Answering (VQA), most models focus primarily on 2D reasoning and ignore the intricacy of 3D visual settings. This study introduces 3D-aware VQA.
  • DocLLM.We present DocLLM, a lightweight extension to traditional large language models (LLMs) for reasoning over visual documents, taking into account both textual semantics and spatial layout
  • GPT-4V(ision) is a Generalist Web Agent. In this work, we explore the potential of LMMs like GPT-4V as a generalist web agent that can follow natural language instructions to complete tasks on any given website.
  • Fast Inference of Mixture-of-Experts Language Models with Offloading. With the widespread adoption of Large Language Models (LLMs), many deep learning practitioners are looking for strategies for running these models more efficiently. One such strategy is to use a sparse mixture of experts (MoE). In this work, we study the problem of running large MoE language models on consumer hardware with limited accelerator memory.
Bracketing is All You Need: Unifying Image Restoration and Enhancement Tasks with Multi-Exposure Images
  • LLM Augmented LLMs: Expanding Capabilities through Composition. investigate combining specialized models with preexisting foundation models to increase capabilities; introduce cross-attention between models to combine representations that allow for new capabilities. For instance, a PaLM2-S model was enhanced with a smaller model trained on low-resource languages to enhance English translation and arithmetic reasoning for low-resource languages; this was also accomplished with a code-specific model that produced a 40% improvement in code generation and explanation tasks compared to the base code model.
  • LLaMA Pro. provides a post-pretraining technique to enhance an LLM’s knowledge without causing catastrophic forgetting; it does this by freezing the inherited blocks and tuning expanded identity blocks using only new corpus; trains an LLaMA Pro-8.3B initialized from Llama2–7B using code and math data; these models outperform base models on a variety of benchmarks while maintaining the original general capabilities.
Auffusion: Leveraging the Power of Diffusion and Large Language Models for Text-to-Audio Generation
  • Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models. demonstrates that a supervised fine-tuned LLM can be made better without needing to obtain more human-annotated data. Drawing inspiration from self-play, it uses the LLM to create its training data from prior iterations, then refines its policy by separating the responses it generated from the human-annotated data. This shows that the method can make the LLM perform better and outperform models trained via DPO with GPT-4 preference data.

News

  • Microsoft’s Copilot app is now available on iOS. The Microsoft Copilot app lets you ask questions, draft text, and generate images using AI.
  • Stuff we figured out about AI in 2023. This piece aims to summarize the major advancements in AI research throughout the course of 2023. It addresses a number of topics, including LLM applications, the issue of gullibility, model tweaking, and how to execute LLMs on personal devices. When used appropriately, LLMs can significantly improve the quality of life for those who use them. Although they are really rather simple to construct, many applications still find them to be unstable and there is still plenty to learn about them.
  • Gemini in Reasoning: Unveiling Commonsense in Multimodal Large Language Models. This study conducts a thorough evaluation of Gemini Pro’s efficacy in commonsense reasoning tasks, employing a diverse array of datasets that span both language-based and multimodal scenarios.
Context-Aware Interaction Network for RGB-T Semantic Segmentation
DocLLM: A layout-aware generative language model for multimodal document understanding
GPT-4V(ision) is a Generalist Web Agent, if Grounded
LLAMA PRO: Progressive LLaMA with Block Expansion

Resources

  • llm-course. Course to get into Large Language Models (LLMs) with roadmaps and Colab notebooks.
  • Bash One-Liners for LLMs. A project called Llamafile combines the inference and model code into a single portable executable. To handle command line output further, this blog post explains how to do so.
  • pykoi: RLHF/RLAIF in one unified interface. pykoi is an open-source Python library for improving LLMs with RLHF. We provide a unified interface including RLHF/RLAIF data and feedback collection, finetuning with reinforcement learning and reward modeling, and LLM comparisons.
AI-created “virtual influencers” are stealing business from humans
  • gpt-fast. Simple and efficient pytorch-native transformer text generation.
  • TinyGPT-V. TinyGPT-V: Efficient Multimodal Large Language Model via Small Backbones
  • sbs-generator. This repository contains a framework for converting monocular videos into side-by-side (SBS) 3D videos. It utilizes a combination of image processing techniques and depth map predictions to generate separate views for each eye, creating a 3D effect when viewed with appropriate hardware.
  • ColBERTv2: Indexing & Search Notebook. ColBERT is a cutting-edge generation and retrieval technique. To assist readers in getting up to speed and experimenting with the technique, the authors have included a notepad.
  • intel-extension-for-transformers. An Innovative Transformer-based Toolkit to Accelerate GenAI/LLM Everywhere
  • aMUSEd: An Open MUSE Reproduction. We present aMUSEd, an open-source, lightweight masked image model (MIM) for text-to-image generation based on MUSE. With 10 percent of MUSE’s parameters, aMUSEd is focused on fast image generation.
  • RAGatouille. Easily use and train state-of-the-art retrieval methods in any RAG pipeline. Designed for modularity and ease of use, backed by research.
  • ODTrack. ODTrack is a simple, flexible, and effective video-level tracking pipeline, which densely associates the contextual relationships of video frames in an online token propagation manner.
  • ARLib. An open-source framework for conducting data poisoning attacks on recommendation systems, designed to assist researchers and practitioners.

Perspectives

Bash One-Liners for LLMs
  • Revealing the ‘Clever Hans Effect’ in AI-Driven Drug Discovery. In a landmark study at the University of Bonn, a team led by Prof. Dr. Jürgen Bajorath has revealed a significant finding about the role of artificial intelligence (AI) in pharmaceutical research.
  • What We Learned About AI and Education in 2023. From Disruption to Integration: AI Responsive Education in 2023
  • The AI trust crisis. Users worry that their data may be used to train OpenAI’s models as a result of Dropbox’s new AI features, even though Dropbox has denied this and has a policy requiring customer agreement for such usage. This circumstance draws attention to a larger crisis of confidence in AI and data privacy, highlighting the necessity of corporations communicating clearly and being open about how they use data.
  • The official OpenAI prompt engineering guide. a thorough, step-by-step manual that outlines methods and techniques for improving performance with big language models such as GPT-4.

Meme of the week

What do you think about it? Some news that captured your attention? Let me know in the comments

If you have found this interesting:

You can look for my other articles, and you can also connect or reach me on LinkedIn. Check this repository containing weekly updated ML & AI news. I am open to collaborations and projects and you can reach me on LinkedIn.

Here is the link to my GitHub repository, where I am collecting code and many resources related to machine learning, artificial intelligence, and more.

or you may be interested in one of my recent articles:

--

--

Salvatore Raieli

Senior data scientist | about science, machine learning, and AI. Top writer in Artificial Intelligence