WEEKLY AI NEWS: RESEARCH, NEWS, RESOURCES, AND PERSPECTIVES

ML news: Week 1–7 January

Salvatore Raieli

11 min readJan 8, 2024

OpenAI store is soon arriving, Google is working on Bard paid version, and other news

The most interesting news, repository, articles, and resources of the week

Check and star this repository where the news will be collected and indexed:

GitHub — SalvatoreRa/ML-news-of-the-week: A collection of the the best ML news every week…

A collection of the the best ML news every week (research, news, resources) — GitHub — SalvatoreRa/ML-news-of-the-week…

github.com

You will find the news first in GitHub. Single posts are also collected here:

Salvatore Raieli

Weekly AI and ML news - each week the best of the field

View list

49 stories

Research

MosaicBERT: A Bidirectional Encoder Optimized for Fast Pretraining. MosaicBERT is a custom BERT architecture optimized for fast pretraining. This study motivated many of the architecture choices around MosaicML’s MPT-7B and MPT-30B models. the main architectural modifications used: FlashAttention, ALiBi, Gated Linear Units, and Low Precision LayerNorm.

MosaicBERT: A Bidirectional Encoder Optimized for Fast Pretraining.

Improving Text Embeddings with Large Language Models. Microsoft researchers trained a decoder-only transformer based on Mistral for embeddings using synthetic data. In the class, it is the best. Remarkably, they create the synthetic retrieval training data using GPT-4 and a two-step prompting technique.
Images altered to trick machine vision can influence humans too. New research shows that even subtle changes to digital images, designed to confuse computer vision systems, can also affect human perception
Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models. The fact that existing language models require very expensive human preference data to function properly is one of their main disadvantages. Determining if it is possible to have language models self-play develop without gathering this data has emerged as a major area of current study. With only SFT data, a new technique called SPIN makes significant progress in that direction by significantly enhancing a basic model’s performance on a variety of tasks.
Boundary Attention: Learning to Find Faint Boundaries at Any Resolution. Identifying edges and curves in pictures is a traditional computer vision challenge. Nevertheless, many existing approaches perform poorly when noise, quality changes, or out-of-distribution instances are introduced. With just 207k parameters, this newly discovered approach works very well on sensor readings. It significantly advances state of the art and employs a two-stage training procedure.

Bracketing is All You Need: Unifying Image Restoration and Enhancement Tasks with Multi-Exposure Images. This work uses a unique temporally modulated recurrent network (TMRNet) with bracketing photography to achieve a considerable improvement in low-light photo quality. This method surpasses current multi-image processing techniques by training with synthetic data and adapting to real-world pictures.
Auffusion: Leveraging the Power of Diffusion and Large Language Models for Text-to-Audio Generation. The Auffusion system presents a breakthrough in Text-to-Audio (TTA) creation, inspired by Text-to-Image diffusion models. It is quite good at turning text into high-quality audio, especially with complicated inputs.
Context-Aware Interaction Network for RGB-T Semantic Segmentation. CAINet is an innovative technique that researchers have developed to improve RGB-T semantic segmentation, which is important for autonomous driving. This system mixes many data kinds in a unique way, emphasizing the complementary qualities and global context of each form of data.

Boundary Attention: Learning to Find Faint Boundaries at Any Resolution

3D-Aware Visual Question Answering about Parts, Poses and Occlusions. Although there has been progress in Visual Question Answering (VQA), most models focus primarily on 2D reasoning and ignore the intricacy of 3D visual settings. This study introduces 3D-aware VQA.
DocLLM.We present DocLLM, a lightweight extension to traditional large language models (LLMs) for reasoning over visual documents, taking into account both textual semantics and spatial layout
GPT-4V(ision) is a Generalist Web Agent. In this work, we explore the potential of LMMs like GPT-4V as a generalist web agent that can follow natural language instructions to complete tasks on any given website.
Fast Inference of Mixture-of-Experts Language Models with Offloading. With the widespread adoption of Large Language Models (LLMs), many deep learning practitioners are looking for strategies for running these models more efficiently. One such strategy is to use a sparse mixture of experts (MoE). In this work, we study the problem of running large MoE language models on consumer hardware with limited accelerator memory.

Bracketing is All You Need: Unifying Image Restoration and Enhancement Tasks with Multi-Exposure Images

LLM Augmented LLMs: Expanding Capabilities through Composition. investigate combining specialized models with preexisting foundation models to increase capabilities; introduce cross-attention between models to combine representations that allow for new capabilities. For instance, a PaLM2-S model was enhanced with a smaller model trained on low-resource languages to enhance English translation and arithmetic reasoning for low-resource languages; this was also accomplished with a code-specific model that produced a 40% improvement in code generation and explanation tasks compared to the base code model.
LLaMA Pro. provides a post-pretraining technique to enhance an LLM’s knowledge without causing catastrophic forgetting; it does this by freezing the inherited blocks and tuning expanded identity blocks using only new corpus; trains an LLaMA Pro-8.3B initialized from Llama2–7B using code and math data; these models outperform base models on a variety of benchmarks while maintaining the original general capabilities.

Auffusion: Leveraging the Power of Diffusion and Large Language Models for Text-to-Audio Generation

Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models. demonstrates that a supervised fine-tuned LLM can be made better without needing to obtain more human-annotated data. Drawing inspiration from self-play, it uses the LLM to create its training data from prior iterations, then refines its policy by separating the responses it generated from the human-annotated data. This shows that the method can make the LLM perform better and outperform models trained via DPO with GPT-4 preference data.

News

Microsoft’s Copilot app is now available on iOS. The Microsoft Copilot app lets you ask questions, draft text, and generate images using AI.
Stuff we figured out about AI in 2023. This piece aims to summarize the major advancements in AI research throughout the course of 2023. It addresses a number of topics, including LLM applications, the issue of gullibility, model tweaking, and how to execute LLMs on personal devices. When used appropriately, LLMs can significantly improve the quality of life for those who use them. Although they are really rather simple to construct, many applications still find them to be unstable and there is still plenty to learn about them.
Gemini in Reasoning: Unveiling Commonsense in Multimodal Large Language Models. This study conducts a thorough evaluation of Gemini Pro’s efficacy in commonsense reasoning tasks, employing a diverse array of datasets that span both language-based and multimodal scenarios.

Context-Aware Interaction Network for RGB-T Semantic Segmentation

Noise-free Optimization in Early Training Steps for Image Super-Resolution. By concentrating on two crucial elements — the ideal centroid of possible high-resolution images and the intrinsic noise that degrades image quality — researchers have created a novel technique that enhances single-image super-resolution.
AI-created “virtual influencers” are stealing business from humans. Brands are turning to hyper-realistic, AI-generated influencers for promotions.
DeepMind AI outdoes human mathematicians on unsolved problem. The large language model improves on efforts to solve combinatorics problems inspired by the card game Set.
Nikon, Sony, and Canon fight AI fakes with new camera tech. Digital signatures to provide a way to tell real photos from deep fakes

DocLLM: A layout-aware generative language model for multimodal document understanding

Intel to spin out AI software firm with outside investment. Intel on Wednesday said it was forming a new independent company around its artificial intelligence software efforts with backing from digital-focused asset manager DigitalBridge Group and other investors.
Search startup Perplexity AI valued at $520 mln in funding from Bezos, Nvidia. Search startup Perplexity AI has raised $73.6 million from a group of investors including Nvidia
OpenAI’s app store for GPTs will launch next week. OpenAI plans to launch a store for GPTs, custom apps based on its text-generating AI models (e.g. GPT-4), sometime in the coming week.
Google appears to be working on an ‘advanced’ version of Bard that you have to pay for. Google might be on track to release a Gemini Ultra-powered Bard Advanced.

GPT-4V(ision) is a Generalist Web Agent, if Grounded

LLM Training and Inference with Intel Gaudi 2 AI Accelerators. Excellent training throughput, flops, and decoding bandwidth are features of the new Intel processor, which is accessible for on-premise deployment across many platforms.
GitHub makes Copilot Chat generally available, letting devs ask questions about code. GitHub’s launching Chat in general availability for all users.

LLAMA PRO: Progressive LLaMA with Block Expansion

Resources

llm-course. Course to get into Large Language Models (LLMs) with roadmaps and Colab notebooks.
Bash One-Liners for LLMs. A project called Llamafile combines the inference and model code into a single portable executable. To handle command line output further, this blog post explains how to do so.
pykoi: RLHF/RLAIF in one unified interface. pykoi is an open-source Python library for improving LLMs with RLHF. We provide a unified interface including RLHF/RLAIF data and feedback collection, finetuning with reinforcement learning and reward modeling, and LLM comparisons.

AI-created “virtual influencers” are stealing business from humans

gpt-fast. Simple and efficient pytorch-native transformer text generation.
TinyGPT-V. TinyGPT-V: Efficient Multimodal Large Language Model via Small Backbones
sbs-generator. This repository contains a framework for converting monocular videos into side-by-side (SBS) 3D videos. It utilizes a combination of image processing techniques and depth map predictions to generate separate views for each eye, creating a 3D effect when viewed with appropriate hardware.
ColBERTv2: Indexing & Search Notebook. ColBERT is a cutting-edge generation and retrieval technique. To assist readers in getting up to speed and experimenting with the technique, the authors have included a notepad.
intel-extension-for-transformers. An Innovative Transformer-based Toolkit to Accelerate GenAI/LLM Everywhere
aMUSEd: An Open MUSE Reproduction. We present aMUSEd, an open-source, lightweight masked image model (MIM) for text-to-image generation based on MUSE. With 10 percent of MUSE’s parameters, aMUSEd is focused on fast image generation.

RAGatouille. Easily use and train state-of-the-art retrieval methods in any RAG pipeline. Designed for modularity and ease of use, backed by research.
ODTrack. ODTrack is a simple, flexible, and effective video-level tracking pipeline, which densely associates the contextual relationships of video frames in an online token propagation manner.
ARLib. An open-source framework for conducting data poisoning attacks on recommendation systems, designed to assist researchers and practitioners.

Learning JAX as a PyTorch developer. Some ideas about the transition from Pytorch to Jax. This post explains nine key ideas that set Jax apart and make it effective; each is illustrated with a lovely piece of code.
Mitigating Hallucination in LLMs. A Comprehensive Survey of Hallucination Mitigation Techniques in Large Language Models
If LLM Is the Wizard, Then Code Is the Wand. A Survey on How Code Empowers Large Language Models to Serve as Intelligent Agents

Perspectives

How IBM Sees AI Changing the Game for Companies of All Sizes with IBM’s VP of Technology and Director of Startups. AI technology is revolutionizing a variety of sectors’ business landscapes. In this article, IBM’s Director of Startups Kylie Rutherford, and Vice President of Software and Technology Raj Datta discuss how artificial intelligence (AI) is transforming business for organizations of all kinds and provide several use examples for different products.
LLMs and Programming in the first days of 2024. Large Language Models (LLMs) have greatly accelerated code creation and comprehension of intricate APIs or frameworks in 2023, making them indispensable for programmers. LLMs perform well at high-level Python coding and routine chores, but they are less effective at sophisticated system programming. They may also be used as a simplified form of documentation and as an effective method for increasing productivity.
Surge in number of ‘extremely productive’ authors concerns scientists. Some researchers publish a new paper every five days, on average. Data trackers suspect not all their manuscripts were produced through honest labor.
Satellite images reveal untracked human activity on the oceans. Machine learning and satellite imagery have been used to map industrial infrastructure at sea — from fishing vessels to wind turbines. The findings provide a more comprehensive picture of maritime activity than ever before.

Revealing the ‘Clever Hans Effect’ in AI-Driven Drug Discovery. In a landmark study at the University of Bonn, a team led by Prof. Dr. Jürgen Bajorath has revealed a significant finding about the role of artificial intelligence (AI) in pharmaceutical research.
What We Learned About AI and Education in 2023. From Disruption to Integration: AI Responsive Education in 2023
The AI trust crisis. Users worry that their data may be used to train OpenAI’s models as a result of Dropbox’s new AI features, even though Dropbox has denied this and has a policy requiring customer agreement for such usage. This circumstance draws attention to a larger crisis of confidence in AI and data privacy, highlighting the necessity of corporations communicating clearly and being open about how they use data.
The official OpenAI prompt engineering guide. a thorough, step-by-step manual that outlines methods and techniques for improving performance with big language models such as GPT-4.

Meme of the week

What do you think about it? Some news that captured your attention? Let me know in the comments

If you have found this interesting:

You can look for my other articles, and you can also connect or reach me on LinkedIn. Check this repository containing weekly updated ML & AI news. I am open to collaborations and projects and you can reach me on LinkedIn.

Here is the link to my GitHub repository, where I am collecting code and many resources related to machine learning, artificial intelligence, and more.

GitHub — SalvatoreRa/tutorial: Tutorials on machine learning, artificial intelligence, data science…

Tutorials on machine learning, artificial intelligence, data science with math explanation and reusable code (in python…