WEEKLY AI NEWS: RESEARCH, NEWS, RESOURCES, AND PERSPECTIVES

ML news: Week 8–14 January

Salvatore Raieli
15 min readJan 17, 2024

Introducing ChatGPT store, Google launches new AI services for retailers and much more

Photo by Roman Kraft on Unsplash

The most interesting news, repository, articles, and resources of the week

Check and star this repository where the news will be collected and indexed:

You will find the news first in GitHub. Single posts are also collected here:

Weekly AI and ML news - each week the best of the field

49 stories

Research

source
  • V*: Guided Visual Search as a Core Mechanism in Multimodal LLMs. We introduce V*, an LLM-guided visual search mechanism that employs the world knowledge in LLMs for efficient visual querying. When combined with an MLLM, this mechanism enhances collaborative reasoning, contextual understanding, and precise targeting of specific visual elements.
  • DeepSeek LLM: Scaling Open-Source Language Models with Longtermism. The DeepSeek LLM was one of the greatest coding models available last year. In several benchmarks, it achieved closeness to GPT-3.5 (despite being probably three times larger). A technical study has been made public with details on model training, token counts, model architecture, and other topics.
  • Denoising Vision Transformers. The vision community has been overtaken by Vision Transformers (ViT). They occasionally still exhibit artifacts in their embeddings that resemble grids. The community is reluctant to use them for jobs that come after because of this. This study suggests a positional embedding update that fixes this problem and provides a 25%+ performance gain for downstream vision tasks.
  • FED-NeRF: Achieve High 3D Consistency and Temporal Coherence for Face Video Editing on Dynamic NeRF. A new stabilizer for smooth temporal coherence and GAN-NeRF technology for 3D consistency have been used by researchers to create a facial video editing architecture. This technique works well for editing videos since it keeps viewpoints constant and makes frame transitions smooth.
source
  • A Minimaximalist Approach to Reinforcement Learning from Human Feedback. Self-Play Preference Optimization (SPO), a less complex alignment method than conventional RLHF, has been presented by Google researchers. Using game theory, the researchers were able to develop single-player self-play dynamics that provide good performance and are resilient to noisy preferences.
  • Mixtral of Experts. We introduce Mixtral 8x7B, a Sparse Mixture of Experts (SMoE) language model. Mixtral has the same architecture as Mistral 7B, with the difference that each layer is composed of 8 feedforward blocks (i.e. experts).
  • GPT-4V(ision) is a Human-Aligned Evaluator for Text-to-3D Generation. The constraints of existing single-criterion measures have been addressed by researchers with the development of a new assessment metric for text-to-3D generative models. This sophisticated technique compares 3D objects and generates prompts using GPT-4V. It is very compatible with human tastes and provides flexibility by adjusting to different user-specified requirements.
  • Self-emerging Token Labeling. Using a novel self-emerging token labeling (STL) framework, researchers have made a substantial development for Vision Transformers (ViTs) by improving the resilience of the Fully Attentional Network (FAN) models. Using this method, a FAN student model is trained after a FAN token labeler has been trained to produce relevant patch token labels.
  • MedAgents: Large Language Models as Collaborators for Zero-shot Medical Reasoning. We propose a Multi-disciplinary Collaboration (MC) framework. The framework works in five stages: (i) expert gathering: gathering experts from distinct disciplines according to the clinical question; (ii) analysis proposition: domain experts put forward their own analysis with their expertise; (iii) report summarization: compose a summarized report on the basis of a previous series of analyses; (iv) collaborative consultation: engage the experts in discussions over the summarized report. The report will be revised iteratively until an agreement from all the experts is reached; (v) decision making: derive a final decision from the unanimous report.
  • DiffBody: Diffusion-based Pose and Shape Editing of Human Images. This study presents a one-shot approach to human image editing that allows for substantial body form and position modifications without compromising the subject’s identification.
  • LLaMA Beyond English: An Empirical Study on Language Capability Transfer. Our evaluation results demonstrate that comparable performance to state-of-the-art transfer models can be achieved with less than 1% of the pretraining data, both in terms of knowledge alignment and response quality.
source

News

source
Denoising Vision Transformers

Resources

  • Steering Llama-2 with contrastive activation additions. By just adding e.g. a “sycophancy vector” to one bias term, we outperform supervised fine-tuning and few-shot prompting at steering completions to be more or less sycophantic. Furthermore, these techniques are complementary: we show evidence that we can get all three benefits at once!
  • DiffusionEdge. DiffusionEdge is an innovative edge detection model that works better than current techniques. Through the integration of a diffusion probabilistic model, DiffusionEdge produces resource-efficient edge maps that are more precise and clean.
  • Transformers From Scratch. In this blog we’re going to walk through creating and training a transformer from scratch. We’ll go through each foundational element step by step and explain what is happening along the way.
  • Merge Large Language Models with mergekit. Model merging is a technique that combines two or more LLMs into a single model. It’s a relatively new and experimental method to create new models for cheap (no GPU required). Model merging works surprisingly well and produced many state-of-the-art models on the Open LLM Leaderboard. In this tutorial, we will implement it using the mergekit library.
  • Mathematical Introduction to Deep Learning: Methods, Implementations, and Theory. This book aims to provide an introduction to the topic of deep learning algorithms. We review essential components of deep learning algorithms in full mathematical detail including different artificial neural network (ANN) architectures and different optimization algorithms
  • Portkey’s AI Gateway. is the interface between your app and hosted LLMs. It streamlines API requests to OpenAI, Anthropic, Mistral, LLama2, Anyscale, Google Gemini, and more with a unified API.act-plus-plus.Imitation Learning algorithms and Co-training for Mobile ALOHAcrewAI.Cutting-edge framework for orchestrating role-playing, autonomous AI agents. By fostering collaborative intelligence, CrewAI empowers agents to work together seamlessly, tackling complex tasks.
source
  • Integrating CLIP and SAM for Enhanced Image Segmentation. In order to enhance picture segmentation and identification, this research presents the Open-Vocabulary SAM, a framework that combines the advantages of CLIP and SAM models.
  • Diffusion Models for Reinforcement Learning: A Survey. Diffusion models’ contribution to RL. Their applications are categorized in this repository, which also provides links to upcoming interdisciplinary research opportunities.
  • tinygrad. A very simple implementation of inference of the new Mistral MoE model using the Tinygrad library.
  • YouTube Transcripts → Knowledge Graphs for RAG Applications. how to scrape YouTube video transcripts into a knowledge graph for Retrieval Augmented Generation (RAG) applications.
  • AI Toolkit. AI Toolkit is a header-only C++ library that provides tools for building the brain of your game’s NPCs.S
  • peechAgents. SpeechAgents is a multi-modal artificial intelligence system that can very realistically mimic human speech. With the use of a multi-modal LLM, this system can manage up to 25 agents. Its ability to imitate human language, complete with constant substance, realistic rhythms, and emotive emotions, suggests that it has promise for use in plays and audiobooks.
source
  • Model Card for Switch Transformers C — 2048 experts (1.6T parameters for 3.1 TB). Google’s switch transformer was among the first Mixture-of-Experts models to achieve success. It can now be found on the HuggingFace platform with code.
  • Make LLM Fine-tuning 2x faster with Unsloth and 🤗 TRL. Pulling your hair out because LLM fine-tuning is taking forever? In this post, we introduce a lightweight tool developed by the community to make LLM fine-tuning go super fast!
  • distilabel Orca Pairs for DPO. a novel technique that makes it possible to filter excellent pair preferences for alignment. It significantly raises the performance of the baseline model.
  • Chatbot UI. The open-source AI chat app for everyone.
  • explain-then-translate. We propose a 2-stage Chain-of-Thought (CoT) prompting technique for program translation: we ask models to explain the source programs first before translating.
  • WhiteRabbitNeo-33B-v1 . Both offensive and defensive security training have been given to this model. This general-purpose coding paradigm can help with activities related to cyber security. This implies that you may use it to learn how to defend against various attacks and vulnerabilities as well as to safeguard your networks.

Perspectives

  • How to Build a Thinking AI. This article provides an analytical framework for how to simulate human-like thought processes within a computer. It describes how attention and memory should be structured, updated, and utilized to search for associative additions to the stream of thought.
  • The New York Times’ AI Opportunity. In its case against OpenAI and Microsoft, the New York Times alleges that the companies’ AI technologies — ChatGPT among them — were trained on millions of copyrighted articles from the newspaper, resulting in outputs that are directly competitive with the Times’ services. The lawsuit challenges the legality of AI training practices and the effects of AI on traditional content creators, claiming that this amounts to copyright infringement and jeopardizes the newspaper’s investment in journalism. It also demands the destruction of AI models and data that used Times content, along with billions of dollars in damages.
  • Does AI risk “other” the AIs? The idea of “othering” AIs and the moral ramifications of regulating or changing AI in the future as well as human values are the main topics of this essay’s analysis of Robin Hanson’s critique of the AI risk discourse. Fearing AI as an “other” is biased, according to Hanson. It’s possible that Hanson’s opinions undervalue the dangers of unchecked AI growth and the difficulties of bringing future AI ideals into line with human ethics.
  • Part One: One-Year Anniversary of ChatGPT. Has AI Become the New Tech Platform? The “Anatomy Framework”, a tool for evaluating the disruptive potential of any breakthrough, including artificial intelligence, is introduced in this article. It examines innovation from five perspectives: apps, tools, core platform, underlying infrastructure, and ecosystem facilitators. It also covers the role of innovators, both new and established and the innovation medium (hardware vs. software).
source
source
  • AI and the Future of SaaS. Today, let’s look into the crystal ball and see a few opportunities, challenges, and threats that AI systems may pose for software entrepreneurs and creators.
  • Benchmarking GPT-4 Turbo — A Cautionary Tale. GPT-4 Turbo came up slightly behind at 68.8%, while GPT-4 successfully finished 70% of the programming tasks. It’s interesting to note that GPT-4 Turbo needed more tries than GPT-4, which may indicate that it lacks GPT-4’s memory power. A further test supported this.
  • Unraveling spectral properties of kernel matrices. This article examines the implications for learning properties of the way that eigenvalues vary for various Kernel Matrices.
source
  • NVIDIA’s CEO on Leading Through the A.I. Revolution. In this podcast, NVIDIA CEO and co-founder Jensen Huang shares his thoughts on how he steers his company through rapidly changing times and offers advice to other entrepreneurs on how to stay competitive by incorporating AI into their operations.
  • It’s Humans All the Way Down. Because everyone believes that everyone else’s work is simple, people believe that AI will replace a lot of employment. Ignorance is the foundation for the desire to exclude humans from the equation. It is impossible to ignore the fact that people matter, even in the craziest of ideas. Humans want to be seen and understood by other humans.

Meme of the week

What do you think about it? Some news that captured your attention? Let me know in the comments

If you have found this interesting:

You can look for my other articles, and you can also connect or reach me on LinkedIn. Check this repository containing weekly updated ML & AI news. I am open to collaborations and projects and you can reach me on LinkedIn.

Here is the link to my GitHub repository, where I am collecting code and many resources related to machine learning, artificial intelligence, and more.

or you may be interested in one of my recent articles:

--

--

Salvatore Raieli

Senior data scientist | about science, machine learning, and AI. Top writer in Artificial Intelligence