WEEKLY AI NEWS: RESEARCH, NEWS, RESOURCES, AND PERSPECTIVES
ML news: Week 8–14 January
Introducing ChatGPT store, Google launches new AI services for retailers and much more
The most interesting news, repository, articles, and resources of the week
Check and star this repository where the news will be collected and indexed:
You will find the news first in GitHub. Single posts are also collected here:
Research
- GUESS:GradUally Enriching SyntheSis for Text-Driven Human Motion Generation. A human motion from a text framework named GUESS has been introduced. It reduces intricate human stances to more abstract forms on several levels, resulting in a more steady and condensed synthesis of motion from text.
- Learning to Prompt with Text Only Supervision for Vision-Language Models. This project presents a technique to keep the generalization capabilities of CLIP-like vision-language models while adapting them for different tasks. Prompts are learned from LLM data, so labeled images are not necessary.
- LLaVA-ϕ: Efficient Multi-Modal Assistant with Small Language Model. In this paper, we introduce LLaVA-ϕ (LLaVA-Phi), an efficient multi-modal assistant that harnesses the power of the recently advanced small language model, Phi-2, to facilitate multi-modal dialogues.
- V*: Guided Visual Search as a Core Mechanism in Multimodal LLMs. We introduce V*, an LLM-guided visual search mechanism that employs the world knowledge in LLMs for efficient visual querying. When combined with an MLLM, this mechanism enhances collaborative reasoning, contextual understanding, and precise targeting of specific visual elements.
- DeepSeek LLM: Scaling Open-Source Language Models with Longtermism. The DeepSeek LLM was one of the greatest coding models available last year. In several benchmarks, it achieved closeness to GPT-3.5 (despite being probably three times larger). A technical study has been made public with details on model training, token counts, model architecture, and other topics.
- Denoising Vision Transformers. The vision community has been overtaken by Vision Transformers (ViT). They occasionally still exhibit artifacts in their embeddings that resemble grids. The community is reluctant to use them for jobs that come after because of this. This study suggests a positional embedding update that fixes this problem and provides a 25%+ performance gain for downstream vision tasks.
- FED-NeRF: Achieve High 3D Consistency and Temporal Coherence for Face Video Editing on Dynamic NeRF. A new stabilizer for smooth temporal coherence and GAN-NeRF technology for 3D consistency have been used by researchers to create a facial video editing architecture. This technique works well for editing videos since it keeps viewpoints constant and makes frame transitions smooth.
- A Minimaximalist Approach to Reinforcement Learning from Human Feedback. Self-Play Preference Optimization (SPO), a less complex alignment method than conventional RLHF, has been presented by Google researchers. Using game theory, the researchers were able to develop single-player self-play dynamics that provide good performance and are resilient to noisy preferences.
- Mixtral of Experts. We introduce Mixtral 8x7B, a Sparse Mixture of Experts (SMoE) language model. Mixtral has the same architecture as Mistral 7B, with the difference that each layer is composed of 8 feedforward blocks (i.e. experts).
- GPT-4V(ision) is a Human-Aligned Evaluator for Text-to-3D Generation. The constraints of existing single-criterion measures have been addressed by researchers with the development of a new assessment metric for text-to-3D generative models. This sophisticated technique compares 3D objects and generates prompts using GPT-4V. It is very compatible with human tastes and provides flexibility by adjusting to different user-specified requirements.
- Self-emerging Token Labeling. Using a novel self-emerging token labeling (STL) framework, researchers have made a substantial development for Vision Transformers (ViTs) by improving the resilience of the Fully Attentional Network (FAN) models. Using this method, a FAN student model is trained after a FAN token labeler has been trained to produce relevant patch token labels.
- MedAgents: Large Language Models as Collaborators for Zero-shot Medical Reasoning. We propose a Multi-disciplinary Collaboration (MC) framework. The framework works in five stages: (i) expert gathering: gathering experts from distinct disciplines according to the clinical question; (ii) analysis proposition: domain experts put forward their own analysis with their expertise; (iii) report summarization: compose a summarized report on the basis of a previous series of analyses; (iv) collaborative consultation: engage the experts in discussions over the summarized report. The report will be revised iteratively until an agreement from all the experts is reached; (v) decision making: derive a final decision from the unanimous report.
- DiffBody: Diffusion-based Pose and Shape Editing of Human Images. This study presents a one-shot approach to human image editing that allows for substantial body form and position modifications without compromising the subject’s identification.
- LLaMA Beyond English: An Empirical Study on Language Capability Transfer. Our evaluation results demonstrate that comparable performance to state-of-the-art transfer models can be achieved with less than 1% of the pretraining data, both in terms of knowledge alignment and response quality.
- Masked Audio Generation using a Single Non-Autoregressive Transformer. Most audio creation methods produce sounds by diffusion or an auto-regressive model. This study does not employ a complex Transformer or several stages. Rather, it employs an obscured language model on top of audio tokens.
- TechGPT-2.0: A large language model project to solve the task of knowledge graph construction. TechGPT-2.0 improves on big language models for particular applications, such as building knowledge graphs. With its emphasis on relationship triple extraction and named entity identification, the project also represents a major advancement for the Chinese open-source AI community.
- Long-Context Retrieval Models with Monarch Mixer. Compute has been investigating a variety of substitutes for Transformers. It has published a retrieval model for retrieval tasks that performs better than a lot of closed embedding models.
- Quantifying Language Models’ Sensitivity to Spurious Features in Prompt Design or: How I learned to start worrying about prompt formatting. It is shown that, depending on the prompt, prompting models for a small set of shot benchmarks can provide task accuracy ranging from 4 to 88%. This study demonstrates how to enhance your prompts in a scientific way.
- Application of Deep Learning in Blind Motion Deblurring: Current Status and Future Prospects. An extensive review of deep learning’s application to blind motion deblurring — a crucial field in computer vision — is provided in this work. It covers everything from fundamental ideas and the drawbacks of conventional approaches to a thorough analysis of contemporary strategies including CNNs, GANs, RNNs, and Transformers.
- Singer Identity Representation Learning using Self-Supervised Techniques. A new framework has been created by researchers to examine and comprehend singing voices more thoroughly. By applying self-supervised learning on isolated vocal records and focusing on out-of-domain generalization, they achieved progress in tasks like singing voice similarity and synthesis, improving upon current technology.
- Towards the Law of Capacity Gap in Distilling Language Models. Language model (LM) distillation is a trending area that aims to distill the knowledge residing in a large teacher LM to a small student one. The law later guides us to distill a 3B student LM (termed MiniMA) from a 7B teacher LM (adapted LLaMA2–7B).
News
- Nabla raises another $24 million for its AI assistant for doctors that automatically writes clinical notes. Paris-based startup Nabla just announced that it had raised a $24 million Series B funding round led by Cathay InnovationOpenInterpreter gets an OS mode.An excellent effort that mimics OpenAI’s interpreter is called Open Interpreter. It can now operate your computer using a language model by pressing buttons and seeing the screen since it has both an OS mode and a visual mode.
- Wave of Apple Generative AI Tools on Track for WWDC Debut. Apple is on schedule to announce a series of generative AI-based tools at its Worldwide Developers Conference (WWDC) in June, Bloomberg’s Mark Gurman reports.
- A survey of 2,778 researchers shows how fragmented the AI science community is. The “2023 Expert Survey on Progress in AI” shows that the scientific community has no consensus on the risks and opportunities of AI, but everything is moving faster than once thought.
- Microsoft’s observer has reportedly joined the OpenAI board. Now Bloomberg reports that person is Microsoft vp Dee Templeton, who has been there for 25 years and leads a team responsible for managing its relationship with OpenAI.
- Microsoft, OpenAI sued for copyright infringement by nonfiction book authors in class action claim. Two nonfiction book authors sued Microsoft and OpenAI in a putative class action complaint alleging that the defendants “simply stole” the writers’ copyrighted works to help build a billion-dollar artificial intelligence system.
- OpenAI and journalism. In response to The New York Times lawsuit, OpenAI emphasized working with news organizations, asserted that using public content for AI training is fair use, pledged to stop using rare content repeatedly in their models, and expressed surprise at the lawsuit considering their continuous efforts to address issues.
- Getty and Nvidia bring generative AI to stock photos. Generative AI by iStock lets users make their own stock photos from text prompts.
- Microsoft’s new Copilot key is the first big change to Windows keyboards in 30 years. Microsoft wants 2024 to be the year of the AI PC as it lines up bigger changes to Windows.
- Rabbit foundation model and computer. The large action model (LAM) developed by Rabbit was designed to work with the R1 pocket companion computer. Almost fully driven by its LAM, the company’s R1 gadget is a reimagining of the computer and smartphone.
- OpenAI’s news publisher deals reportedly top out at $5 million a year. The ChatGPT company has been trying to get more news organizations to sign licensing deals to train AI models.
- Intel: ‘We are bringing the AI PC to the car’. The chip company is doubling down on its auto business, introducing a new AI-enhanced system-on-a-chip for cars. The first company to install it will be Zeekr.
- AlphaFold’s Latest Strides: Improved Accuracy for Antibody-Antigen Complex Modeling. A new study from the University of Maryland evaluates its accuracy and provides new insights into the factors influencing protein modeling.
- Introducing the GPT Store. OpenAI has launched the GPT store, which allows developers to get paid by building these agents. The company plans to feature GPTs every week.
- Regulators aren’t convinced that Microsoft and OpenAI operate independently. EU is fielding comments on potential market harms of Microsoft’s investments.
- Your private AI can have eyes. Ollama with the LLaVA model. Vision models are now supported by Ollama. With Llava, you may enjoy cutting-edge language and vision performance on your MacBook Pro.
- OpenAI debuts ChatGPT subscription aimed at small teams. OpenAI is launching a new subscription plan for ChatGPT, its viral AI-powered chatbot, aimed at smaller, self-service-oriented teams.
- Valve now allows the “vast majority” of AI-powered games on Steam. A new reporting system will enforce “guardrails” for “live-generated” AI content.
- marc newson designs swarovski’s world-first AI binoculars that identify species on their own. the dubbed world’s first AI-supported binoculars that using their high-performance analog long-range optics and digital intelligence, can detect and identify more than 9,000 birds and other wildlife at the touch of a button.
- Google Cloud launches new generative AI tools for retailers. Google launched several new AI tools for retailers to improve online shopping experiences and other retail operations.
- Amazon’s Alexa gets new generative AI-powered experiences. Today, the company revealed three developers delivering new generative AI-powered Alexa experiences, including AI chatbot platform Character.AI, AI music company Splash, and Voice AI game developer Volley. All three experiences are available in the Amazon Alexa Skill Store.
Resources
- Steering Llama-2 with contrastive activation additions. By just adding e.g. a “sycophancy vector” to one bias term, we outperform supervised fine-tuning and few-shot prompting at steering completions to be more or less sycophantic. Furthermore, these techniques are complementary: we show evidence that we can get all three benefits at once!
- DiffusionEdge. DiffusionEdge is an innovative edge detection model that works better than current techniques. Through the integration of a diffusion probabilistic model, DiffusionEdge produces resource-efficient edge maps that are more precise and clean.
- Transformers From Scratch. In this blog we’re going to walk through creating and training a transformer from scratch. We’ll go through each foundational element step by step and explain what is happening along the way.
- Merge Large Language Models with mergekit. Model merging is a technique that combines two or more LLMs into a single model. It’s a relatively new and experimental method to create new models for cheap (no GPU required). Model merging works surprisingly well and produced many state-of-the-art models on the Open LLM Leaderboard. In this tutorial, we will implement it using the mergekit library.
- Mathematical Introduction to Deep Learning: Methods, Implementations, and Theory. This book aims to provide an introduction to the topic of deep learning algorithms. We review essential components of deep learning algorithms in full mathematical detail including different artificial neural network (ANN) architectures and different optimization algorithms
- Portkey’s AI Gateway. is the interface between your app and hosted LLMs. It streamlines API requests to OpenAI, Anthropic, Mistral, LLama2, Anyscale, Google Gemini, and more with a unified API.act-plus-plus.Imitation Learning algorithms and Co-training for Mobile ALOHAcrewAI.Cutting-edge framework for orchestrating role-playing, autonomous AI agents. By fostering collaborative intelligence, CrewAI empowers agents to work together seamlessly, tackling complex tasks.
- Integrating CLIP and SAM for Enhanced Image Segmentation. In order to enhance picture segmentation and identification, this research presents the Open-Vocabulary SAM, a framework that combines the advantages of CLIP and SAM models.
- Diffusion Models for Reinforcement Learning: A Survey. Diffusion models’ contribution to RL. Their applications are categorized in this repository, which also provides links to upcoming interdisciplinary research opportunities.
- tinygrad. A very simple implementation of inference of the new Mistral MoE model using the Tinygrad library.
- YouTube Transcripts → Knowledge Graphs for RAG Applications. how to scrape YouTube video transcripts into a knowledge graph for Retrieval Augmented Generation (RAG) applications.
- AI Toolkit. AI Toolkit is a header-only C++ library that provides tools for building the brain of your game’s NPCs.S
- peechAgents. SpeechAgents is a multi-modal artificial intelligence system that can very realistically mimic human speech. With the use of a multi-modal LLM, this system can manage up to 25 agents. Its ability to imitate human language, complete with constant substance, realistic rhythms, and emotive emotions, suggests that it has promise for use in plays and audiobooks.
- Model Card for Switch Transformers C — 2048 experts (1.6T parameters for 3.1 TB). Google’s switch transformer was among the first Mixture-of-Experts models to achieve success. It can now be found on the HuggingFace platform with code.
- Make LLM Fine-tuning 2x faster with Unsloth and 🤗 TRL. Pulling your hair out because LLM fine-tuning is taking forever? In this post, we introduce a lightweight tool developed by the community to make LLM fine-tuning go super fast!
- distilabel Orca Pairs for DPO. a novel technique that makes it possible to filter excellent pair preferences for alignment. It significantly raises the performance of the baseline model.
- Chatbot UI. The open-source AI chat app for everyone.
- explain-then-translate. We propose a 2-stage Chain-of-Thought (CoT) prompting technique for program translation: we ask models to explain the source programs first before translating.
- WhiteRabbitNeo-33B-v1 . Both offensive and defensive security training have been given to this model. This general-purpose coding paradigm can help with activities related to cyber security. This implies that you may use it to learn how to defend against various attacks and vulnerabilities as well as to safeguard your networks.
Perspectives
- How to Build a Thinking AI. This article provides an analytical framework for how to simulate human-like thought processes within a computer. It describes how attention and memory should be structured, updated, and utilized to search for associative additions to the stream of thought.
- The New York Times’ AI Opportunity. In its case against OpenAI and Microsoft, the New York Times alleges that the companies’ AI technologies — ChatGPT among them — were trained on millions of copyrighted articles from the newspaper, resulting in outputs that are directly competitive with the Times’ services. The lawsuit challenges the legality of AI training practices and the effects of AI on traditional content creators, claiming that this amounts to copyright infringement and jeopardizes the newspaper’s investment in journalism. It also demands the destruction of AI models and data that used Times content, along with billions of dollars in damages.
- Does AI risk “other” the AIs? The idea of “othering” AIs and the moral ramifications of regulating or changing AI in the future as well as human values are the main topics of this essay’s analysis of Robin Hanson’s critique of the AI risk discourse. Fearing AI as an “other” is biased, according to Hanson. It’s possible that Hanson’s opinions undervalue the dangers of unchecked AI growth and the difficulties of bringing future AI ideals into line with human ethics.
- Part One: One-Year Anniversary of ChatGPT. Has AI Become the New Tech Platform? The “Anatomy Framework”, a tool for evaluating the disruptive potential of any breakthrough, including artificial intelligence, is introduced in this article. It examines innovation from five perspectives: apps, tools, core platform, underlying infrastructure, and ecosystem facilitators. It also covers the role of innovators, both new and established and the innovation medium (hardware vs. software).
- There are holes in Europe’s AI Act — and researchers can help to fill them. Scientists have been promised a front-row seat in the formulation of the EU’s proposed AI regulatory structures. They should seize this opportunity to bridge some big gaps.
- The science events to watch for in 2024. Advanced AI tools, Moon missions, and ultrafast supercomputers are among the developments set to shape research in the coming year.
- Will superintelligent AI sneak up on us? A new study offers reassurance. Improvements in the performance of large language models such as ChatGPT are more predictable than they seem.
- AI consciousness: scientists say we urgently need answers. Researchers call for more funding to study the boundary between conscious and unconscious systems.
- AI could transform metal recycling globally. Metal recycling needs to become more cost-efficient because it is a crucial contributor to the global circular economy and the transition to renewable energy.
- Can AI make genuine theoretical discoveries? When Nature included ChatGPT alongside its list of ten people who helped to shape science in 2023, it seemed deliberately provocative
- AI and the Future of SaaS. Today, let’s look into the crystal ball and see a few opportunities, challenges, and threats that AI systems may pose for software entrepreneurs and creators.
- Benchmarking GPT-4 Turbo — A Cautionary Tale. GPT-4 Turbo came up slightly behind at 68.8%, while GPT-4 successfully finished 70% of the programming tasks. It’s interesting to note that GPT-4 Turbo needed more tries than GPT-4, which may indicate that it lacks GPT-4’s memory power. A further test supported this.
- Unraveling spectral properties of kernel matrices. This article examines the implications for learning properties of the way that eigenvalues vary for various Kernel Matrices.
- NVIDIA’s CEO on Leading Through the A.I. Revolution. In this podcast, NVIDIA CEO and co-founder Jensen Huang shares his thoughts on how he steers his company through rapidly changing times and offers advice to other entrepreneurs on how to stay competitive by incorporating AI into their operations.
- It’s Humans All the Way Down. Because everyone believes that everyone else’s work is simple, people believe that AI will replace a lot of employment. Ignorance is the foundation for the desire to exclude humans from the equation. It is impossible to ignore the fact that people matter, even in the craziest of ideas. Humans want to be seen and understood by other humans.
Meme of the week
What do you think about it? Some news that captured your attention? Let me know in the comments
If you have found this interesting:
You can look for my other articles, and you can also connect or reach me on LinkedIn. Check this repository containing weekly updated ML & AI news. I am open to collaborations and projects and you can reach me on LinkedIn.
Here is the link to my GitHub repository, where I am collecting code and many resources related to machine learning, artificial intelligence, and more.
or you may be interested in one of my recent articles: