WEEKLY AI NEWS: RESEARCH, NEWS, RESOURCES, AND PERSPECTIVES

ML news: Week 6–12 November

OpenAI dev, TopicGPT, new chips, and much more

Salvatore Raieli

14 min readNov 13, 2023

ML news — Photo by Tim Mossholder on Unsplash

The most interesting news, repository, articles, and resources of the week

Check and star this repository where the news will be collected and indexed:

GitHub — SalvatoreRa/ML-news-of-the-week: A collection of the the best ML news every week…

A collection of the the best ML news every week (research, news, resources) — GitHub — SalvatoreRa/ML-news-of-the-week…

github.com

You will find the news first in GitHub. Single posts are also collected here:

Salvatore Raieli

Weekly AI and ML news - each week the best of the field

View list

63 stories

Research

RT-Sketch: Goal-Conditioned Imitation Learning from Hand-Drawn Sketches. Hand-drawn sketches as a modality for goal specification in visual imitation learning. You sketch the robot execute, in other words, you can communicate with the robot with a sketch. here is the official article.

Cheating Depth: Enhancing 3D Surface Anomaly Detection via Depth Simulation. RGB-based surface anomaly detection methods have advanced significantly. However, certain surface anomalies remain practically invisible in RGB alone, necessitating the incorporation of 3D information. This new approach 3D data with RGB outperforms traditional methods for surface anomaly detection. official code.
Gaussian Mixture Solvers for Diffusion Models. Recently, diffusion models have achieved great success in generative tasks. Gaussian mixture solvers improve the model both in speed and quality official code.
PixArt-α: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis. This paper introduces PIXART-alpha, a Transformer-based T2I diffusion model whose image generation quality is competitive with state-of-the-art image generators. The model uses three elements: T5 text encodings, cross attention, and a diffusion transformer

Language Models are Super Mario: Absorbing Abilities from Homologous Models as a Free Lunch. In this paper, we uncover that Language Models (LMs), either encoder- or decoder-based, can obtain new capabilities by assimilating the parameters of homologous models without retraining or GPUs. official code.
An Efficient Self-Supervised Cross-View Training For Sentence Embedding. Cross-View Training (SCT) allows efficient sentence embedding with small language models official code.
A Systematic Review of Deep Graph Neural Networks: Challenges, Classification, Architectures, Applications & Potential Utility in Bioinformatics. Apart from presenting all existing GNN models, mathematical analysis and comparison of the variants of all types of GNN have been highlighted in this survey. Graph neural networks are investigated for their potential real-world applications in various fields, focusing on Bioinformatics.
How AI could lead to a better understanding of the brain Early machine-learning systems were inspired by neural networks — now AI might allow neuroscientists to get to grips with the brain’s unique complexities.
How AI can help to save endangered species Scientists are using artificial intelligence to fight biodiversity loss by analyzing vast amounts of data, monitoring ecosystems, and spotting trends over time.

Pretraining Data Mixtures Enable Narrow Model Selection Capabilities in Transformer Models An article from Google providing experimental evidence that the transformer (and therefore LLMs) cannot generalize beyond the training data. This is an indication that the transformer will be not the architecture leading us to artificial general intelligence (AGI)
RobustMat: Neural Diffusion for Street Landmark Patch Matching under Challenging EnvironmentsFor autonomous vehicles (AVs), visual perception techniques based on sensors like cameras play crucial roles in information acquisition and processing. In various computer perception tasks for AVs, it may be helpful to match landmark patches taken by an onboard camera with other landmark patches captured at a different time or saved in a street scene image database. The authors using spatial information and neural differential equations have created an approach to improve landmark matching. official code I
2VGen-XL: High-Quality Image-to-Video Synthesis via Cascaded Diffusion Models. Video synthesis has recently made remarkable strides benefiting from the rapid development of diffusion models. However, it still encounters challenges in terms of semantic accuracy, clarity, and spatio-temporal continuity. This new approach is composed of two steps: preserve the static image’s content and refine details and resolution.
Rethinking Benchmark and Contamination for Language Models with Rephrased Samples. We know that better data improves LLM training, here is a better way to clean your data. The authors have published the decontaminator tool here.
Hallucination in LLMs. We begin with an innovative taxonomy of LLM hallucinations, then delve into the factors contributing to hallucinations. Subsequently, we present a comprehensive overview of hallucination detection methods and benchmarks
Simplifying Transformer Blocks. Combining signal propagation theory and empirical observations, we motivate modifications that allow many block components to be removed with no loss of training speed, including skip connections, projection or value parameters, sequential sub-blocks, and normalization layers. official code
LLaVA-Med: Large Language and Vision Assistant for BioMedicine LLaVA-Med was initialized with the general-domain LLaVA and then continuously trained in a curriculum learning fashion (first biomedical concept alignment then full-blown instruction-tuning). We evaluated LLaVA-Med on standard visual conversation and question-answering tasks. official repository

Teaching is Hard: How to Train Small Models and Outperforming Large Counterparts

Distilling the knowledge of a large model is complex but a new method shows incredible performances

towardsdatascience.com

News

Google Research scholar program. The Research Scholar Program aims to support early-career professors who are pursuing research in fields relevant to Google.
OpenAI DevDay Buzz Includes Alleged Leak Of New ChatGPT Prototype. ”highlights: OpenAI could introduce major updates for developers, making it cheaper and faster to build AI-based applications. A rumored “Team” plan for ChatGPT could offer unlimited high-speed GPT-4, advanced data analysis, and more.”
Google is extending its Vulnerability Rewards Program (VRP) to include generative AI. Today, we’re expanding our VRP to reward attack scenarios specific to generative AI. As part of expanding VRP for AI, we’re taking a fresh look at how bugs should be categorized and reported.
Paper Digest: NeurIPS 2023 Highlights. Paper Digest has analyzed more than 500/3500 papers. Interesting, but many articles are already been published for a while
HelixNet.HelixNet is a Deep Learning architecture consisting of 3 x Mistral-7B LLMs. It has an actor, a critic, and a regenerator. The authors also used AI synthetic data. This approach showed impressive results. The model is available on HuggingFace

ChatGPT Plus members can upload and analyze files in the latest beta. ChatGPT Plus members can also use modes like Browse with Bing without manually switching, letting the chatbot decide when to use them.
OpenAI Dev day recap. A recap by OpenAI: New GPT-4 Turbo model that is more capable, cheaper, and supports a 128K context window, New Assistants API that makes it easier for developers to build their own assistive AI apps. New multimodal capabilities in the platform, including vision, image creation (DALL·E 3), and text-to-speech (TTS)
xAI PromptIDEIntegrated development environment for prompt engineering and interpretability research, released by xAI
ChatGPT continues to be one of the fastest-growing services everIn less than a year, it’s hit 100 million weekly users, and over 2 million developers are currently building on the company’s API, including the majority of Fortune 500 companies.
Xbox partners with Inworld AI to build AI tools for game development. Microsoft’s Xbox and Inworld AI have partnered to create AI-powered game development tools for narrative and character creation.
Nvidia Is Piloting a Generative AI for Its Engineers.ChipNeMo summarizes bug reports, gives advice, and writes design-tool scripts
YouTube to test generative AI features. Users may test out a new conversational tool that utilizes artificial intelligence (AI) to respond to inquiries about YouTube content and provide suggestions, as well as a new feature that summarizes subjects in video comments, as part of the premium package offered to pay subscribers.
Google Announces Expansion of AI Partnership with Anthropic. The partnership includes important new collaborations on AI safety standards, committing to the highest standards of AI security, and the use of TPU v5e accelerators for AI inference
Cohere Introduced Embed v3 Embed v3 offers state-of-the-art performance per trusted MTEB and BEIR benchmarks. it is multilingual (100+ languages), works well with noisy data, retrieval-augmentation generation (RAG) systems, searches in a language or cross-language searches
Microsoft has over a million paying Github Copilot users” We have over 1 million paid copilot users in more than 37,000 organizations that subscribe to copilot for business,” said Nadella, “with significant traction outside the United States.”
Meta’s audiocraft can also generate stereo music Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor/tokenizer, along with MusicGen, a simple and controllable music generation LM with textual and melodic conditioning.
Hugging Face has a two-person team developing ChatGPT-like AI models Hugging Face’s H4 team is focused on developing open-source ChatGPT
Samsung is joining the AI arms race, too Samsung’s live translate feature, which the company is calling “AI Live Translate Call,” will be built into the company’s native phone app. Samsung says “audio and text translations will appear in real-time as you speak” and that the translations will happen on the device.
Introducing Adept Experiments Adept is building AI agents and now they are opening access to test them
Introducing GPTs You can now create custom versions of ChatGPT that combine instructions, extra knowledge, and any combination of skills. Highlight: Starting today, you can create GPTs and share them publicly. Later this month, we’re launching the GPT Store, featuring creations by verified builders. Once in the store, GPTs become searchable and may climb the leaderboards. We will also spotlight the most useful and delightful GPTs we come across in categories like productivity, education, and “just for fun”. In the coming months, you’ll also be able to earn money based on how many people are using your GPT.
Google Cloud demonstrates the world’s largest distributed training job for large language models across 50000+ TPU v5e chips Google Cloud TPU Multislice Training was built from the ground up to address the challenges of distributed ML training in orchestration, compilation, and end-to-end optimization. We demonstrated the benefits of Cloud TPU Multislice Training with what we believe is the largest publicly disclosed LLM distributed training job in the world (in terms of the number of chips used for training) on a compute cluster of 50,944 Cloud TPU v5e chips on the JAX ML framework, utilizing both BF16 and INT8 quantized training.
OpenAI Data Partnerships. We’re interested in large-scale datasets that reflect human society and that are not already easily accessible online to the public today. We can work with any modality, including text, images, audio, or video. We’re particularly looking for data that expresses human intention (e.g. long-form writing or conversations rather than disconnected snippets), across any language, topic, and format.

Speak Only About What You Have Read: Can LLMs Generalize Beyond Their Pretraining Data?

Unveiling the Limits and Wonders of In-Context Learning in Large Language Models

pub.towardsai.net

Resources

DeepSpeed-FastGen: High-throughput Text Generation for LLMs via MII and DeepSpeed-Inference. new software for competing with vLLM and text-generation interfaces for the fast serving of language models.
qdrant. Qdrant (read: quadrant) is a vector similarity search engine and vector database. It provides a production-ready service with a convenient API to store, search, and manage points — vectors with an additional payload Qdrant is tailored to extended filtering support.
Video2MusicVideo2Music: Suitable Music Generation from Videos using an Affective Multimodal Transformer model. official article.
Hacking Google Bard — From Prompt Injection to Data Exfiltration.Great post that explains what are the novel risks with generative AI plugins

RedPajama-Data-v2.a new version of the RedPajama dataset, with 30 trillion filtered and deduplicated tokens (100+ trillion raw) from 84 CommonCrawl dumps covering 5 languages, along with 40+ pre-computed data quality annotations that can be used for further filtering and weighting. A dataset bigger than the one used for GPT-4 and already preprocessed
LLM4Rec.The proposed CLLM4Rec is the first recommender system that tightly combines the ID-based paradigm and LLM-based paradigm and leverages the advantages of both worlds.
consistencydecoder.OpenAI has released Improved decoding for stable diffusion vaes. The consistency decoder has reached the SOTA and it is nice they released also for stable diffusion
TopicGPT.We introduce TopicGPT, a prompt-based framework that uses large language models (LLMs) to uncover latent topics within a provided text collection. TopicGPT produces topics that align better with human categorizations compared to competing methods. official article.
FACTOR. an effective tool to detect deep fakes even without training. FACTOR leverages the discrepancy between false facts and their imperfect synthesis within deepfakes. By quantifying the similarity using the truth score, computed via cosine similarity, FACTOR effectively distinguishes between real and fake media, enabling robust detection of zero-day deepfake attacks.
CogVLM.CogVLM is a powerful open-source visual language model (VLM). CogVLM-17B has 10 billion vision parameters and 7 billion language parameters.
langroidLangroid is an intuitive, lightweight, extensible, and principled Python framework to easily build LLM-powered applications. You set up Agents, equip them with optional components (LLM, vector store, and methods), assign them tasks, and have them collaboratively solve a problem by exchanging messages.
OVIR-3D.D object retrieval from text prompts using 2D image fusion. his work provides a straightforward yet effective solution for open-vocabulary 3D instance retrieval, which returns a ranked set of 3D instance segments given a 3D point cloud reconstructed from an RGB-D video and a language query.
JaSPICE: Automatic Evaluation Metric Using Predicate-Argument Structures for Image Captioning Models. an automatic evaluation metric called JaSPICE, evaluates Japanese captions based on scene graphs. There is a gap between the performance of models for English captioning and other languages, this clever approach promises to reduce the gap
awesome-openai-vision-api-experiments.A set of examples showing how to use the OpenAI vision API to run inference on images, video files, and webcam streams.
punica. Low-rank adaptation (LoRA) is a parameter-efficient way to add new knowledge to a pre-trained LLM. Although the pre-trained LLM takes 100s of GB storage, a LoRA finetuned model only adds 1% storage and memory overhead. Punica enables running multiple LoRA finetuned models at the cost of running one.
LongQLoRA.LongQLoRA is a memory-efficient and effective method to extend the context length of Large Language Models with less training GPUs. On a single 32GB V100 GPU, LongQLoRA can extend the context length of LLaMA2 7B and 13B from 4096 to 8192 and even to 12k.
Lidar-Annotation-is-All-You-Need.a smarter method for self-driving cars to recognize roads by using lidar technology.

LM4VisualEncoding.Pretrained transformers from LLMs, despite being trained solely on textual data, are surprisingly strong encoders for purely visual tasks in the absence of language. Our exploration shows the potential of LLMs as general-purpose encoders for visual data, as opposed to the previous usages of either pure encoders for text embeddings or decoders for tokenized outputs. official article.
vimGPT.Browse the web with GPT-4V and Vimium. Vimium is a Chrome extension that lets you navigate the web with only your keyboard. You could use Vimium to give the model a way to interact with the web.
Announcing a New Way to Create AI Employeesthe first platform lets you build a team of AI employees working together to perform any task. The idea is to build an agent that you can call and ask to perform a task

Neural Ensemble: what’s Better than a Neural Network? A group of them

Neural ensemble: how to combine different neural networks in a powerful model

levelup.gitconnected.com

Perspectives

Data Pipeline Attacks.An excerpt from Secure Intelligent Machines. In the future attacks will be focused on poisoning data or other components of the data pipeline. This blog post describes this issue and potential mitigation issues
Could Cruise be the Theranos of AI? And is there a dark secret at the core of the entire driverless car industry? Cruise is a driverless car company bought by General Motors. However, it seems that remote human interventions are needed in many cases

Will generative AI transform business? Industries expect demand for quality control and human oversight of AI-generated content to grow
A minor ChatGPT update is a warning to founders: Big Tech can blow up your startup at any time. Wrapping chatGPT as a core business is not a great idea. chatGPT can now interact with PDF and let you ask questions which is blowing the business of small start-ups. It’s a bleak reminder that swift rule changes by Big Tech firms can wreak havoc on smaller players.
Pixel Perfect: How AI Unlocks Creativity.AI, and creators are gaining momentum. Using the right tactics can increase it
Almost an Agent: What GPTs can do. GPT is almost an agent, but what actually an agent can do? For instance, write a scientific article by itself
Are language models good at making predictions? It seems so. The article suggests GPT-4 really is better at making predictions for politics than for science or technology, even once the hardness of the questions are accounted for.
OpenAI Is A Lot More Vulnerable Than You Think. All the press, money, and awards in the world won’t prevent OpenAI from the cold reality of competition.
ChatGPT use shows that the grant-application system is broken. The fact that artificial intelligence can do much of the work makes a mockery of the process. It’s time to make it easier for scientists to ask for research funding.
The world’s week on AI safety: powerful computing efforts launched to boost research. UK and US governments have established efforts to democratize access to supercomputers that will aid studies on AI systems.
Is AI the Next Crypto? Insights from 2M HN comments. Both crypto and AI have been heavily debated on Hacker News, with discussions going back years. By looking at trends in HN commenter opinions we might find interesting similarities and differences.
AI companies have all kinds of arguments against paying for copyrighted content. The biggest companies in AI aren’t interested in paying to use copyrighted material as training data.
AI could cause ‘catastrophic’ financial crisis, says Yuval Noah Harari Historian and Sapiens author says the sophistication of technology makes it difficult to forecast its dangers
Nvidia Envy: understanding the GPU gold rush. In 2023, thousands of companies and countries begged Nvidia to purchase more GPUs. Can the exponential demand endure?

AI is about to completely change how you use computers. Bill Gates in his blog (yes, he has a blog) discusses how AI will revolutionize software interaction
Self Supervised Learning Market Size Thrives with AI Systems That Discover Patterns and Insights Independently Self-Supervised Learning market growth surges due to AI’s ability to autonomously learn from unlabelled data, enhancing efficiency and innovation
Yoko Taro Foresees the End of Video Games as We Know Them Yoko Taro says the rise of AI will give birth to a new era of video games in which the line between developer and player is blurred into nonexistence.
How Generative AI Will Transform Knowledge Work Generative AI can be a boon for knowledge work, but only if you use it in the right way. New generative AI-enabled tools are rapidly emerging to assist and transform knowledge work in industries ranging from education and finance to law and medicine.

Meme of the week

If you have found this interesting:

You can look for my other articles, and you can also connect or reach me on LinkedIn. Check this repository containing weekly updated ML & AI news. I am open to collaborations and projects and you can reach me on LinkedIn.

Here is the link to my GitHub repository, where I am collecting code and many resources related to machine learning, artificial intelligence, and more.

GitHub — SalvatoreRa/tutorial: Tutorials on machine learning, artificial intelligence, data science…

Tutorials on machine learning, artificial intelligence, data science with math explanation and reusable code (in python…