WEEKLY AI NEWS: RESEARCH, NEWS, RESOURCES, AND PERSPECTIVES
AI & ML news: Week 24–31 March
Inflection is acquired by Microsoft, Amazon invest in anthropic and much more
The most interesting news, repository, articles, and resources of the week
Check and star this repository where the news will be collected and indexed:
You will find the news first in GitHub. Single posts are also collected here:
Research
- Mora: Enabling Generalist Video Generation via A Multi-Agent Framework. This paper introduces Mora, a new multi-agent framework designed to close the gap in the field of generalist video generation, mimicking the capabilities of the leading model, Sora, across a range of tasks including text-to-video and video editing. Despite achieving performance close to Sora in various tasks, Mora still faces a holistic performance gap, marking a step towards future advancements in collaborative AI agents for video generation.
- Open-Vocabulary Attention Maps with Token Optimization for Semantic Segmentation in Diffusion Models. Text-to-image diffusion models such as Stable Diffusion are altered by Open-Vocabulary Attention Maps (OVAM), which overcome earlier restrictions by enabling the creation of attention maps for any word.
- HETAL: Efficient Privacy-preserving Transfer Learning with Homomorphic Encryption. Securing data privacy with Homomorphic Encryption, HETAL’s novel method of transfer learning represents a major advancement in safe AI training.
- HAC: Hash-grid Assisted Context for 3D Gaussian Splatting Compression. This paper presents the Hash-grid Assisted Context (HAC) framework, which outperforms existing standards by achieving over 75X compression of 3D Gaussian Splatting (3DGS) data.
- Shadow Generation for Composite Image Using Diffusion model. This work overcomes earlier difficulties with form and intensity accuracy to present a novel approach to producing realistic shadows in picture composition. The addition of intensity modulation modules to ControlNet and the expansion of the DESOBA dataset allowed the researchers to achieve a considerable improvement in shadow production in pictures.
- View-decoupled Transformer for Person Re-identification under Aerial-ground Camera Network. The View-Decoupled Transformer (VDT) was created by researchers to address the problem of detecting subjects from disparate camera perspectives, such as those obtained from ground and aerial cameras.
- ElasticDiffusion: Training-free Arbitrary Size Image Generation.Text-to-image diffusion models can now generate images in different sizes and aspect ratios without the need for extra training thanks to ElasticDiffusion, an inventive decoding technique.
- PSALM: Pixelwise SegmentAtion with Large Multi-Modal Model. The Large Multi-modal Model (LMM) is extended by PSALM, which adds a mask decoder and a flexible input schema to perform well in a range of picture segmentation tasks. This method not only gets beyond the drawbacks of text-only outputs but also makes it possible for the model to comprehend and categorize complicated images with ease.
- Compositional Inversion for Stable Diffusion Models. In order to solve overfitting problems, researchers have devised a novel technique to enhance the way AI generates individualized visuals. This method guarantees that the thoughts are represented in the images in a more varied and balanced manner.
- Residual Dense Swin Transformer for Continuous Depth-Independent Ultrasound Imaging. With arbitrary-scale super-resolution, RDSTN is a novel network that addresses the trade-off between field-of-view and picture quality in ultrasound imaging.
- UFineBench: Towards Text-based Person Retrieval with Ultra-fine Granularity.A new standard for text-based person retrieval is UFineBench. To aid AI in comprehending and locating persons in photos, it makes use of thorough descriptions.
- SegRefiner: Towards Model-Agnostic Segmentation Refinement with Discrete Diffusion Process. By understanding refinement as a data creation process, SegRefiner is a novel model-agnostic approach that enhances object mask quality in a variety of segmentation applications. Through the use of a discrete diffusion method, it fine-tunes coarse masks pixel by pixel, improving border metrics and segmentation.
- VMRNN: Integrating Vision Mamba and LSTM for Efficient and Accurate Spatiotemporal Forecasting. Our suggestion is the VMRNN cell, a novel recurrent unit that combines the advantages of LSTM and Vision Mamba blocks. Our comprehensive tests demonstrate that, despite retaining a reduced model size, our suggested strategy achieves competitive outcomes on a range of pivot benchmarks.
- Salience-DETR: Enhancing Detection Transformer with Hierarchical Salience Filtering Refinement. In order to balance computing economy and accuracy, this research presents Salience DETR, which uses hierarchical salience filtering to improve query selection in object identification.
- Universal Cell Embeddings: A Foundation Model for Cell Biology. We present the Universal Cell Embedding (UCE) foundation model. UCE was trained on a corpus of cell atlas data from humans and other species in a completely self-supervised way without any data annotations. UCE offers a unified biological latent space that can represent any cell, regardless of tissue or species. This universal cell embedding captures important biological variation despite the presence of experimental noise across diverse datasets.
- AniPortrait: Audio-Driven Synthesis of Photorealistic Portrait Animation. Using just one reference image and voice input, the AniPortrait framework can produce realistic animated portraits. This technique creates animations that are exceptional in terms of authentic facial expressions, a variety of poses, and great visual quality by first converting audio into 3D representations and then mapping them onto 2D facial landmarks.
- PAID: (Prompt-guided) Attention Interpolation of Text-to-Image. Two methods, AID and its version PAID are intended to enhance image interpolation by the incorporation of text and pose conditions. Without the need for further training, these techniques guarantee the creation of images with improved consistency, smoothness, and fidelity.
- The Need for Speed: Pruning Transformers with One Recipe. With the help of the OPTIN framework, transformer-based AI models can now be more effective across a range of domains without requiring retraining. Through the use of an intermediate feature distillation technique, OPTIN is able to compress networks under certain conditions with minimal impact on accuracy.
- Long-form factuality in large language models. Factual information can be produced through the use of language models. Google has made available benchmarks and a dataset that demonstrate the performance of each model. This research demonstrates that language models outperform human annotators in most situations and offers advice on how to enhance a model’s actuality.
- CoDA: Instructive Chain-of-Domain Adaptation with Severity-Aware Visual Prompt Tuning. A novel method for Unsupervised Domain Adaptation (UDA) is called CoDA. It learns from variances at both the scene and image levels, which aids AI models in becoming more adaptive to unlabeled, difficult settings.
- Backtracing: Retrieving the Cause of the Query. This method finds the precise content — from lectures to news articles — that prompts users to ask questions online. Backtracking is a technique that seeks to assist content producers in improving their work by locating and comprehending the reasons for misunderstandings, inquisitiveness, or emotional responses.
- CT-CLIP. A foundation model utilizing chest CT volumes and radiology reports for supervised-level zero-shot detection of abnormalities
News
- Stability AI CEO resigns to ‘pursue decentralized AI’. Emad Mostaque’s resignation comes after key departures at the AI startup. And here is the Company announcement.
- GTC Wrap-Up: ‘We Created a Processor for the Generative AI Era,’ NVIDIA CEO Says. Kicking off the biggest GTC conference yet, NVIDIA founder and CEO Jensen Huang unveils NVIDIA Blackwell, NIM microservices, Omniverse Cloud APIs, and more.
- After raising $1.3B, Inflection is eaten alive by its biggest investor, Microsoft. In June 2023, Inflection announced it had raised $1.3 billion to build what it called “more personal AI.” The lead investor was Microsoft. Today, less than a year later, Microsoft announced that it was feasting on Inflection’s body and sucking the marrow from the bones (though I think they phrased it differently).
- OpenAI is pitching Sora to Hollywood. The AI company is scheduled to meet with a number of studios, talent agencies, and media executives in Los Angeles next week to discuss partnerships, sources familiar with the matter told Bloomberg.
- GitHub’s latest AI tool can automatically fix code vulnerabilities. It’s a bad day for bugs. Earlier today, Sentry announced its AI Autofix feature for debugging production code and now, a few hours later, GitHub is launching the first beta of its code-scanning autofix feature for finding and fixing security vulnerabilities during the coding process.
- Researchers gave AI an ‘inner monologue’ and it massively improved its performance. Scientists trained an AI system to think before speaking with a technique called QuietSTaR. The inner monologue improved common sense reasoning and doubled math performance.
- a California city is training AI to spot homeless encampments. For the last several months, a city at the heart of Silicon Valley has been training artificial intelligence to recognize tents and cars with people living inside in what experts believe is the first experiment of its kind in the United States.
- Sora: First Impressions.A compilation of Sora content generated by visual artists, designers, creative directors, and filmmakers.
- Open Interpreter O1 Light. A portable speech interface that manages your home computer is called the 01 Light. It can utilize your applications, view your screen, and pick up new abilities. The open-source 01 serves as the basis for a new generation of AI gadgets.
- Character Voice For Everyone. Character Voice is a set of capabilities that elevates the Character.AI experience by enabling users to hear Characters conversing with them one-on-one. The company’s bigger goal is to create a multimodal interface that will enable more smooth, simple, and interesting interactions. This is the first step toward that goal.
- Cerebras Systems Unveils World’s Fastest AI Chip with Whopping 4 Trillion Transistors. The 24T parameter language models may be trained using Cerebras’ new wafer chip. PyTorch is supported natively.
- The GPT-4 barrier has finally been broken.Four weeks ago, GPT-4 remained the undisputed champion: consistently at the top of every key benchmark, but more importantly the clear winner in terms of “vibes”. Today that barrier has finally been smashed. We have four new models, all released to the public in the last four weeks, that are benchmarking near or even above GPT-4.
- China puts trust in AI to maintain the largest high-speed rail network on Earth. The railway system is in better condition than when it was first built, according to a peer-reviewed paper. Vast amounts of real-time data are processed by an artificial intelligence system in Beijing to identify problems before they arise, the engineers say
- Microsoft to hold a special Windows and Surface AI event in May. Ahead of Build 2024, Microsoft CEO Satya Nadella will share the company’s ‘AI vision’ for both software and hardware.
- AI ‘apocalypse’ could take away almost 8m jobs in the UK, says the report. Women, younger workers, and lower paid are at the most risk from artificial intelligence, says IPPR thinktank
- Elon Musk says all Premium subscribers on X will gain access to AI chatbot Grok this week. Following Elon Musk’s xAI’s move to open source its Grok large language model earlier in March, the X owner on Tuesday said that the company formerly known as Twitter will soon offer the Grok chatbot to more paying subscribers.
- OpenAI’s chatbot store is filling up with spam. TechCrunch found that the GPT Store, OpenAI’s official marketplace for GPTs, is flooded with bizarre, potentially copyright-infringing GPTs that imply a light touch where it concerns OpenAI’s moderation efforts.
- Apple’s big WWDC 2024 announcement may be an AI App Store. Apple’s AI strategy may not necessarily be to only offer the best AI apps it can produce, but instead deliver an enhanced AI App Store that may debut at WWDC.
- Mathematicians use AI to identify emerging COVID-19 variants. Scientists at The Universities of Manchester and Oxford have developed an AI framework that can identify and track new and concerning COVID-19 variants and could help with other infections in the future.
- iOS 18 Reportedly Won’t Feature Apple’s Own ChatGPT-Like Chatbot. Bloomberg’s Mark Gurman today reported that Apple is not planning to debut its own generative AI chatbot with its next major software updates, including iOS 18 for the iPhone. Instead, he reiterated that Apple has held discussions with companies such as Google, OpenAI, and Baidu about potential generative AI partnerships.
- Introducing DBRX: A New State-of-the-Art Open LLM.DBRX, an open, general-purpose LLM created by Databricks. Across a range of standard benchmarks, DBRX sets a new state-of-the-art for established open LLMs.
- Amazon invests another $2.75B in Anthropic — reportedly ‘largest’ in company history. Today, Amazon announced it has finalized that investment at the full planned amount, putting in another $2.75 billion atop the $1.25 billion it originally committed last year. According to CNBC, it is Amazon’s “largest venture investment yet.”
- OpenAI Is Starting To Test GPT Earning Sharing. We’re partnering with a small group of US builders to test usage-based GPT earnings. Our goal is to create a vibrant ecosystem where builders are rewarded for their creativity and impact and we look forward to collaborating with builders on the best approach to get there.
- Nvidia Tops MLPerf’s Inferencing Tests. Now that we’re firmly in the age of massive generative AI, it’s time to add two such behemoths, Llama 2 70B and Stable Diffusion XL, to MLPerf’s inferencing tests. Version 4.0 of the benchmark tests more than 8,500 results from 23 submitting organizations. As has been the case from the beginning, computers with Nvidia GPUs came out on top, particularly those with its H200 processor. But AI accelerators from Intel and Qualcomm were in the mix as well.
- AI21 releases Jamba Language Model. The Mamba model style is designed to outperform Transformers in terms of efficiency while maintaining performance parity. One new version with MoE layers is Jamba. With a context length of 128k tokens, it can operate at 1.6k tokens per second. It performs 67% on the benchmark for MMLU. There are weights available.
- Hume introduces Empathic Voice Interface. Meet Hume’s Empathic Voice Interface (EVI), the first conversational AI with emotional intelligence.
- Google starts testing AI overviews from SGE in main Google search interface. Google is now testing AI overviews in the main Google Search results, even if you have not opted into the Google Search Generative Experience labs feature. Google said this is an experience on a “subset of queries, on a small percentage of search traffic in the U.S.,” a Google spokesperson told Search Engine Land.
- LLaVA-HR: High-Resolution Large Language-Vision Assistant . This repository contains the implementation of LLaVA-HR, a strong and efficient MLLM powered by our mixture-of-resolution adaptation.
- Meta is adding AI to its Ray-Ban smart glasses next month. The Ray-Ban Meta Smart Glasses can do things like identify objects, monuments, and animals, as well as translate text.
- Google bringing Gemini Nano to Pixel 8 with next Feature Drop. The Pixel 8 will get Gemini Nano, in developer preview, to power Summarize in Recorder and Gboard Smart Reply. The latter allows for “higher-quality smart replies” that have “conversational awareness” and should be generated faster. On the Pixel 8 Pro, it works with WhatsApp, Line, and KakaoTalk. Meanwhile, Summarize can take a recording and generate bullet points.
Resources
- Building and testing C extensions for SQLite with ChatGPT Code Interpreter. This essay goes into great detail on how to create code in a foreign language for a difficult task using ChatGPT (or any other language model). Its creator writes, compiles, and downloads new bindings for the well-known database SQLite using ChatGPT’s code interpreter.
- Official Mistral Fine-tuning Code. A hackathon was recently organized by Mistral. The business also published code for optimizing its language models along with version 0.2 of the 7B model. The coding is clear and easy to read.
- Scalable Optimal Transport. A curated list of research works and resources on optimal transport in machine learning.
- AdaIR: Adaptive All-in-One Image Restoration via Frequency Mining and Modulation. AdaIR presents an all-in-one image restoration network that addresses several types of picture deterioration such as noise, blur, and haze by using frequency mining and modulation.
- Turbocharged Training: Optimizing the Databricks Mosaic AI stack with FP8. The group at Databricks Mosaic has persisted in advancing language model training. They talk about the fp8 training stack and the potential advantages of decreasing precision in this post.
- Low-latency Generative AI Model Serving with Ray, NVIDIA Triton Inference Server, and NVIDIA TensorRT-LLM. A new collaboration between Anyscale and NVIDIA will allow users to scale generative AI models into production. Customers can enhance resource management, observability, and autoscaling by utilizing the combined capabilities of Anyscale’s managed runtime environment and Ray through this integration.
- Discover The Best AI Websites & Tools. 11006 AIs and 233 categories in the best AI tools directory. AI tools list & GPTs store are updated daily by ChatGPT.
- codel. Fully autonomous AI Agent that can perform complicated tasks and projects using a terminal, browser, and editor.
- binary vector search is better than your FP32 vectors. A crucial component of RAG pipelines is searching over embedding vectors. You may retain performance while reducing memory needs by 30x by substituting a single 0 or 1 for the fp32 numbers, followed by a KNN clustering and reranked.
- Deepfake Generation and Detection: A Benchmark and Survey. This thorough analysis explores the developments and difficulties around deepfake technology and its detection, emphasizing the arms race between those who produce deepfakes and those who are creating systems to identify them.
- Evaluate LLMs in real-time with Street Fighter III. Make LLMs fight each other in real-time in Street Fighter III. Each player is controlled by an LLM. We send to the LLM a text description of the screen. The LLM decides on the next moves its character will make. The next moves depend on its previous moves, the moves of its opponents, its power, and health bars.
- Superpipe. Superipe is a lightweight framework to build, evaluate and optimize LLM pipelines for structured outputs: data labeling, extraction, classification, and tagging. Evaluate pipelines on your own data and optimize models, prompts, and other parameters for the best accuracy, cost, and speed.
Perspectives
- How People Are Really Using GenAI. There are many use cases for generative AI, spanning a vast number of areas of domestic and work life. Looking through thousands of comments on sites such as Reddit and Quora, the author’s team found that the use of this technology is as wide-ranging as the problems we encounter in our lives. The 100 categories they identified can be divided into six top-level themes, which give an immediate sense of what generative AI is being used for: Technical Assistance & Troubleshooting (23%), Content Creation & Editing (22%), Personal & Professional Support (17%), Learning & Education (15%), Creativity & Recreation (13%), Research, Analysis & Decision Making (10%).
- Untangling concerns about consolidation in AI. Microsoft’s recent acquisition of Inflection’s talent sparked discussions about the largest tech giants having too much influence over AI research and development. Although they have the resources to work quickly on basic language models, there are legitimate concerns that the concentration of power would stifle transparency and innovation. This article examines the intricate trade-offs that arise as artificial intelligence becomes more widely used.
- ‘A landmark moment’: scientists use AI to design antibodies from scratch. Modified protein-design tool could make it easier to tackle challenging drug targets — but AI antibodies are still a long way from reaching the clinic.
- TechScape: Is the US calling time on Apple’s smartphone domination?The tech giant fights regulators on both sides of the Atlantic, as the US government launches a grab-bag of accusations. Plus, Elon Musk’s bad day in court
- Go, Python, Rust, and production AI applications. The roles of Python, Go, and Rust in developing AI applications are covered in this article: Go is used for larger-scale production, Python is used for developing AI models, and Rust is used for tasks requiring high performance. It highlights the significance of choosing the appropriate language for the task based on the ecosystem and tool fit, speculating that Go may replace Python as the production language. The author promotes connecting the Go and Python communities to improve the development of AI applications.
- Trends in Synthetic Biology & AI in Drug Discovery in 2024. 2024 promises to be a historic year for artificial intelligence in drug discovery, with significant progress being made in synthetic biology. The synthesis of modular biological components and the impact of generative AI on research are two prominent themes that are highlighted in this article. The entry of Insilico Medicine’s AI-powered candidate into Phase II clinical trials demonstrates how the combination of artificial intelligence and synthetic biology is speeding up the drug discovery process.
- LLMs have special intelligence, not general, and that’s plenty. In sophisticated cognitive tests, Anthropic’s new AI model Claude-3 performs better than other models, including GPT-4, and above the average human IQ. Even with this success, Claude-3 still finds it difficult to solve simple puzzles and other basic tasks that people take for granted. Rather than having general intelligence like that of humans, LLMs can have “Special Intelligence.” They can be creatively reflecting back to us what they know.
- AI SaaS Companies Will Be More Profitable. The deflationary impacts of AI in marketing, sales, operations, and software development could mean that while AI software companies may initially incur higher costs, they could end up being more profitable than traditional SaaS companies.
- AI image generators often give racist and sexist results: can they be fixed? Researchers are tracing sources of racial and gender bias in images generated by artificial intelligence, and making efforts to fix them.
- How AI is improving climate forecasts. Researchers are using various machine-learning strategies to speed up climate modelling, reduce its energy costs and hopefully improve accuracy.
- Here’s why AI search engines really can’t kill Google. The AI search tools are getting better — but they don’t yet understand what a search engine really is and how we really use them.
- Inside the shadowy global battle to tame the world’s most dangerous technology. The problem of controlling AI is one that the world is now facing. Global leaders, tech executives, and legislators convened many high-profile meetings and conferences that exposed disagreements and differences over how to regulate this game-changing technology.
- Hackers can read private AI-assistant chats even though they’re encrypted. All non-Google chat GPTs affected by side channel that leaks responses sent to users.
- Towards 1-bit Machine Learning Models. Recent works on extreme low-bit quantization such as BitNet and 1.58 bit have attracted a lot of attention in the machine learning community. The main idea is that matrix multiplication with quantized weights can be implemented without multiplications, which can potentially be a game-changer in terms of compute efficiency of large machine learning models.
- AI escape velocity. The law of accelerating returns, which holds that progress is made at an exponential pace over time, was created by AI futurist Ray Kurzweil. Kurzweil covered a wide range of subjects in a recent talk, such as prospects that are only going to get better, the future of the AI economy, human relationships with AIs, lifespan escape velocity, and much more.
- Plentiful, high-paying jobs in the age of AI. Experts in AI are investigating automating human functions, raising fears about job losses and declining wages. The belief that advances in AI would eventually render human labor obsolete, however, may not be accurate. Constraints like computer power and opportunity costs may mean that humans will still have jobs in an AI-dominated future, but this is not a given.
Medium articles
A list of the Medium articles I have read and found the most interesting this week:
- Andrea D'Agostino, Extract any entity from text with GLiNER
- Marcel Moosbrugger, Random Walks Are Strange and Beautiful
- Mike Young, Up to 17% of AI conference reviews now written by AI
- Séverin Bruhat, You Will Love This Free Generative AI Tool
- Sharad Joshi, Step by step guide to automate browsing using large language models
- Thomas Czerny, Prompting to Extract Structured Data From Unstructured Data
- Tabrez Syed, Decoding the Linguistic Matrix: From Ones and Zeros to Contextual Meaning
Meme of the week
What do you think about it? Some news that captured your attention? Let me know in the comments
If you have found this interesting:
You can look for my other articles, and you can also connect or reach me on LinkedIn. Check this repository containing weekly updated ML & AI news. I am open to collaborations and projects and you can reach me on LinkedIn. You can also subscribe for free to get notified when I publish a new story.
Here is the link to my GitHub repository, where I am collecting code and many resources related to machine learning, artificial intelligence, and more.
or you may be interested in one of my recent articles: