WEEKLY AI NEWS: RESEARCH, NEWS, RESOURCES, AND PERSPECTIVES
AI & ML news: Week 15 -21 April
META LLaMA 3 is here, Adobe working on generative AI video, humane AI pin is a failure and much more
The most interesting news, repository, articles, and resources of the week
Check and star this repository where the news will be collected and indexed:
You will find the news first in GitHub. Single posts are also collected here:
Research
- LinkdescriptionDGMamba: Domain Generalization via Generalized State Space Model.DGMamba is a new framework that makes use of the novel state space model Mamba to address domain generalization problems.
- Manipulating Large Language Models to Increase Product Visibility. Search engines’ extensive language models can be manipulated by adding strategic text sequences to product descriptions to promote specific products.
- MindBridge: A Cross-Subject Brain Decoding Framework. MindBridge is a single model that can interpret brain activity from several subjects.
- Taming Stable Diffusion for Text to 360° Panorama Image Generation. With the help of text prompts, this project presents PanFusion, a dual-branch diffusion model that creates 360-degree panoramic images. To minimize visual distortion, the technique combines the Stable Diffusion approach with a customized panoramic branch, which is further improved by a special cross-attention mechanism.
- The Physics of Language Models. Scaling laws describe the relationship between the size of language models and their capabilities. Unlike prior studies that evaluate a model’s capability via loss or benchmarks, we estimate the number of knowledge bits a model stores.
- The Influence Between NLP and Other Fields. attempts to measure the level of influence that NLP has over 23 different fields of study; the cross-field engagement of NLP has decreased from 0.58 in 1980 to 0.31 in 2022; the study also reveals that CS dominates NLP citations, accounting for over 80% of citations with a focus on information retrieval, AI, and ML; in general, NLP is becoming more isolated, with a rise in intra-field citations and a fall in multidisciplinary works.
- EventEgo3D: 3D Human Motion Capture from Egocentric Event Streams. Researchers present a unique technique utilizing a fisheye event camera to address the difficulties in monocular egocentric 3D human motion capture, particularly in challenging lighting conditions and with rapid motions.
- MPPE-DST: Mixture of Prefix Prompt Experts for LLM in Zero-Shot Dialogue State Tracking. The mixture of Prefix Prompt Experts (MPPE) is a novel approach that has been created by researchers to improve zero-shot dialogue state tracking. This technique allows knowledge to be transferred to new domains without requiring additional dataset annotations.
- Any2Point: Empowering Any-modality Large Models for Efficient 3D Understanding. A novel technique called Any2Point effectively transfers vision, language, and audio model capabilities into the 3D space while preserving spatial geometries.
- Google’s new technique gives LLMs infinite context. A new paper by researchers at Google claims to give large language models (LLMs) the ability to work with text of infinite length. The paper introduces Infini-attention, a technique that configures language models in a way that extends their “context window” while keeping memory and compute requirements constant.
- Compression Represents Intelligence Linearly. The concept of compressing a training dataset into a model is the foundation of most contemporary AI. The model gets better the better the compression. This research establishes a high correlation between scale benchmark scores and a model’s capacity to condense novel material by thoroughly demonstrating that relationship.
- TransformerFAM: Feedback attention is working memory. Transformers may take care of their own latent representations thanks to TransformerFAM’s feedback system. In theory, this might allow the model to process incredibly long inputs in context by adding repetition.
- Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length. Another lengthy context paper, but this one is about a new design that makes use of two cutting-edge weight updating techniques. In comparison, Llama 2 underperformed on the same training token count (2T). Additionally, at inference time, it scales to an indefinite context length.
- STORM: Synthesis of Topic Outlines through Retrieval and Multi-perspective Question Asking. Retrieval-guided language models are used by Stanford’s innovative research system, Storm, to generate reports for particular subjects.
- Homography Guided Temporal Fusion for Road Line and Marking Segmentation. Road lines and markings must be accurately segmented for autonomous driving, however, this is difficult because of sunlight, shadows, and car occlusions. The Homography Guided Fusion (HomoFusion) module employs a pixel-by-pixel attention mechanism and a unique surface normal estimator to recognize and classify obscured road lines from video frames.
- LaSagnA: vLLM-based Segmentation Assistant for Complex Queries. Vision Language Models (vLLMs) sometimes face difficulties in distinguishing absent objects and handling many queries per image. To address these problems, this work presents a novel question style and integrates semantic segmentation into the training procedure.
- A collective AI via lifelong learning and sharing at the edge. Here we review recent machine learning advances converging towards creating a collective machine-learned intelligence. We propose that the convergence of such scientific and technological advances will lead to the emergence of new types of scalable, resilient, and sustainable AI systems.
- Challenges and opportunities in translating ethical AI principles into practice for children. This Perspective first maps the current global landscape of existing ethics guidelines for AI and analyses their correlation with children.
- Mistral 8x22B Report and Instruction Model. Mixtral 8x22B is our latest open model. It sets a new standard for performance and efficiency within the AI community. It is a sparse Mixture-of-Experts (SMoE) model that uses only 39B active parameters out of 141B, offering unparalleled cost efficiency for its size.
- Long-form music generation with latent diffusion.Stability AI’s diffusion transformer model for audio synthesis.
- LaDiC: A Diffusion-based Image Captioning Model. The use of diffusion models for image-to-text generation is revisited in this work. It presents the LaDiC architecture, which improves the image captioning tasks performance of diffusion models.
- LINGO-2: Driving with Natural Language. This blog introduces LINGO-2, a driving model that links vision, language, and action to explain and determine driving behavior, opening up a new dimension of control and customization for an autonomous driving experience. LINGO-2 is the first closed-loop vision-language-action driving model (VLAM) tested on public roads.
- Towards a general-purpose foundation model for computational pathology. We introduce UNI, a general-purpose self-supervised model for pathology, pre-trained using more than 100 million images from over 100,000 diagnostic H&E-stained WSIs (>77 TB of data) across 20 major tissue types.
- A visual-language foundation model for computational pathology. We introduce CONtrastive learning from Captions for Histopathology (CONCH), a visual-language foundation model developed using diverse sources of histopathology images, biomedical text, and, notably, over 1.17 million image–caption pairs through task-agnostic pertaining.
- FedPFT: Federated Proxy Fine-Tuning of Foundation Models. Federated Proxy Fine-Tuning (FedPFT), a novel technique created by researchers, enhances foundation models’ ability to adjust for certain tasks while maintaining data privacy.
- In-Context Learning State Vector with Inner and Momentum Optimization. In this research, a novel method for improving In-Context Learning (ICL) in big language models such as GPT-J and Llama-2 is presented. The authors introduce a novel optimization technique that enhances compressed representations of the model’s knowledge, referred to as “state vectors.”
- Decomposing and Editing Predictions by Modeling Model Computation. To determine each component’s precise contribution to the final result, component modeling dissects a model’s prediction process into its most fundamental parts, such as attention heads and convolution filters.
News
- Grok-1.5 Vision Preview. Introducing Grok-1.5V, our first-generation multimodal model. In addition to its strong text capabilities, Grok can now process a wide variety of visual information, including documents, diagrams, charts, screenshots, and photographs. Grok-1.5V will be available soon to our early testers and existing Grok users.
- Google’s new chips look to challenge Nvidia, Microsoft, and Amazon. Google’s new AI chip is a rival to Nvidia, and its Arm-based CPU will compete with Microsoft and Amazon
- OpenAI Fires Researchers For Leaking Information. After months of leaks, OpenAI has apparently fired two researchers who are said to be linked to company secrets going public.
- BabyLM Challenge. The goal of this shared task is to incentivize researchers with an interest in pretraining or cognitive modeling to focus their efforts on optimizing pretraining given data limitations inspired by human development. Additionally, we hope to democratize research on pretraining — which is typically thought to be practical only for large industry groups — by drawing attention to open problems that can be addressed on a university budget.
- Dr. Andrew Ng was appointed to Amazon’s Board of Directors. Dr. Andrew Ng is currently the Managing General Partner of AI Fund and is joining Amazon’s Board of Directors.
- Creating sexually explicit deep fake images to be made offense in UK. Offenders could face jail if the image is widely shared under a proposed amendment to criminal justice bill
- Leisure centers scrap biometric systems to keep tabs on staff amid UK data watchdog clampdown. Firms such as Serco and Virgin Active pull facial recognition and fingerprint scan systems used to monitor staff attendance
- Introducing OpenAI Japan. We are excited to announce our first office in Asia and we’re releasing a GPT-4 custom model optimized for the Japanese language.
- Adobe’s working on generative video, too. Adobe says it’s building an AI model to generate video. But it’s not revealing when this model will launch, exactly — or much about it besides the fact that it exists.
- OpenAI and Meta Reportedly Preparing New AI Models Capable of Reasoning. OpenAI and Meta are on the verge of releasing the next versions of their AI models that will supposedly be capable of reasoning and planning, the Financial Times reports. But, as with any hype coming out of big tech, take it all with a grain of salt.
- Humane’s Ai Pin Isn’t Ready to Replace Your Phone, But One Day It Might. AI-powered wearable Humane’s Ai Pin has numerous technical problems, ranging from AI assistant glitches to music streaming concerns. Though future software updates are promised, the first-generation gadget lacks crucial functions and experiences performance gaps despite its intention to create an ambient computing experience. The Ai Pin is positioned as a companion device for a more present and less screen-focused lifestyle, yet it struggles to replace conventional smartphones despite its meticulous design.
- TikTok may add AI avatars that can make ads. he new feature will let advertisers and TikTok Shop sellers generate scripts for a virtual influencer to read.
- Google launches Code Assist, its latest challenger to GitHub’s Copilot. At its Cloud Next conference, Google on Tuesday unveiled Gemini Code Assist, its enterprise-focused AI code completion and assistance tool.
- AI traces mysterious metastatic cancers to their source. algorithm examines images of metastatic cells to identify the location of the primary tumor. Some stealthy cancers remain undetected until they have spread from their source to distant organs. Now scientists have developed an artificial intelligence (AI) tool that outperforms pathologists at identifying the origins of metastatic cancer cells that circulate in the body
- Apple’s iOS 18 AI will be on-device preserving privacy, and not server-side. Apple’s AI push in iOS 18 is rumored to focus on privacy with processing done directly on the iPhone, which won’t connect to cloud services.
- Introducing ALOHA Unleashed. Google DeepMind’s ALOHA Unleashed is a program that pushes the boundaries of dexterity with low-cost robots and AI
- France’s Mistral AI seeks funding at $5 bln valuation, The Information reports. French tech startup Mistral AI has been speaking to investors about raising several hundred million dollars at a valuation of $5 billion, The Information reported on Tuesday.
- Stability AI is giving more developers access to its next-gen text-to-image generator. Developers can now access the API for the latest version of Stability AI’s text-to-image model.
- European car manufacturer will pilot Sanctuary AI’s humanoid robot. Sanctuary AI announced that it will be delivering its humanoid robot to a Magna manufacturing facility. Based in Canada, with auto manufacturing facilities in Austria, Magna manufactures and assembles cars for several of Europe’s top automakers, including Mercedes, Jaguar, and BMW. As is often the nature of these deals, the parties have not disclosed how many of Sanctuary AI’s robots will be deployed.
- Google Maps will use AI to help you find out-of-the-way EV chargers . The company will use AI to summarize directions to EV chargers as well as reliability and wait times.
- Introducing Meta Llama 3: The most capable openly available LLM to date. Today, we’re introducing Meta Llama 3, the next generation of our state-of-the-art open-source large language model. Llama 3 models will soon be available on AWS, Databricks, Google Cloud, Hugging Face, Kaggle, IBM WatsonX, Microsoft Azure, NVIDIA NIM, and Snowflake, and with support from hardware platforms offered by AMD, AWS, Dell, Intel, NVIDIA, and Qualcomm.
- Google’s Deep Mind AI can help engineers predict “catastrophic failure”. AI and a popular card game can help engineers predict catastrophic failure by finding the absence of a pattern.
- OpenAI winds down AI image generator that blew minds and forged friendships in 2022. When OpenAI’s DALL-E 2 debuted on April 6, 2022, the idea that a computer could create relatively photorealistic images on demand based on just text descriptions caught a lot of people off guard. The launch began an innovative and tumultuous period in AI history, marked by a sense of wonder and a polarizing ethical debate that reverberates in the AI space to this day. Last week, OpenAI turned off the ability for new customers to purchase generation credits for the web version of DALL-E 2, effectively killing it.
- Stability AI lays off roughly 10 percent of its workforce. Stability AI laid off 20 employees just a day after announcing the expansion of access to its new flagship model. This comes after weeks of upheaval that saw its founding CEO leave the company.
- The Humane AI Pin is lost in translation. Though the Humane AI Pin has a lot of drawbacks, its translation feature might be the worst.
Resources
- LLM-friendly HTML conversion. Reader converts any URL to an LLM-friendly input with a simple prefix https://r.jina.ai/. Get improved output for your agent and RAG systems at no cost.
- Minimal Implementation of a D3PM (Structured Denoising Diffusion Models in Discrete State-Spaces), in pytorch. This is a minimal (400 LOC), but fully faithful implementation of a D3PM Structured Denoising Diffusion Models in Discrete State-Spaces. in pytorch.
- Cerule — A Tiny Mighty Vision Model. We train and release “Cerule”, a tiny yet powerful Vision Language Model based on the newly released Google’s Gemma-2b and Google’s SigLIP.
- Diffusion Models for Video Generation. This article looks at adapting image models, training diffusion models to produce video, and even producing video directly from an image model without further training.
- Pile-T5. The contemporary AI workhorse is called T5. Eleuther retrained it using a more recent tokenizer and a longer training period. As a consequence, the fundamental model for encoding tasks is significantly enhanced.
- GitHub Repository to File Converter. This Python script allows you to download and process files from a GitHub repository, making it easier to share code with chatbots that have large context capabilities but don’t automatically download code from GitHub.
- AI Index Report. The 2024 Index is our most comprehensive to date and arrives at an important moment when AI’s influence on society has never been more pronounced. This year, we have broadened our scope to more extensively cover essential trends such as technical advancements in AI, public perceptions of the technology, and the geopolitical dynamics surrounding its development.
- Accelerating AI: Harnessing Intel(R) Gaudi(R) 3 with Ray 2.10. Ray 2.10, the most recent version from Anyscale, now supports Intel Gaudi 3. In addition to provisioning Ray Core Task and Actors on a Gaudi fleet directly through Ray Core APIs, developers can now spin up and manage their own Ray Clusters. For an enhanced experience, they can also utilize Ray Serve on Gaudi via Ray Serve APIs and set up Intel Gaudi accelerator infrastructure for use at the Ray Train layer.
- Code with CodeQwen1.5. Notwithstanding these advancements, dominant coding assistants like Github Copilot, built upon proprietary LLMs, pose notable challenges in terms of cost, privacy, security, and potential copyright infringement. Today, we are delighted to introduce a new member of the Qwen1.5 open-source family, the CodeQwen1.5–7B, a specialized codeLLM built upon the Qwen1.5 language model. CodeQwen1.5–7B has been pre-trained with around 3 trillion tokens of code-related data. It supports an extensive repertoire of 92 programming languages, and it exhibits exceptional capacity in long-context understanding and generation with the ability to process information of 64K tokens.
- OLMo 1.7–7B: A 24-point improvement on MMLU. Today, we’ve released an updated version of our 7 billion parameter Open Language Model, OLMo 1.7–7B. This model scores 52 on MMLU, sitting above Llama 2–7B and approaching Llama 2–13B, and outperforms Llama 2–13B on GSM8K.
- Effort. With the use of the Effort library, one can alter in real-time how many calculations are made when inferring an LLM model, which can significantly increase performance while maintaining a high level of quality. Initial findings indicate that the Effort library has the potential to greatly increase LLM inference speed while preserving quality, even with modest implementation overhead. In order to further enhance the library, the author invites others to test the 0.0.1B version and offer feedback.
- luminal. Luminal is a deep-learning library that uses composable compilers to achieve high performance.
- SoccerNet Game State Reconstruction: End-to-End Athlete Tracking and Identification on a Minimap. A new dataset called SoccerNet-GSR aims to improve game state reconstruction from football video footage captured by a single camera.
- AI Gateway. Gateway streamlines request to 100+ open & closed source models with a unified API. It is also production-ready with support for caching, fallbacks, retries, timeouts, and load balancing, and can be edge-deployed for minimum latency.
- moondream. a tiny vision language model that kicks ass and runs anywhere
- Sentence Embeddings. Introduction to Sentence Embeddings. This series aims to demystify embeddings and show you how to use them in your projects. This first blog post will teach you how to use and scale up open-source embedding models. We’ll look into the criteria for picking an existing model, current evaluation methods, and the state of the ecosystem.
Perspectives
- Does AI need a “body” to become truly intelligent? Meta researchers think so. AIs that can generate videos, quickly translate languages, or write new computer code could be world-changing, but can they ever be truly intelligent? Not according to the embodiment hypothesis, which argues that human-level intelligence can only emerge if intelligence is able to sense and navigate a physical environment, the same way babies can.
- Micromanaging AI. Currently, AI is classified as micromanage, which requires people to establish tasks, assess work frequently, and lead development at each stage, akin to managing high school interns. Motivation is high but competence level is rather low.
- ‘Eat the future, pay with your face’: my dystopian trip to an AI burger joint. If the experience of robot-served fast food dining is any indication, the future of sex robots is going to be very unpleasant
- AI now beats humans at basic tasks — new benchmarks are needed, says the major report. Stanford University’s 2024 AI Index charts the meteoric rise of artificial intelligence tools. Artificial intelligence (AI) systems, such as the chatbot ChatGPT, have become so advanced that they now very nearly match or exceed human performance in tasks including reading comprehension, image classification, and competition-level mathematics, according to a new report.
- Lethal dust storms blanket Asia every spring — now AI could help predict them. As the annual phenomenon once again strikes East Asia, scientists are hard at work to better predict how it will affect people.
- From boom to burst, the AI bubble is only heading in one direction. No one should be surprised that artificial intelligence is following a well-worn and entirely predictable financial arc
- You can’t build a moat with AI. Differentiating AI is difficult, but the secret is in the unique data that is supplied into these models — not in the AI models themselves, which are becoming commodity-like. Take LLMs, for example. The performance of AI is strongly impacted by effective data engineering since applications need to integrate customer-specific data to respond accurately. Thus, rather than the AI technology itself, gaining a competitive edge in AI applications depends on creative data utilization.
- Towards 1-bit Machine Learning Models. Recent works on extreme low-bit quantization such as BitNet and 1.58 bit have attracted a lot of attention in the machine learning community. The main idea is that matrix multiplication with quantized weights can be implemented without multiplications, which can potentially be a game-changer in terms of compute efficiency of large machine learning models.
- From Idea to Integration: Four Steps for Founders Integrating AI. There is currently a great deal of push to incorporate AI into current goods. This brief, step-by-step manual will assist you in making the initial move.
- Use game theory for climate models that really help reach net zero goals. Many countries and companies have committed to eliminating their greenhouse gas emissions by the middle of the century. Yet most of these pledges lack a clear policy pathway.
- A step along the path towards AlphaFold — 50 years ago. Paring down the astronomical complexity of the protein-folding problem
- The democratization of global AI governance and the role of tech companies. Can non-state multinational tech companies counteract the potential democratic deficit in the emerging global governance of AI? We argue that although they may strengthen core values of democracy such as accountability and transparency, they currently lack the right kind of authority to democratize global AI governance.
- The new NeuroAI. After several decades of developments in AI, has the inspiration that can be drawn from neuroscience been exhausted? Recent initiatives make the case for taking a fresh look at the intersection between the two fields.
- Connecting molecular properties with plain language. AI tools such as ChatGPT can provide responses to queries on any topic, but can such large language models accurately ‘write’ molecules as output to our specification? Results now show that models trained on general text can be tweaked with small amounts of chemical data to predict molecular properties, or to design molecules based on a target feature.
- MLOps vs. Eng: Misaligned Incentives and Failure to Launch? An in-depth discussion on the difficulties and solutions associated with implementing AI models in production, as well as how MLOps varies from traditional engineering, with industry experts. They talk about how to focus as a company to truly launch and why so few ML ideas ever reach production.
- Is Attention All You Need? In order to overcome Transformers’ shortcomings in long-context learning, generation, and inference speed, researchers are creating alternative designs that exhibit competitive quality at smaller scales but questionable scalability. Because of the quick development in this area, the Pareto frontier will likely keep growing, opening up more opportunities for lengthier context modeling and higher throughput inference, which will ultimately lead to a bigger variety of AI use cases.
- The Shifting Dynamics And Meta-Moats Of AI. Managing complex short-, mid-, and long-term dynamics while retaining elite speed and execution, owning more of the stack, obtaining unique data, and utilizing synthetic data production are all necessary for building a successful AI business. As the AI sector develops, businesses will need to adjust to changing labor dynamics, comprehend the machine they are creating, and recognize the competitive axes on which they are based in order to forge long-lasting moats and differentiate themselves from the crowd.
- Integration of AI in healthcare requires an interoperable digital data ecosystem. Electronic health information, including electronic health records, is needed to develop AI tools for health, but the seamless flow of data will require standards and interoperability.
- To do no harm — and the most good — with AI in health care. Drawing from real-life scenarios and insights shared at the RAISE (Responsible AI for Social and Ethical Healthcare) conference, we highlight the critical need for AI in health care (AIH) to primarily benefit patients and address current shortcomings in healthcare systems such as medical errors and access disparities.
- How to support the transition to AI-powered healthcare. To make health systems more sustainable in the long-term, incentivize artificial intelligence (AI) and digital technologies that are grounded on careful testing and real-world validation.
- The increasing potential and challenges of digital twins. This issue of Nature Computational Science includes a Focus that highlights recent advancements, challenges, and opportunities in the development and use of digital twins across different domains.
- The Space Of Possible Minds. Sophisticated AIs are stretching the boundaries of our understanding of what it is to be human and forcing us to consider how we embody agency and true understanding in a spectrum of intelligent beings. Creating mutually beneficial relationships between radically different entities, recognizing the similarities and differences among various forms of intelligence, and developing principled frameworks for scaling our moral concern to the essential qualities of being are all necessary to navigate this new terrain.
- CUDA is Still a Giant Moat for NVIDIA. NVIDIA’s proprietary interconnects and CUDA software environment, in addition to its hardware, continue to solidify the company’s leadership in the AI market. The ease of use and performance optimization of CUDA makes it superior to alternatives like AMD’s ROCM, guaranteeing that NVIDIA’s GPUs continue to be the go-to option for AI tasks. NVIDIA’s dominance in AI computing is strengthened by its investments in the CUDA ecosystem and community education.
Medium articles
A list of the Medium articles I have read and found the most interesting this week:
- Steve Jones, OpenAI explains why GPT understands nothing, link
- Daniel Warfield, AGI is Not Possible, link
- Tim Cvetko, Build Your Own Liquid Neural Network with PyTorch, link
- Aaron 0928, OpenAI’s Sora is doomed to die ?, link
- Enrique Dans, Students are going to use generative algorithms, so let’s make sure they do so properly, link
- Oluwafemidiakhoa, Designing the Future: How Algorithmic Innovation is Transforming Protein Design and Drug Development, link
- Moli Ma, AI Copyright, a Brief Case Study, link
- Gianpiero Andrenacci, PyRefactor: The Definitive Choice to Refactor Python Code, link
- Deepa Ramachandra, The Dichotomy With AI for Climate Change, link
Meme of the week
What do you think about it? Some news that captured your attention? Let me know in the comments
If you have found this interesting:
You can look for my other articles, and you can also connect or reach me on LinkedIn. Check this repository containing weekly updated ML & AI news. I am open to collaborations and projects and you can reach me on LinkedIn. You can also subscribe for free to get notified when I publish a new story.
Here is the link to my GitHub repository, where I am collecting code and many resources related to machine learning, artificial intelligence, and more.
or you may be interested in one of my recent articles: