WEEKLY AI NEWS: RESEARCH, NEWS, RESOURCES, AND PERSPECTIVES

AI & ML news: Week 15 -21 April

META LLaMA 3 is here, Adobe working on generative AI video, humane AI pin is a failure and much more

Salvatore Raieli

--

Photo by Denny Müller on Unsplash

The most interesting news, repository, articles, and resources of the week

Check and star this repository where the news will be collected and indexed:

You will find the news first in GitHub. Single posts are also collected here:

Weekly AI and ML news - each week the best of the field

27 stories

Research

https://arxiv.org/pdf/2404.07794v1.pdf
  • Taming Stable Diffusion for Text to 360° Panorama Image Generation. With the help of text prompts, this project presents PanFusion, a dual-branch diffusion model that creates 360-degree panoramic images. To minimize visual distortion, the technique combines the Stable Diffusion approach with a customized panoramic branch, which is further improved by a special cross-attention mechanism.
  • The Physics of Language Models. Scaling laws describe the relationship between the size of language models and their capabilities. Unlike prior studies that evaluate a model’s capability via loss or benchmarks, we estimate the number of knowledge bits a model stores.
https://x.ai/blog/grok-1.5v
  • The Influence Between NLP and Other Fields. attempts to measure the level of influence that NLP has over 23 different fields of study; the cross-field engagement of NLP has decreased from 0.58 in 1980 to 0.31 in 2022; the study also reveals that CS dominates NLP citations, accounting for over 80% of citations with a focus on information retrieval, AI, and ML; in general, NLP is becoming more isolated, with a rise in intra-field citations and a fall in multidisciplinary works.
  • EventEgo3D: 3D Human Motion Capture from Egocentric Event Streams. Researchers present a unique technique utilizing a fisheye event camera to address the difficulties in monocular egocentric 3D human motion capture, particularly in challenging lighting conditions and with rapid motions.
  • MPPE-DST: Mixture of Prefix Prompt Experts for LLM in Zero-Shot Dialogue State Tracking. The mixture of Prefix Prompt Experts (MPPE) is a novel approach that has been created by researchers to improve zero-shot dialogue state tracking. This technique allows knowledge to be transferred to new domains without requiring additional dataset annotations.
https://arxiv.org/pdf/2404.07981v1.pdf
https://babylm.github.io/
  • Compression Represents Intelligence Linearly. The concept of compressing a training dataset into a model is the foundation of most contemporary AI. The model gets better the better the compression. This research establishes a high correlation between scale benchmark scores and a model’s capacity to condense novel material by thoroughly demonstrating that relationship.
  • TransformerFAM: Feedback attention is working memory. Transformers may take care of their own latent representations thanks to TransformerFAM’s feedback system. In theory, this might allow the model to process incredibly long inputs in context by adding repetition.
  • Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length. Another lengthy context paper, but this one is about a new design that makes use of two cutting-edge weight updating techniques. In comparison, Llama 2 underperformed on the same training token count (2T). Additionally, at inference time, it scales to an indefinite context length.
https://littlepure2333.github.io/MindBridge/
https://openai.com/blog/introducing-openai-japan
https://chengzhag.github.io/publication/panfusion/
  • LINGO-2: Driving with Natural Language. This blog introduces LINGO-2, a driving model that links vision, language, and action to explain and determine driving behavior, opening up a new dimension of control and customization for an autonomous driving experience. LINGO-2 is the first closed-loop vision-language-action driving model (VLAM) tested on public roads.
  • Towards a general-purpose foundation model for computational pathology. We introduce UNI, a general-purpose self-supervised model for pathology, pre-trained using more than 100 million images from over 100,000 diagnostic H&E-stained WSIs (>77 TB of data) across 20 major tissue types.
https://aclanthology.org/2023.emnlp-main.797.pdf

News

https://github.com/ttw1018/mope-dst
  • BabyLM Challenge. The goal of this shared task is to incentivize researchers with an interest in pretraining or cognitive modeling to focus their efforts on optimizing pretraining given data limitations inspired by human development. Additionally, we hope to democratize research on pretraining — which is typically thought to be practical only for large industry groups — by drawing attention to open problems that can be addressed on a university budget.
  • Dr. Andrew Ng was appointed to Amazon’s Board of Directors. Dr. Andrew Ng is currently the Managing General Partner of AI Fund and is joining Amazon’s Board of Directors.
  • Creating sexually explicit deep fake images to be made offense in UK. Offenders could face jail if the image is widely shared under a proposed amendment to criminal justice bill
https://github.com/ivan-tang-3d/any2point
  • Humane’s Ai Pin Isn’t Ready to Replace Your Phone, But One Day It Might. AI-powered wearable Humane’s Ai Pin has numerous technical problems, ranging from AI assistant glitches to music streaming concerns. Though future software updates are promised, the first-generation gadget lacks crucial functions and experiences performance gaps despite its intention to create an ambient computing experience. The Ai Pin is positioned as a companion device for a more present and less screen-focused lifestyle, yet it struggles to replace conventional smartphones despite its meticulous design.
  • TikTok may add AI avatars that can make ads. he new feature will let advertisers and TikTok Shop sellers generate scripts for a virtual influencer to read.
https://www.theverge.com/2024/4/17/24132254/google-maps-ev-charging-directions-ai-summaries
https://arxiv.org/pdf/2404.07143.pdf
https://ai.meta.com/blog/meta-llama-3/
https://arxiv.org/pdf/2404.09937.pdf
  • OpenAI winds down AI image generator that blew minds and forged friendships in 2022. When OpenAI’s DALL-E 2 debuted on April 6, 2022, the idea that a computer could create relatively photorealistic images on demand based on just text descriptions caught a lot of people off guard. The launch began an innovative and tumultuous period in AI history, marked by a sense of wonder and a polarizing ethical debate that reverberates in the AI space to this day. Last week, OpenAI turned off the ability for new customers to purchase generation credits for the web version of DALL-E 2, effectively killing it.
  • Stability AI lays off roughly 10 percent of its workforce. Stability AI laid off 20 employees just a day after announcing the expansion of access to its new flagship model. This comes after weeks of upheaval that saw its founding CEO leave the company.
  • The Humane AI Pin is lost in translation. Though the Humane AI Pin has a lot of drawbacks, its translation feature might be the worst.

Resources

https://arxiv.org/pdf/2404.09173.pdf
  • Diffusion Models for Video Generation. This article looks at adapting image models, training diffusion models to produce video, and even producing video directly from an image model without further training.
  • Pile-T5. The contemporary AI workhorse is called T5. Eleuther retrained it using a more recent tokenizer and a longer training period. As a consequence, the fundamental model for encoding tasks is significantly enhanced.
  • GitHub Repository to File Converter. This Python script allows you to download and process files from a GitHub repository, making it easier to share code with chatbots that have large context capabilities but don’t automatically download code from GitHub.
  • AI Index Report. The 2024 Index is our most comprehensive to date and arrives at an important moment when AI’s influence on society has never been more pronounced. This year, we have broadened our scope to more extensively cover essential trends such as technical advancements in AI, public perceptions of the technology, and the geopolitical dynamics surrounding its development.
  • Accelerating AI: Harnessing Intel(R) Gaudi(R) 3 with Ray 2.10. Ray 2.10, the most recent version from Anyscale, now supports Intel Gaudi 3. In addition to provisioning Ray Core Task and Actors on a Gaudi fleet directly through Ray Core APIs, developers can now spin up and manage their own Ray Clusters. For an enhanced experience, they can also utilize Ray Serve on Gaudi via Ray Serve APIs and set up Intel Gaudi accelerator infrastructure for use at the Ray Train layer.
https://github.com/cloneofsimo/d3pm
  • Code with CodeQwen1.5. Notwithstanding these advancements, dominant coding assistants like Github Copilot, built upon proprietary LLMs, pose notable challenges in terms of cost, privacy, security, and potential copyright infringement. Today, we are delighted to introduce a new member of the Qwen1.5 open-source family, the CodeQwen1.5–7B, a specialized codeLLM built upon the Qwen1.5 language model. CodeQwen1.5–7B has been pre-trained with around 3 trillion tokens of code-related data. It supports an extensive repertoire of 92 programming languages, and it exhibits exceptional capacity in long-context understanding and generation with the ability to process information of 64K tokens.
https://github.com/stanford-oval/storm
  • OLMo 1.7–7B: A 24-point improvement on MMLU. Today, we’ve released an updated version of our 7 billion parameter Open Language Model, OLMo 1.7–7B. This model scores 52 on MMLU, sitting above Llama 2–7B and approaching Llama 2–13B, and outperforms Llama 2–13B on GSM8K.
  • Effort. With the use of the Effort library, one can alter in real-time how many calculations are made when inferring an LLM model, which can significantly increase performance while maintaining a high level of quality. Initial findings indicate that the Effort library has the potential to greatly increase LLM inference speed while preserving quality, even with modest implementation overhead. In order to further enhance the library, the author invites others to test the 0.0.1B version and offer feedback.
  • luminal. Luminal is a deep-learning library that uses composable compilers to achieve high performance.
https://huggingface.co/Tensoic/Cerule-v0.1
https://github.com/congvvc/lasagna
  • Sentence Embeddings. Introduction to Sentence Embeddings. This series aims to demystify embeddings and show you how to use them in your projects. This first blog post will teach you how to use and scale up open-source embedding models. We’ll look into the criteria for picking an existing model, current evaluation methods, and the state of the ecosystem.

Perspectives

  • Does AI need a “body” to become truly intelligent? Meta researchers think so. AIs that can generate videos, quickly translate languages, or write new computer code could be world-changing, but can they ever be truly intelligent? Not according to the embodiment hypothesis, which argues that human-level intelligence can only emerge if intelligence is able to sense and navigate a physical environment, the same way babies can.
https://www.nature.com/articles/s42256-024-00800-2
https://lilianweng.github.io/posts/2024-04-12-diffusion-video/
  • Lethal dust storms blanket Asia every spring — now AI could help predict them. As the annual phenomenon once again strikes East Asia, scientists are hard at work to better predict how it will affect people.
  • From boom to burst, the AI bubble is only heading in one direction. No one should be surprised that artificial intelligence is following a well-worn and entirely predictable financial arc
  • You can’t build a moat with AI. Differentiating AI is difficult, but the secret is in the unique data that is supplied into these models — not in the AI models themselves, which are becoming commodity-like. Take LLMs, for example. The performance of AI is strongly impacted by effective data engineering since applications need to integrate customer-specific data to respond accurately. Thus, rather than the AI technology itself, gaining a competitive edge in AI applications depends on creative data utilization.
https://mistral.ai/news/mixtral-8x22b/
  • Towards 1-bit Machine Learning Models. Recent works on extreme low-bit quantization such as BitNet and 1.58 bit have attracted a lot of attention in the machine learning community. The main idea is that matrix multiplication with quantized weights can be implemented without multiplications, which can potentially be a game-changer in terms of compute efficiency of large machine learning models.
  • From Idea to Integration: Four Steps for Founders Integrating AI. There is currently a great deal of push to incorporate AI into current goods. This brief, step-by-step manual will assist you in making the initial move.
https://hai.stanford.edu/research/ai-index-report
https://arxiv.org/pdf/2404.10301.pdf
  • The new NeuroAI. After several decades of developments in AI, has the inspiration that can be drawn from neuroscience been exhausted? Recent initiatives make the case for taking a fresh look at the intersection between the two fields.
  • Connecting molecular properties with plain language. AI tools such as ChatGPT can provide responses to queries on any topic, but can such large language models accurately ‘write’ molecules as output to our specification? Results now show that models trained on general text can be tweaked with small amounts of chemical data to predict molecular properties, or to design molecules based on a target feature.
  • MLOps vs. Eng: Misaligned Incentives and Failure to Launch? An in-depth discussion on the difficulties and solutions associated with implementing AI models in production, as well as how MLOps varies from traditional engineering, with industry experts. They talk about how to focus as a company to truly launch and why so few ML ideas ever reach production.
https://qwenlm.github.io/blog/codeqwen1.5/
  • Is Attention All You Need? In order to overcome Transformers’ shortcomings in long-context learning, generation, and inference speed, researchers are creating alternative designs that exhibit competitive quality at smaller scales but questionable scalability. Because of the quick development in this area, the Pareto frontier will likely keep growing, opening up more opportunities for lengthier context modeling and higher throughput inference, which will ultimately lead to a bigger variety of AI use cases.
  • The Shifting Dynamics And Meta-Moats Of AI. Managing complex short-, mid-, and long-term dynamics while retaining elite speed and execution, owning more of the stack, obtaining unique data, and utilizing synthetic data production are all necessary for building a successful AI business. As the AI sector develops, businesses will need to adjust to changing labor dynamics, comprehend the machine they are creating, and recognize the competitive axes on which they are based in order to forge long-lasting moats and differentiate themselves from the crowd.
  • Integration of AI in healthcare requires an interoperable digital data ecosystem. Electronic health information, including electronic health records, is needed to develop AI tools for health, but the seamless flow of data will require standards and interoperability.
https://github.com/wangyuchi369/ladic
  • To do no harm — and the most good — with AI in health care. Drawing from real-life scenarios and insights shared at the RAISE (Responsible AI for Social and Ethical Healthcare) conference, we highlight the critical need for AI in health care (AIH) to primarily benefit patients and address current shortcomings in healthcare systems such as medical errors and access disparities.
  • How to support the transition to AI-powered healthcare. To make health systems more sustainable in the long-term, incentivize artificial intelligence (AI) and digital technologies that are grounded on careful testing and real-world validation.
  • The increasing potential and challenges of digital twins. This issue of Nature Computational Science includes a Focus that highlights recent advancements, challenges, and opportunities in the development and use of digital twins across different domains.
https://arxiv.org/pdf/2404.11225v1.pdf
  • The Space Of Possible Minds. Sophisticated AIs are stretching the boundaries of our understanding of what it is to be human and forcing us to consider how we embody agency and true understanding in a spectrum of intelligent beings. Creating mutually beneficial relationships between radically different entities, recognizing the similarities and differences among various forms of intelligence, and developing principled frameworks for scaling our moral concern to the essential qualities of being are all necessary to navigate this new terrain.
  • CUDA is Still a Giant Moat for NVIDIA. NVIDIA’s proprietary interconnects and CUDA software environment, in addition to its hardware, continue to solidify the company’s leadership in the AI market. The ease of use and performance optimization of CUDA makes it superior to alternatives like AMD’s ROCM, guaranteeing that NVIDIA’s GPUs continue to be the go-to option for AI tasks. NVIDIA’s dominance in AI computing is strengthened by its investments in the CUDA ecosystem and community education.

Medium articles

A list of the Medium articles I have read and found the most interesting this week:

Meme of the week

What do you think about it? Some news that captured your attention? Let me know in the comments

If you have found this interesting:

You can look for my other articles, and you can also connect or reach me on LinkedIn. Check this repository containing weekly updated ML & AI news. I am open to collaborations and projects and you can reach me on LinkedIn. You can also subscribe for free to get notified when I publish a new story.

Here is the link to my GitHub repository, where I am collecting code and many resources related to machine learning, artificial intelligence, and more.

or you may be interested in one of my recent articles:

--

--

Salvatore Raieli

Senior data scientist | about science, machine learning, and AI. Top writer in Artificial Intelligence