WEEKLY AI NEWS: RESEARCH, NEWS, RESOURCES, AND PERSPECTIVES
AI & ML news: Week 24–30 June
OpenAI on an acquisition spree, Anthropic new model, Amazon developing its own LLM, and much more
The most interesting news, repository, articles, and resources of the week
Check and star this repository where the news will be collected and indexed:
You will find the news first in GitHub. Single posts are also collected here:
Research
- Can Long-Context Language Models Subsume Retrieval, RAG, SQL, and More? reports that long-context LLMs can compete with state-of-the-art retrieval and RAG systems without explicit training on the tasks; suggests that compositional reasoning (needed in SQL-like tasks) is still challenging for these LLMs; and encourages further research on advanced prompting strategies. performs a thorough performance analysis of long-context LLMs on in-context retrieval and reasoning. first presents a benchmark with real-world tasks requiring 1M token context.
- PlanRAG: A Plan-then-Retrieval Augmented Generation for Generative Large Language Models as Decision Makers. improves decision-making using the iterative plan-then-RAG (PlanRAG) technique, which consists of two steps: The last phase determines whether a new plan for additional analysis is required and repeats earlier steps or makes a decision based on the data. 1) An LM creates the plan for decision-making by reviewing the questions and data schema, and 2) the retriever creates the queries for data analysis; It is discovered that PlanRAG performs better than iterative RAG on the suggested Decision QA tasks.
- Be like a Goldfish, Don’t Memorize! Mitigating Memorization in Generative LLMs. demonstrates how the goldfish loss resists memorization and keeps the model useful, but it may need to train for longer to more effectively learn from the training data. It is a modification of the next-token prediction objective called goldfish loss, which helps mitigate the verbatim generation of memorized training data. It uses a simple technique that excludes a pseudorandom subset of training tokens at training time.
- Accessing GPT-4 level Mathematical Olympiad Solutions via Monte Carlo Tree Self-refine with LLaMa-3 8B. report having used an approach that combines LLMs with Monte Carlo Tree Search to achieve a mathematical Olympiad solution at the GPT-4 level. This approach aims to improve the system’s performance in mathematical reasoning by enabling features like systematic exploration, self-refinement, and self-evaluation.
- From RAGs to rich parameters: Probing how language models utilize external knowledge over parametric information for factual queries. aims to better understand how LLMs use external knowledge in place of parametric information when responding to factual queries. It finds that in an RAG pipeline, LLMs take a “shortcut” and exhibit a strong bias toward using only the context information and their parametric memory to answer the question.
- Tree Search for Language Model Agents. reveals that performance scales with increased test-time computing. It is tested on interactive online environments and applied to GPT-4o to dramatically enhance performance. It suggests an inference-time tree search technique for LM agents to explore and enable multi-step reasoning.
- Evidence of a log scaling law for political persuasion with large language models. Super persuasion is the worry that models may become noticeably more persuasive as they get bigger. The idea that larger models aren’t significantly more compelling than smaller models isn’t supported by strong data. They might, nevertheless, be able to be adjusted to be more convincing.
- MacroHFT: Memory Augmented Context-aware Reinforcement Learning On High Frequency Trading. Reinforcement learning is used in MacroHFT, a novel method of high-frequency trading (HFT) in cryptocurrency markets, to enhance profitability and decision-making.
- Soft-QMIX: Integrating Maximum Entropy For Monotonic Value Function Factorization. Researchers have included a local Q-value learning method within a maximum entropy framework to enhance QMIX, a well-liked multi-agent reinforcement learning technique.
- eaL: Efficient RLHF Training for LLMs with Parameter Reallocation. ReaLHF is a unique method that optimizes parallelization during training and dynamically redistributes parameters to improve reinforcement learning from human input (RLHF).
- AlphaFold2 structures guide prospective ligand discovery. AlphaFold2 (AF2) models have had a wide impact but mixed success in retrospective ligand recognition. We prospectively docked large libraries against unrefined AF2 models of the σ2 and serotonin 2A (5-HT2A) receptors, testing hundreds of new molecules
- GPTs are GPTs: Labor market impact potential of LLMs. OWe proposes a framework for evaluating the potential impacts of large-language models (LLMs) and associated technologies on work by considering their relevance to the tasks workers perform in their jobs. When accounting for current and likely future software developments that complement LLM capabilities, this share jumps to just over 46% of jobs.
- Leveraging Passage Embeddings for Efficient Listwise Reranking with Large Language Models. PE-Rank is a novel passage ranking method that leverages context compression through single passage embeddings to increase performance.
- MoA: Mixture of Sparse Attention for Automatic Large Language Model Compression. By customizing sparse attention configurations for each head and layer, the Mixture of Attention (MoA) method maximizes sparse attention in large language models.
- GeoMFormer: A General Architecture for Geometric Molecular Representation Learning. A new Transformer-based model called GeoMFormer learns both equivariant and invariant properties to enhance molecular modeling.
- Making my local LLM voice assistant faster and more scalable with RAG. Researchers classified data, precomputed embeddings, and dynamically generated examples to improve the efficiency and scalability of an LLM voice assistant.
- Retrieval Augmented Instruction Tuning for Open NER with Large Language Models. Using big language models, Retrieval Augmented Instruction Tuning (RA-IT) enhances information extraction.
- Data curation via joint example selection further accelerates multimodal learning. In pre-training, actively choosing the next best batch is a difficult and open problem. This research from DeepMind investigates how to match SOTA for a variety of tasks while using only 10% of FLOPs and hard-mining negative samples.
- Director3D: Real-world Camera Trajectory and 3D Scene Generation from Text. A system called Director3D was created to improve camera trajectory modeling and 3D scene production in the real world. Director3D creates lifelike 3D scenes from text descriptions by using a Multi-view Latent Diffusion Model and a Trajectory Diffusion Transformer.
- Prompt Engineering Tool. An excellent prompting toolset that helps evaluate the effectiveness of various prompts, nearly completely composed of Sonnet 3.5.
- Meta Large Language Model Compiler: Foundation Models of Compiler Optimization. Two language models that can decompile to LLVM IR and compile code to assembly have been made available by Meta. They received additional training after being trained on 546 billion tokens of superior-quality data. They can accomplish 45% round trip disassembly performance and 77% optimized assembling performance.
News
- Geologists raise concerns over possible censorship and bias in Chinese chatbot. GeoGPT was developed as part of a Chinese-funded earth sciences program aimed at researchers in the global south
- OpenAI acquires Rockset. Rockset is a robust database that supports both indexing and querying. The startup was acquired by OpenAI in order to enhance its infrastructure for retrieval.
- Snapchat AI turns prompts into new lens. Snapchat’s upcoming on-device AI model could transform your background — and your clothing — in real time.
- HeyGen Raises $60M Series A to Scale Visual Storytelling for Businesses. HeyGen, an AI video-generating platform, has raised $60 million in Series A funding to improve its studio-quality video creation and localization capabilities quickly and affordably. HeyGen, which just generated $35 million in ARR, strives to democratize visual storytelling for companies of all sizes.
- AI candidate running for Parliament in the U.K. says AI can humanize politics. Voters can talk to AI Steve, whose name will be on the ballot for the U.K.’s general election next month, to ask policy questions or raise concerns.
- Anthropic has a fast new AI model — and a clever new way to interact with chatbots . Claude 3.5 Sonnet is apparently Anthropic’s smartest, fastest, and most personable model yet.
- AIs are coming for social networks. An app called Butterflies puts a new spin on how we interact with AI. With Meta and others making similar moves, social media is about to get a lot weirder.
- OpenAI walks back controversial stock sale policies, will treat current and former employees the same. OpenAI has changed its policies toward secondary share sales to allow current and former employees to participate equally in its annual tender offers, CNBC has learned. All current and former staffers “will have the same sales limit” and be able to participate at the same time, OpenAI said in documents shared with stakeholders.
- Report: Amazon developing AI chatbot that would compete with ChatGPT and others. Amazon is developing its own consumer-focused AI chatbot that would compete with OpenAI’s ChatGPT and could be revealed later this year, according to a report from Business Insider.
- Multi is joining OpenAI. OpenAI continues its purchase binge by purchasing additional desktop-related infrastructure.
- Artificial Marketing Intelligence at your fingertips: MarTech startup Ability AI secures $1.1M pre-seed round funding to automate the process. Ability AI, a martech startup specializing in full-cycle paid marketing automation with the help of autonomous AI agents, announced today that it has raised $1.1 million in pre-seed funding from SMRK VC as a lead investor, with the participation of other funds and angels.
- Claude 3.5 suggests AI’s looming ubiquity could be a good thing. If you don’t like chatbots popping up everywhere, get ready to be peeved. But the latest version of Anthropic shows AI is becoming more useful — and, crucially, affordable
- Apple was found in breach of EU competition rules. European Commission finds iPhone maker broke new laws designed to protect smaller competitors against big tech platforms
- Etched is building an AI chip that only runs one type of model. Etched is among the many, many alternative chip companies vying for a seat at the table — but it’s also among the most intriguing.
- Stability AI Secures Significant New Investment.Stability AI was able to obtain a “significant infusion of capital” from both new and existing investors in addition to hiring a new CEO.
- Training a 70B model from scratch: open-source tools, evaluation datasets, and learnings. Earlier this year, we pre-trained and fine-tuned a 70B-parameter model that outperforms GPT-4o zero-shot on a range of reasoning and coding-related benchmarks and datasets. Our fine-tuned model, pre-trained on 2T tokens, roughly matches a fine-tuned Llama 3 70B, which was pre-trained on more than seven times as much data.
- OpenAI Pushes Back Voice Mode. The sophisticated Voice Mode that OpenAI showcased in its Spring Update will go live in alpha form in late July for a limited group of ChatGPT Plus subscribers.
- Meta’s AI translation model embraces overlooked languages. More than 7,000 languages are in use throughout the world, but popular translation tools cannot deal with most of them. A translation model that was tested on under-represented languages takes a key step towards a solution.
- Researchers fool university markers with AI-generated exam papers. University of Reading project poses questions for integrity of coursework and take-home student assignments
- YouTube tries convincing record labels to license music for AI song generator. Video site needs labels’ content to legally train AI song generators.
- Evolutionary Scale Raises $142m series A. A biology startup called Evolutionary Scale has come out of stealth with significant funding. Additionally, it declared the release of ESM 3, its foundation model, a 98B parameter model trained for 10²⁴ Flops on 771B biological tokens. Using the model, it found a new luminous green protein that is not found in nature.
- Waymo One is now open to everyone in San Francisco. With its driverless cars, Waymo One now makes it possible for anybody in San Francisco to request a ride. After providing tens of thousands of trips per week, the company is expanding. Its all-electric fleet helps it achieve its sustainability goals and boosts the local economy. Waymo claims that its cars are much less likely to be involved in collisions than those driven by humans, citing increased safety.
- ChatGPT on your desktop. Users can now download the ChatGPT desktop software for macOS.
- AI will be help rather than hindrance in hitting climate targets, Bill Gates says. Microsoft co-founder says efficiencies for technology and electricity grids will outweigh energy use by data centers
- Snap Lense Studio 5.0.T he GenAI suite, which Snap introduced with Lens Studio 5.0, is a fantastic development and a huge help for creating augmented reality apps.
- Instagram Launching An AI Studio. Instagram’s “AI Studio” enables developers to create self-aware AI chatbots. In the US, an early test of it is presently underway.
- Dust raises $16m series A. Dust, one of the first modern-day chaining and agency companies, raised more money after surpassing $1 million in annual revenue.
- ElevenLabs launches iOS app that turns ‘any’ text into audio narration with AI.” ElevenLabs Reader: AI Audio,” the company’s debut iOS app, enables users to listen on the go by turning text files or web links into audio narration.
Resources
- Open-Sora 1.2 Report. a 1.1B parameter model trained on over 30 million data points, this open-source video generation model can produce 16-second 720p videos. It also features an improved diffusion model and video compression network for both temporal and spatial compression, which lowers training costs and improves the controllability of the generations.
- LLM101n: Let’s build a Storyteller. An outline for a new course that Andrej Karpathy is working on can be found in a new repository. It entails creating a narrative-capable aligned language model. Code, video lectures, and other learning resources are included in the course.
- AutoCodeRover: Autonomous Program Improvement. AutoCodeRover is a new technology that combines sophisticated code search methods with big language models to automate software enhancements, such as feature additions and problem fixes.
- NLUX. NLUX is a React and JavaScript open-source library for building conversational AI interfaces. It makes it super simple to build web applications powered by Large Language Models (LLMs) and AI. With just a few lines of code, you can add conversational AI capabilities and interact with your favorite AI models.
- Claudette. Claudette is a higher-level and easier-to-use way to interact with Claude.
- top CVPR 2024 papers. Computer Vision and Pattern Recognition is a massive conference. In 2024 alone, 11,532 papers were submitted, and 2,719 were accepted. I created this repository to help you search for crème de la crème of CVPR publications.
- TTS in 7000 Languages. Recently, Toucan published a collection of new text-to-speech models that are now compatible with all ISO-639–3 standard languages.
- ParaLLM: 1300+ tok/s on a MacBook. When batch parallel KV cache is implemented in MLX, inference times for the creation of synthetic data and model completions are significantly sped up.
- Train vision models in TRL. Transformers can be trained using reinforcement learning with the help of TRL, a Hugging Face library. You may apply the same procedure for vision-based language models, such as LLaVA, using this example.
- Rethinking Remote Sensing Change Detection With A Mask View. Two new models for remote sensing change detection — CDMask and CDMaskFormer — are presented in this study.
- llama.ttf. This article explains how to use a font file to run a little Llama language model.
- june. June is a local voice chatbot that combines the power of Ollama (for language model capabilities), Hugging Face Transformers (for speech recognition), and the Coqui TTS Toolkit (for text-to-speech synthesis). It provides a flexible, privacy-focused solution for voice-assisted interactions on your local machine, ensuring that no data is sent to external servers.
- Building a personalized code assistant with open-source LLMs using RAG Fine-tuning. AI and Morph Labs collaborated to create an excellent blog post about optimizing models for retrieval enhanced generation. They also demonstrate a few applications of generated data.
- EvalAlign: Evaluating Text-to-Image Models through Precision Alignment of Multimodal Large Models with Supervised Fine-Tuning to Human Annotations. A novel metric called EvalAlign was created to enhance the assessment of generative models that convert text to images. EvalAlign provides fine-grained accuracy and stability in contrast to current measures. It emphasizes text-image alignment and image faithfulness.
- Fine-tuning Florence-2 — Microsoft’s Cutting-edge Vision Language Models. Florence-2, released by Microsoft in June 2024, is a foundation vision-language model. This model is very attractive because of its small size (0.2B and 0.7B) and strong performance on a variety of computer vision and vision-language tasks. Florence supports many tasks out of the box: captioning, object detection, OCR, and more.
- Accelerating Neural Network Training with Semi-Structured (2:4) Sparsity. Specifically designed kernels have been created by the PyTorch team to utilize sparse cores, which are typically exclusively used for inference.
- FreeTraj: Tuning-Free Trajectory Control in Video Diffusion Models. Diffusion models are used in FreeTraj, a tuning-free technique for controlling motion trajectories in video creation. To direct the generated content, it adjusts the attention mechanisms and noise sampling.
- OpenGlass — Open Source Smart Glasses. Turn any glasses into hackable smart glasses with less than $25 of off-the-shelf components. Record your life, remember people you meet, identify objects, translate text, and more.
- An Intuitive Explanation of Sparse Autoencoders for LLM Interpretability. The Golden Gate Claude served as a potent illustration of how to influence and evaluate models using SAEs. This work includes some sample code for training these models and an easy-to-understand explanation of how it operates.
- RES-Q. A new benchmark called RES-Q is designed to evaluate how well huge language models can modify code repositories using instructions in natural language.
- Balancing Old Tricks with New Feats: AI-Powered Conversion From Enzyme to React Testing Library at Slack. Using a hybrid method, Slack developers used AI Large Language Models with Abstract Syntax Tree transformations to automate the translation of more than 15,000 unit tests from Enzyme to React Testing Library. The team utilized Anthropic’s Claude 2.1 AI model in conjunction with DOM tree capture for React components to achieve an 80% success rate in automatic conversions. This ground-breaking project demonstrates Slack’s dedication to using AI to improve developer productivity and experience. It’s part of the continuous attempts to remain ahead of the always-changing frontend scene.
- R2R. R2R was designed to bridge the gap between local LLM experimentation and scalable, production-ready Retrieval-Augmented Generation (RAG). R2R provides a comprehensive and SOTA RAG system for developers, built around a RESTful API for ease of use.
- Internist.ai 7b. Internist.ai 7b is a medical domain large language model trained by medical doctors to demonstrate the benefits of a physician-in-the-loop approach. The training data was carefully curated by medical doctors to ensure clinical relevance and required quality for clinical practice.
- Finding GPT-4’s mistakes with GPT-4. CriticGPT, a model based on GPT-4, writes critiques of ChatGPT responses to help human trainers spot mistakes during RLHF
- ALPBench: A Benchmark for Active Learning Pipelines on Tabular Data. A program called ALPBench was created to standardize active learning query benchmarks.
- Introducing AuraSR — An open reproduction of the GigaGAN Upscaler. FAL recently made AuraSR, a high-resolution picture upscale, open-sourced. Even with repeated applications, it may upscale by 4x with just one forward pass. AuraSR performs admirably with created photos.
- Point-SAM: Promptable 3D Segmentation Model for Point Clouds. Point-SAM, a transformer-based 3D segmentation model, has been introduced by researchers in response to the increasing demand for comprehensive 3D data.
- GenIR-Survey. This survey explores generative information retrieval (GenIR), a novel approach to information retrieval that shifts from conventional search techniques to ones that generate results dynamically.
- Gemma 2. Gemma is a family of lightweight, state-of-the-art open models from Google, built from the same research and technology used to create the Gemini models.
- MatText: Do Language Models Need More than Text & Scale for Materials Modeling? MatText is a collection of benchmarking tools and datasets intended to assess the effectiveness of language models in the field of materials science.
- mamba2. A quick implementation of Mamba 2
Perspectives
- The Long View on AI. AI has the potential to cause tremendous growth rates and technological improvements, according to historical statistics. Society will probably be able to adjust to these rapid changes just as it has in the past.
- AI’s Hidden Opportunities: Shawn “swyx” Wang on New Use Cases and Career. Well-known developer Shawn “swyx” Wang discusses the untapped potential for conventional software professionals wishing to go into artificial intelligence. In particular, examining how to enhance existing tools, use AI to summarization, and more.
- Apple Intelligence. Rather than developing stand-alone AI products, Apple has incorporated generative AI into its core apps, improving services like Mail classification, Safari summaries, and Siri’s functioning. This demonstrates the company’s focus on user control and privacy.
- Apple intelligence and AI maximalism. Apple has shown a bunch of cool ideas for generative AI, but much more, it is pointing to most of the big questions and proposing a different answer — that LLMs are commodity infrastructure, not platforms or products.
- How To Solve LLM Hallucinations. Lamini has created Memory Tuning, which effectively embeds particular facts into models without sacrificing general knowledge and reduces hallucinations by 95%.
- AI machine translation tools must be taught cultural differences too. But to successfully preserve or revitalize minority languages, the scope of large-language-model (LLM) training needs to be broadened.
- Misinformation might sway elections — but not in the way that you think. Rampant deepfakes and false news are often blamed for swaying votes. Research suggests it’s hard to change people’s political opinions, but easier to nudge their behaviour.
- How I’m using AI tools to help universities maximize research impacts. Artificial intelligence algorithms could identify scientists who need support with translating their work into real-world applications and more. Leaders must step up.
- The Future of LLM-Based Agents: Making the Boxes Bigger. Long-term planning and system-level resilience are two essential strategies that assist move Agents from the playground into the real world, and they are discussed in this post. These introduce the ability to create plans of a higher level for the Agents, allowing for adaptability in the middle of an episode. They also introduce systems techniques to intelligently orchestrate the models, resulting in increased performance and accuracy.
- Apple, Microsoft Shrink AI Models to Improve Them. Large language models are becoming less popular as IT companies shift their focus to more efficient small language models (SLMs). Apple and Microsoft have introduced models with far fewer parameters that nonetheless perform comparably or even better in benchmarks. According to the CEO of OpenAI, we’re past the LLM era since SLMs have benefits including greater accessibility for smaller entities, local device operation, and potential insights into human language acquisition. Even though SLMs are narrower in scope, their performance is enhanced by training them on high-quality, or “textbook-quality” data.
- Are Tech-Enabled Vertical Roll-Ups the Future or the Past? The ability to generate excess cash flows through operational efficiencies is a prerequisite for roll-up methods. It’s possible that the development of AI offers a new lever that fully unlocks the roll-up strategy. Are rollups for SMBs and verticals the future? Two different perspectives on this issue are presented in this post.
Meme of the week
What do you think about it? Some news that captured your attention? Let me know in the comments
If you have found this interesting:
You can look for my other articles, and you can also connect or reach me on LinkedIn. Check this repository containing weekly updated ML & AI news. I am open to collaborations and projects and you can reach me on LinkedIn. You can also subscribe for free to get notified when I publish a new story.
Here is the link to my GitHub repository, where I am collecting code and many resources related to machine learning, artificial intelligence, and more.
or you may be interested in one of my recent articles: