WEEKLY AI NEWS: RESEARCH, NEWS, RESOURCES, AND PERSPECTIVES
AI & ML news: Week 15–21 July
OpenAI and Mistral new models, Andrej Karpathy’s new company, and much more
The most interesting news, repository, articles, and resources of the week
Check and star this repository where the news will be collected and indexed:
You will find the news first in GitHub. Single posts are also collected here:
Research
- RankRAG: Unifying Context Ranking with Retrieval-Augmented Generation in LLMs. demonstrates how a Llama3-RankRAG significantly outperforms Llama3-ChatQA-1.5 and GPT-4 models on nine knowledge-intensive benchmarks. It also introduces a new instruction fine-tuning framework to perform effective context ranking and answering generation to enhance an LLM’s RAG capabilities. This framework makes use of a small ranking dataset to outperform existing expert ranking models.
- Mixture of A Million Experts. aims to decouple computational cost from parameter count by efficiently routing to a large number of tiny experts through a learned index structure used for routing. It shows superior efficiency compared to dense FFW, coarse-grained MoEs, and Product Key Memory (PKM) layers. introduces a parameter-efficient expert retrieval mechanism that uses the product key technique for sparse retrieval from a million tiny experts.
- Reasoning in Large Language Models: A Geometric Perspective. establishes a relationship between the expressive power of LLMs and the density of their self-attention graphs; their analysis shows that the density of these graphs defines the intrinsic dimension of the inputs to the MLP blocks. investigates the reasoning of LLMs from a geometrical perspective; reports that a higher intrinsic dimension implies greater expressive capacity of the LLM.
- Lookback Lens: Detecting and Mitigating Contextual Hallucinations in Large Language Models Using Only Attention Maps. Contextual Hallucinations Mitigation in LLMs: This paper presents a novel approach that both detects and reduces contextual hallucinations in LLMs (e.g., reduces by 10% in the XSum summarization task). It does this by building a hallucination detection model based on input features provided by the ratio of attention weights on the context vs. newly generated tokens (for each attention head). The theory behind this approach is that contextual hallucinations are related to the degree to which an LLM attends to the contextual information provided. Additionally, they suggest a decoding strategy that mitigates contextual hallucinations based on their detection method, and this can be applied to other models without requiring retraining.
- RouteLLM. uses human preference data and data augmentation techniques in its training framework to improve performance and reduce costs by over two times in some cases, all while maintaining response quality. It suggests effective router models to dynamically choose between stronger and weaker LLMs during inference to achieve a balance between cost and performance.
- Learning to (Learn at Test Time): RNNs with Expressive Hidden States. suggests new layers for sequence modeling that have linear complexity and an expressive hidden state; defines a hidden state as an ML model that can update even when tested; a two-layer MLP-based hidden state combined with a linear model is found to match or outperform baseline models such as Mamba, Transformers, and contemporary RNNs; the linear model is faster than Mamba in wall-clock time and matches Transformer at 8k context.
- Physicochemical graph neural network for learning protein-ligand interaction fingerprints from sequence data. Predicting the binding affinity between small-molecule ligands and proteins is a key task in drug discovery; however, sequence-based methods are often less accurate than structure-based ones. Koh et al. develop a graph neural network using physicochemical constraints that discovers interactions between small molecules and proteins directly from sequence data and that can achieve state-of-the-art performance without the need for costly, experimental 3D structures.
- Generic protein-ligand interaction scoring by integrating physical prior knowledge and data augmentation modeling. Machine learning can improve scoring methods to evaluate protein-ligand interactions, but achieving good generalization is an outstanding challenge. Cao et al. introduce EquiScore, which is based on a graph neural network that integrates physical knowledge and is shown to have robust capabilities when applied to unseen protein targets.
- MARS: Mixture of Auto-Regressive Models for Fine-grained Text-to-image Synthesis. Semantic Vision-Language Integration Expert (SemVIE) is a feature of MARS, a novel text-to-image (T2I) generation system.OpenDiLoCo.Prime Intellect duplicated the DeepMind technique known as Distributed Low-Communication (DiLoCo). It preserves GPU consumption while enabling cross-datacenter training.
- gpu.cpp. A new lightweight and portable library for WebGPU-based low-level GPU computations has been launched by Answer AI. Writing cross-GPU kernels is possible with it, and portable instructions are provided.
- ViTime: A Visual Intelligence-based Foundation Model for Time Series Forecasting. Rather than using conventional numerical data fitting, the foundation model for time series forecasting (TSF) called ViTime makes use of visual intelligence.
- Gradient Boosting Reinforcement Learning. The benefits of Gradient Boosting Trees (GBT) are applied to reinforcement learning using Gradient-Boosting RL (GBRL).
- SpreadsheetLLM: Encoding Spreadsheets for Large Language Models. An excellent study explaining how to convert a spreadsheet into a suitable representation for a contemporary LLM. Q/A, formatting, and other data operations can be done using this.
- LAPT: Label-driven Automated Prompt Tuning for OOD Detection with Vision-Language Models. Label-focused A novel technique for out-of-distribution (OOD) detection in Vision-Language Models such as CLIP is Automated Prompt Tuning (LAPT).
- Prover-Verifier Games improve legibility of language model outputs. To enable a weak model to grade content reliably, OpenAI trained a strong model to produce more legible text. The company discovered that this improved overall readability generally.
- Temporally Consistent Stereo Matching. By guaranteeing temporal consistency, researchers present a novel technique for video stereo matching that improves depth estimation.
- Patch-Level Training for Large Language Models. To increase training efficiency for big language models, researchers suggest patch-level training.
News
- Elon Musk promises ‘battle in court’ over EU’s crackdown on X’s blue checks. Regulators’ findings suggest social networks breached Digital Services Act and could be fined 6% of global turnover
- AI prompts can boost writers’ creativity but result in similar stories, study finds. Ideas generated by ChatGPT can help writers who lack inherent flair but may mean there are fewer unique ideas
- OpenAI is reportedly working on more advanced AI models capable of reasoning and ‘deep research’.The secret project is code-named ‘Strawberry,’ according to a Reuters report.
- Meet the AI Agent Engineer. At his company, Sierra, Bret Taylor, the Chairman of the Board of OpenAI, has created a new position called Agent Engineer. One of the first people in the role recently wrote a blog post describing the Sierra team’s view of agent engineering as a new field inside AI engineering.
- OpenAI Revenue. An estimated $3.4 billion in revenue for OpenAI comes from its ChatGPT services.
- Taming the tail utilization of ads inference at Meta scale. Meta’s machine learning inference services saw a two-thirds decrease in failure rates, a 35% increase in computing efficiency, and a halving of p99 latency because to changes made in the tail utilization. With these improvements, Meta’s ad delivery systems are guaranteed to be able to manage growing workloads without requiring more resources and to uphold service level agreements. Predictive scaling and managing the machine learning model lifetime with Meta’s unified platform, IPnext, are examples of continuous improvement techniques.
- Meta to reportedly launch largest Llama 3 model on July 23. Meta Platforms will release its largest Llama 3 model on July 23, The Information reported on Friday, citing an employee of the company. The new model, boasting 405 billion parameters, will be multimodal and capable of understanding and generating both images and text.
- Quora’s Poe now lets users create and share web apps. Poe, Quora’s subscription-based, cross-platform aggregator for AI-powered chatbots like Anthropic’s Claude and OpenAI’s GPT-4o, has launched a feature called Previews that lets people create interactive apps directly in chats with chatbots.
- Microsoft CTO Kevin Scott thinks LLM “scaling laws” will hold despite criticism. Will LLMs keep improving if we throw more compute at them? OpenAI dealmaker thinks so.
- OpenAI says there are 5 ‘levels’ for AI to reach human intelligence — it’s already almost at level 2. The company shared a five-level system it developed to track its artificial general intelligence, or AGI, progress with employees this week, an OpenAI spokesperson told Bloomberg. The levels go from the currently available conversational AI to AI that can perform the same amount of work as an organization.
- AI startup Hebbia raised $130M at a $700M valuation on $13 million of profitable revenue. Hebbia, a startup that uses generative AI to search large documents and respond to large questions, has raised a $130 million Series B at a roughly $700 million valuation led by Andreessen Horowitz, with participation from Index Ventures, Google Ventures and Peter Thiel.
- Pixel 9 Pro might come with 1-year of Gemini Advanced. With less than a month until Made by Google 2024, the latest leak suggests that the Pixel 9 Pro will come with 1 year of Gemini Advanced.
- Company Abandons Plans to Give AI Workers “Rights” and Add Them to Org Chart After Outcry From Human Employees. Following its announcement that it would give AI algorithms “rights” and integrate them as “digital workers” with managers and performance evaluations in its product, the HR software provider Lattice encountered criticism.
- Want to know how AI will affect government and politics? The bots have the answers. Tony Blair’s powerful think tank asked ChatGPT how AI might affect public sector jobs. Critics say the results were … wonky
- Andrej Karpathy’s new company. A new AI startup with an emphasis on education, Eureka Labs aims to transform the way we acquire new knowledge.
- Whistleblowers accuse OpenAI of ‘illegally restrictive’ NDAs. Whistleblowers have accused OpenAI of placing illegal restrictions on how employees can communicate with government regulators, according to a letter obtained by The Washington Post.
- Apple, Nvidia, Anthropic Used Thousands of Swiped YouTube Videos to Train AI. AI companies are generally secretive about their sources of training data, but an investigation by Proof News found some of the wealthiest AI companies in the world have used material from thousands of YouTube videos to train AI. Companies did so despite YouTube’s rules against harvesting materials from the platform without permission.
- SciCode: A Research Coding Benchmark Curated by Scientists. The objective of coding models has always been HumanEval. It is essentially solved now. This benchmark is the next step forward in solving difficult science programming puzzles.
- SmolLM — blazingly fast and remarkably powerful. This blog post introduces SmolLM, a family of state-of-the-art small models with 135M, 360M, and 1.7B parameters, trained on a new high-quality dataset. It covers data curation, model evaluation, and usage.
- Benchmarking results for vector databases. Redis has released updated information on the best vector databases, measuring throughput and latency with the help of the industry-recognized Qdrant framework. Key findings include Redis achieving much higher queries per second and lower latency than Qdrant, Milvus, and Weaviate, and outperforming competitors by 62% for low-complexity datasets and by 21% for high-dimensional datasets.
- Announcing the launch of Gray Swan. A company specializing in creating tools to assist businesses in evaluating the risks associated with their AI systems and protecting their AI installations from inappropriate use is called Gray Swan AI.
- Anthropic releases Claude app for Android. Anthropic launched its Claude Android app on Tuesday to bring its AI chatbot to more users. This is Anthropic’s latest effort to convince users to ditch ChatGPT by making Claude available in more places.
- AI tool can pinpoint dementia’s cause — from stroke to Alzheimer’s. An algorithm that distinguishes among a host of underlying causes of dementia could be used for diagnosis in hospitals and clinics.
- Portal needed for victims to report AI deep fakes, federal police union says. Parliamentary inquiry told police forced to ‘cobble together’ laws to prosecute a man who allegedly spread deep fake images of women
- Meta Won’t Offer Future Multimodal AI Models In The EU. Due to legislative uncertainties, Meta will not be able to provide future multimodal AI models to consumers in the EU; however, Llama 3 will still be offered in text only.
- Anthropic teams up with venture capital firm to kickstart $100M AI startup fund. Recipients of six-digit investments aren’t required to use Claude
- Anthropic doubles output token limit. Anthropic has doubled the max output token limit for Claude 3.5 Sonnet from 4096 to 8192 in the Anthropic API.
- AI-powered video creation for work. An AI-powered video creation tool for the workplace, Google Vids is tightly integrated with the Workspace suite.
- aiXplain Secures $6.5M pre-Series A to Universalize AI Agent Development. Saudi Aramco’s venture arm, Wa’ed Ventures, has announced a $6.5 million pre-series A fundraising round for aiXplain (a global top 10 firm by market cap).
- Meta pulls plug on the release of advanced AI model in EU. ‘Unpredictable’ privacy regulations prompt the Facebook owner to scrap regional plans for multimodal Llama
- Mistral NeMo. A novel tokenizer was used to train the multilingual Mistral Nemo 12B model, which exhibits strong multilingual and English performance. Also supported are 128k contexts.
- OpenAI is releasing a cheaper, smarter model. OpenAI is releasing a lighter, cheaper model for developers to tinker with called GPT-4o Mini. It costs significantly less than full-sized models and is said to be more capable than GPT-3.5.
- Cohere and Fujitsu Announce Strategic Partnership To Provide Japanese Enterprise AI Services. Cohere and Fujitsu have partnered strategically to create and offer enterprise AI services that have the best Japanese language capabilities in the market. These services, which will provide private cloud deployments to businesses in highly regulated sectors including financial institutions, the public sector, and research and development units, will be developed with security and data privacy as their primary goals.
- OpenAI And Broadcom Held Discussions About Producing An AI Chip. OpenAI and Broadcom have discussed developing a new artificial intelligence server processor.
- Flow Studio. Flow Studio creates 3-minute films that are completely produced, with a believable story, dependable characters, and automatically synced sound effects and background music.
- Slow recovery from IT outage begins as experts warn of future risks. Fault in CrowdStrike caused airports, businesses and healthcare services to languish in ‘largest outage in history’
Resources
- A Survey on Mixture of Experts. a survey study on the Mixture of Experts (MoE), covering its technical specifications, open-source implementations, assessment methods, and practical uses.
- Internet of Agents: Weaving a Web of Heterogeneous Agents for Collaborative Intelligence. a new framework to address several limitations in multi-agent frameworks such as integrating diverse third-party agents and adaptability to dynamic task requirements; introduces an agent integration protocol, instant messaging architecture design, and dynamic mechanisms for effective collaboration among heterogeneous agents.
- Meta 3D Gen. a new pipeline that can generate 3D assets from text in less than a minute, from start to finish. It incorporates cutting-edge parts like TextureGen and AssetGen to represent objects in three dimensions: view space, volumetric space, and UV space. It also achieves a 68% win rate compared to the single-stage model.
- Challenges, evaluation and opportunities for open-world learning. Here we argue that designing machine intelligence that can operate in open worlds, including detecting, characterizing, and adapting to structurally unexpected environmental changes, is a critical goal on the path to building systems that can solve complex and relatively under-determined problems.
- Machine learning-aided generative molecular design. Data-driven generative methods have the potential to greatly facilitate molecular design tasks for drug design.
- Introducing AuraFlow v0.1, an Open Exploration of Large Rectified Flow Models. Fal trained a new open model called AuraFlow. The model has 5.8B parameters and was trained with muP.
- Lynx: State-of-the-Art Open Source Hallucination Detection Model. a model for identifying language model hallucinations that performs noticeably better than the state of the art in its generations.
- Hyper-3DG: Text-to-3D Gaussian Generation via Hypergraph. Hyper-3DG enhances text-to-3D model creation by emphasizing the intricate connections between texture and geometry.
- LightenDiffusion. By utilizing diffusion models and Retinex theory, LightenDiffusion enhances low-light photos.
- ProDepth. A novel framework for monocular depth estimation called ProDepth addresses problems brought on by moving objects in dynamic situations. It finds and fixes discrepancies in in-depth estimates using a probabilistic method.
- Open-Canopy. A high-resolution (1.5 m) publicly available dataset called Open-Canopy is used to estimate canopy height over France.
- crawlee-python. Crawlee — A web scraping and browser automation library for Python to build reliable crawlers. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with BeautifulSoup, Playwright, and raw HTTP. Both headful and headless modes. With proxy rotation.
- Mathstral.Mistral’s newest math model performs well on various benchmarksCodestral Mamba. Codestral Mamba, a Mamba2 language model specialized in code generation, available under an Apache 2.0 license.
- exo.Run your own AI cluster at home on everyday devices.
- Refuse Whenever You Feel Unsafe: Improving Safety in LLMs via Decoupled Refusal Training. Through addressing refusal position bias, a novel method called Decoupled Refusal Training (DeRTa) enhances safety tuning in large language models.
- PID: Physics-Informed Diffusion Model for Infrared Image Generation. By integrating physical laws into the conversion process, researchers have created a Physics-Informed Diffusion (PID) model that enhances the translation of RGB images to infrared images.
- What happened to BERT & T5? On Transformer Encoders, PrefixLM, and Denoising Objectives. Excellent post on encoders, prefixlm, denoising aims, and other contemporary language modeling techniques by Yi Tay of Reka and Google.
- LiDAR Semantic Segmentation. A novel technique called SFPNet is intended to be universal across various LiDAR technology types. Instead of employing window attention as in the past, SFPNet uses sparse focus point modulation to extract and dynamically collect multi-level contexts.
- Praison AI. Using prior agent frameworks as a springboard, Praison AI is a low-code, centralized framework with customizable features and human-agent interaction that makes it easier to create and manage multi-agent systems for a range of LLM applications.
- Video Object Segmentation with World Knowledge. Reasoning Video Object Segmentation (ReasonVOS) is a new task that uses implicit text queries to generate segmentation masks. It requires complex reasoning and world knowledge.
- Enhancing Class Learning Without Forgetting. In order to enhance Class-Incremental Semantic Segmentation (CISS), this project presents a background-class separation framework.
- Leapfrogging traditional vector-based RAG with language maps. When developing a chat application over data, retrieval plays a major role. But frequently, systems are delicate to the format of the data being accessed. Chat-based performance is greatly enhanced by creating a language map (e.g., Wikipedia-style entry) of the material and using that for retrieval. This is how code-based question answering is handled by mutable AI.
- Removing Inappropriate Content from Diffusion Models. Using a revolutionary technique called Reliable and Efficient Concept Erasure (RECE), improper content may be removed from diffusion models in only three seconds without requiring additional fine-tuning.LLM2sh.A command-line tool called LLM2sh uses LLMs to convert requests written in plain English into shell instructions.
- GraphMuse. GraphMuse is a Python Library for Graph Deep Learning on Symbolic Music. This library intends to address Graph Deep Learning techniques and models applied specifically to Music Scores.
- E5-V: Universal Embeddings with Multimodal Large Language Models. A novel framework called E5-V modifies Multimodal Large Language Models (MLLMs) to provide multimodal embeddings that are universal. With prompts, it bridges the gap between various input formats and achieves remarkable results in multimodal activities without the need for fine-tuning.
- Strategizing Your Preparation for Machine Learning Interviews. Interviews for machine learning might be difficult. You may greatly increase your chances by being aware of the range of machine learning positions and adjusting your preparation to fit particular job duties and specializations. To approach interviews with confidence, concentrate on learning the fundamentals, investigating technology unique to the organization, and regularly monitoring your progress.
- Uncensor Any LLM With Abliteration. For safety, llama models are heavily restricted, which reduces their versatility. Through the identification and elimination of the rejection mechanism, the “abliteration” technique uncensored them, enabling models to respond to all stimuli without requiring retraining.
- SPIQA: A Dataset for Multimodal Question Answering on Scientific Papers. SPIQA is a quality assurance dataset created to assist users in rapidly locating solutions within scientific research publications by deciphering intricate figures and tables.
Perspectives
- AI’s ‘Oppenheimer moment’: autonomous weapons enter the battlefield. The military use of AI-enabled weapons is growing, and the industry that provides them is booming
- Will generative AI transform robotics? In the current wave of excitement about applying large vision–language models and generative AI to robotics, expectations are running high, but conquering real-world complexities remains challenging for robots.
- Introducing: The Managed-Service-as-Software (M-SaS) Startup. AI-driven, service-oriented firms are creating Managed-Service-as-Software (M-SaS) enterprises, which follow a new business model blueprint in building their businesses. Startups need to adopt a fundamentally different attitude to use AI instead of selling it. These firms start off labor-intensive with low gross margins and then use automation and artificial intelligence (AI) to progressively move to greater SaaS-like gross margins.
- Could AIs become conscious? Right now, we have no way to tell. With divergent opinions on whether developments in machine learning and neuromorphic computing can result in sentient computers, the discussion over artificial intelligence potentially gaining awareness is becoming more heated. The theory of Integrated Information holds that the current hardware limits make AI consciousness implausible, while computational functionalist theories such as Global Neuronal Workspace Theory and Attention Schema Theory believe that AI awareness is inevitable. Neuroscience is trying to come up with a single theory of consciousness in order to better understand how it might show up in AI.
- Generative AI makes for better scientific writing — but beware the pitfalls. As researchers who have sometimes struggled with articulating intricate concepts, we find his suggestions for using ChatGPT to improve the clarity and coherence of academic papers compelling. But potential pitfalls warrant further discussion.
- My trip to the frontier of AI education. First Avenue Elementary School in Newark is utilizing Khanmigo, an AI-powered tutor and teacher assistant created by Khan Academy, to include AI tools for education. Teachers in the classroom can customize instruction and cut down on work time by using this technology. The goal of increasing responsiveness and inclusion is a continuous endeavor. Through increased teacher-student involvement, this Gates Foundation-backed project seeks to level the playing field in education.
- AI-Driven Behavior Change Could Transform Health Care. Thrive AI Health is being funded by OpenAI and Thrive Global to create a customized AI health coach that addresses everyday health-related behaviors like nutrition and sleep. AI’s hyper-personalization powers the mobile app and corporate solution by fusing individual data with peer-reviewed science. The project intends to manage chronic diseases, democratize healthy behavior modification, and show how effectively AI can be integrated into healthcare while maintaining robust privacy protections.
- GraphRAG Analysis, Part 1: How Indexing Elevates Knowledge Graph Performance in RAG. Analysis of Microsoft’s GraphRAG research suggests that knowledge graphs like Neo4j may not significantly beat FAISS in context retrieval for RAG applications. While Neo4j without its indexing can reach a better answer relevancy, the minor advantages may not justify the cost given ROI limits. Neo4j’s indexing, on the other hand, significantly improves answer faithfulness, lowering the possibility of false information.
- How Taiwan secured semiconductor supremacy — and why it won’t give it up. Trump has accused Taiwan of ‘taking’ the US chip sector, but Taipei has been at the forefront of the industry for decades, and its future could depend on it
- Overcoming The Limits Of Current LLMs. Large language models (LLM) have been all the rage for quite some time now. Looking beyond the hype though, they have severe limitations: hallucinations, lack of confidence estimates, and lack of citations.
Meme of the week
What do you think about it? Some news that captured your attention? Let me know in the comments
If you have found this interesting:
You can look for my other articles, and you can also connect or reach me on LinkedIn. Check this repository containing weekly updated ML & AI news. I am open to collaborations and projects and you can reach me on LinkedIn. You can also subscribe for free to get notified when I publish a new story.
Here is the link to my GitHub repository, where I am collecting code and many resources related to machine learning, artificial intelligence, and more.
or you may be interested in one of my recent articles: