WEEKLY AI NEWS: RESEARCH, NEWS, RESOURCES, AND PERSPECTIVES
AI & ML news: Week 5–11 August
Microsoft Says OpenAI is Now a Competitor in AI and Search, Google Broke the Law to Maintain Online Search Monopoly, US Judge Rules, and much more
The most interesting news, repository, articles, and resources of the week
Check and star this repository where the news will be collected and indexed:
You will find the news first in GitHub. Single posts are also collected here:
Research
- Meta-Rewarding Language Models: Self-Improving Alignment with LLM-as-a-Meta-Judge. Using this LLM-as-a-Meta-Judge approach enhances the LLM’s ability to judge and follow instructions; simply self-improvement to produce better responses (act) saturates quickly; this work enhances the LLM’s ability to judge itself (judge) to avoid issues like reward hacking; in addition to the act and judge roles, a third role called meta-judge is used to evaluate the model’s judgments. This approach, known as meta-rewarding LLMs, proposes a self-improving alignment technique (no human supervision) where the LLM judges its judgments and uses the feedback to improve its judgment skills.
- MindSearch: Mimicking Human Minds Elicits Deep AI Searcher. In MindSearch, a multi-agent framework based on LLM is presented for complex web-information seeking and integration tasks. A web planner is utilized to efficiently break down complex queries, while a web searcher performs hierarchical information retrieval on the Internet to enhance the relevance of the retrieved information. An iterative graph construction is employed in the planning component to better model complex problem-solving processes. The multi-agent framework is better suited for handling long context problems by assigning retrieval and reasoning tasks to specialized agents.
- Improving Retrieval Augmented Language Model with Self-Reasoning. Enhanced RAG through Self-Reasoning — utilizes the reasoning trajectories produced by the LLM itself to offer an end-to-end self-reasoning framework that enhances the dependability and traceability of RAG systems. The LLM is utilized to do the following three procedures: This method helps the model be more selective, reason and distinguish relevant and irrelevant documents, thus improving the accuracy of the RAG system as a whole. 1) Relevance-aware: evaluates the relevance between the retrieved documents and the question; 2) Evidence-aware selective: selects and cites relevant documents, and then automatically selects key sentence snippets as evidence from the cited documents; and 3) Trajectory analysis: generates a concise analysis based on all gathered self-reasoning trajectories generated by the preceding 2 processes, and then provides the final inferred answer. Using only 2,000 training examples, the framework outperforms GPT-4. (produced by GPT-4)
- Concise Thoughts: Impact of Output Length on LLM Reasoning and Cost. Constrained-CoT, a model that restricts the reasoning output length without compromising performance, demonstrates that increasing the LLaMA2–70b’s reasoning limit to 100 words increases accuracy on GSM8K from 36.01% (CoT) to 41.07% (CCoT) while lowering the average output length by 28 words.
- ThinK: Thinner Key Cache by Query-Driven Pruning. ThinK focuses on long-context scenarios and inference; it offers a query-dependent KV cache pruning method to minimize attention weight loss while selectively pruning the least important channels. HinK — aims to address inefficiencies in KV cache memory consumption.
- Large Language Monkeys: Scaling Inference Compute with Repeated Sampling. A group of scientists discovered that benchmark performance can be significantly improved at a 3x lower cost than with a larger model if you sample from tiny models regularly, provided that you have adequate coverage and a verification tool.
- Boosting Audio Visual Question Answering via Key Semantic-Aware Cues. A Temporal-Spatial Perception Model (TSPM) has been established by researchers to enhance the capacity to respond to inquiries concerning auditory and visual signals in videos.
- No learning rates needed: Introducing SALSA — Stable Armijo Line Search Adaptation. This work presents enhancements to line search strategies that improve the efficiency of stochastic gradient descent systems.
- Automated Review Generation Method Based on Large Language Models. Utilizing LLMs, researchers have created an automated approach for generating reviews to assist in managing the massive amount of scientific material.
- CLEFT: Language-Image Contrastive Learning with Efficient Large Language Model and Prompt Fine-Tuning. CLEFT is a Contrastive Learning technique meant for medical imaging that aims to overcome the drawbacks of current, resource-intensive CLIP-like methods.
- Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters. To boost model performance, there is a lot of demand to leverage computation at inference time. This essay explores the trade-offs made between various approaches and presents several useful ones. This often suggests a larger trend of getting more performance out of smaller machines.
- An Object is Worth 64x64 Pixels: Generating 3D Object via Image Diffusion. It is easy to utilize a DiT model to generate unique things based on textual inputs by treating 3D objects as UV-wrapped images.
News
- Character.AI CEO Noam Shazeer returns to Google. In a big move, Character.AI co-founder and CEO Noam Shazeer is returning to Google after leaving the company in October 2021 to found the a16z-backed chatbot startup.
- Three New Additions To Gemma 2. Google is expanding the Gemma 2 family of models with the addition of a new 2B parameter model, safety content classifier model, and model interpretability tool.
- Microsoft says OpenAI is now a competitor in AI and search. Microsoft’s annually updated list of competitors now includes OpenAI, a long-term strategic partner. The change comes days after OpenAI announced a prototype of a search engine. Microsoft has reportedly invested $13 billion into OpenAI.
- Introducing GitHub Models. We are launching GitHub Models, enabling our more than 100 million developers to become AI engineers and build industry-leading AI models.
- Reddit CEO says Microsoft needs to pay to search the site. In an interview, Steve Huffman calls out Microsoft’s Bing, Anthropic, and Perplexity for scraping Reddit’s data without permission. ‘It has been a real pain in the ass to block these companies.’
- Elon Musk sues OpenAI again, alleging ‘deceit of Shakespearean proportions’. Tesla CEO alleges his former partners, including CEO Sam Altman, manipulated him into co-founding the company
- Google broke the law to maintain online search monopoly, US judge rules. White House calls decision — that could have major implications for web use — ‘victory for the American people’
- Secretaries of state called on Musk to fix chatbot over election misinformation. X’s Grok AI chatbot falsely told users ‘ballot deadline has passed for several states’
- Groq Raises $640M To Meet Soaring Demand for Fast AI Inference. To address the demand for massive language model inference, Groq, the startup that is developing AI chips with lightning speed, is raising a significant amount of funding.
- Elon Musk sues OpenAI, Sam Altman for making a “fool” out of him. Having promised to keep OpenAI’s technology open-source and prioritize the public good, Elon Musk has revived a lawsuit against the company and its CEO, Sam Altman. He claims that by turning OpenAI into a for-profit venture with ties to Microsoft, they obtained $44 million in seed funding fraudulently, which Musk claims betrays the original mission and has caused irreparable harm to both his interests and the public.
- OpenAI Co-Founders Schulman and Brockman Step Back. John Schulman has joined Anthropic as an independent contributor, while Greg Brockman is enjoying a long holiday.
- Llama 3.1 Impact Grants. Meta has announced a program to award groups using its models for good with $2m to help develop these tools for economically and socially impactful projects.
- BYU engineering research finds key to quicker nuclear power: artificial intelligence. Professor of chemical engineering at BYU Matt Memmott has created an AI algorithm that has the potential to drastically lower costs by ten years in the design and licensing of nuclear reactors. According to his team’s study, AI can solve difficult nuclear design challenges far more quickly than conventional techniques; in one case, the design process was shortened from six months to just two days. The conclusions seek to preserve low electricity costs while meeting growing energy demands by speeding up the development of nuclear power.
- OpenAI tempers expectations with less bombastic, GPT-5-less DevDay this fall. According to OpenAI, this year’s DevDay conference will no longer be a large event but rather a series of smaller, mobile developer sessions that will concentrate on upgrades to developer services and APIs rather than the introduction of a new flagship model.
- Tezi raises $9M to launch Max: the first fully autonomous AI recruiter. To build Max, an AI-driven recruiting agent that conducts hiring procedures from beginning to end on its own, Tezi raised $9 million in seed funding, with the lead investors being 8VC and Audacious Ventures.
- Apple Intelligence rollout timetable won’t delay iPhone 16. Apple Intelligence capabilities will be added to iOS 18 after launch; initial access will be available to iPhone 15 Pro models exclusively in iOS 18.1.
- Figure redesigns its humanoid robot from the ground up for slick new F.02. California-based robotics outfit Figure has today announced its second-generation humanoid robot, which is initially being aimed at production lines in commercial settings, but the company is promising a bipedal butler in our homes shortly.
- Structured Outputs in OpenAI API. It is difficult to request organized output, such as JSON, from language models. With the help of this new functionality in OpenAI’s API, language model creation may produce structured output that deterministic applications downstream can use.
- Meta is reportedly offering millions to use Hollywood voices in AI projects. To obtain broad usage rights across all of its platforms, Meta is negotiating to use the voices of well-known actors like Awkwafina and Judi Dench for its AI digital assistant. If a settlement is reached, the actors may receive millions of dollars in compensation, with SAG-AFTRA protecting likenesses created by AI. The business recently canceled a celebrity voice chatbot project, and now plans to showcase these AI technologies at its Connect conference in September.
- With Smugglers and Front Companies, China Is Skirting American A.I. Bans. A thriving underground market persists despite U.S. sanctions meant to stop the transfer of AI chips to China, facilitating large transactions such as the $103 million purchase using Nvidia processors. In an attempt to get around prohibitions, new businesses are founded, delivery methods are deceitful, and international distribution gaps are exploited. The ongoing illicit commerce has sparked discussions about the efficacy of American export regulations and how they affect US tech companies in comparison to their Chinese rivals.
- Nvidia Blackwell GPUs allegedly delayed due to design flaws — launch expected to be pushed back by three months or more. Microsoft, Meta, Google, and xAI will have to wait a few more months to receive their massive GPU orders.
- OpenAI says it’s taking a ‘deliberate approach’ to releasing tools that can detect writing from ChatGPT. OpenAI has built a tool that could potentially catch students who cheat by asking ChatGPT to write their assignments — but according to The Wall Street Journal, the company is debating whether to release it.
- Zuckerberg touts Meta’s latest video vision AI with Nvidia CEO Jensen Huang. Meta had a palpable hit last year with Segment Anything, a machine learning model that could quickly and reliably identify and outline just about anything in an image. The sequel, which CEO Mark Zuckerberg debuted on stage Monday at SIGGRAPH, takes the model to the video domain, showing how fast the field is moving.
- Gemini intelligence is coming to Google Home. Google Assistant is getting a major upgrade on Nest smart speakers and displays, and Nest cameras will soon be able to tell as well as show, as Google Home gets a powerful AI infusion
- Zuckerberg says Meta will need 10x more computing power to train Llama 4 than Llama 3. Meta, which develops one of the biggest foundational open source large language models, Llama, believes it will need significantly more computing power to train models in the future.
- AMD is becoming an AI chip company, just like Nvidia. AMD’s AI GPU sales just went from a billion dollars cumulatively to a billion dollars quarterly.
- Microsoft Is Losing a Staggering Amount of Money on AI. With an emphasis on data centers for AI capabilities, Microsoft’s spending in AI jumped to $19 billion in the most recent quarter; nevertheless, significant AI revenue is yet unknown.
- Taco Bell’s drive-thru AI might take your next order. Taco Bell’s parent company aims to bring its ‘Voice AI’ technology to hundreds of stores in the US by the end of 2024.
- OpenAI invests in a webcam company turned AI startup. penAI is leading a $60 million funding round for Opal, the same company behind the high-end Tadpole webcam, according to a report from The Information.
- UK regulator to examine $4bn Amazon investment in AI startup Anthropic. Move is the latest of a string of CMA investigations into technology tie-ups
- Hugging Face acquires XetHub. The majority of the data that Hugging Face serves and keeps is in LFS. XetHub has developed a strong substitute for Git repositories’ scalability.
- Humane’s daily returns are outpacing sales. The company is scrambling to stabilize as it hits $1 million in total returns against $9 million in sales.
- GPT-4o System Card. Setting up a voice system can be difficult. The ongoing efforts to guarantee the safety and usefulness of the multimodal paradigm are highlighted in this piece.
- Fully-automatic robot dentist performs world’s first human procedure. In a historic moment for the dental profession, an AI-controlled autonomous robot has performed an entire procedure on a human patient for the first time, about eight times faster than a human dentist could do it.
- Microsoft launches GitHub Models, offering 100 million developers easy access to leading AI tools. Microsoft has introduced “GitHub Models,” a new platform that enables over 100 million developers to integrate AI into their software projects by providing access to a variety of AI models. This includes popular models like Llama 3.1, GPT-4o, and Mistral Large 2, among others. Developers can explore these models for free through a built-in model playground on GitHub, where they can experiment with different prompts and model parameters.
- Google brings Gemini-powered search history and Lens to Chrome desktop. Google Thursday said that it is introducing new Gemini-powered features for Chrome’s desktop version, including Lens for desktop, tab compare for shopping assistance, and natural language integration for search history.
- Apple changes EU App Store rules after commission charges. Change in policy means developers will be able to communicate with customers outside App Store
Resources
- Adaptive Retrieval-Augmented Generation for Conversational Systems. In addition to demonstrating the potential for RAG-based conversational systems to produce high-quality responses and high-generation confidence, Adaptive RAG for Conversations Sytems also develops a gating model that predicts whether a conversational system needs RAG to improve its responses. It further asserts that a correlation can be found between the relevance of the augmented knowledge and the generation’s degree of confidence.
- ShieldGemma: Generative AI Content Moderation Based on Gemma. Based on Gemma 2, ShieldGemma provides a full suite of LLM-based safety content moderation models, including classifiers for major damage categories like toxicity, hate speech, and hazardous content.
- PersonaGym: Evaluating Persona Agents and LLMs. Assessing Persona Agents: This study suggests a standard for assessing persona agent skills in LLMs; it discovers that, while being a somewhat more sophisticated model, Claude 3.5 Sonnet only shows a 2.97% relative improvement in PersonaScore when compared to GPT 3.5.
- The Art of Refusal: A Survey of Abstention in Large Language Models. a review of the approaches currently employed in LLMs to attain rejection; offers measures and benchmarks for evaluation that are used to gauge abstinence in LLMs.
- XHand: Real-time Expressive Hand Avatar. A brand-new hand avatar called XHand is intended for real-time rendering in virtual worlds and video games. In contrast to earlier models, XHand concentrates on producing intricate hand morphologies, looks, and deformities.
- Prompt Poet. Millions of talks are served by Character AI’s prompted construction library, which is made available to the public.
- NAVIX: minigrid in JAX. A popular testing bed for RL has been accelerated in JAX.
- SynthVLM: High-Efficiency and High-Quality Synthetic Data for Vision Language Models. A novel data synthesis pipeline for Vision Large Language Models (VLLMs) is called SynthVLM. Rather than captioning photos directly, SynthVLM leverages sophisticated diffusion models to produce high-resolution images from captions.
- Networks that compress themselves. You can train a more accurate, self-quantized model that gets smaller by integrating the network’s size in the loss function.
- Video Tracking with Language Embeddings. A novel technique that leverages language embeddings to enhance point tracking in lengthy video sequences has been developed by researchers.
- Boosting Efficiency in Vision-Language Model Training. This effort addresses the imbalance brought about by different data distributions and model architectures by introducing a technique to balance computational burdens during large-scale 3D simultaneous training of vision-language models.
- TexGen: Text-Guided 3D Texture Generation with Multi-view Sampling and Resampling. High-quality generation of textures on 3d models with diffusion.
- MeshAnything V2: Artist-Created Mesh Generation with Adjacent Mesh Tokenization. This work uses textual, 2D, or 3D input to create artistic meshes. To sample effectively, it takes advantage of neighboring tokens and enhancements to the vertex representation.
- CogVideo. A text-to-video model available for free that performs nearly as well as closed video creation technologies.
- MiniCPM-V. Amazing vision language model with near real-time performance. It performs better on certain benchmarks than closed models.
- RecDiffusion: Rectangle for Image Stitching with Diffusion Models. RecDiffusion is a framework that improves the aesthetic appeal of stitched photos without requiring any cropping or distortion.
- LLaVA-OneVision: Easy Visual Task Transfer. In visual language models, there has been an effort to make them versatile and easy to tune. This reminds me of computer vision from ten years ago. Crucially, LLaVA-OneVision demonstrates how meticulous data curation and architecture upgrades may do this.
- ABC Invariance. To migrate your hyperparameters from smaller to larger models, use muP. A fantastic theorem that says you may vary where you scale model outputs and it won’t affect ultimate transfer performance is demonstrated in practice in this GitHub gist.
- XLabs-AI/flux-controlnet-canny. XLabs has released the first Flux-Dev control net which allows for generation conditioned on Canny image inputs.
- HARMONIC: Harnessing LLMs for Tabular Data Synthesis and Privacy Protection. A framework called HARMONIC is used to create and assess synthetic tabular data by utilizing big language models.
- Introducing Qwen2-Math. A 72B math model developed by the Qwen team beats all other open and closed models on MATH. Additionally, it beats Llama-3.1–405B on some measures related to reasoning. Only English is available at this time; multilingual models will be available soon.
- SAM2-PATH: A better segment anything model for semantic segmentation in digital pathology. A novel approach called SAM2-PATH aims to improve semantic segmentation in digital pathology.
- Speech-MASSIVE: A Multilingual Speech Dataset for SLU and Beyond. A new multilingual Spoken Language Understanding (SLU) dataset is called Speech-MASSIVE. It provides an analogous speech-based corpus to the massive text corpus.
- PyTorch FlexAttention. A new API from PyTorch makes it possible to design and compile any kind of attention variant to Triton. Better portability, performance, and research velocity on attention types are made possible by this.
- A Language Model with Quick Pre-Training. The “1.5-Pints” Language Model offers a novel method for pre-training that is compute-efficient. This model outperforms Apple’s OpenELM and Microsoft’s Phi in instruction-following tasks, as determined by MT-Bench, by curating a high-quality dataset of 57 billion tokens.
- lighthouse. Lighthouse is a user-friendly library for reproducible and accessible research on video moment retrieval (MR) and highlight detection (HD). It supports six VMR-HD models, three features, and five datasets for reproducible VMR-HD.
Perspectives
- existential risk probabilities are too unreliable to inform policy. The use of AI existential risk probability estimates for policymaking is criticized in this essay, which contends that these estimates are excessively erratic and lack a strong inductive or deductive foundation, frequently approximating educated guesses rather than fact-based projections. The authors argue against the validity of using these projections to inform public policy, particularly when they are connected to expensive or restricting measures, and they support an evidence-based strategy that takes AI development uncertainty into account. They advise against utilizing speculative existential risk probability in high-impact decisions and instead suggest concentrating on specified AI milestones for more significant policy choices.
- Is AI judging the future of gymnastics or just a surveillance tool? To provide more equitable and transparent scoring, the International Gymnastics Federation (FIG) and Fujitsu have partnered to provide an AI-assisted judging support system at the World Gymnastics Championships. With room for future development and wider uses, the Judging Support System (JSS), which will not take the place of judges, provides 3D model-based second views in challenging cases and inquiry disagreements. The JSS may improve scoring accuracy and consistency, which is important in a sport where even small point variations have a significant impact on standings and players’ careers, despite worries that it may replace human judgment.
- Why AI’s Tom Cruise problem means it is ‘doomed to fail’. LLMs’ ‘reversal curse’ leads it to fail at drawing relationships between simple facts. It’s a problem that could prove fatal
- Sound clashes are a thrilling reggae tradition. Will AI ruin them?The use of fake AI vocals — including those of Donald Trump — is sending shockwaves through this historic scene. At a Montego Bay clash, performers debate their culture’s future
- Replacing my Right Hand with AI. While riding a bike, an anthropic scientist broke their hand. They continued to be incredibly productive by leaning into Claude and his voice.
- TPU transformation: A look back at 10 years of our AI-specialized chips. Because it has invested in bespoke TPU chips, Google is one of the only companies training massive models without being dependent on Nvidia.
- I’m Switching Into AI Safety. Alex Irpan left Google’s robotics team after eight years to join Google DeepMind’s AI safety team. His move was motivated by a personal desire to address safety concerns as AI systems get closer to being superhuman. Though the area is difficult and fraught with controversy, they voice concerns about the effectiveness of present AI safety measures, the growing risks of unmanaged AI growth, and their dedication to contributing to AI safety.
- As Regulators Close In, Nvidia Scrambles for a Response. With a 90 percent share of the A.I. chip market, the company is facing antitrust investigations into the possibility that it could lock in customers or hurt competitors.
- How GitHub harnesses AI to transform customer feedback into action. GitHub is using AI and machine learning to compile and evaluate user input at scale, providing useful insights that drive feature prioritization and product enhancements. This automated method improves responsiveness to developer needs by facilitating the collection of multilingual input and promoting data-driven decision-making. The project demonstrates GitHub’s dedication to utilizing AI to uphold a developer-centric approach to product development.
- How Does OpenAI Survive? The paper expresses a strong doubt regarding the sustainability of OpenAI, given the exorbitant costs associated with constructing and maintaining huge language models, as well as the absence of broad business utility for generative AI. The long-term sustainability of OpenAI is questioned by the author in the absence of substantial technology advancements or persistent, extraordinary fundraising efforts. Even though OpenAI has had a significant impact on the AI sector, the business still has issues with profitability, high operational burn rates, and a reliance on key alliances, most notably Microsoft.
- How neurons make a memory. Loosely packaged DNA might make these nerve cells better able to encode memories.
- DeepMind hits milestone in solving maths problems — AI’s next grand challenge. AlphaProof showed its prowess on questions from this year’s Mathematical Olympiad — a step in the race to create substantial proofs with artificial intelligence.
- Dirty talk: how AI is being used in the bedroom — and beyond. Analysis of more than 200,000 chatbot conversations shows how the new tech is actually being used. Turns out quite a lot of it is ‘racy role play’
- Scientists are falling victim to deepfake AI video scams — here’s how to fight back. Cybercriminals are increasingly singling out researchers, alongside politicians and celebrities. Targeted scientists share tips on how to silence them.
- What lies beneath: the growing threat to the hidden network of cables that power the internet. Last month large parts of Tonga were left without internet when an undersea cable was broken. It’s a scenario that is far more common than is understood
- Why AI hasn’t shown up in the GDP statistics yet. Even though LLMs have made remarkable strides in handling complicated tasks, they are still unable to reliably complete activities at a scale comparable to that of humans. As a result, their current potential as direct human substitutes in processes is limited. LLMs require comprehensive prompt engineering and iteration to reach acceptable accuracy. The latest JSON output control and cost reduction enhancements from OpenAI may help with certain problems, but the subtle integration needed for LLMs in corporate settings points to gradual productivity increases rather than a sudden economic revolution.
- AI Is Coming for India’s Famous Tech Hub. AI integration is posing a danger to employment, particularly in routine operations like contact centers, which has caused a sea change in India’s technology outsourcing sector. While recruiting is slowing down, companies are finding it difficult to move up the value chain. However, some are optimistic that AI technologies may open up new opportunities in fields like programming. Higher-order cognitive abilities will be necessary in the sector going forward as automation continues to reshape traditional employment.
- Inside the company that gathers ‘human data’ for every major AI company. Advances in AI pre-training have made it possible for models to handle large amounts of online data and supervised fine-tuning with specialists afterward aids in the models’ ability to become more specialized and general. The goal of Turing’s method is to improve AI reasoning capabilities by leveraging “input and output pairs” created by subject-matter experts. These models, foreseeing the “agentic” future of artificial intelligence, might integrate specialized knowledge across areas to accomplish complicated tasks independently.
Meme of the week
What do you think about it? Some news that captured your attention? Let me know in the comments
If you have found this interesting:
You can look for my other articles, and you can also connect or reach me on LinkedIn. Check this repository containing weekly updated ML & AI news. I am open to collaborations and projects and you can reach me on LinkedIn. You can also subscribe for free to get notified when I publish a new story.
Here is the link to my GitHub repository, where I am collecting code and many resources related to machine learning, artificial intelligence, and more.
or you may be interested in one of my recent articles: