WEEKLY AI NEWS: RESEARCH, NEWS, RESOURCES, AND PERSPECTIVES

AI & ML news: Week 8–14 July

Apple M5 chip, Google and META new models, xAI ends the deal with Oracle, and much more

Salvatore Raieli
17 min readJul 15, 2024
Photo by Annie Spratt on Unsplash

The most interesting news, repository, articles, and resources of the week

Check and star this repository where the news will be collected and indexed:

You will find the news first in GitHub. Single posts are also collected here:

Weekly AI and ML news - each week the best of the field

44 stories

Research

https://arxiv.org/pdf/2402.14905
https://www.nytimes.com/2024/07/04/technology/openai-hack.html?unlocked_article_code=1.4k0.8GOs.WFzxVAjkpQLt&smid=url-share
  • APIGen: Automated Pipeline for Generating Verifiable and Diverse Function-Calling Datasets. A dataset with 60K entries is also released to aid in research on function-calling-enabled agents. APIGen — presents an automated data generation pipeline to synthesize high-quality datasets for function-calling applications; demonstrates that 7B models trained on curated datasets outperform GPT-4 models and other state-of-the-art models on the Berkeley Function-Calling Benchmark.
  • Searching for Best Practices in Retrieval-Augmented Generation. Looking for Best Practices in RAG outlines best practices for creating efficient RAG workflows and suggests performance- and efficiency-focused tactics, such as newly developed multimodal retrieval tools.
  • Self-Evaluation as a Defense Against Adversarial Attacks on LLMs. The article “Self-Evaluation as a Defense Against Adversarial Attacks on LLMs” suggests using self-evaluation as a defense against adversarial attacks. It demonstrates that developing a dedicated evaluator can significantly lower the success rate of attacks and uses a pre-trained LLM to build a defense that is more effective than fine-tuned models, dedicated safety LLMs, and enterprise moderation APIs. The article evaluates various settings, such as attacks on the generator alone and the generator + evaluator combined.
https://arxiv.org/pdf/2407.02911v1
  • Adaptable Logical Control for Large Language Models. The Ctrl-G framework, which combines LLMs and Hidden Markow Models to enable the following logical constraints (represented as deterministic finite automata), is presented in Adaptable Logical Control for LLMs. Ctrl-G achieves over 30% higher satisfaction rate in human evaluation compared to GPT4.
  • LLM See, LLM Do: Guiding Data Generation to Target Non-Differentiable Objectives. In LLM See, LLM Do, the effectiveness and effects of synthetic data are examined in detail, along with how they affect a model’s internal biases, calibration, attributes, and preferences. It is discovered that LLMs are sensitive to certain attributes even when the prompts from the synthetic data seem neutral, indicating that it is possible to influence the generation profiles of models to reflect desirable attributes.
https://github.com/cloneofsimo/minRF
https://arxiv.org/pdf/2407.02918v1
https://fun-audio-llm.github.io/

News

https://arxiv.org/pdf/2406.18518
  • Smart Paste for context-aware adjustments to pasted code. We present Smart Paste, an internal tool that streamlines the code authoring workflow by automating adjustments to pasted code. We describe key insights from our UX and model preparation efforts, which have led to high performance and successful adoption among Google developers.
  • Apple M5 Chip’s Dual-Use Design Will Power Future Macs and AI Servers. Apple will reportedly use a more advanced SoIC packaging technology for its M5 chips, as part of a two-pronged strategy to meet its growing need for silicon that can power consumer Macs and enhance the performance of its data centers and future AI tools that rely on the cloud.
https://research.google/blog/smart-paste-for-context-aware-adjustments-to-pasted-code/
https://arxiv.org/pdf/2407.06135
  • Meta drops AI bombshell: Multi-token prediction models now open for research. Meta has thrown down the gauntlet in the race for more efficient artificial intelligence. The tech giant released pre-trained models on Wednesday that leverage a novel multi-token prediction approach, potentially changing how large language models (LLMs) are developed and deployed
  • .Google DeepMind’s AI Rat Brains Could Make Robots Scurry Like the Real Thing. In order to investigate the brain circuits underlying complicated motor skills, DeepMind and Harvard University created a virtual rat using artificial intelligence (AI) neural networks trained on real rat motions and neural patterns. With its ability to transfer acquired movement skills to other settings, this bio-inspired AI could advance robotics and provide new insights into brain function. The study shows that brain activity associated with various behaviors may be accurately mimicked and decoded by digital simulations.
  • Microsoft drops observer seat on OpenAI board amid regulator scrutiny. The startup’s new approach means Apple will no longer be able to appoint an executive to a similar role
https://arxiv.org/pdf/2407.01219
https://arxiv.org/pdf/2407.03234
https://openai.com/index/openai-and-los-alamos-national-laboratory-work-together/
  • Here’s how Qualcomm’s new laptop chips really stack up to Apple, Intel, and AMD. The Snapdragon X Elite and X Plus chips from Qualcomm are making Windows on Arm a competitive platform, roughly matching the performance and battery life of AMD Ryzen, Apple’s M3 chip, and Intel Core Ultra. The Snapdragon chips are excellent in multi-core scores and power economy, even though they don’t lead in GPU performance. The latest generation of laptops with Snapdragon processors is a more affordable option than MacBooks and conventional Intel or AMD-based devices.
https://arxiv.org/pdf/2406.13892

Resources

https://arxiv.org/pdf/2407.04604v1
  • Quality Prompts. QualityPrompts implements 58 prompting techniques explained in this survey from OpenAI, Microsoft, et al.
  • Summary of a Haystack: A Challenge to Long-Context LLMs and RAG Systems. Describes a new job, SummHay, to evaluate a model’s capacity to process a Haystack and produce a summary that highlights the key insights and references the original documents; finds that RAG components are found to improve performance on the benchmark, making it a feasible choice for holistic RAG evaluation. Long-context LLMs score 20% on the benchmark, which lags the human performance estimate of 56%.
  • AI Agents That Matter. AI Agents That Matter examines existing agent evaluation procedures and identifies flaws that could prevent practical deployment; it also suggests a framework to prevent overfitting agents and an implementation that simultaneously maximizes accuracy and cost.
  • An Extremely Opinionated Annotated List of My Favourite Mechanistic Interpretability Papers v2. A post by Neel Nanda, a Research Engineer at Google DeepMind, about his favorite papers to read in Mechanistic Interpretability.
  • SAE. This library trains k-sparse autoencoders (SAEs) on the residual stream activations of HuggingFace language models, roughly following the recipe detailed in Scaling and evaluating sparse autoencoders (Gao et al. 2024)
  • MInference. To speed up Long-context LLMs’ inference, approximate and dynamic sparse calculate the attention, which reduces inference latency by up to 10x for pre-filling on an A100 while maintaining accuracy.
https://arxiv.org/pdf/2407.04363v1
  • micro-agent. An AI agent that writes and fixes code for you.
  • AnySR. A novel method for improving efficiency and scalability in single-image super-resolution (SISR) is called AnySR. The ‘Any-Scale, Any-Resource’ implementation is supported by AnySR, in contrast to previous techniques, which reduces resource requirements at smaller scales without the need for extra parameters.
  • Unsupervised Learning of Category-Level 3D Pose from Object-Centric Videos. Without human supervision, researchers have created a novel method for estimating category-level 3D poses from informal, object-centric films.
https://huggingface.co/Kwai-Kolors/Kolors
  • SenseVoice. a speech foundation model that possesses a variety of speech understanding functions, such as auditory event detection, spoken language identification, automatic speech recognition, and speech emotion recognition.
  • Boosting Large Vision Language Models with Self-Training. A novel method called Video Self-Training with Augmented Reasoning (Video-STaR) aims to enhance Large Vision Language Models (LVLMs).
  • GraphRAG. With GraphRAG, you may use language models to analyze unstructured text. The quick start is simple to spin up because it operates on Azure.
  • iLLM-TSC. To enhance traffic signal control systems, researchers have created a novel framework that blends reinforcement learning with a sizable language model.
https://cdn-uploads.huggingface.co/production/uploads/6200d0a443eb0913fa2df7cc/NyhBs_gzg40iwL995DO9L.png
https://github.com/sarthakrastogi/quality-prompts
https://arxiv.org/pdf/2407.06190v1
  • Paints-Undo. Paints UNDO is a system where a model generates strokes that are used to reconstruct an image. It comes from the same creators as ControlNet, IC-Light, and many other image production systems. Remarkably, in contrast to earlier stroke systems, this model is able to cancel strokes and frequently completely reevaluates its strategy halfway through — quite like a human artist would.
  • minRF. For Stable Diffusion 3, scalable rectified flow transformers are partially utilized. This repository contains sweeps of the muP hyperparameters along with a rudimentary implementation of them.
  • RouteLLM. RouteLLM is a framework for serving and evaluating LLM routers
https://arxiv.org/pdf/2407.01370
https://arxiv.org/pdf/2407.07726

Perspectives

https://github.com/microsoft/MInference
  • Superintelligence — 10 years later. Ten years after the publication of Nick Bostrom’s seminal book “Superintelligence,” advances in AI have raised awareness of the potential for AGI and its associated concerns. With 2024 being a turning point toward guaranteeing control and alignment with human values, the AI research community is now giving AI safety serious attention. With AI technologies advancing so quickly, the sector faces concerns related to safety and ethics that were previously thought to be theoretical.
https://haoosz.github.io/ConceptExpress/
  • How Good Is ChatGPT at Coding, Really? Depending on the task difficulty and programming language, OpenAI’s ChatGPT may generate code with success rates anywhere from less than 1% to 89%.
  • TechScape: Can AI really help fix a healthcare system in crisis? Artificial intelligence is heralded as helping the NHS fight cancer. But some warn it’s a distraction from more urgent challenges
  • Pop Culture. In a critical 31-page analysis titled “Gen AI: Too Much Spend, Too Little Benefit?”, Goldman Sachs makes the case that utility spending would rise sharply due to generative AI’s power consumption and very little productivity advantages and returns. The study raises concerns about AI’s potential to completely change industries by highlighting its high price, problems with the electrical infrastructure, and inability to produce appreciable increases in productivity or revenue. If significant advancements in technology are not made, it could portend a dismal future for the field.
https://github.com/BuilderIO/micro-agent
  • The AI summer. Compared to other tech innovations like the iPhone and e-commerce, which took years to acquire hold, ChatGPT’s quick adoption — it hit 100 million users in just two months — is noteworthy. Even with the initial excitement, not many users have found ChatGPT to be useful in the long run, and business adoption of big language models is still few. This suggests that more work is necessary to establish substantial product-market fit and long-term value.
  • A Deep Dive on AI Inference Startups. The development of AI’s “picks and shovels,” such as model fine-tuning, observability, and inference, is a well-liked field for venture capital investment. VCs are placing bets that when businesses integrate AI into their products, they won’t want to develop things themselves. For AI inference, the TAM is highly limited. For VCs’ investments to be profitable, they must have faith in significant TAM expansion. Although platforms for AI inference benefit startups in the short run, over the long run, they hurt them.
https://github.com/orrzohar/Video-STaR
  • Cyclists can’t decide whether to fear or love self-driving cars. San Francisco cyclists have reported near misses and safety concerns with self-driving cars from Waymo and Cruise. Almost 200 complaints about these self-driving cars’ unpredictable behavior and near-misses have been filed with the California DMV. Despite the manufacturers’ claims that their cars had improved safety features, the events cast doubt on the vehicles’ suitability for widespread use in the face of heightened regulatory scrutiny.
  • Augmenting Intelligence. This essay promotes a practical approach to employing AI as an enhancement to human intelligence and explores bridging the divide between techno-optimists and pessimists on the subject. It discusses AI’s role in education, its effects on creativity and the arts, and its ethical application. The paper highlights that artificial intelligence (AI) is a tool that augments human capabilities rather than poses a threat, suggesting that the term “augmented intelligence” is a more realistic description.

Meme of the week

What do you think about it? Some news that captured your attention? Let me know in the comments

If you have found this interesting:

You can look for my other articles, and you can also connect or reach me on LinkedIn. Check this repository containing weekly updated ML & AI news. I am open to collaborations and projects and you can reach me on LinkedIn. You can also subscribe for free to get notified when I publish a new story.

Here is the link to my GitHub repository, where I am collecting code and many resources related to machine learning, artificial intelligence, and more.

or you may be interested in one of my recent articles:

--

--

Salvatore Raieli

Senior data scientist | about science, machine learning, and AI. Top writer in Artificial Intelligence