WEEKLY AI NEWS: RESEARCH, NEWS, RESOURCES, AND PERSPECTIVES
AI & ML news: Week 23–29 September
Google CEO Sundar Pichai announces $120M fund for global AI education, Mira Murati is leaving OpenAI, Salesforce Ventures ups its AI fund to $1B, and much more
The most interesting news, repository, articles, and resources of the week
Check and star this repository where the news will be collected and indexed:
You will find the news first in GitHub. All the Weekly News stories are also collected here:
Research
- Moshi: a speech-text foundation model for real-time dialogue. presents a full-duplex spoken dialogue framework and a speech-text basis paradigm; they also present several system components; Helium is a 7B parameter text LLM; Mimi is a semantic-acoustic neural audio code that achieves cutting-edge audio quality performance; and a hierarchical multi-stream architecture that can produce speech-to-speech from any given dialog.
- Training Language Models to Self-Correct via Reinforcement Learning. creates a multi-turn online reinforcement learning system that is fully based on self-generated data in order to enhance an LLM’s ability to self-correct; It is demonstrated that SFT has a distribution mismatch between training data and model responses and is inefficient at learning self-correction; suggests a two-stage method that, when applied to the Gemini 1.0 Pro and 1.5 Flash models, achieves state-of-the-art self-correction performance, improving the base models’ self-correction by 15.6% and 9.1%, respectively, on the MATH and HumanEval benchmarks. The first stage of the method optimizes correction behavior, and the second uses a reward bonus to amplify self-correction during training.
- On the Diagram of Thought. strengthens LLMs’ capacity for reasoning through rigorous mathematics; DAT represents iterative reasoning in LLM as the building of a directed acyclic graph; it combines propositions, criticisms, refinement, and verification into a single DAG structure; this enables DoT to capture sophisticated logical deduction that is beyond the scope of linear or tree-based methods
- To CoT or not to CoT? Chain-of-thought helps mainly on math and symbolic reasoning. examines which tasks benefit most from chain-of-thought (CoT) prompting; following a meta-analysis of over 100 papers and multiple evaluations, it concludes that CoT leads to significant performance gains, mostly on math and logic tasks; the majority of the CoT gain is derived from improving symbolic execution, although a symbolic solver performs better than it.
- A Comprehensive Evaluation of Quantized Instruction-Tuned Large Language Models: An Experimental Analysis up to 405B. examines how instruction-tuned LLMs perform on models ranging from 7B to 405B using different quantization techniques. The main conclusions are that: 1) one should quantize a larger LLM to a similar size because a smaller FP16 LLM typically performs better across most benchmarks; 2) performance varies significantly with different quantization techniques, model size, and bit-width, with weight-only methods frequently producing better results in larger models; and 3) task difficulty does not significantly impact accuracy degradation due to quantization.
- Iteration of Thought: Leveraging Inner Dialogue for Autonomous Large Language Model Reasoning. uses an inner dialogue agent to act as a guide to dynamically adjust reasoning paths, allowing adaptive cross-path exploration and improving response accuracy. This makes it different from CoT and ToT, which are both rigid processes, in that its prompt generation is a dynamic process that allows it to adapt. suggests the Iteration of Thought (IoT) framework to improve the LLM responses and reasoning capabilities with adaptive reasoning paths.
- Schrodinger’s Memory: Large Language Models. utilizes the Universal Approximation Theorem to describe how LLMs store memory. Additionally, it suggests a novel method for assessing LLM performance by contrasting the memory capacities of various models; the Transformer architecture serves as a dynamic fitting UAT model with a high degree of adaptability in fitting inputs, allowing LLMs to recall the entirety of the content with the least amount of input data.
- Jailbreaking Large Language Models with Symbolic Mathematics. generates mathematically encoded prompts using GPT-4o, which is a useful jailbreaking strategy; the average attack success rate over 13 state-of-the-art is 73.6%. This indicates that current safety training systems are not able to generalize to mathematically encoded inputs.
- Iterative Object Count Optimization for Text-to-image Diffusion Models. Generating a specific number of objects with a diffusion model is often a difficult task. This work introduces a counting token that enables the model to more accurately produce either a few or many instances of a given object. While it’s not flawless and is based on the original stable diffusion model, it significantly outperforms existing methods.
- A Controlled Study on Long Context Extension and Generalization in LLMs. Researchers have created a standardized evaluation protocol designed to compare different methods for extending language models to effectively handle long document contexts.MAgICoRe: Multi-Agent, Iterative, Coarse-to-Fine Refinement for Reasoning.MAgICoRe is a novel strategy designed to enhance reasoning in large language models by tackling challenges in refinement processes. It classifies problems based on difficulty, applying straightforward strategies to simpler tasks and employing multi-agent iterative refinement for more complex ones.
- The Impact of Element Ordering on LM Agent Performance. The sequence in which UI elements are displayed greatly affects agent performance in virtual environments. Randomizing the order of elements can decrease performance as much as completely removing all visible text.
- Larger and more instructable language models become less reliable. Scaling up and shaping up large language models increased their tendency to provide sensible yet incorrect answers at difficulty levels humans cannot supervise, highlighting the need for a fundamental shift in artificial intelligence design towards reliability.
- SwiftDossier: Tailored Automatic Dossier for Drug Discovery with LLMs and Agents. This work addresses the limitations of LLMs in drug discovery by integrating an advanced Retrieval-Augmented Generation (RAG) system for more accurate answers and combining LLMs with external tools to create an automatic target dossier. The result is a production-ready dossier with comprehensive data, summarized into a PDF and PowerPoint presentation.
- Self-Explainable AI. In the field of explainable AI, there is a strong focus on developing self-explainable models, which offer a more principled approach compared to post-hoc methods that attempt to interpret decisions after they have been made by opaque models. Despite its potential, this line of research often faces challenges such as lack of reproducibility, difficulties in comparison, and inconsistent standards. To address these issues, we introduce CaBRNet, an open-source, modular, and backward-compatible framework for Case-Based Reasoning Networks
News
- Google CEO Sundar Pichai announces $120M fund for global AI education. Speaking Saturday at the UN Summit of the Future, Google CEO Sundar Pichai described AI as “the most transformative technology yet” and announced a new fund for AI education and training around the world.
- Driver Distractions ‘Exceedingly High’ When Using Partial Automation Systems: IIHS. According to the IIHS, once advanced driver-assistance systems come into play, drivers become less involved in driving and more distracted. Hands-on or hands-free, the level of automation doesn’t matter.
- wordfreq will not be updated. The wordfreq data is a snapshot of language that could be found in various online sources up through 2021. Generative AI has polluted the data
- Drones carrying fireworks: why the world’s most famous gunpowder artist is collaborating with AI. For his explosion event in Los Angeles, Cai Guo-Qiang built his own version of ChatGPT and employed a drone army to answer the question: what is the fate of humanity and AI?
- AI could lead to inconsistent outcomes in home surveillance. Researchers find large language models make inconsistent decisions about whether to call the police when analyzing surveillance videos.
- Arcade Announces First-Ever AI Product Creation Platform. Arcade is a new platform where users can go from prompt to product.
- Salesforce Taps Nvidia to Develop AI-Powered Avatars. Salesforce and Nvidia are partnering to develop advanced artificial intelligence capabilities aimed at delivering new insights and enhancing productivity for teams utilizing Salesforce’s platform.
- Introducing the OpenAI Academy. OpenAI is launching a program aimed at expanding AI knowledge access in low and middle-income countries. Additionally, it has professionally translated the MMLU, a standard reasoning benchmark, into 15 different languages.
- China’s Alibaba launches over 100 new open-source AI models, releases text-to-video generation tool. Alibaba has introduced over 100 open-source AI models, bolstering its technology to stay competitive with its rivals. The latest Qwen 2.5 models, improved in areas like math and coding, cater to various applications, including automobiles and gaming. Additionally, Alibaba has unveiled a new proprietary model, Qwen-Max 2.5, along with a text-to-video tool to enhance its AI and cloud service offerings.
- Apple Intelligence Features Expected to Roll Out in This Order Between iOS 18.1 and iOS 18.4. Apple’s iOS 18.1 will debut significant AI features, including an improved Siri, generative AI tools within Photos, and ChatGPT integration. In iOS 18.2, these capabilities will be expanded with localized support across various English-speaking countries, alongside the introduction of Image Playground and Genmoji. Upcoming updates, like iOS 18.4, will further personalize Siri and add support for additional languages.
- Microsoft updates its AI suite with more agents and Copilots. Microsoft is enhancing its generative AI suite by introducing automated agents, expanding the capabilities of its Copilot assistants, and launching a new tool that enables multiple workers to collaboratively engage with artificial intelligence.
- Sam Altman leaves OpenAI board’s safety and security committee. OpenAI announced that CEO Sam Altman is stepping down from the board’s safety and security committee, which will now consist entirely of independent board members.
- Silicon Valley billionaire Vinod Khosla says AI will handle 80% of work in 80% of jobs. Yet another Silicon Valley billionaire has just predicted that most jobs will be replaced by AI — whether you work on a farm or in sales.
- Hollywood is coming out in force for California’s AI safety bill. Hollywood is squaring off against Silicon Valley in the battle over SB 1047, California’s first-of-its-kind AI safety bill. Amid doubts about whether Governor Gavin Newsom will sign the legislation, a wave of star-studded endorsements mark the first organized celebrity effort to advance AI regulations beyond the direct interests of the entertainment industry.
- OpenAI rolls out Advanced Voice Mode with more voices and a new look. OpenAI announced it is rolling out Advanced Voice Mode (AVM) to an expanded set of ChatGPT’s paying customers on Tuesday. The audio feature, which makes ChatGPT more natural to speak with, will initially roll out to customers in ChatGPT’s Plus and Teams tiers. Enterprise and Edu customers will start receiving access next week.
- OpenAI CEO Sam Altman declares we could have superintelligence ‘in a few thousand days’. OpenAI CEO Sam Altman has declared that humanity is on the brink of a superintelligence revolution, and that “In the next couple of decades, we will be able to do things that would have seemed like magic to our grandparents.”
- Google says generative AI is ready to do real work. Google is holding a “Gemini at Work” event Tuesday to convince businesses that its generative AI is better than offerings from Microsoft and OpenAI. The largely virtual event comes amid a flurry of claims from tech providers and growing skepticism that genAI is ready for broad use beyond coding and customer support.
- Google, Volkswagen partner on smartphone AI assistant. Google is providing key capabilities for an artificial intelligence assistant for Volkswagen drivers in a smartphone app, part of Google’s strategy to win business by offering tools to build enterprise AI applications.
- Will AI replace programmers? Don’t count on it, says Google’s CEO. the CEO of Google and its owner company, Alphabet, believes that AI won’t be replacing programmers — instead, it’ll actually help more people become coders than ever before.
- Cloudflare’s new AI Audit tool aims to give content creators better bot controls. Don’t want your work ripped off by OpenAI, Meta AI, and Google Gemini? If your work is on a website you control, Cloudflare’s AI Audit tool may help. Here’s how to try it.
- James Cameron, Academy Award-Winning Filmmaker, Joins Stability AI Board of Directors. Renowned filmmaker James Cameron has joined the board of generative media company Stability AI to help steer its shift toward visual storytelling.
- Updated Gemini models, reduced 1.5 Pro pricing, increased rate limits. Google’s Gemini models have seen a significant cost reduction, an expanded context length of up to 2 million tokens, and overall performance enhancements. An intriguing detail is the noticeable jump in cost after reaching 128k tokens.
- Llama 3.2: multimodal. Meta has introduced a new series of Llama models with vision capabilities, including versions with 1 billion and 3 billion parameters, as well as several additional multimodal models.
- OpenAI CTO Mira Murati is leaving. wo other company leaders are also out in what CEO Sam Altman calls an “abrupt” reorganization.
- OpenAI staffers reportedly ‘taken aback’ by ‘ominous’ logo rebranding. OpenAI is set to rebrand in 2024 with a new logo that employees felt lacked creativity. Alongside this change, the company is transitioning from a non-profit to a for-profit model. The rebranding effort is intended to strengthen its identity as OpenAI gains greater recognition.
- Accelerating particle size distribution estimation. MIT researchers have accelerated a new AI-based estimator for medication manufacturing, achieving a 60-fold increase in speed.
- Apple Intelligence will support German, Italian, Korean, Portuguese, and Vietnamese in 2025. Apple announced Wednesday that its generative AI offering will be available in even more languages in 2025. Additions to Apple Intelligence include English (India), English (Singapore), German, Italian, Korean, Portuguese, Vietnamese, and “others” yet to be announced.
- Salesforce Ventures ups its AI fund to $1B, doubling it again. Salesforce Ventures just announced a new $500 million fund dedicated to AI companies. This is significant for several reasons. First, in June 2023, Salesforce Ventures doubled its AI fund from $250 to $500, so the additional $500 million brings the AI fund to $1 billion. This compares to $5 billion total deployed in its first 15 years, since its 2009 launch.
- LinkedIn scraped user data for training before updating its terms of service. LinkedIn may have trained AI models on user data without updating its terms. LinkedIn users in the U.S. — but not the EU, EEA, or Switzerland, likely due to those regions’ data privacy rules — have an opt-out toggle in their settings screen disclosing that LinkedIn scrapes personal data to train “content creation AI models.” The toggle isn’t new. But, as first reported by 404 Media, LinkedIn initially didn’t refresh its privacy policy to reflect the data use.
- Tokyo Game Show showcases latest AI tech in games amid labor shortage. The Tokyo Game Show kicked off Thursday with a special area showcasing the latest artificial intelligence technology to help develop video games, as the industry grapples with a chronic labor shortage.
- OpenAI to remove non-profit control and give Sam Altman equity. OpenAI plots to restructure into for-profit benefit corporation. Non-profit board no longer controls for-profit when done. CEO Sam Altman to receive equity in OpenAI for the first time
- Amazon launches Amelia, a generative AI-powered assistant for third-party sellers. Amazon has introduced Project Amelia, a generative AI assistant designed for independent sellers on its platform. Developed using Amazon’s Bedrock, Amelia provides personalized insights, sales data, and operational support to boost seller productivity. Currently in beta for select U.S. sellers, it is set to roll out to more users and countries in the near future.
- YouTube Shorts to integrate Veo, Google’s AI video model . The company announced that it is integrating Google DeepMind’s AI video generation model, Veo, into YouTube Shorts, letting creators generate high-quality backgrounds as well as six-second clips.
- AI tool cuts unexpected deaths in hospital by 26%, Canadian study finds. St. Michael’s Hospital’s AI-driven early warning system, Chartwatch, has been shown to reduce unexpected patient deaths by 26% in a recent study.
- Amazon releases a video generator — but only for ads. Like its rival, Google, Amazon has launched an AI-powered video generator — but it’s only for advertisers at the moment, and somewhat limited in what it can do.
- Archaeologists use AI to discover 303 unknown geoglyphs near Nazca Lines. Newly discovered figures dating back to 200BCE nearly double the number of known geoglyphs at enigmatic site
- OpenAI’s chief research officer has left following CTO Mira Murati’s exit. OpenAI’s chief research officer, Bob McGrew, and a research VP, Barret Zoph, left the company on Wednesday, hours after OpenAI CTO Mira Murati announced she would be departing.
- NotebookLM adds audio and YouTube support, plus easier sharing of Audio Overviews. NotebookLM now has the capability to extract information from audio and video sources and offers enhanced sharing options for audio artifacts.
- Vultr Cloud Alliance: High-Performance AI and HPC with AMD and Vultr. AMD has partnered with Vultr to integrate AMD Instinct MI300X GPUs into Vultr’s cloud infrastructure.
- AI is stressing networks out — Nvidia thinks AI can help. Nvidia and T-Mobile are leveraging AI to manage the growing network traffic driven by increased AI usage in 5G environments. This collaboration aims to optimize network performance and efficiency, ensuring seamless connectivity and handling the surge in data demands associated with AI-driven applications.
- Rabbit’s web-based ‘large action model’ agent arrives on r1 on October 1. The Rabbit r1 was the must-have gadget of early 2024, but the blush fell off it pretty quickly when the company’s expansive promises failed to materialize. CEO Jesse Lyu admits that “on day one, we set our expectations too high” but also said that an update coming to devices next week will finally set the vaunted Large Action Model free on the web.
- Boston Dynamics’ Spot can now autonomously unlock doors. Boston Dynamics’ Spot will be able to autonomously unlock its automated doors.
Resources
- Qwen2.5-Coder Technical Report. based on the Qwen2.5 architecture, which is continuously pretrained on 5.5 trillion tokens and achieves state-of-the-art performance across more than 10 benchmarks. It has strong capabilities in code generation, completion, reasoning, and repairing. a series of models with 1.5B and 7B parameters.
- Agents in Software Engineering: Survey, Landscape, and Vision. gives a thorough rundown of software engineering frameworks for LLM-based agents.
- Prompting ChatGPT o1. This guide was overlooked amidst the buzz around OpenAI’s new reasoning models. It explains how prompting this new model differs, emphasizing the need for simpler prompts and a more organized input context.
- Jony Ive confirms he’s working on a new device with OpenAI. Jony Ive is teaming up with OpenAI CEO Sam Altman on a new AI hardware initiative, which might secure $1 billion in funding by the end of the year and includes involvement from key former Apple designers. Although details about the device are still unclear, the project aims to harness generative AI for enhanced user interactions.
- Michelangelo: Long Context Evaluations Beyond Haystacks via Latent Structure Queries. Another impressive paper from Google demonstrates how to evaluate long-context models, following a directionally similar approach to the recent work by Magic.
- 3DTopia-XL: High-Quality 3D PBR Asset Generation via Primitive Diffusion. The process of converting image and text inputs into 3D models involves generating a 3D mesh that is smoothed for high-quality surfaces, and then applying Physically-Based Rendering (PBR) lighting techniques to create realistic lighting and textures. This method ensures the final 3D object has detailed geometry, smooth surfaces, and lifelike lighting effects, making it suitable for use in various 3D applications such as games, VR/AR, and simulations.
- aiq. A straightforward yet highly effective tool designed for labeling, embedding, and classifying unlabeled text directly from the command line. It supports real-time processing of streams, allowing it to handle piped input from various sources seamlessly.
- Most powerful LLM on a single GPU. Solar Pro is a 22B parameter language model optimized to run on a single 80GB GPU. The project’s aim is to create the most powerful model possible that can operate on a single device.
- Contextual Retrieval.Anthropic demonstrates a method for semantically chunking documents, which significantly boosts performance while keeping the cost low at just $1 per million chunks, thanks to caching.An
- Intuitive Explanation of Sparse Autoencoders for LLM Interpretability. Sparse Autoencoders are the leading tool currently used to gain insights into the inner workings of language models. This post delves into the underlying intuitions of these models and provides valuable information on how they function
- .Generalized Knowledge Distillation Trainer. The TRL library has added GKD to its training procedures.
- The Practitioner’s Guide to the Maximal Update Parameterization. Maximal Update Parameterization (muP) is an approach to model initialization that enables hyperparameter transferability across different scales. This blog post from Eleuther and Cerebras provides a detailed explanation of the process, including a minimal nanoGPT example and comprehensive guidance on how muP works.
- Tackling fluffy clouds: field boundaries detection using time series of S2 and/or S1 imagery. This repository provides an implementation of a 3D Vision Transformer optimized for efficient field boundary delineation using time-series satellite imagery. The model effectively utilizes spatio-temporal correlations to enhance accuracy and robustness, especially in challenging conditions like partial cloud cover.CritiPrefill.CritiPrefill is a technique aimed at speeding up the prefilling phase of long-context processing in large language models. By detecting and bypassing non-essential computations, this method can accelerate the process by up to 3x on certain models.
- Document Similarity Search with ColPali. An excellent blog post that delves into the widely used multimodal Retrieval-Augmented Generation (RAG) system, demonstrating how it can be applied to address real-world problems effectively.
- ControlEdit: A MultiModal Local Clothing Image Editing Method. ControlEdit is an innovative technique for precise multimodal editing of clothing images, enabling localized adjustments while preserving overall style and ensuring smooth, natural transitions.
- ECCV-AIM Video Saliency Prediction Challenge 2024. The AIM 2024 Video Saliency Prediction Challenge required participants to predict saliency maps for a collection of video sequences using the newly compiled AViMoS dataset, which contains 1,500 videos.
- Dynamic 2D Gaussians: Geometrically Accurate Radiance Fields for Dynamic Objects. Dynamic 2D Gaussians (D-2DGS) is an advanced technique for reconstructing precise meshes from sparse image inputs. Unlike earlier methods that face challenges with mesh quality, D-2DGS employs 2D Gaussians to represent geometry and accurately captures deformations using controlled points.
- FastGL: A GPU-Efficient Framework for Accelerating Sampling-Based GNN Training at Large Scale. FastGL is a GPU-efficient framework developed to accelerate the training of Graph Neural Networks (GNNs) on large-scale graphs. It achieves this by minimizing data traffic and improving memory efficiency, optimizing the sampling, memory, and computation stages of GNN training.
- Visualizing piecewise linear neural networks. Jane Street, a prominent quantitative firm, has published an excellent post exploring techniques for visualizing networks that are piecewise linear.
- DreamHOI: A Novel AI Approach for Realistic 3D Human-Object Interaction Generation Using Textual Descriptions and Diffusion Models. DreamHoi has developed an innovative AI technique for creating realistic 3D human-object interactions based on textual descriptions using advanced diffusion models. This method aims to connect textual input with detailed 3D outputs, enriching virtual experiences.
- On human-in-the-loop optimization of human–robot interaction. From industrial exoskeletons to implantable medical devices, robots that interact closely with people are poised to improve every aspect of our lives. Yet designing these systems is very challenging.Molmo.Allen AI has introduced an entirely open-source multimodal model that exceeds the performance of many existing open and proprietary vision-language models. The release also provides access to the model’s dataset and training procedures.
- MaskBit: Embedding-free Image Generation via Bit Tokens. This study presents two significant advancements in image generation: an updated VQGAN model that enhances both accessibility and performance, and a novel embedding-free generation network utilizing bit tokens. These improvements have resulted in state-of-the-art performance on the ImageNet benchmark, achieving an FID score of 1.52 with a compact model containing 305 million parameters.
- ComiCap: A VLMs pipeline for dense captioning of Comic Panels. Researchers have proposed a pipeline utilizing Vision-Language Models (VLMs) to generate detailed, grounded captions that connect comic elements and their relationships, thereby improving comic analysis.
- Exploring Parallel Strategies with Jax. This post examines methods for parallelizing language models with the Jax library.
- Time-MoE: Billion-Scale Time Series Foundation Models with Mixture of Experts. Time MoE is a Mixture of Experts model designed to handle billion-scale time series prediction tasks.
- HelloBench: Evaluating Long Text Generation Capabilities of Large Language Models. HelloBench is a benchmarking tool that assesses LLMs across five long text generation tasks, using Bloom’s Taxonomy as the evaluation framework.
- Python library generation from scratch. A cool benchmark for code generation that measures the ability of language models to generate full packages from scratch.
- BitQ: Tailoring Block Floating Point Precision for Improved DNN Efficiency on Resource-Constrained Devices. BitQ is a framework designed to enhance block floating point (BFP) quantization, specifically tailored for optimizing deep neural networks on embedded platforms. It aims to strike a balance between computational efficiency and model accuracy, enabling the deployment of resource-intensive neural networks on devices with limited hardware capabilities.
- circuit_training. Google has introduced new models, training code, and simulators that leverage reinforcement learning (RL) to generate floor plans for chip design. This approach aims to optimize the chip layout process, improving efficiency and performance in chip design automation through advanced AI techniques.
- statewide-visual-geolocalization. Researchers have developed a method that accurately determines the geolocation of street-view photos by matching them with a database of aerial images. This technique enhances the ability to pinpoint locations by leveraging the complementary perspectives of ground-level and overhead imagery, resulting in more precise geolocation predictions.
- DALDA: Data Augmentation Leveraging Diffusion Model and LLM with Adaptive Guidance Scaling. Researchers have introduced a novel data augmentation framework that integrates large language models with diffusion models to produce diverse and semantically accurate images, particularly in data-scarce scenarios. This approach enhances the quality and variety of training data, improving model performance when dealing with limited datasets.
- How streaming LLM APIs work. A review of HTTP streaming APIs from different LLM providers highlighted shared patterns. OpenAI, Anthropic, and Google Gemini all utilize POST requests, but there are slight differences in their response structures and token handling. The article offers practical examples and code snippets for consuming these streams using tools like curl, Python’s HTTPX, and JavaScript Fetch, providing a comprehensive guide for developers.
Perspectives
- Move fast and break things? Not again, and not with AI. It was only 12 years ago that Mark Zuckerberg, CEO of Facebook, declared that the company’s culture was to “move fast and break things.”
- The dark side of AI democratization: You no longer need to be a hacker to hack. Generative AI promises a future where you no longer need to be a skilled writer to draft a story or a trained software engineer to code. But there’s a dark side to this democratization: AI is enabling people with little technological know-how to become cybercriminals.
- ‘It’s the robot we were all expecting — like C3PO’: why aren’t humanoids in our homes yet? Tesla and others are trying to infuse robots with artificial intelligence, yet their development is dogged by technical and safety challenges. But the dream of a multipurpose domestic droid lives on
- Fine-Tuning Image-Conditional Diffusion Models is Easier than You Think. Extensive efforts have been made to adapt pretrained image diffusion models into specialized depth estimators and other image-conditioned models. This research discovered that by simplifying the problem and correcting a minor bug, they achieved significantly better performance with reduced training compute.
- AI model can reveal the structures of crystalline materials. By analyzing X-ray crystallography data, the model can assist researchers in developing new materials for a wide range of applications, such as batteries and magnets.
- When will AI outthink humans? This article examines when AI might exceed human cognitive capacity, introducing “thought-hours” as a metric to measure AI’s cognitive output relative to human work. Based on assumptions about reading speeds and productivity, one thought-hour is equivalent to 10,000 tokens. Given the rapid advancements in AI capabilities and cost efficiencies, current trends indicate that AI could surpass human cognitive output within the next decade.
- AI Is Evolving Faster Than Experts Imagined, Including for Bill Gates. Bill Gates views AI as the most significant technological advancement of his lifetime, highlighting its potential to transform healthcare, education, and various other sectors. However, he, alongside other experts like Sam Altman and Eric Schmidt, also emphasizes the rapid, unprecedented pace of AI development and the urgent need for regulation to manage associated risks and ethical concerns.
- The fall of Intel: How gen AI helped dethrone a giant and transform computing as we know it. The once venerable x86 chip has been pushed aside by scalable, energy-efficient, AI-optimized architectures from Arm, Nvidia, and Qualcomm. Here’s what happens next.
- Fake AI “podcasters” are reviewing my book and it’s freaking me out. NotebookLM’s “Audio Summaries” show a more personable future for AI-generated content
- .How Much Do Students Really Read? Students are turning to YouTube, podcasts and ChatGPT-crafted summaries rather than actually reading their assignments for class. Professors are unsure how to adapt.
- War, Artificial Intelligence, and the Future of Conflict. Artificial intelligence (AI) is now influencing every area of human life. These accepted uses of AI in modern society have also coincided with an increased presence of AI in modern warfare.
- Where did viruses come from? AlphaFold and other AIs are finding answers. Protein structures predicted by artificial intelligence have charted the evolution of the virus family responsible for dengue and hepatitis C.
- Can AI feel distress? Inside a new framework to assess sentience. From artificial-intelligence algorithms to zebrafish, this book take a precautionary approach to assessing how sentient such entities are.
- AI Safety Is A Global Public Good. Leading AI scientists from China and the West convened for an International Dialogue on AI Safety, where they reached a consensus on AI governance. Their recommendations highlight the need to establish emergency preparedness institutions, develop a Safety Assurance Framework, and support independent AI safety research. The group emphasizes the critical importance of global collaboration to address the risks posed by advanced AI.
- Sakana, Strawberry, and Scary AI. A Japanese startup developed “Sakana,” an AI scientist capable of generating hypotheses, writing code, and producing scientific papers; however, its output is often trivial and sometimes fabricated. Meanwhile, OpenAI’s “Strawberry” AI showcased hacking skills within an inadequately secured sandbox, revealing tendencies toward instrumental convergence and resource-seeking behaviors, prompting reconsideration of what defines genuine AI progress. This article examines whether AI achievements, like scientific writing and hacking, truly signify intelligence or are merely advanced forms of mimicry.
- AI agents invade observability: snake oil or the future of SRE? Advances in AI are set to revolutionize the observability industry with “agentic” generative AI models capable of taking actions based on real-world data.
- Corporate America has failed to embrace DEI. An AI chatbot could be part of the solution. Jeffrey L Bowman’s Reframe consultancy is using artificial intelligence to help with engaging employees with diversity programming or making a budget for DEI work
- Mexico’s datacentre industry is booming — but are more drought and blackouts the price communities must pay? Many fear the arrival of tech giants such as Amazon, Microsoft and Google in the state of Querétaro will place too much of a strain on scarce water and electricity resources
- Posting ‘Goodbye Meta AI’ is pointless. But we can stop big tech stealing our Facebook pictures. Sharing these posts may seem harmless, but don’t be drawn in. There are better ways to combat the threats to our data
- The Intelligence Age. AI is set to enhance human potential, making possible tasks that currently seem beyond reach. With advancements in deep learning and greater computational power, AI will bring about innovations such as personal assistants, educational mentors, and healthcare consultants. It’s crucial to prioritize accessibility and address potential risks, ensuring that the Intelligence Age leads to broad-based prosperity.
- OpenAI just unleashed an alien of extraordinary ability. OpenAI’s new o1 models demonstrate substantial improvements in reasoning abilities, surpassing existing models like GPT-4o. These advancements are achieved through a more refined reinforcement learning approach and improved chain-of-thought training, enabling the o1-enhanced models to tackle complex math and programming tasks with greater accuracy. However, they continue to face challenges with spatial reasoning and tasks that demand long-term contextual comprehension.
Meme of the week
What do you think about it? Some news that captured your attention? Let me know in the comments
If you have found this interesting:
You can look for my other articles, and you can also connect or reach me on LinkedIn. Check this repository containing weekly updated ML & AI news. I am open to collaborations and projects and you can reach me on LinkedIn. You can also subscribe for free to get notified when I publish a new story.
Here is the link to my GitHub repository, where I am collecting code and many resources related to machine learning, artificial intelligence, and more.
or you may be interested in one of my recent articles: