WEEKLY AI NEWS: RESEARCH, NEWS, RESOURCES, AND PERSPECTIVES
AI & ML news: Week 21–27 October
Amazon Introduces AI-Generated Audio Ads, Claude AI Introduces New Capabilities, and much more
The most interesting news, repository, articles, and resources of the week
Check and star this repository where the news will be collected and indexed:
You will find the news first in GitHub. All the Weekly News stories are also collected here:
Research
- Thinking LLMs: General Instruction Following with Thought Generation. The proposed training method aims to enhance LLMs with thinking capabilities for general instruction-following without relying on human-annotated data. It employs an iterative search and optimization process to facilitate thought generation, allowing the model to learn without direct supervision. For each user instruction, potential thoughts are evaluated using a judge model, which scores only the responses to identify the best and worst options. The resulting full outputs are then used as selected and rejected pairs for DPO (termed Thought Preference Optimization in this paper). This approach demonstrates superior performance on AlpacaEval and Arena-Hard.
- Model Swarms: Collaborative Search to Adapt LLM Experts via Swarm Intelligence. A new collaborative search algorithm is proposed to adapt LLMs using swarm intelligence, where a group of LLM experts collaboratively navigates the weight space to optimize a utility function that reflects various adaptation objectives. Experiments show that Model Swarms can effectively adjust LLM experts for a single task, multi-task domains, reward models, and a range of human interests. This approach outperforms 12 model composition baselines by up to 21.0% across different tasks and contexts.
- First-Person Fairness in Chatbots. This study explores first-person fairness, focusing on the fairness of interactions between users and ChatGPT, particularly examining any biases related to users’ names. It utilizes a model powered by GPT-4o to analyze patterns and name sensitivity in the chatbot’s responses based on different user names. The findings suggest that post-training significantly reduces harmful stereotypes overall. However, in areas such as entertainment and art, especially with open-ended tasks, the study reveals a higher level of bias, indicating a tendency to create narratives featuring protagonists whose gender aligns with the gender inferred from the user’s name.
- Looking Inward: Language Models Can Learn About Themselves by Introspection. The report indicates that LLMs can gain knowledge through introspection that is not directly derivable from their training data. It suggests that these models possess privileged information about themselves, which could contribute to creating more interpretable and controllable systems. However, it also notes that this introspective ability has limitations, as models often struggle to predict their own behavior on tasks that require reasoning over extended outputs.
- Janus: Decoupling Visual Encoding for Unified Multimodal Understanding and Generation. This proposal introduces a unified autoregressive framework for multimodal understanding and generation, which decouples visual encoding into independent pathways. Utilizing a single transformer architecture enhances flexibility and performance in both visual understanding and generation tasks. The framework claims to mitigate the trade-offs typically associated with vision tasks found in methods relying on a single visual encoder. As a result, it outperforms previous unified models and matches or exceeds the performance of task-specific models.
- Inference Scaling for Long-Context Retrieval Augmented Generation. This study employs two strategies to explore scaling laws for Retrieval-Augmented Generation (RAG): in-context learning (DRAG) and iterative prompting (IterRAG). It discovers that RAG performance steadily enhances with an increase in effective context length when configurations are optimized. Additionally, under optimal conditions, increasing inference computation yields linear improvements in long-context RAG performance. This insight leads to the creation of a computation allocation model designed to offer practical guidance for optimal computation distribution in long-context RAG situations.
- Agent S: An Open Agentic Framework that Uses Computers Like a Human. A novel open agentic framework has been developed to facilitate autonomous interactions with computers via a graphical user interface (GUI). Named Agent S, this framework addresses challenges such as knowledge acquisition, long-horizon planning, and managing dynamic interfaces. It introduces experience-augmented hierarchical planning that combines search and retrieval methods. Additionally, it utilizes an agent-computer interface to enable reasoning and control over GUI agents. Evaluation on the OSWorld benchmark demonstrates that Agent S surpasses the baseline by 9.37% in success rate, representing an 83.6% relative improvement, and sets a new state-of-the-art performance.
- Exploring Model Kinship for Merging Large Language Models. The study introduces the concept of model kinship to assess the similarity between LLMs. This measure is utilized to develop a model merging strategy called Top-k Greedy Merging with Model Kinship, which enhances performance. The authors discover this new criterion allows for effective and continuous model merging.
- On The Planning Abilities of OpenAI’s o1 Models: Feasibility, Optimality, and Generalizability. The report highlights that the o1-preview model excels in self-evaluation and constraint-following. However, it also points out that these o1 models exhibit bottlenecks in decision-making and memory management, particularly in the context of spatial reasoning. Specifically, the models tend to generate redundant actions and face challenges in generalizing across spatially complex tasks.
- Sabotage evaluations for frontier models. Anthropic has conducted several innovative evaluations to identify vulnerabilities and assess misalignment in large, powerful models.
- Mini-Omni2: Towards Open-source GPT-4o with Vision, Speech and Duplex Capabilities. A powerful open-source initiative aimed at replicating GPT-4’s speech capabilities has emerged. This model was trained by aligning multiple modalities using pre-trained audio and speech encoders, allowing it to achieve advanced speech recognition and generation functionalities.
- Automatically Interpreting Millions of Features in Large Language Models. Interpreting SAE features on a large scale can be difficult. To address this, Eleuther has introduced a set of automatic interpreter features designed to help understand the meaning of elements within their context.
- Mitigating Object Hallucination via Concentric Causal Attention. Object hallucination in vision-language models has been associated with Rotary Position Encoding (RoPE), which faces challenges in managing long-term dependencies between visual and textual inputs. To overcome this, the authors introduce Concentric Causal Attention (CCA), a novel positional alignment method that enhances the interaction between visual elements and instruction tokens.
- Simplifying, stabilizing, and scaling continuous-time consistency models. OpenAI has published work focusing on enhancing consistency models, which operate in two steps rather than the 1,000 steps typically used in diffusion models. While these models still depend on distillation from an existing diffusion model, the research seeks to improve their performance and stability as they scale.
- All you need are 32 tokens to represent video. Salesforce’s new approach introduces a novel video encoder that significantly reduces the number of tokens needed for accurate representation. While similar attempts in the past have seen limited success, the breakthrough appears to come from combining an explicit temporal encoder with a spatial encoder, enabling more efficient video processing.
- CoPS: Empowering LLM Agents with Provable Cross-Task Experience Sharing. CoPS is a novel algorithm that improves agents’ sequential reasoning by allowing them to share experiences across various tasks, enhancing their overall learning and adaptability.
News
- US investigates 2.4m Tesla self-driving vehicles after reported collisions. Road safety agency opens evaluation over reported collisions in low visibility
- Anthropic just made it harder for AI to go rogue with its updated safety policy. Anthropic has revised its Responsible Scaling Policy to incorporate Capability Thresholds for AI models that present substantial risks, including bioweapons and autonomous AI research. This policy is designed to establish industry standards by introducing AI Safety Levels, which mandate stricter safeguards according to the model’s capabilities. By transparently sharing safety practices and appointing a Responsible Scaling Officer, Anthropic aims to take a leadership role in AI governance and encourage similar initiatives across the industry.
- Sam Altman’s Worldcoin becomes World and shows new iris-scanning Orb to prove your humanity. The World project, co-founded by Sam Altman, seeks to authenticate human identity online through iris-scanning technology, addressing privacy issues and ongoing investigations in the EU. The initiative plans to integrate human verification into AI platforms and may redistribute the wealth generated by AI through Worldcoins. Recent updates include the launch of a new blockchain, an app, and tools such as Deep Face to help combat deepfakes.
- Google — Gemini Long Context. The Gemini team has set aside $100,000 for the most effective applications of their long context model capabilities.
- Unleashing System 2 Thinking? AlphaCodium Outperforms Direct Prompting of OpenAI o1. OpenAI’s o1 model, demonstrating System 1.5 thinking, exhibits improved reasoning abilities compared to earlier LLMs but still lacks the comprehensive problem-solving capabilities of full System 2 thinking. AlphaCodium enhances o1’s coding performance by offering a structured framework that supports reasoning and iterative refinement, resulting in greater accuracy on Codeforces benchmarks. Although the combination of o1 and AlphaCodium shows potential for advancing AI toward more profound reasoning, significant effort is still needed to incorporate complete System 2 thinking in AI models.
- Amazon’s AI Generator Tool Can Now Create Audio Ads. Soon, you’ll hear more audio ads on Amazon’s properties that were created with generative AI.
- Google Shopping is getting a ‘for you’ feed of products. Google Shopping is rolling out a personalized feed that shows you a stream of products you might like. The new feature, which is coming to mobile and desktop devices, shows up when you head to shopping.google.com.
- TikTok owner sacks intern for allegedly sabotaging AI project. ByteDance dismissed person in August it says ‘maliciously interfered’ with training of artificial intelligence models
- AlphaFold reveals how sperm and egg hook up in intimate detail. Three sperm proteins work together as matchmakers to enable fertilization in vertebrates.
- xAI, Elon Musk’s AI startup, launches an API. In August, Elon Musk’s xAI promised to make Grok, the company’s flagship generative AI model powering a number of features on X, available via an API. Now, that API has arrived — albeit a bit bare-bones at the moment.
- Jane Street Real-Time Market Data Forecasting. This competition, hosted by Jane Street, challenges participants to build models using real-world data from production systems. The goal is to provide insights into the complexities of financial markets, requiring participants to apply their skills in data analysis and modeling to navigate the dynamic nature of market behavior.
- OCP Summit 2024: The open future of networking hardware for AI. At OCP 2024, Meta unveiled a next-generation disaggregated network fabric and new network hardware specifically designed for AI clusters. The company introduced the Disaggregated Scheduled Fabric (DSF), aimed at improving scalability and performance in AI training systems. Both the newly developed and existing hardware are optimized for high throughput and efficiency, providing open, vendor-agnostic solutions to support advanced AI applications.
- Serve confirms delivery by robot expansion plans with Gen3 rollout. Serve Robotics’ third-generation delivery robot is equipped with NVIDIA’s Jetson Orin module, significantly boosting its AI processing capabilities. This upgrade allows the robot to make faster, real-time autonomous navigation decisions, improving its efficiency and performance in delivery tasks.
- Boston Dynamics teams with TRI to bring AI smarts to Atlas humanoid robot. Boston Dynamics and Toyota Research Institute are partnering to integrate advanced AI and large behavior models into the electric Atlas humanoid robot. This collaboration aims to enhance the robot’s capabilities, enabling more sophisticated and autonomous behaviors in tasks that require human-like movement and decision-making.
- Microsoft introduces ‘AI employees’ that can handle client queries.US company gives customers the ability to build own virtual agents as well as releasing 10 off-the-shelf bots
- Thom Yorke and Julianne Moore join thousands of creatives in AI warning. Statement comes as tech firms try to use creative professionals’ work to train AI models
- Claude AI tool can now carry out jobs such as filling forms and booking trips, says the creator. Anthropic says model is able to carry out computer tasks — as fears mount such technology will replace workers
- Introducing computer use, a new Claude 3.5 Sonnet, and Claude 3.5 Haiku.Anthropic has enhanced Sonnet 3.5’s capabilities and introduced a more affordable version that delivers the same performance as the previous Claude 3 Opus. Furthermore, Sonnet 3.5 has been trained with screen recordings, enabling it to operate computers and interact with user interfaces.
- ChatGPT has a Windows app now. The app, which is currently in testing, is only available to ChatGPT subscribers for now.
- Adobe’s new image rotation tool is one of the most impressive AI concepts we’ve seen. Adobe’s Project Turntable leverages AI to rotate 2D vector art in 3D, allowing the artwork to be viewed from various angles while preserving its 2D look and design integrity. This innovative technique ensures that the visual style remains consistent, even as the artwork is transformed in three-dimensional space
- .Perplexity lets you search your internal enterprise files and the web. Enterprises can use their Perplexity dashboards to search for internal information and combine it with knowledge from the internet, but this will only be limited to specific files they deem important.
- OpenAI, Microsoft reportedly hire banks to renegotiate partnership terms. OpenAI and Microsoft are in discussions regarding the terms of their partnership, with Microsoft aiming to acquire a substantial stake in OpenAI following its restructuring.
- Former OpenAI CTO Mira Murati is reportedly fundraising for a new AI startup. This startup will reportedly focus on building AI products based on proprietary models and could raise more than $100 million in this round.
- Midjourney plans to let anyone on the web edit images with AI. Midjourney is planning to release an upgraded web tool that’ll let users edit any uploaded images from the web using Midjourney’s generative AI.
- Intel wins lengthy EU legal battle over £880m competition fine. Chipmaker disputed 2009 decision that it abused its market position in case dating back two decades
- Cohere’s multilingual model’s dramatic improvement. The Aya project, a standout initiative in multilingual language model training, has made impressive strides since its launch earlier this year. Much of its performance improvement is attributed to effective post-training strategies. Additionally, Aya can handle audio input and create images, all from non-English sources.
- Introducing the analysis tool in Claude.ai. Claude can now write and execute code as part of artifacts.
- Gurman: Apple internally believes that it’s at least two years behind in AI development. According to the latest edition of Mark Gurman’s Power On newsletter, some employees at Apple believe that the company is around two years behind in artificial intelligence development.
- Perplexity is reportedly looking to fundraise at an $8B valuation. AI search engine Perplexity is in fundraising talks and hopes to raise around $500 million at an $8 billion valuation, according to The Wall Street Journal.
- Chinese humanoid robot is the ‘fastest in the world’ thanks to its trusty pair of sneakers. The STAR1 robot can reach a top speed of 8 mph with the added help of a pair of sneakers.
- From Rupert Murdoch to Thom Yorke: the growing backlash to AI. Media mogul and leading artists join the fight to stop tech firms using creative works for free as training data
- Talk to your plants? Now the first AI-powered garden will allow them to talk back. Collaboration between leading garden designer and Microsoft to go on display at Chelsea Flower Show 2025
Resources
- CoTracker3: Simpler and Better Point Tracking by Pseudo-Labelling Real Videos. This proposal introduces a new point-tracking model along with a semi-supervised training recipe that allows for the use of real videos without annotations during training. It generates pseudo-labels using readily available teacher models. This approach simplifies the architecture and training scheme, resulting in improved outcomes while utilizing 1000 times less data.
- Meta’s latest open source releases. Meta has introduced a significant array of valuable research tools, including a speech-to-speech model, enhancements to SAM, and numerous other intriguing developments.
- One-Step Diffusion via Shortcut Models. Shortcut models represent a new category of consistency models that can produce continuous signals with minimal inference steps.
- Zero-Shot 3D Visual Grounding. VLM-Grounder is a novel approach to 3D visual grounding that addresses the shortcomings of conventional methods by leveraging vision-language models (VLMs) and 2D images.
- DeepSeek’s natively Multimodal model. DeepSeek has developed and launched a powerful 1.3 billion parameter model capable of processing interleaved text and images for both generation and comprehension.
- Meta Lingua. Meta has developed an easy-to-use and research-friendly codebase that can replicate Llama 2 7B within 24 hours.
- Embedding an Ethical Mind: Aligning Text-to-Image Synthesis via Lightweight Value Optimization. LiVO (Lightweight Value Optimization) is an innovative approach designed to align Text-to-Image models with human values.
- Easily hackable vision language model. A simple and performant VLM implementation in pure PyTorch
- Anthropic Quickstarts. Anthropic Quickstarts provides developers with projects like a customer support agent and a financial data analyst to help them swiftly utilize the Anthropic API. These projects leverage Claude for natural language processing and incorporate interactive data visualization. Each quickstart comes with setup instructions and encourages contributions from the community.
- BiGR: Harnessing Binary Latent Codes for Image Generation and Improved Visual Representation Capabilities. BiGR is an innovative image generation model that leverages compact binary latent codes to enhance both its generation and representation capabilities. It is the first model to integrate both generative and discriminative tasks within a unified framework. Key features of the model include binary tokenization and a distinctive entropy-ordered sampling technique, which contribute to its improved performance.
- LongPiBench. LongPiBench is a benchmark created to evaluate positional biases in large language models (LLMs) when handling long contexts. It focuses on identifying biases that stem from the spacing between multiple relevant pieces of information, providing a targeted way to assess how well models handle long-range dependencies in text.
- CLaMP 2: Multimodal Music Information Retrieval Across 101 Languages Using Large Language Models. Clamp2 is a contrastive model designed for aligning music and text. It uses contrastive learning techniques to match and relate musical elements with corresponding textual descriptions, enhancing the ability to process and generate music-related text in alignment with audio.
- bitnet.cpp. Microsoft has released an inference repository for its 1.58-bit models, which, when properly trained, are capable of running efficiently on consumer hardware. This development allows for more accessible deployment of advanced AI models without requiring high-end computational resources.
- Montessori-Instruct: Generate Influential Training Data Tailored for Student Learning. Montessori-Instruct is a novel framework designed to generate synthetic data that aligns with a student language model’s learning process. It adapts the data produced by the teacher model to fit the student’s learning preferences by leveraging local data influence and Direct Preference Optimization (DPO), optimizing the training experience for the student model.
- Stable Diffusion 3.5. Stability AI has launched a new series of models featuring enhanced performance and faster speeds. These models come with built-in Diffusers support, allowing for immediate training capabilities
- 3D-GANTex: 3D Face Reconstruction with StyleGAN3-based Multi-View Images and 3DDFA based Mesh Generation. This paper presents a novel approach for estimating face texture and geometry from a single image by combining StyleGAN with 3D Morphable Models.
- Moonshine. Moonshine is a family of speech-to-text models optimized for fast and accurate automatic speech recognition (ASR) on resource-constrained devices. It is well-suited to real-time, on-device applications like live transcription and voice command recognition.PocketPal AI.PocketPal AI is a pocket-sized AI assistant powered by small language models (SLMs) that run directly on your phone. Designed for both iOS and Android, PocketPal AI lets you interact with various SLMs without the need for an internet connection.
- Introducing the prompt() Function: Use the Power of LLMs with SQL! The costs of operating LLMs have dropped considerably, making it feasible to incorporate smaller models like GPT-4o-mini into SQL functions. MotherDuck’s PROMPT() function simplifies tasks such as text generation, summarization, and structured data extraction using OpenAI models. It provides flexibility in balancing cost and performance, while also supporting bulk operations with improved concurrency for more efficient processing.
- Anthropic Computer Use Demo. A quick example of Claude Sonnet’s 3.5 new computer use capabilities.
- Introducing SynthID Text. SynthID is a method for statistically watermarking generated text. It employs a pseudorandom function after the top-k and top-p sampling steps to embed a mark within the text. A probabilistic Bayesian approach is then used to detect whether the text has been watermarked, indicating it was produced by a language model.
- Transformers.js v3: WebGPU Support, New Models & Tasks, and More…. Transformers JS is a JavaScript library designed to run machine learning models, and it now supports WebGPU, offering up to 1,000x faster performance in some cases. The latest version provides access to over 1,200 models, making it well-suited for edge and browser-based applications.
- Pangea: A Fully Open Multilingual Multimodal LLM for 39 Languages. We present Pangea-7B, an open multilingual multimodal language model (MLLM) developed to address multilingual and multicultural challenges in visual understanding tasks. Pangea-7B is trained on PangeaIns, a comprehensive dataset consisting of 6 million instructions across 39 languages.
- SAM2Long: Enhancing SAM 2 for Long Video Segmentation with a Training-Free Memory Tree. SAM2Long solves the “error accumulation” problem found in SAM 2’s memory design by implementing a training-free strategy for video object segmentation.
- Agent.exe. A convenient wrapper for Anthropic’s computer use system simplifies its usage and execution, making it more user-friendly and accessible.
- TALoS: Enhancing Semantic Scene Completion via Test-time Adaptation on the Line of Sight. TALoS is a method that enhances scene completion for autonomous vehicles by leveraging observations from different time points as supervision for making more accurate predictions.
- OmniParser for Pure Vision Based GUI Agent. Screenshot parsing tool for models to use digital interfaces.
- Introducing quantized Llama models with increased speed and a reduced memory footprint. Meta has optimized its 1B and 3B language models by applying quantization, achieving a 2–4x speed increase and reducing the model size by over 50% with minimal quality loss. This improvement is made possible by its quantization-aware training setup, allowing the models to adapt to lower precision effectively.
- Joint Point Cloud Upsampling and Cleaning with Octree-based CNNs. An effective and straightforward approach for upsampling and refining point clouds utilizes a modified octree-based 3D U-Net, known as OUNet.
- ExecuTorch. ExecuTorch supports on-device inference across mobile and edge devices, including wearables, embedded systems, and microcontrollers. It facilitates the efficient deployment of PyTorch models to edge environments and is compatible with various computing platforms, leveraging hardware capabilities like CPUs, NPUs, and DSPs. Comprehensive tutorials provide guidance on using ExecuTorch step-by-step.
- Federated Transformer (FeT). The Federated Transformer (FeT) is a novel framework aimed at enhancing both performance and privacy in Vertical Federated Learning (VFL) across multiple collaborating parties.
- ADEM-VL. ADEM-VL is an innovative vision-language model created to address hardware constraints found in current models.
- Predicting Weight Loss with Machine Learning. The author utilized a straightforward feedforward DNN model to monitor and forecast weight loss on a ketogenic diet. This model effectively captured the non-linear weight loss trends, fit a predictive function to the data, and visualized calorie metrics. For added insights, the Harris-Benedict Equation was applied to compare estimated calorie needs with actual weight loss.
- Video scraping: extracting JSON data from a 35 second screen capture for less than 1/10th of a cent. Google Gemini’s AI Studio can accurately extract numerical data from video screen recordings of emails. This process leverages the cost-effective Gemini 1.5 Flash model, resulting in minimal expense. This innovative “video scraping” technique provides a practical alternative to conventional data extraction methods.
Perspectives
- Duolingo CEO Luis von Ahn wants you addicted to learning. Duolingo’s CEO, Luis von Ahn, talks about utilizing AI and gamification to improve language learning through features such as chat interactions with AI avatars and AI-generated video game-like adventures. The company has recently launched Duolingo Max, a premium subscription plan that provides AI-driven conversation practice, capitalizing on the lower costs and faster development associated with AI-generated content. Although AI has limitations in engagement, Duolingo prioritizes maintaining user motivation by balancing effective learning with gamified, entertaining experiences.
- State of AI Report 2024. The 2024 State of AI Report notes that foundational models are increasingly being integrated into practical applications, with OpenAI leading the way in significant revenue generation. Key developments include the alignment of performance among leading research labs, a growing emphasis on planning and reasoning in large language model (LLM) research, and extending foundational models into multimodal domains. Despite facing regulatory hurdles, AI companies have seen a surge in valuation, though questions about their long-term sustainability remain.
- How gen AI can help doctors and nurses ease their administrative workloads. Doctors and nurses spend nearly 28 hours a week on administrative tasks.
- Elon Musk’s global political goals. Over the weekend, Musk pledged to give away $1m a day to registered voters in battleground states in the US who sign his Pac’s petition in support of the First and Second Amendments. He awarded the first prize, a novelty check the size of a kitchen island, at a Pennsylvania rally on Saturday and the second on Sunday in Pittsburgh. He says he’ll keep doing it until the election on 5 November. Experts say that the stunt is potentially illegal.
- The Second $100B AI Company. This article forecasts that by 2034, emerging AI companies fueled by advancements in AI applications, particularly in consumer AI, will join OpenAI in exceeding a $100B market cap. While established tech giants currently dominate the AI infrastructure and model layers, the application layer offers significant potential for innovation and expansion, providing fertile ground for consumer AI to flourish. The prospects for large-scale success in consumer AI, especially in areas such as video creation, online shopping, and gaming, resemble the transformative impact seen in past tech revolutions like cloud computing and mobile technology.
- Use Prolog to improve LLM’s reasoning. Current methods such as Chain-of-Thought (CoT) reasoning and the integration of programming languages like Prolog can enhance the reasoning abilities of LLMs, helping to mitigate the limitations of autoregressive models. The paper “Reliable Reasoning Beyond Natural Language” introduces a neurosymbolic approach that employs Prolog to translate requests into symbolic logic, enhancing both explainability and problem-solving capabilities. ProSLM, the model developed in this research, has shown substantial improvements in various datasets, highlighting the potential of combining Prolog with LLMs for tackling complex reasoning tasks.
- AI watermarking must be watertight to be effective. Scientists are closing in on a tool that can reliably identify AI-generated text without affecting the user’s experience. But the technology’s robustness remains a challenge.
- AI scans RNA ‘dark matter’ and uncovers 70,000 new viruses. Many are bizarre and live in salt lakes, hydrothermal vents, and other extreme environments.
- Build an international AI ‘telescope’ to curb the power of big tech companies. Artificial intelligence (AI) technologies have reached a crucial juncture. The vast computing clusters required to train the most advanced generative AI systems are available only to a few large corporations.
- Was the Nobel prize for physics? Yes — not that it matters. The award of the 2024 Nobel Prize in Physics to John Hopfield and Geoffrey Hinton for their groundbreaking research on artificial neural networks has caused consternation in some quarters. Surely this is computer science, not physics?
- How I peer into the geometry behind computer vision. Minh Ha Quang’s work at a Japanese AI research center aims to understand how machines extract image data from the real world.
- AI Dreams: Microsoft @ 50, Chapter 1. Microsoft’s research on AI robustness led the company to invest billions in AI infrastructure, driving breakthroughs with partners such as OpenAI. This investment has played a key role in Microsoft’s rapid growth in AI-powered products, highlighted by the success of GitHub Copilot. Despite facing competition and balancing sustainability goals, Microsoft remains committed to AI, with record capital expenditures on its AI and cloud infrastructure.
- Future of Internet in the age of AI. In this article, Cloudflare CEO Matthew Prince explores AI’s influence on Internet infrastructure, emphasizing the need for AI-capable edge computing and local inference to minimize network latency. He underscores the significance of regionalization in AI services to address regulatory challenges and outlines Cloudflare’s strategy of developing a connectivity-focused network. Cloudflare’s goal is to enhance internet connectivity by making it faster, more secure, and more efficient, closely aligning its efforts with advancements in AI technologies.
- How Jacob Collier helped shape the new MusicFX DJ. Grammy-winning musician Jacob Collier has partnered with Google DeepMind and Google Labs to develop MusicFX DJ, an AI-driven music tool. The tool’s interface has been revamped to foster creativity, making it easy for users to tap into a “flow state” of artistic inspiration. MusicFX DJ is now available, featuring user-friendly controls suitable for all experience levels.
- The AI Investment Boom. The AI boom is spurring substantial US investments in data centers, computing infrastructure, and advanced hardware, with annual data center construction reaching an unprecedented $28.6 billion. This growth is driven by rising demand for high-powered computing resources essential for training and deploying sophisticated AI models. Although tech sector revenue is recovering, job growth is primarily centered on semiconductor manufacturing and infrastructure, shifting attention away from traditional programming roles.
Meme of the week
What do you think about it? Some news that captured your attention? Let me know in the comments
If you have found this interesting:
You can look for my other articles, and you can also connect or reach me on LinkedIn. Check this repository containing weekly updated ML & AI news. I am open to collaborations and projects and you can reach me on LinkedIn. You can also subscribe for free to get notified when I publish a new story.
Here is the link to my GitHub repository, where I am collecting code and many resources related to machine learning, artificial intelligence, and more.
or you may be interested in one of my recent articles: