WEEKLY AI NEWS: RESEARCH, NEWS, RESOURCES, AND PERSPECTIVES
ML news: Week Week 13–19 November
OpenAI’s CEO Sam Altman found another job (Microsoft), new models, and much more
The most interesting news, repository, articles, and resources of the week
Check and star this repository where the news will be collected and indexed:
You will find the news first in GitHub. Single posts are also collected here:
Research
- 3DStyle-Diffusion: Pursuing Fine-grained Text-driven 3D Stylization with 2D Diffusion Models. In order to provide more control over appearance and geometry, this research integrates 2D diffusion models into the 3DStyle-Diffusion model, a revolutionary technique for comprehensive stylization of 3D meshes. It functions by first employing implicit MLP networks to parameterize the texture of a 3D mesh into reflectance and illumination. After that, a pre-trained 2D diffusion model is used to maintain geometric consistency and match the produced pictures with the text prompt.
- official code.Cross-modal Prompts: Adapting Large Pre-trained Models for Audio-Visual Downstream Task. Dual-Guided Spatial-Channel-Temporal (DG-SCT) attention mechanism to enhance pre-trained audio-visual models for multi-modal tasks.
- Generalized Biomolecular Modeling and Design with RoseTTAFold All-Atom. RoseTTAFold All-Atom (RFAA), is a deep network addressing the limitations of current protein structure modeling tools by accurately representing complete biological assemblies, including covalent modifications and interactions with small molecules. RFAA demonstrates comparable accuracy to AlphaFold2 in protein structure prediction, excels in flexible small molecule docking, and predicts covalent modifications and assemblies involving nucleic acids and small molecules. Additionally, the authors present RFdiffusion All-Atom (RFdiffusionAA), a fine-tuned model for generating binding pockets around small and non-protein molecules, showcasing experimental validation with proteins binding to therapeutic, enzymatic, and optically active molecules.
- FinGPT: Large Generative Models for a Small Language. This study tackles the challenges of creating large language models (LLMs) for Finnish, a language spoken by less than 0.1% of the world population.
- Watermarking Vision-Language Pre-trained Models for Multi-modal Embedding as a Service. VLPMarker, a secure and robust backdoor-based embedding watermarking method for vision-language pre-trained models (VLPs), which effectively injects triggers into VLPs without interfering with model parameters, providing high-quality copyright verification and minimal impact on performance, while also enhancing resilience against various attacks through a collaborative copyright verification strategy based on both backdoor triggers and embedding distribution.
- Visualizing the Diversity of Representations Learned by Bayesian Neural Networks. ExplainableAI methods and their applications to Bayesian Neural Networks
- MonoDiffusion: Self-Supervised Monocular Depth Estimation Using Diffusion Model. In this work, a novel framework for self-supervised monocular depth estimation called MonoDiffusion is presented. It takes a fresh approach to the problem by treating iterative denoising. Instead of employing depth ground-truth for training, it makes use of a faux ground-truth diffusion process led by a teacher model that has already been taught. official code.
- Knowledgeable Preference Alignment for LLMs in Domain-specific Question Answering. The paper discusses the deployment challenges of large language models (LLMs) in real-world scenarios, particularly in domain-specific question answering (QA) with the integration of domain knowledge graphs. The authors introduce KnowPAT, a novel pipeline that employs style and knowledge preference sets, coupled with a new alignment objective, to improve LLMs for practical use in domain-specific QA, as evidenced by superior performance in experiments against 15 baseline methods. official code.
- DeepMind AI accurately forecasts weather — on a desktop computer. The machine-learning model takes less than a minute to predict future weather worldwide more precisely than other approaches. original article
- Role play with large language models. Casting dialogue-agent behavior in terms of role-play allows us to draw on familiar folk psychological terms, without ascribing human characteristics to language models that they in fact lack.
- Fine-tuning Language Models for Factuality. ChatGPT’s widespread acceptance was made possible by a breakthrough in model optimization based on preferences. By using comparable technologies, model accuracy and factual accuracy can be increased, leading to a 50% reduction in medical recall errors.
- Ultra-Efficient On-Device Object Detection on AI-Integrated Smart Glasses with TinyissimoYOLO. This group trained an ultra-small YOLO computer vision model and developed new RISC-V hardware specifically for vision, allowing for real-time object identification at very low latency and low power consumption.
- SentAlign: Accurate and Scalable Sentence Alignment. an accurate sentence alignment tool designed to handle very large parallel document pairs. It can efficiently handle tens of thousands of sentences. official code.
- Large Language Models are Temporal and Causal Reasoners for Video Question Answering. LLMs make errors in VQA when they focus too much on the language and ignore the video content, this article aims to solve this official code.
News
- Google in talks to invest ‘hundreds of millions’ into AI startup Character.AI. Character.AI’s chatbots, with various roles and tones to choose from, have appealed to users ages 18 to 24, who contributed about 60% of its website traffic.
- Introducing AI to FigJam. FigJam, Figma’s digital whiteboard application, now incorporates AI support to help streamline and improve design interactions.
- LLaVA-Plus: Large Language and Vision Assistants that Plug and Learn to Use Skills. An open-source approach that integrates language and vision is called LLaVa. The updated version gives the instruction-tuned model access to tools for creating and altering images, among other things.
- ai-town-rwkv-proxy. Hundreds of agents in AI Town, an incredible experiment, go about their everyday lives as prompt states in language models. Compared to typical Transformers, the RWKV model is a linear language model that uses fewer resources. This repository runs AI Town on your local computer using this less expensive model.
- Nvidia is launching a new must-have AI chip — as customers still scramble for its last one. The new class-leading H200 has more memory capacity and bandwidth, speeding up its work with generative AI and LLMs.
- OpenAI reveals new details about its AI development roadmap and fundraising plans. OpenAI LP is working on GPT-5 and plans to raise more capital from Microsoft Corp. to support its development efforts, Chief Executive Officer Sam Altman has disclosed in a new interview.
- Xbox partners with Inworld AI to build generative AI tools for game development. Xbox and Inworld AI are working together to build AI-driven technologies that will enhance game developers’ narratives and character creation features. As part of the collaboration, an AI character runtime engine and an AI design copilot will be created to help game creators create immersive gaming experiences. They believe these technologies will accelerate game creation, improve immersion, and encourage boundless innovation.
- New techniques efficiently accelerate sparse tensors for massive AI models. Complimentary approaches — “HighLight” and “Tailors and Swiftiles” — could boost the performance of demanding machine-learning tasks.
- OpenAI’s six-member board will decide ‘when we’ve attained AGI’ According to OpenAI, the six members of its nonprofit board of directors will determine when the company has “attained AGI”.
- Giant AI Platform Introduces ‘Bounties’ for Deepfakes of Real People. Users of the contentious “bounties” function of Civitai, an AI model-sharing site, may now commission and profit from the production of AI-generated photographs.
- You.com launches new APIs to connect LLMs to the web. When OpenAI connected ChatGPT to the internet, it supercharged the AI chatbot’s capabilities. Now, the search engine You.com wants to do the same for every large language model (LLM) out there.
- Microsoft and OpenAI partnership unveils new AI opportunities. Microsoft said at OpenAI’s DevDay that it will launch the new GPT-4 Turbo on Azure OpenAI Service before year’s end, offering more control and cost savings. Businesses’ AI skills will be enhanced by OpenAI’s Custom Models initiative, which will easily interact with Microsoft’s ecosystem.Nous-Capybara-34B V1.9.This is trained on the Yi-34B model with 200K context length, for 3 epochs on the Capybara dataset (multi-turn data with more than 1000 tokens per conversation)
- AI writes summaries of preprints in bioRxiv trial. A large language model creates synopses of papers aimed at various reading levels to help scientists sift through the literature.
- Catch me if you can! How to beat GPT-4 with a 13B model. Announcing Llama-rephraser: 13B models reaching GPT-4 performance in the major benchmark. What’s the trick behind it? Well, rephrasing the test set is all you need!
- IBM debuts $500 million enterprise AI venture fund. IBM is dedicating $500 million to invest in generative AI startups focused on business customers.
- Microsoft is finally making custom chips — and they’re all about AI. The Azure Maia 100 and Cobalt 100 chips are the first two custom silicon chips designed by Microsoft for its cloud infrastructure
- Google’s AI-powered search feature goes global with a 120-country expansion. The SGE update includes additional language support for Spanish, Portuguese, Korean and Indonesian.
- Universe 2023: Copilot transforms GitHub into the AI-powered developer platform. GitHub is announcing the general availability of GitHub Copilot Chat and previews of the new GitHub Copilot Enterprise offering, new AI-powered security features, and the GitHub Copilot Partner Program.
- Deepmind’s animation gallery. A variety of animations and artwork have been made available by Google’s deepmind research department to help people comprehend various AI systems. The animations are visually stunning but also a little strange.
- Deep mind announce music generation model. Today, in partnership with YouTube, we’re announcing Google DeepMind’s Lyria, our most advanced AI music generation model to date. Any content published by our Lyria model will be watermarked with SynthID.
- META introduces Emu Video and Emu Edit, our latest generative AI research milestones. A generative model frequently produces an output image that isn’t exactly what you were hoping for. It is really difficult to alter that image using the same model, though. Meta made a crucial discovery: editing capabilities can arise when all generations are treated as instructions. This is a really good improvement, especially when combined with the model architecture’s newfound simplicity.
- Microsoft launches a deepfakes creator at Ignite 2023 event. One of the more unexpected products to launch out of the Microsoft Ignite 2023 event is a tool that can create a photorealistic avatar of a person and animate that avatar saying things that the person didn’t necessarily say.
- YouTube will show labels on videos that use AI; YouTube is now requiring creators to mark videos that are made using AI, and the platform will show labels to viewers.
- Sam Altman fired as CEO of OpenAI. In a sudden move, Altman is leaving after the company’s board determined that he ‘was not consistently candid in his communications.’ President and co-founder Greg Brockman has also quit. apparently, they asked him to come back but he is now hired by Microsoft.
- Google delays launch of AI model Gemini, a potential rival to OpenAI’s GPT-4. Google is delaying the launch of its new large language model called Gemini, a potential rival to AI models from Microsoft (MSFT)-backed OpenAI
- The Escalating AI Arm Race: Inside the High-Stakes Talent Wars with OpenAI and Google. OpenAI recruiters are pitching annual compensation packages of around $5–10 million for senior researchers who jump ship depending on their role and expertise.
- Meta disbanded its Responsible AI team. A new report says Meta’s Responsible AI team is now working on other AI teams.
Resources
- The Alignment Handbook. The HuggingFace’s Alignment Handbook aims to fill that gap by providing the community with a series of robust training recipes that span the whole pipeline.
- versatile_audio_super_resolution. Pass your audio in, AudioSR will make it high fidelity!
- tarsier. Vision utilities for web interaction agents. A number of teams are working on creating agents that can interact with web items through vision thanks to the development of potent new vision models. A standard toolset is introduced by Tarsier (e.g., element tagging). Any vision system will work to help you navigate the website and take action. It also has browsing facilities for language models without eyesight.
- Extra-fast Bark for generating long texts. In this notebook, we’ll show you how to generate very long texts very quickly using Bark, Flash Attention 2, and batching.
- OpenGPTs. This is an open-source effort to create a similar experience to OpenAI’s GPTs. It builds upon LangChain, LangServe and LangSmith.
- Tamil-Llama: A Family of LLaMA-based LLMs focused on Tamil Language. This repository contains the code and models for “Tamil-Llama”, a project focused on enhancing the performance of language models for the Tamil language.
- GPT4V-AD-Exploration. In our report, we explore the revolutionary GPT-4V, a visionary in the field of autonomous driving.
- BestGPTs. Top-ranked OpenAI GPTs
- Hallucination Leaderboard. This evaluates how often an LLM introduces hallucinations when summarizing a document.
- draw-a-ui This is an app that uses tldraw and the gpt-4-vision api to generate HTML based on a wireframe you draw.
- AMBER: An Automated Multi-dimensional Benchmark for Multi-modal Hallucination Evaluation. a new benchmark designed to assess and reduce hallucinations in Multi-modal Large Language Models (MLLMs)
- instructor. Structured extraction in Python, powered by OpenAI’s function calling API, designed for simplicity, transparency, and control.
- GPU-Accelerated LLM on a $100 Orange Pi. This post shows GPU-accelerated LLM running smoothly on an embedded device at a reasonable speed. Additionally, we are able to run a Llama-2 13b model at 1.5 tok/sec on a 16GB version of the Orange Pi 5+ under $150.
- LLM Sherpa. LLM Sherpa provides strategic APIs to accelerate large language model (LLM) use cases.
- The Developer’s Guide to Production-Grade LLM Apps. advanced Techniques for Maximizing LLM Performance
- Accelerating Generative AI with PyTorch: Segment Anything, Fast. This blog shows how to get META SAM 8x faster, just using PyTorch features: quantization, nested tensors, and Triton
- ai-exploits. This repository, ai-exploits, is a collection of exploits and scanning templates for responsibly disclosed vulnerabilities affecting machine learning tools.
- Music ControlNet. ControlNet represented an innovative approach to providing image synthetic models with fine-grained control. There is now a model for music generation that is fairly similar and allows you to manage several aspects such as pitch and pronunciation.
- GPT-4 Turbo Note Taker. Fast and simple, Tactiq’s AI Note Taker with GPT-4 Turbo lets you turn your meetings into actionable notes — so that you’re always taking the right action and getting more out of your meetings.
- Chroma. Chroma is a generative model for designing proteins programmatically.
- A Survey on Language Models for Code. gives a summary of LLMs for code, covering 500 relevant works, more than 30 assessment tasks, and more than 50 models.
Perspectives
- Adversarial Attacks on LLMs. This blog post discusses the many new assaults that language model systems are facing. It has good details regarding several attack types as well as some successful mitigations that teams have discovered.
- AI and Open Source in 2023. A comprehensive review of the major developments in the AI research, industry, and open-source space that happened in 2023.
- How investors see your start up? A general partner at Angular Ventures divides the application concepts we are seeing into three major categories in an attempt to make sense of all the nascent AI firms. This exclusively examines application-layer businesses; it ignores model-layer companies.
- retool’s state of AI 2023. Retool surveyed 1,500 tech workers
- Language models can use steganography to hide their reasoning, study finds. large language models (LLMs) can master “encoded reasoning,” a form of steganography. This intriguing phenomenon allows LLMs to subtly embed intermediate reasoning steps within their generated text in a way that is undecipherable to human readers.
- Why teachers should explore ChatGPT’s potential — despite the risks. Many students now use AI chatbots to help with their assignments. Educators need to study how to include these tools in teaching and learning — and minimize pitfalls.
- The future is quantum: universities look to train engineers for an emerging industry. With quantum technologies heading for the mainstream, undergraduate courses are preparing the workforce of the future.
- The Future of Music: How Generative AI Is Transforming the Music Industry. AI-generated music has the potential to become our primary source of music in the future and influence our listening preferences. This might mark the beginning of music’s “Midjourney moment.”
- AI Doomers Are Finally Getting Some Long Overdue Blowback. Now, those who predicted AI will bring about our collective extinction must reconsider their claims. The “AI doom” really mainly benefited the large players, and there are plenty of chances for the open source AI movements.
- There’s a model for democratizing AI. The request for recommendations made by OpenAI on integrating democratic procedures in AI decision-making comes out as constrictive and prefers to handle delicate political matters without accepting accountability, which could limit the application and efficacy of democracy in AI governance.
- Copilot is an Incumbent Business Model. Though its ultimate disruptive potential rests in redesigning workflows, a challenge that might open substantially larger market opportunities, the Copilot AI business model improves current workflows for efficiency without generating new markets or upending lower ends.
Meme of the week
If you have found this interesting:
You can look for my other articles, and you can also connect or reach me on LinkedIn. Check this repository containing weekly updated ML & AI news. I am open to collaborations and projects and you can reach me on LinkedIn.
Here is the link to my GitHub repository, where I am collecting code and many resources related to machine learning, artificial intelligence, and more.