WEEKLY AI NEWS: RESEARCH, NEWS, RESOURCES, AND PERSPECTIVES

ML news: Week 6–12 November

OpenAI dev, TopicGPT, new chips, and much more

Salvatore Raieli
14 min readNov 13, 2023
ML news
Photo by Tim Mossholder on Unsplash

The most interesting news, repository, articles, and resources of the week

Check and star this repository where the news will be collected and indexed:

You will find the news first in GitHub. Single posts are also collected here:

Weekly AI and ML news - each week the best of the field

63 stories

Research

ML news
RT-Sketch
ML news
PixArt-α
ML news
How AI can help to save endangered species; Photo by Dieny Portinanni on Unsplash

News

  • Google Research scholar program. The Research Scholar Program aims to support early-career professors who are pursuing research in fields relevant to Google.
  • OpenAI DevDay Buzz Includes Alleged Leak Of New ChatGPT Prototype. ”highlights: OpenAI could introduce major updates for developers, making it cheaper and faster to build AI-based applications. A rumored “Team” plan for ChatGPT could offer unlimited high-speed GPT-4, advanced data analysis, and more.”
  • Google is extending its Vulnerability Rewards Program (VRP) to include generative AI. Today, we’re expanding our VRP to reward attack scenarios specific to generative AI. As part of expanding VRP for AI, we’re taking a fresh look at how bugs should be categorized and reported.
  • Paper Digest: NeurIPS 2023 Highlights. Paper Digest has analyzed more than 500/3500 papers. Interesting, but many articles are already been published for a while
  • HelixNet.HelixNet is a Deep Learning architecture consisting of 3 x Mistral-7B LLMs. It has an actor, a critic, and a regenerator. The authors also used AI synthetic data. This approach showed impressive results. The model is available on HuggingFace
ML news
HelixNet
  • ChatGPT Plus members can upload and analyze files in the latest beta. ChatGPT Plus members can also use modes like Browse with Bing without manually switching, letting the chatbot decide when to use them.
  • OpenAI Dev day recap. A recap by OpenAI: New GPT-4 Turbo model that is more capable, cheaper, and supports a 128K context window, New Assistants API that makes it easier for developers to build their own assistive AI apps. New multimodal capabilities in the platform, including vision, image creation (DALL·E 3), and text-to-speech (TTS)
  • xAI PromptIDEIntegrated development environment for prompt engineering and interpretability research, released by xAI
  • ChatGPT continues to be one of the fastest-growing services everIn less than a year, it’s hit 100 million weekly users, and over 2 million developers are currently building on the company’s API, including the majority of Fortune 500 companies.
  • Xbox partners with Inworld AI to build AI tools for game development. Microsoft’s Xbox and Inworld AI have partnered to create AI-powered game development tools for narrative and character creation.
  • Nvidia Is Piloting a Generative AI for Its Engineers.ChipNeMo summarizes bug reports, gives advice, and writes design-tool scripts
  • YouTube to test generative AI features. Users may test out a new conversational tool that utilizes artificial intelligence (AI) to respond to inquiries about YouTube content and provide suggestions, as well as a new feature that summarizes subjects in video comments, as part of the premium package offered to pay subscribers.
  • Google Announces Expansion of AI Partnership with Anthropic. The partnership includes important new collaborations on AI safety standards, committing to the highest standards of AI security, and the use of TPU v5e accelerators for AI inference
  • Cohere Introduced Embed v3 Embed v3 offers state-of-the-art performance per trusted MTEB and BEIR benchmarks. it is multilingual (100+ languages), works well with noisy data, retrieval-augmentation generation (RAG) systems, searches in a language or cross-language searches
  • Microsoft has over a million paying Github Copilot users” We have over 1 million paid copilot users in more than 37,000 organizations that subscribe to copilot for business,” said Nadella, “with significant traction outside the United States.”
  • Meta’s audiocraft can also generate stereo music Audiocraft is a library for audio processing and generation with deep learning. It features the state-of-the-art EnCodec audio compressor/tokenizer, along with MusicGen, a simple and controllable music generation LM with textual and melodic conditioning.
  • Hugging Face has a two-person team developing ChatGPT-like AI models Hugging Face’s H4 team is focused on developing open-source ChatGPT
  • Samsung is joining the AI arms race, too Samsung’s live translate feature, which the company is calling “AI Live Translate Call,” will be built into the company’s native phone app. Samsung says “audio and text translations will appear in real-time as you speak” and that the translations will happen on the device.
  • Introducing Adept Experiments Adept is building AI agents and now they are opening access to test them
  • Introducing GPTs You can now create custom versions of ChatGPT that combine instructions, extra knowledge, and any combination of skills. Highlight: Starting today, you can create GPTs and share them publicly. Later this month, we’re launching the GPT Store, featuring creations by verified builders. Once in the store, GPTs become searchable and may climb the leaderboards. We will also spotlight the most useful and delightful GPTs we come across in categories like productivity, education, and “just for fun”. In the coming months, you’ll also be able to earn money based on how many people are using your GPT.
  • Google Cloud demonstrates the world’s largest distributed training job for large language models across 50000+ TPU v5e chips Google Cloud TPU Multislice Training was built from the ground up to address the challenges of distributed ML training in orchestration, compilation, and end-to-end optimization. We demonstrated the benefits of Cloud TPU Multislice Training with what we believe is the largest publicly disclosed LLM distributed training job in the world (in terms of the number of chips used for training) on a compute cluster of 50,944 Cloud TPU v5e chips on the JAX ML framework, utilizing both BF16 and INT8 quantized training.
  • OpenAI Data Partnerships. We’re interested in large-scale datasets that reflect human society and that are not already easily accessible online to the public today. We can work with any modality, including text, images, audio, or video. We’re particularly looking for data that expresses human intention (e.g. long-form writing or conversations rather than disconnected snippets), across any language, topic, and format.

Resources

  • RedPajama-Data-v2.a new version of the RedPajama dataset, with 30 trillion filtered and deduplicated tokens (100+ trillion raw) from 84 CommonCrawl dumps covering 5 languages, along with 40+ pre-computed data quality annotations that can be used for further filtering and weighting. A dataset bigger than the one used for GPT-4 and already preprocessed
  • LLM4Rec.The proposed CLLM4Rec is the first recommender system that tightly combines the ID-based paradigm and LLM-based paradigm and leverages the advantages of both worlds.
  • consistencydecoder.OpenAI has released Improved decoding for stable diffusion vaes. The consistency decoder has reached the SOTA and it is nice they released also for stable diffusion
  • TopicGPT.We introduce TopicGPT, a prompt-based framework that uses large language models (LLMs) to uncover latent topics within a provided text collection. TopicGPT produces topics that align better with human categorizations compared to competing methods. official article.
  • FACTOR. an effective tool to detect deep fakes even without training. FACTOR leverages the discrepancy between false facts and their imperfect synthesis within deepfakes. By quantifying the similarity using the truth score, computed via cosine similarity, FACTOR effectively distinguishes between real and fake media, enabling robust detection of zero-day deepfake attacks.
  • CogVLM.CogVLM is a powerful open-source visual language model (VLM). CogVLM-17B has 10 billion vision parameters and 7 billion language parameters.
  • langroidLangroid is an intuitive, lightweight, extensible, and principled Python framework to easily build LLM-powered applications. You set up Agents, equip them with optional components (LLM, vector store, and methods), assign them tasks, and have them collaboratively solve a problem by exchanging messages.
  • OVIR-3D.D object retrieval from text prompts using 2D image fusion. his work provides a straightforward yet effective solution for open-vocabulary 3D instance retrieval, which returns a ranked set of 3D instance segments given a 3D point cloud reconstructed from an RGB-D video and a language query.
  • JaSPICE: Automatic Evaluation Metric Using Predicate-Argument Structures for Image Captioning Models. an automatic evaluation metric called JaSPICE, evaluates Japanese captions based on scene graphs. There is a gap between the performance of models for English captioning and other languages, this clever approach promises to reduce the gap
  • awesome-openai-vision-api-experiments.A set of examples showing how to use the OpenAI vision API to run inference on images, video files, and webcam streams.
  • punica. Low-rank adaptation (LoRA) is a parameter-efficient way to add new knowledge to a pre-trained LLM. Although the pre-trained LLM takes 100s of GB storage, a LoRA finetuned model only adds 1% storage and memory overhead. Punica enables running multiple LoRA finetuned models at the cost of running one.
  • LongQLoRA.LongQLoRA is a memory-efficient and effective method to extend the context length of Large Language Models with less training GPUs. On a single 32GB V100 GPU, LongQLoRA can extend the context length of LLaMA2 7B and 13B from 4096 to 8192 and even to 12k.
  • Lidar-Annotation-is-All-You-Need.a smarter method for self-driving cars to recognize roads by using lidar technology.
ML news
Lidar
  • LM4VisualEncoding.Pretrained transformers from LLMs, despite being trained solely on textual data, are surprisingly strong encoders for purely visual tasks in the absence of language. Our exploration shows the potential of LLMs as general-purpose encoders for visual data, as opposed to the previous usages of either pure encoders for text embeddings or decoders for tokenized outputs. official article.
  • vimGPT.Browse the web with GPT-4V and Vimium. Vimium is a Chrome extension that lets you navigate the web with only your keyboard. You could use Vimium to give the model a way to interact with the web.
  • Announcing a New Way to Create AI Employeesthe first platform lets you build a team of AI employees working together to perform any task. The idea is to build an agent that you can call and ask to perform a task

Perspectives

ML news
ML news
FY24/25 estimates based on CapitalIQ research

Meme of the week

If you have found this interesting:

You can look for my other articles, and you can also connect or reach me on LinkedIn. Check this repository containing weekly updated ML & AI news. I am open to collaborations and projects and you can reach me on LinkedIn.

Here is the link to my GitHub repository, where I am collecting code and many resources related to machine learning, artificial intelligence, and more.

--

--

Salvatore Raieli
Salvatore Raieli

Written by Salvatore Raieli

Senior data scientist | about science, machine learning, and AI. Top writer in Artificial Intelligence

No responses yet