What You May Have Missed #32
Research (OpenLLaMA, GPT-4 MIT) / Business & Products (Mistral AI, Function calling on OpenAI's API) / Articles (Terence Tao on GPT-4, EU AI Act) / Miscellanea (AI church service)
Research
OpenLLaMA: An Open Reproduction of LLaMA (GitHub, Hugging Face): “TL;DR: we are releasing our public preview of OpenLLaMA, a permissively licensed open source reproduction of Meta AI’s LLaMA 7B and 13B trained on the RedPajama dataset. Our model weights can serve as the drop in replacement of LLaMA in existing implementations. We also provide a smaller 3B variant of LLaMA model.”
Exploring the MIT Mathematics and EECS Curriculum Using Large Language Models (arXiv, Twitter): “LLM Passes MIT Math & Computer Science.
4,550 questions from the 30 MIT Math & CS courses required for a degree
New benchmark likely not in any training data
On test set excluding image Qs, w/ prompt engineering:
GPT-3.5 solves 33%
GPT-4 solves 100%”
But under scrutiny, the results look rather different (yoav, Raunak Chowdhuri, Margaret Mitchell). Update from Chowdhuri: ”No, GPT4 can’t ace MIT,” with comments from Hardmaru and Gary Marcus. Here's Ethan Mollick’s correction. This is an example of bad science disguised as good science: bad testing, non-peer-reviewed reporting passing as science, and AI/ML people sharing it on Twitter before careful fact-checking.
Introducing Voicebox: The first generative AI model for speech to generalize across tasks with state-of-the-art performance (Meta AI): “Like generative systems for images and text, Voicebox creates outputs in a vast variety of styles, and it can create outputs from scratch as well as modify a sample it’s given. But instead of creating a picture or a passage of text, Voicebox produces high-quality audio clips … because of the potential risks of misuse, we are not making the Voicebox model or code publicly available at this time.”
Survey reveals AI’s impact on the developer experience (Inbal Shani on GitHub): “We surveyed 500 U.S.-based developers at companies with 1,000-plus employees about how managers should consider developer productivity, collaboration, and AI coding tools … 92% of U.S.-based developers are already using AI coding tools both in and outside of work … 70% say AI coding tools will offer them an advantage at work and cite better code quality, completion time, and resolving incidents as some of the top anticipated benefits.” Grady Booch has a different reading of the results.
Demystifying GPT Self-Repair for Code Generation (Hugging Face): “Large Language Models (LLMs) have shown remarkable aptitude in code generation but still struggle on challenging programming tasks. Self-repair -- in which the model debugs and fixes mistakes in its own code -- has recently become a popular way to boost performance in these settings … we find that the effectiveness of self-repair is only seen in GPT-4.”