30 Comments
Oct 16, 2022·edited Oct 16, 2022Liked by Alberto Romero

It feels like things have exploded this year (especially the past few months) when it comes to text-to-image / text-to-video consumer-level releases and announcements.

Question: Is this likely to be an odd blip in an otherwise more gradual pace of AI progress (i.e. a few similar competing models being completed at roughly the same time, making it seem like an avalanche of new stuff)? Or is this an accelerating trend that might hold/increase exponentially as more companies get in the game across more disciplines?

I realize this is a very speculative question, but as someone who is quite removed from all the ongoing behind-the-scenes AI work, it is sure hard to keep track or project where we're heading.

Thank you for your always fascinating takes!

Expand full comment
author

Awesome question Daniel! If I'd have to blindly guess, I'd say it's a bit of both.

On the one hand, text-to-image and text-to-video are super small compared to LLMs, for instance, and therefore easier to train and deploy. In this sense, this avalanche, as you say, may be difficult to replicate in other areas. Another factor is the easiness with which you find high-quality image data on the internet.

But, on the other hand, some of the advances that are facilitating the explosion in R&D are transversal: Open-sourcing tendencies, better hardware, a better understanding of model architecture and the underlying software, more interest by companies, researchers, and investors, etc.

So, a bit of both.

But, even if that's true, I don't think this fast-paced progress, even if partially transferable to other areas of AI, will last much. In the end, progress requires resources and resources are limited. There are only so many engineers who can work on this, only so many investors interested in powering AI companies, etc.

And, as results increase, expectations do, too. Hopefully, we don't raise them too much or we'll be facing an AI winter in no time (as has happened historically when AI people promised too much).

Finally, to challenge your premise, we'd have to assess how much of this fast-paced progress is illusory and how much is real. Published papers don't necessarily correlate with meaningful progress, as I wrote recently.

Expand full comment

Thanks for your detailed and thorough answer Alberto!

You just about hit the nail on the head with your last paragraph for me. Thing is, it all *seems* like an avalanche to a layman like myself exactly *because* we suddenly got access to lots of AI tools that we can directly interact with and experience for ourselves.

But that mainstream "consumer level" explosion isn't necessarily representative of the field at large. I realize that what the average person like myself sees is just a tiny slice (or tip of the iceberg, to use a tired analogy) of all the AI and machine learning progress that may or may not be happening. Which is why it's great to have a take from someone much more "in the know."

So thanks a lot for giving me a bit more clarity.

Expand full comment
author

You just pinpointed two key insights:

1. "we suddenly got access to lots of AI tools" Super important, because I realize that, being closer to the guts of the AI field biases me a lot. First, in the sense that I take for granted things that aren't well known yet for the majority. Second, because I may underestimate the impact that those things that I take for granted will have once they are out there in the world.

2. 'that mainstream "consumer level" explosion isn't necessarily representative of the field at large.' That's exactly right. Our perception is completely biased by our capacity (or lack thereof) to see the whole picture. On the one hand, I'm biased as I stated above. On the other hand, you and many more people could be biased because of this "consumer level explosion."

This means both insiders and outsiders may have a hard time understanding the true impact of all this!

Expand full comment
Oct 16, 2022Liked by Alberto Romero

Anything to achieve anomaly detection, even with little training data, predictive maintenance and autonomous operation.

When you follow what is shared around AI it is mainly about the fields you mentioned (language, images, social...)

I know that the industry has difficulties sharing data to allow rapid growth as for AI which are trained on images or language.

I'm keen to learn if there is anything on the horizon what may could change this.

Expand full comment
Oct 19, 2022Liked by Alberto Romero

Hi Stefan, I've come across several companies using acoustic and vibration data to detect equipment failure, such as for wind turbines. From a machine learning perspective, this is about learning a case-specific data distribution and detecting out-of-distribution samples. It an unsupervised learning problem that doesn't get as much attention in the research literature, for a number of reasons.

Current deep learning methods have a problem: they don't know when they don't know. In general they don't output calibrated confidences, and don't have a flag to say "hey I don't know about this kind of input". For machine learning to become more widely applicable, out-of-distribution detection needs to come as a standard feature, and the output confidences need to be more like probabilities.

Expand full comment

If you want anomaly detection for vision, check out the company below. They are wildly used and trusted in the industry. They train on good images and anything that isn't "learned" will appear to the operator as anomalies for classification.

https://www.mvtec.com/

Expand full comment
author

I'd say there are a few applications for anomaly detection. About autonomous operation, you'd find more in robotics than AI, probably. Idk about predictive maintenance.

Those use cases are less interesting for the majority, so that's why no one talks about them. But I'm quite sure you can find info (sadly, I have no idea where). My recommendation is to avoid mainstream sources because you'll mostly find mainstream topics there.

Expand full comment

I own you an answer (we got some private challenges to solve)

As there are new kids on block (chatGPT etc.) I assume AI will enter certain closed domains through the backdoor, when AI tools become part of the daily office toolset.

I wished there is more easily accessible discussion (like this here) to learn about the use and success of AI in the "old" industry which doesn't have direct connection to "big tech".

So far the only option is to follow research literature, which is often very case specific

Expand full comment
Oct 16, 2022Liked by Alberto Romero

First of all - love your work!

I have a question regarding writing! When did you start writing? Was it always Substack or did you start somewhere else? And how long did it take to build a loyal follower base for your newsletter?

Expand full comment
author

Hi Arnold, thanks for the first non-AI question!

I started writing publicly on January 2021 on Medium (I'm still there, but the platform is gotten worse so I'm focusing more on Substack now). But I've been writing more generally since 2016 or so--you know, thoughts and reflections (like a journal).

I started on Medium because it's better to grow an audience faster *initially* and you can write about different topics across publications. Once you have both the audience and the niche, Substack is definitely better (no algorithm between you and your readers and owning your audience are critical differences).

The 4 months I've been on Substack have been amazing growth-wise. I came here with 700 subscribers that I gathered after 1.5 years on Medium (a bit less because I didn't start the mailing list right away), and in just 4 months I've 5X that number to 3,500. I'm very happy!

Expand full comment
Oct 16, 2022Liked by Alberto Romero

In your opinion what is the most accurate speech recognition system for dictation and transcription currently available in Spanish for a journalist? Can you share some comparative data?

Expand full comment
author

Hi JB! Do you want to capture speech in Spanish and transcribe it in Spanish? (So, no English involved?)

Expand full comment
Oct 16, 2022Liked by Alberto Romero

With respect to level 5 autonomous driving , do you you think it could be solve with more data or edge cases or do you think we need done breakthroughs in AI itself.

Also , how useful you think training in simulation would be? Any limitations in these simulation trainings?

Expand full comment
author

Hi YH, thanks for joining!

I'd say that if we had more data on edge cases they'd stop being edge cases! Can we get rid of edge cases in the sense that every case has been documented enough times so that no situation is even slightly new for the car? I don't think so, no. There's simply too much variability in the real world.

Can simulations help with this? Yes, for sure. Simulations can help generate data for those cases that barely ever happen. Still, the same response applies; too many degrees of freedom in reality.

I'd say current techniques can improve self-driving beyond what we've seen so far, but that's only because we've been trying for only a few years with such techniques. (You already saw how Tesla is applying large language models for self-driving--which is a new interesting approach!)

In case we need a breakthrough, I don't have any idea of what it could be besides AI having the abilities humans have on improvisation, super-fast decision-making, etc. At least a world model is needed for that--meaning, how things work in the real world beyond language or vision.

How do we make AI learn world models like we do? If I knew I'd be already famous!

Expand full comment
Oct 21, 2022Liked by Alberto Romero

I’ve always had this doubt on simulations. While we could incorporate physics into simulations, we don’t seems to be able to fully incorporate stochastic into it. So the object trained might not be tuned to incorporate stochastic scenarios that is very common in real world

Expand full comment
Oct 16, 2022Liked by Alberto Romero

Do you see any efforts in openness in industrialised applications of ML/AI. I'm looking for a hugging face for the industry (including training dataset)?

Expand full comment
author

Hi Stefan, thanks for your question!

What do you mean by "industrialized applications"?

Expand full comment

Where do you see the state of the art in AI in the next 10-20 years? Worst and best case.

Expand full comment
author

Hi Clinton, your question is tricky, I like it. On the one hand, it's easy to throw some random predictions that have undefined accuracy but can satisfy everyone. On the other hand, it's super hard to give you detailed and reasoned predictions with their corresponding arguments.

Let's see. Because I'm not a scale maximalist, I don't see existential risks as something anywhere near urgent, so the worst scenarios I see are: super high job replacement (and lack of adequate countermeasures) and irreversible damage to the value/credibility of digital information. The first one is a threat to what we provide to and receive from society, and the second is a threat to how we understand the world. Both are already here, yes. The difference would be scale.

The best scenarios are: AI is being used by most people as an "enhancer" instead of a "replacer". Also, I hope that by then we've learned to separate the concept of AI ("true AI") from products that are powered by fancy statistics (i.e. we stop calling everything AI). And also, I'd love to see AI taking more inspiration from other fields (neuroscience) and accepting help from the humanities.

Hope that answered your question!

Expand full comment

It did. I have a similar perspective. I think with any new significant technology there is always a disruption to work or jobs. I feel that everyone is looking at ML and "AI" as a hammer to all of our problems, because the market hasn't found, or tested the limits of, deep learning coupled with the cheap cost of data storage. It is a powerful tool, but there will be limit until some other significant innovation is found (maybe ML will help us figure it out).

I hope as well the ease-of-use continues to improve. Stable diffusion is a great example of disruption to a market, but it is also getting more artists into the field t to create new digital content.

I am interested to see how the robotics field changes in the coming years. There needs to be a lot of breakthroughs in many areas of algorithms, sensors, and hardware, but I think we are on the cusp of a significant robotics boom. I tend to track the population sizes of student bodies majoring in a field as lead indicator to technology changes. I never bet against focused, smart people collocating.

Expand full comment
author

"I never bet against focused, smart people collocating." That's likely to be a good predictor!

I agree with you, robotics will grow in attention and importance in the next few years. AI is going for multimodality and one important part of that is sensorimotor information--purely digital AIs can't get that.

Expand full comment
Dec 2, 2022·edited Dec 2, 2022

Alberto writes...

"Because I'm not a scale maximalist, I don't see existential risks as something anywhere near urgent, so the worst scenarios I see are: super high job replacement (and lack of adequate countermeasures) and irreversible damage to the value/credibility of digital information."

If we define existential risk as a risk to the stability of our civilization (not an extinction threat) then job replacement and information credibility factors can get us to that risk something like this.

The overall pace of knowledge driven change (further accelerated by AI) is correctly perceived as personally threatening by large swaths of the population. These people then become susceptible to unrealistic promises offered by hyper confident charismatic con men offering fantasy feel good slogans like "make America great again". That is, this population wishes to return to a time when they knew what was going on, and thus felt secure.

The con man is elected to high office, and proceeds to try to steal the government for himself. Other than that, the con man really has no experience, training, or any idea what he's doing, so he accidentally hits the wrong geo-political button, resulting in a nuclear conflict between the major powers.

AI => Knowledge Driven Change => Political Madness => Game over.

Did AI cause the crash? Not specifically. Did it accelerate the social instability which led to the crash. Yes.

Expand full comment
Oct 16, 2022·edited Oct 16, 2022Liked by Alberto Romero

Where are the open source projects for building/improving training datasets for LaLM? As far as I can see, The Pile (EleutherAI) is SotA and it appears to have been frozen at the beginning of 2021. I keep hearing how important the training datasets are for these models yet I can't find any projects that are focused on building/improving what is available. Do you know of any projects, or where people discuss this topic? I've spent a short amount of time on EleutherAI's Discord but all discussion appears to be on the model code and none on The Pile but I may have missed it.

Expand full comment
author

Hi Kevin, that's a good question. As far as I know yes, the Pile is the best open-source dataset for language models there is right now. Most companies (OpenAI, Google) don't reveal the sources of their training data possibly to avoid copycats, maybe also legal problems.

Idk if any other groups are working on this. I know BigScience tried to improve the training dataset for BLOOM ethically speaking--but I've read they failed to make the model performant because of bad data curation.

So, in essence, the answer to your question depends on what you mean by "improving". Is it about making more performant models? Making them less biased/toxic? Something else?

I feel I didn't answer your question satisfactorily, so I'm sorry for that! If I find something I'll email you.

Expand full comment

Thank you! I've looked a bit and suspect that there is not something like this out there yet, something like a big Linux-like or Wikipedia-like effort to create a massive open-source dataset.

You pose a good question about what I mean by "better": to me it's a naïve view of improving the trained model's performance on standardized benchmarks. Ideally, The One Dataset To Rule Them All would support different dimensions of "better", explicitly letting the consumers choose their tradeoffs. Or, at least, being able to choose from different branches with differing tradeoffs.

Expand full comment
author

'The One Dataset To Rule Them All would support different dimensions of "better"'. That's a great way to see it!

Now we only have to engineer The One Dataset To Rule Them All from that idea, ha!

Expand full comment
Oct 20, 2022·edited Oct 20, 2022

What can you say about NLP? GPT3 was huge at the time. But, it seems it does not perform equally on languages other than English. My goals are on Domain Specific Corpora, so fine tuning is a must. On the other side of large models, we are seeing newcomers. Pathways Language Model (PaLM) is one of them. I do not know much about it, but it seems interesting enough. I have invested quite a few hours on GPT3. Since my target language is not English, should I move on?

Expand full comment

What do you think is a bigger bottleneck to a general-use home robot (e.g. Optimus) - the robotics side of things or the AI side of things? Any guess when we'll get one?

-Alex

Expand full comment