To Avoid AI Hype look for the Training Data and the Objective Function

I’ve got to admit, ChatGPT is pretty amazing. And I’ve spent a few nights, as a technical professional and teacher, wondering how it might complicate my life. And I’m sure it will. But I think its easy to look at this text generating monster and get a little unhinged about the prospects for AI in the near future.

One thing these models have convinced me of is that if there is training data for a task then I expect contemporary AI methods to eventually automate it. The internet is a giant pile of text and what we’ve seen is that a surprisingly small model, just billions of parameters, is sufficient to generate coherent text conditional on some previous text. While its also surprising how many tasks come down to just generating coherent text given a prompt, its also clear that the ability to do so doesn’t constitute general intelligence except of the kind that is represented specifically by generating plausible text.

Four things have enabled these large language models: advances in computer technology, advances is the architecture of the neural networks that underlie them, and, most critically, the availability of a large training data set and a clear objective function. AI applications have succeeded wildly when these last two conditions have been met and continue to struggle where they cannot be. If you’re wondering what area might be disrupted next, look for the places where the objective function and data are available. Despite the ability to write code, for example, there isn’t a lot of training data out there on how to debug subtle problems coupled with a given company’s problem domain, for example. There are many areas of human endeavor where the material to train a machine to do the work simply isn’t available in any accessible form and where devising an objective function which covers a large number of cases is difficult.

prompt: a laughing robot shoveling an enormous pile of paper into its gaping mouth like a salad with a fork “soviet art” abstract “line drawing”

In the long term I don’t have any illusions. Human beings have general intelligence and are physically realized beings who manage to use that intelligence without any easily described global objective function. I’m sure one day we’ll figure this trick out or just steal it from nature. But large language models are, in a sense, just more of the same: statistical models that work because of a well specified objective function and voluminous training data. I don’t expect those requirements to change for the foreseeable future.

Of course, the length of the foreseeable future gets shorter all the time.

Leave a Reply

Your email address will not be published. Required fields are marked *