OpenAI and the Loss of Control

Close-reading OpenAI's safety statement

Apr 10, 2023

My book Meganets is about our ongoing loss of control over our large networked systems. AI is only one example, but it is probably the most visible and extreme example because its unpredictability is so often blatantly on display.

Companies aren’t keen on acknowledging a lack of control over their products. I point out a few instances from Facebook and Microsoft in the book, and ultimately I think we’ll be seeing more and more undeniable admissions of the limits of companies’ control over their own products.

The recent OpenAI statement safety statement is more candid than most in admitting their lack of fine-grained control over ChatGPT and its kin. They bluntly admit that early on:

I want to point out just how remarkable this statement is. This is not just saying that their invention may be put to unexpected nefarious use. It is more akin to saying that the possibilities of AI behavior are too voluminous to test prior to release, and so the exploration of those limits, whether safe or not, will inevitably come in the only testbed that is sufficient to accomplish such testing: the real world.

From now on, we are all beta testers.

Safety

So it comes off a tad disingenuous to then claim that they don’t permit ChatGPT to be used for abusive purposes—when they’ve just admitted they can’t predict all the purposes that ChatGPT could potentially fulfill.

And those relative percentages always raise eyebrows. “82% less likely,” in the absence of absolute numbers and baselines, seems more like a cause for concern than a reassuring statistic. Ironically, there’s no doubt in my mind that OpenAI spent a great deal of time ironing out many cases in the last few years, since compared to Microsoft’s Sydney, ChatGPT does remain resolutely anodyne far more of the time. It’s just that

They haven’t made it foolproof
They can’t make it foolproof
They know (1) and (2)

Hence the tension in this statement between their wanting to promote their genuine efforts, versus their knowledge that it is still insufficient.

Sure enough, the statement soon flips back to hedging.

No doubt they have made significant effort, and child endangerment in particular is an area that’s more tractable than most, since the content is more readily identifiable and considerably less rampant than most types of content. Hateful and harassing content, on the other hand, is rampant and much harder to distinguish.

And yet “significant effort to minimize the potential” is an incredible hedge, without any metric given for success and the unnerving implication that “minimization” is not “elimination.” To be honest, I’m surprised that OpenAI was so reticent on this point in particular, and I am at a loss as to exactly why.

Privacy

There’s less mystery to the statement about personal data. Bluntly put, tons and tons of personal information—publicly available personal information—fuels ChatGPT and many other AIs, and thoroughly filtering it out is a humanly impossible task. So when talking about personal data, OpenAI has a get out of jail free card—see if you can find it.

Yes, the get out of jail free card is “where feasible.” I don’t doubt OpenAI’s general wish not to cause uncomfortable violations of privacy, but the mass of data fed into any general-purpose AI is going to include lots of public data that would surprise many as to its personal content, whether purchases or offhand public statements of private details or demographic data or you name it.

The amount of data needed to train any large-scale AI today—10 million examples minimum for a particular feature identification, according to one textbook—is so great that it would require either mass crowdsourcing (unworkable) or AI itself (circular) to filter. Unless the input data is already restricted to information that couldn’t possibly violate privacy—information not tied to any individual in the first place—AI creators can’t give assurance of a lack of sensitive information in the training data.

What they can do, however, is to be very careful about releasing it. They can train AI not to output statements that look personal. They won’t succeed 100 percent of the time, but they’ll at least prevent a fair amount of what’s inside the system from escaping. This is a mixed blessing, but it is better than nothing.

Accuracy

Finally there is the matter of factual accuracy. Once again, OpenAI hauls out a relative percentage.

“40% more likely” is an ironic statistic to use because the better GPT-3.5 was, the less impressive GPT-4 is. Here OpenAI links to their GPT-4 System Card, which shows the improvement to be substantial—but also that in adversarial questions, even a fine-tuned GPT-4 (with human-intervention tuning included) only reaches 60% accuracy.

Note also that most of the improvement came from the RLHF—reinforcement learning with human feedback. In other words, the greater part of the improvement in GPT-4 came from individual examples provided by an army of human testers—not from the initial system or training.

This is the state of the art. The advances are remarkable, but they still leave the end result lacking, and from this statement, OpenAI signals several times that they don’t have quick fixes to fix the deficits. It is not a lack of time, but a lack of ability. The path from 60 to 80% accuracy is likely going to be longer than the path from 40 to 60.

The source of that difficulty, as I write in my book Meganets, lies less in AI itself than in the training data—and more specifically in our lack of fine-grained control over this data. If we can’t perfectly control what’s going into ChatGPT and its brethren, we are very unlikely to be able to control what’s coming out of it.

OpenAI and the Loss of Control

Close-reading OpenAI's safety statement

Safety

Privacy

Accuracy

Discussion about this post