2023-02-09 Pillar II · Architecture over tooling 5 min read Industrial & Engineering

The excitement, the fallacy and the big steal: thoughts on ChatGPT

ChatGPT is a fantastic time saver, but anyone expecting genuine novelty from it is making a category error. A note on creation, originality and what these engines cannot do.

Thoughts on creation and originality in an automating world.

ChatGPT has taken center stage with much excitement. Children and adults alike have flocked to the command prompt and tried all kinds of queries. Imagination running wild, the possibilities seemingly endless, the horizon of what could be achieved with this boundless.

Watching the playfulness of my kids and their friends applying ChatGPT to solve questions of knowing — and then watching Yusuf Mehdi present the new features that Bing unlocks in its latest AI-powered iteration — I’m struck by one big pink elephant in the room.

The example of comparing the Q3 statements of GAP Inc. and Lululemon, after first summarizing the GAP Inc. quarterly statement, is a showcase of how this new machine can be a fantastic time saver. At the same time, it is a prime example of the major limitation of ChatGPT.

After all the elation, I want to focus on some unmentioned and underexposed problems that rarely make it into the excited news articles, the lazy coder posts, or the SEO marketer pieces. Yes, your SEO can now be handled easily. Coding moves from craft to knowing which question to ask. I get it.

The fallacy

In an abundance of training data, errors in composing the correct answer to a given question are a matter of quantity-derived accuracy. With the web containing petabytes of data, abundance is all around us. However, mistakes — or “hallucinations” — are a real occurrence. Work is in progress to build “hallucination” detection to catch those instances. Until then, you had better check the texts yourself.

The biggest fallacy, however, is believing you can find new insight here. My conjecture: the current iteration and paradigm of machine learning and AI are intrinsically bound to what is already known, and as such offer no entropy, no information, no surprise. Users of ChatGPT output (and that of similar engines) are utterly mistaken when they interpret something that reads new as being new.

The big steal

The above conjecture is problematic in three profound ways.

We go to school, we learn, we read what is known, and yet the capacity to generate new problem statements — ones that in turn produce new, previously unknown solutions, designs or texts — stems from our human faculty to be amazed, scared, inspired, to love — and out of that to create something new. Unsolicited. Not in a million years will ChatGPT ever create. The command prompt will wait diligently forever. The ChatGPT engine will never look with amazement upon its own writings and be inspired to start anew and create something out of its own vocation. It cannot be inspired. It will not be inspired by its own hallucinations. (Unfortunately? Fortunately?)
Allowing ChatGPT and other data scavengers to disassociate the output — the “composition” — from its origin is a clear case of plagiarism. Any output of these models is created from pre-existing, published knowledge. Good practice in every country that has and honors copyright law is that mass plagiarism is not allowed. Having Bing summarize a PDF for private use is not a problem. Having ChatGPT construct a recipe or itinerary is plain plagiarism, because it originates from a source or reference that goes unreferenced. Where the old Bing and Google search results used to point to the origin and maintain the link, that link is now hidden.
The most profound problem, however, is the risk that output generated by these engines gets reintegrated into the dataset — either directly, or after being reconfigured by some human hand. Where Facebook timelines were first recognized as presenting self-referential ads that strengthened the pull on the user into a stale, static rabbit hole of limited variety — and TikTok takes this to the extreme — the same dynamic applies here. If the work of SEO specialists, information analysts and business analysts gets augmented and sped up by machine labor, this leads to more content produced by a very small number of re-hashing algorithms, potentially turning a diverse environment into a static one. The big steal is the theft of originality and surprise.

The light

That all sounds very dark, and some of it is genuinely disconcerting. Luckily I’m also an optimist. Throughout history, humans have shown that the propensity to scale brings forth the next step in mechanization, industrialization. Maximizing output and optimizing production has, time and again, decimated the art of creating at each increment. The light I have to offer: each time, the creators who understood how to turn these new capabilities to their advantage built a new normal. There are great opportunities out there, and I have started to integrate these new engines into my own projects. Part of the human condition is that it is hard not to create.

Lees in het Nederlands →