The Danger of AI Comes Down to an Obvious Problem
Like robot-ified human comfort in the Pixar movie, LLM-ified human understanding is not future-proof
Note: “AI” and “LLM” are used interchangeably in this post.
Proposal for a fatal (but ultimately mundane) flaw in LLM accuracy
I tend not to be interested in topics unless I can problematize them. My interest in the current frontier of “AI” — Large Language Model text output platforms — was sparked by a gas pump infotainment video.
I believe the critical problem I perceived when watching the video must already be understood by AI experts and insiders, and ChatGPT itself claimed likewise (falsely or truthfully) in a test-conversation. However, I was not able to locate discussion of the problem in a cursory trawl of recent articles regarding LLM accuracy obstacles. Rather than allege that the LLM industry is trying to swindle humanity at large, I would guess that this oversight is likely due to more immediate-seeming accuracy issues dominating the industry spotlight.
Edit, June 20: Commenter “Orlando” highlighted a recent VentureBeat article discussing the same problem, as was modeled and described by AI researchers at the end of last month. These researchers applied the term “Model Collapse” to describe what is discussed in this post, and predict the same obstacles to ameliorating the problem in real-world use. At the bottom of this post is linked my follow-up further reviewing said paper.
i. The problem: No plan for training LLMs on tomorrow’s knowledge, yet we are replacing today’s human-based knowledge resources with LLMs
You may have seen the same gas pump infotainment video yourself by now. The premise (or something like it): “We asked an AI chatbot to recommend where to see fireworks this year.” It recommended two obvious answers and an off-the-beaten-path answer, reflecting first that countless rounds of human feedback had trained the neural network that two obvious answers and an off-the-beaten path answer are a desirable output for a travel recommendation, and second that its training material contained accurate knowledge about fireworks shows.
The key problem that this example reveals is the phrase this year.
What happens when the same question is asked again next year? Or 10 years from now?
While the top two popular answers might not change for decades, a satisfactory off-the-beaten-path travel recommendation today might become stale tomorrow — relegating a currently true answer to future false status. For this reason, an AI-generated answer next year or 10 years from now, in order to retain user value, will need to be trained on future knowledge.
But all future knowledge will soon be “polluted” with AI-generated text-outputs. Soon, in fact, the same way that the internet severely atrophied local news, LLMs will replace and thereby disengage organic, human knowledge-generation and distribution in broader (i.e. national and global) public and private spheres. To keep the example centered in the US: However pointless-seeming much of American public and private office-work is, the switch to using LLMs to do the (pointless-seeming) work of “generating emails and presentations” means that specific, individual humans in America will no longer be doing that work.
AI will both ameliorate currently-atrophied, highly local human information dispersal-systems (such as by making local news more accessible) and, simultaneously, expand the atrophy of human information dispersal into broader markets and systems where, up to now, it has still been commercially viable to pay humans to know things and put them into emails.
Just who will be writing the future travel guides that are used to generate a prompted suggestion of an off-the-beaten-path fireworks show? There will be no “who,” if all the humans currently employed in writing travel-guides are replaced by AI. Let us suppose, more realistically, that the future is a blend of replacement and supplement, with some travel writing merely AI-assisted. Consulting an LLM to help write up an in-person visit to a new attraction may, if anything, improve accuracy, by prompting a clearer and fuller report of the experience.
However, economic incentives will clearly promote the simultaneous multiplication of purely AI-generated content about the same new attractions, magnifying the task of distinguishing future “hallucinations” from authentic human input in updated learning sets going forward.
Economic incentives will clearly promote the multiplication of purely AI-generated content over AI-assisted or wholly human-generated content, magnifying the task of distinguishing future “hallucinations” from authentic human input in updated learning sets going forward.
This problem will manifest in different ways in different systems. Travel guides might become inaccurate and useless for reflecting current prevailing experiences; customer service might largely degenerate to AI trying to explain the justification of AI-developed policies and products that bear little human value and are evaluated on AI-generated surveys; while more guild-like institutions of imperfect knowledge and rampant measurement (medicine, sports) might remain only as flawed as they are, or greatly improve due to the expanded realization of unperceived complex truths.
AI will expand the atrophy of human knowledge-distribution as far as it can be tolerated, and in so doing degrade the ability to verify that future training material meets any accuracy standards. No one will be able to check “written knowledge” for accuracy in fields where human knowledge-distribution is atrophied; in these realms, AI-generated text will simply have to be accepted as the only authority that exists.
The difference between current accuracy problems and Iterative LLM-inaccuracy
Currently-prioritized accuracy problems center on the tendency of the language-generator to produce right-sounding text that factually contradicts the training text. The language generator’s training text can contain Roger Federer and Roger Federer’s date of birth, but these conditions alone aren’t sufficient to ensure that it does not produce a sentence saying Roger Federer was born in 1563. This problem may be fatal for LLMs in high-stakes information systems, or at least prevent displacement of human overseers who maintain active knowledge bases; but the bigger danger is that this problem is fixed, lulling human users into a sense of reliability and functionality that is propped up not only by jury-rigged anti-hallucination heuristics but by the absence of previously AI-generated text in the training text, an absence that will become impossible as soon as LLMs are widely used in any given system.
Current accuracy hurdle:
Force text generator to report learning material accurately.
Future accuracy hurdle:
Learning material increasingly text-generated over time, with no way to measure accuracy.
ii. Example: The current conditions of the learning text, per ChatGPT
As said above, I believe this problem is so obvious that it must already be understood within the industry, and must be discussed somewhere, even if not found in recent media regarding accuracy problems. (Again, see the bottom of this post for my follow-up reviewing the recent paper by AI researchers that discusses this problem.)
Since I do not have industry knowledge-havers at my disposal, I took to ChatGPT to describe how the problem is currently understood. The response reveals little about ChatGPT’s current training sources other than that the cutoff is September, 2021:
What types of text sources does ChatGPT use to build knowledge?
ChatGPT, based on the GPT-3.5 architecture, does not have direct access to its training data or know where it came from. However, I can tell you that the model has been trained on a diverse range of internet text sources, including books, articles, websites, and other publicly available written material. These sources cover a wide array of topics and domains, allowing the model to acquire knowledge on various subjects. It's important to note that while ChatGPT strives to provide accurate and up-to-date information, it's possible that some of the details or specific facts may be outdated or incomplete, as my training only goes up until September 2021.
When asked about excluding future AI-generated text from updated training material, ChatGPT provided a response that sounded hallucinatory — it could tell I was proposing a problem with AI-generated text feeding the training material, and constructed plausible countermeasures.
These plausible countermeasures were described as being “typically in place,” even though the problem of widespread LLM-generated content does not exist in the training materials currently being used; but this error could have resulted from constructing a response from discussions of the future problem (if they exist in the training material).
Since ChatGPT is proscribed from citing its own training material, it was necessarily evasive when asked to prove that its answer is true (i.e., that the training text actually contained anything about “filtering out AI-generated text”).
Suppose a decision is later made to include content from a date later than 2022. Do you know if there is a specific requirement or plan for preventing trained-content, i.e. AI-generated text, from entering your training data? I know you do not know the sources, but you may have knowledge of the general strategy for sourcing.
As an AI language model, I don't have access to information about specific decisions or plans made by OpenAI regarding the inclusion of content from a date later than my knowledge cutoff in September 2021. However, I can provide some general information about data handling practices.
To mitigate the risk of AI-generated text contaminating training data, there are typically measures in place to prevent such content from being included. These measures may involve careful data selection and preprocessing steps, which can include filtering out AI-generated text or taking steps to identify and remove it. Additionally, human reviewers and content curators play an important role in the training process, helping to ensure the quality and relevance of the training data.
[…]
Are you able to point me to any sources discussing the hypothetical topic you mentioned, "These measures may involve careful data selection and preprocessing steps, which can include filtering out AI-generated text or taking steps to identify and remove it."
[Short answer: No. Go look it up yourself.]
Implications: Iterative LLM-inaccuracy is inescapable
It goes almost without saying that “filtering out” AI-generated text from future training materials will be necessarily impossible on any source-based or stylistic basis.
Regarding sources, no verifiably LLM-free knowledge generation systems any longer exist, as individual humans within current, organic systems can already enlist ChatGPT to produce written work, etc.
Regarding stylistic detection, LLM neural networks are trained by receiving positive feedback from humans for sounding like humans, not for sounding like AIs. A neural network trained to look for AI-generated text would by definition have to receive positive feedback for identifying the same thing, i.e. text that sounds like a human wrote it. This paradox is as inescapable as it is banal. Resort to deliberate inefficiencies in human communication (e.g. code) will fail for the same reason people litter at a beach (i.e., because it is efficient); and wouldn’t fix the problem with LLM accuracy anyway, since deliberately-inefficient language wouldn’t be introducible to the training material without training the LLM how to mimic the code. No matter what is attempted, the LLM, free of human flaws and insecurities, but trained simply on positive feedback by human readers, will always sound “more human than human.”
Filtering would therefor have to be accuracy-based. This, however, raises at least two problems.
First, such an approach can effectively “seal” written knowledge up to the present to preserve it from iterative LLM-inaccuracy (by discarding any future text that contradicts current training material), but there will be no proven-LLM-free reference to verify text descriptions of future facts as being LLM-free.
Second, “sealing” knowledge of the past is undesirable, as (human-generated) text descriptions of facts up to the present are full of inaccuracies; it is always the work of future knowledge-generators to correct the past.
Downvoting to the rescue?
Is it not trivial for future users simply to downvote their way out of Iterative LLM-inaccuracies? If readers given a travel recommendation are displeased with the result, they can simply submit a negative review.
However, this would not shift incentives away from relying on LLM-generated inaccurate information. The user of the information is the content-generator seeking readers who are seeking travel recommendations, and accuracy is not necessarily an asset. This disconnect between media-consumer incentive gradients and the idealistic value of “truth” is already a rampant problem not just with bot-generated web content, but increasingly with organic, human-generated content, where posing in front of a camera and giving naïve, useless advice has recently proven to be a whole industry. Again, AI would only be expanding the atrophy of human knowledge-distribution wrought by the internet, here into formal publications.
So while “downvoting” seems like the obvious program to implement to solve this problem, it can be predicted to have little corrective force, as surely as it doesn’t stop nonsense advice from dominating TikTok feeds.
The largest and most successful crowdsourced knowledge-resource, Wikipedia, depends heavily on precisely the same human, organic knowledge-distribution systems (schools, books, industry-experience) that AI is currently spring-loaded to invade; and this same site must in practice constantly be insulated, not over-exposed, from popular input, as when specific pages are locked whenever their topics achieve widespread attention.
Other successful applications of downvoting likewise depend on selecting niche users, who are highly interested — and therefor knowledgable — about specific topics. Anywhere LLM can find an application, therefor, it will atrophy the same human knowledge-bases that would create a basis for crowdsourcing future corrections to inaccuracies.
iii. What’s the difference between human iterative inaccuracy and Iterative LLM-inaccuracy?
As acknowledged above, even current, LLM-free human knowledge is already rife with inaccurate beliefs. Supposing updated LLM training material to contain future LLM-generated inaccuracies, leading to a constant game of “telephone” within the LLM knowledge-base in any system that degenerates overall accuracy from the baseline achieved by the first generation, how is this any different from human knowledge-systems? Don’t we already live in a post-truth world; have we ever really not done so?
The first answer to this question is, why should we want to find out? The current status quo seems to work well enough to keep the lights on, even if much of the time this seems like an impossible feat. The price of understanding specifically why Iterative LLM-inaccuracy is more detrimental to economic and social function than human iterative inaccuracy would be, obviously, detriment to economic and social function.
A second answer is that trading one condition of epistemic helplessness for another is inherently unpalatable. Maybe knowledge is largely impossible; but in practice we feel that we know and understand most of the things we work on from day-to-day. Even if that feeling could persist in a future where we have no idea whether the surrogate knowledge-base in the latest LLM training docket is less accurate than a human-generated feed would be, it is uncomfortable to realize that we really, probably will not know if we are worse off.
It might be preferable that the detriment is obvious, even if that won’t do much to fix the incentives that make rebuilding human knowledge bases in invaded systems impossible.
However, the third and I think most likely answer is that AI knowledge-systems must iteratively degrade into inaccurate, untethered nonsense in the full course of time, and that rebuilding the human knowledge-systems we discard today will be economically and politically impossible as long as the internet still exists.
LLM-fed LLM training data will be irrevocably “mutagenic,” with iterative inaccuracies regarding future events growing more quickly than they can be replaced, and with decreased discovery of inaccuracies in the human knowledge base that preceded the advent of LLM.
In nature, the force that “proves” the fidelity of copied genetic information is survival — no gene can reproduce that does not function in reality. In human information systems, there is no such force, but information is “proved” by experience with the outside world, leading — when an individual holding a given collective belief is “mugged by reality” — to perpetual corrections. When these systems are displaced by LLM-generated knowledge bases, with not reality but consumer demand as the only force available to rate the output, the mean of human knowledge will eventually consist of the equivalent of a Tik-Tok for sniffing chocolate to lose weight.
Appendix: Reviewing Shumailov, et al. on “Model Collapse,” a.k.a. Iterative LLM-inaccuracy
If you derived value from this post, please drop a few coins in your fact-barista’s tip jar.
I think this is starting to quietly hit the mainstream. Researchers found that AI model collapse happens incredibly quickly once they start training on AI generated content.
https://venturebeat.com/ai/the-ai-feedback-loop-researchers-warn-of-model-collapse-as-ai-trains-on-ai-generated-content/
Maybe the most lucrative use of LLM would be cranking out trash novels.