So, it turns out that the internet is still functional at the neighbor’s. But as he overshares, his network is semi-unusable. Full on tragedy of the commons mode. Forgive my continued lower output. The repair visit is scheduled for Wednesday (it’s not clear why a visit is required). For concerned readers, I have fixed the illustrations in Sunday’s post to feature the correct years and number of dots for the historical control. Other than that, I am still very much in the dark on the latest controversies. Did the Olympics Virus get released? Are we all dead now?
Anyway, if anyone is still out there, have a visit to my mental workshop…
Who’s Lysine is it Anyway?
I am continuing to find my accidental cloistering productive. I haven’t met any short elderly green men but I am still honing my relationship with this war’s analog to “the force,” which is the language of genetics. I have been quite delinquent in my study of the subject until now. Before Omicron, it seemed “safe” to subtract the virus’s genetic origin story entirely, and focus on the mRNA transfections. The light saber fights of genetic forensics RE the Lab Leak™ and the Furin Cleavage Site™ seemed like a distraction, or something designed to overwhelm the reader and render them powerless. “I guess the TV and government are lying to me, but it’s not like I can figure out what’s real on my own - I’ll just take the shot” being the thought silently expressed in the deep dark recesses of the Professional Managerial Class-member’s mind.
With Omicron, there suddenly was no clear opposition to the mainstream narrative - and even the mainstream narrative was suddenly scrambling for words. All at once, things were interesting, and I regretted my procrastination in researching genetics.
A caveat: Most genes are not scripts, however much they are made out to be in public school education. So the puzzle of the language of coding genes - the triplets that correspond to the amino acids that are used to assemble proteins - is only one aspect of the machinery of genes. The following only applies to that aspect.
It’s pretty obvious that most of the in-the-know people talking about the genetic forensics of the virus are trees-before-forest types. Calculations are forwarded for the likelihood that a sequence of codons of a given length could arrive randomly. What a bunch of nonsense. Coding genes are speaking a language that corresponds to physical objects that serve functions. So, right off the bat, we can assume reasons why patterns and resemblances would emerge non-randomly:
Functional logic, i.e. ergonomics: Why do so many of the structures that humans create resemble the shapes we can make with the underside of our hands? Because that makes it easier to handle them. It makes sense that biology would be full of such structures - wings resemble fins, tails resemble tentacles. None of this implies that cats or jellyfish were made in a lab. If proteins are themselves both a language and a structure, it makes sense that motifs would arise spontaneously without implying anything about phenotypic lineage.
Linguistic constraints: Both proteins and the mRNA ribbons that code for them are physical structures, rather than pure abstractions. Think a roll of paper going through a player piano. There might be limitations to how closely notes can be spaces, if all notes are X, for example. For coding genes, the ribbon of mRNA is the music, the ribosome and the tRNA molecules that zipper along interpreting three-letter-words (codons) are collectively the piano. Or think of verbal language. There is a limit to how much a sentence can crowd together similar sounds; these limits are collected as tongue-twisters. In fact, the capacity for consonate codons and other patterns to prompt ribosomes to stop or slip is well understood and thoroughly-theorized over. It is only in the language of the in-the-know-people talking about the genetic forensics of SARS-CoV-2, that it suddenly no longer makes sense that coding mRNA mutations gravitate toward certain patterns to maintain or optimize harmony with the “piano.”
So, when for example Ethical Skeptic resorts to graphs and mathematical computations to deliver his proof that Omicron is a descendent of a pre-Wuhan strain of the virus, it sets off my alarms. When Thomas Jefferson writes a 57 words, you don’t scramble to compute the likelihood of those words arising by chance and reverse-calculate how many years it would take for them to evolve naturally. You just figure he’s grandpa-clouding at Phillis Wheatley again.
Coding genes are speaking a language. It should be possible to discuss the logic and coherence of changes to the language in a logical and coherent way; without constant resort to the mathematics of random chance. Patterns will emerge. Moreover, if an optimal alternate version of the text lies along a gradient of likely errors, it will arise more quickly. This is all simple and obvious and, again, well-understood.
My voyage to gain fluency in the language thus consists of retreading well-word ground. Rediscovering the obvious. The language of coding genes contains a built-in-puzzle, and I have begun to develop some answers; these answers are likely several decades old. Though that doesn’t mean they haven’t gone on to be forgotten and ignored, as so many other biological discoveries concerning genes have done.
I recommend the puzzle to anyone. My partial solution, below, will be preceded by a spoiler alert. First, here is the puzzle.
There are four letters in mRNA: A, C, G, U
Every “word” (codon) in coding mRNA consists of three letters.
This means there are 64 possible codons.
However, there are only 21 actual words. Most letter-combinations are redundant.
Typically, if a letter of a codon is redundant, it is the third one. The theory goes that the “wobbly” tRNA molecules that transport the amino acids corresponding to words have a lower resolution for the third letter.
Typically, if the third codon is relevant, it will still only mean two things (whereas when the third codon is redundant, all four letters mean the same thing). So if the start of a codon is CC, the tRNA and ribosome can essentially “check out” before seeing what follows. If the start of a codon is AA, the third letter must be read to distinguish between the word. The words are the 20 amino acids which are themselves represented by letters, or the stop instruction. But, again, typically, in the situation of the third letter mattering, it only matters whether it is A/G or C/U.
A and G are purines. C and U are pyrimidines. Molecularly, A and G look like “oo” and C and U look like “o.” So, you can kind of get that there is a hidden binary code within the four-digit A / C / G / U code, and that our alphabetical ordering distorts that. Additionally, genetic mutations are sometimes more likely to substitute a given “oo” for it’s twin “oo” and a “o” for its twin “o” (this is called transition) than to swap a “o” for a “oo” or vice-versa, even though any given letter only has one twin and two non-twins (this is called transversion). Whether this likelihood is an artifact of coding RNA not storing as much meaning into transitions (changing between AAA and AAG still means the same thing), or whether that stability is a safety-feature built around the likelihood, has probably been well-theorized and researched.
Sometimes the “oo” / “o” binary for the third codon is flaunted. Namely if the first two letters are AU or UG, the meanings for an “oo” in the third codon are no longer self- synonymous. And both AUoo and UGoo are involved in starting or stopping, interestingly - though stopping is also expressed by UAoo.
This mess captures how strange the puzzle appears when viewed through normal alphabetical and sequential logic:
So, there is the puzzle. Why is the third letter usually redundant, but not always? Why are some words (R, S, L, and stop) expressed by multiple two-letter precedents, which only adds to how many shared first-two-codon words are required (if UUoo did not also mean L, then UU could code for F exclusively)? What effects does this insanely ancient design have on mutation dynamics - which words can turn into others, and how likely are such changes to be stable or disruptive?
The mutational dynamics side is quite elaborate, but also intriguingly stable. Most words can only turn into a few others, and many changes are superfluous. Only one of the amino acids above - Glycine - represents a “naked” lego block. The implicit signal to noise ratio of a protein is incredibly low. We can infer this, for example, by the hundreds of mutations dispersed across the multiple variants of SARS-CoV-2: You can change a lot without changing anything.
But merely by writing the code down and playing with the elements of the puzzle described above, you can infer quite a bit about the design of the piano. Again, this has all likely been figured out decades ago; but the joy of the puzzle is working it out for yourself.
And that is where the spoiler alert comes in. Do not look at the diagram below if you want to figure out the puzzle for yourself. Even if this is only one tiny sliver of the solutions the puzzle has to offer, it is a solution more satisfyingly discovered than described. So, just stop reading this post here.
..
..
..
..
..
..
..
..
A purine in the second codon favors more careful reading of the third. This is plausibly because it more strongly attracts the tRNA molecule (“oo” in the second codon is akin to a downbeat, and only becomes more attractive if preceded by a “o”), slowing down translation. Hence why the stops both have purines in the second codon, preceded by pyrimidines. “-o-” second codons are fast, “-oo-” second codons are slow, “o-oo-” first-second codons are the slowest.
A more rational diagram of the genetic code uses the second codon as the key element of design. Further, ordering by o/oo sequence allows for the rational presentation of most likely mutations for any given letter:
In this (edited and updated on February 17) diagram, changes within any row are more likely (they are transitions) (see further discussion in my follow-up post). You can cross-reference this with the amino acid change list for Omicron, for example, to see for yourself whether most of the alterations (Ethical Skeptic would say negative changes) are a big deal.
And now, back to the swamp-moon training session. Wipe cut please.
Elegant. You've used your offline time well. Kits are available for doing gene editing at home. That's on my list. Maybe I can make myself smarter.
In the cmRNA vaccines, Uracil is replaced 100% by Pseudouridine, an alien code, which facilitates the emergence of misfolded proteins (heat shock proteins, isomeric antibodies, dextrorotatory prions/beta sheet prions). cmRNA = chemically modified RNA or modRNA. Misfolded proteins/prions = Wilhelm Reich's T-bacilli.