Edit: I have fixed the wantonly inaccurate use of the word “codon” as originally posted. Forgive the error; besides my normal looseness with terminology, I am not at my best when working outside in the cold!
Wednesday has come and gone. And, still no internet. At least the rain has stopped, so I can return to using the neighbor’s internet by sitting outside at night. This should allow for the production of a topical post tomorrow. Something to show for the month so far besides OAS is not real rants and the dispatches from exile of the last week.
Meanwhile… one more dispatch from exile-slash-crackpot-corner.
The following will not make sense without reading Monday’s post first.
I won’t go too into detail over why I alternated the order for 1st-codon-letter mapping in my debut of the CUGA chart. There were various intuitions that led me to think “o” should be above “oo” when before “o;” and I knew for sure that “o” should be below “oo” when before “oo.” A critic could accuse me of intentionally trying to obscure some otherwise obvious first-letter patterns; namely that A-x-x and U-x-x are high resolution (correspond to third letter reads) just as often as oo-oo and o-oo. This becomes super-duper clear when a consistent first letter mapping scheme is used. So, here is upside-down fractal CUGA (edit, May 13 - A and G have now been swapped; though further versions in this post retain the original for now):
Here, the first letter mapping is reverse of the second letter mapping, and yet as it is consistent within itself, it allows for much easier comparison between our four camps. But it also allows for much easier mutation reference, so let’s examine that first.
For any given word, third letter mutations reside within the same spot (and are always transversions, i.e. alternations between an “o” or “oo” or vice-versa, except for “oo”-space I/M and stop/W mutations), first letter mutations reside within the same quadrant, and second letter mutations are “same position in other quadrants.” In the case of either first or second letter mutations, the mutation in the same row represents the transition and the two mutations in the non-same row represent the transversion. This allows the allows the user to visualize the “available moves” in the game of 1-letter-at-a-time mutations. If the current word spells “E,” then possible mutations are:
So, CUGA turns genetics into a board game. This should make the various novelties of SARS-CoV-2 variants seem less exciting. E turned into K. Well, that was one of the seven “simple” moves (not requiring a well-placed three-letter deletion, which can turn two adjacent words into an unrelated new, single word) available to it; so why not? E can change into seven other words, one of which is the likely fatal “stop” command.
Enough on mutations for now. Back to comparing our camps. Clearly, second-letter U and G are more similar than presented on Monday. In fact, to say that (second letter) o and oo are “different” is to elide the fact that for two of the four letters (U and G), o and oo are the same. So why base a theory of everything on (second letter) o/oo rather than (first letter) A/U vs C/G? Well, reasons.
First of all, the theory of everything must account for compromises. If -oo- does indeed increase read accuracy for the third codon, there will still be other “considerations” that will create exceptions to the order; particularly if we are imagining that in some primordial era, there were only two, four or sixteen “words,” and the ability to store multiple words in the same one-two codon set was a gradual, ad-hoc development. We could expect accidental patterns along side of meaningful patterns. Either theory could assert that the other is an accident of such compromises; or both could be accidents; or the pattern could be multidimensional and allow for both theories. I have the benefit of not knowing all the excellent organic-chemical arguments that likely exist for the latter.
At all events, there were as I said strong intuitions at work, and patterns which had already started to make themselves clear but which I did not describe or illustrate at the time. So, drum-roll, here is a first draft of upside-down fractal CUGA with annotations. It is still a draft because I didn’t open enough tabs the other day to allow for extensive cross-reference of the amino acid properties listed by wikipedia nor to incorporate “fold clades,” i.e. which amino acids are typically found together in common protein structures. So, I wasn’t really able to illustrate anything meaningful about proline, for example; even though it is a very interesting creature. But merely by adding either molecular weight or usage-frequency, it should become really clear why the first letter mapping is reverse of the second letter.
Here, with molecular mass:
Light amino acids at the top; heavy amino acids at the bottom; again along-side the potentially hazardous stop codons. Again U is being a bit transgressive, but now at least G is being a bit more like A than C, and the row signal is stronger. The “o-oo-” row can be thought of as the “rare earth” amino acids. Just how rare? Let’s change mass for usage percentages. Asterisks indicate amino acids that are spelled by multiple prefixes (again, apologies for extremely limited cross-referencing of the wikipedia values):
Bearing in mind the mutation scheme, we can see right off the bat that nothing is more radical than a second-letter transversion. To change from a U (o) to an A or G (oo) will often represent reversing every physical property at the same time (if not landing on a stop codon), whereas changing the first letter keeps the amino acid in “hydrophobia corner;” and a transition on the second letter (U to C) represents a loss of hydrophobia but smaller alteration otherwise.
I have a million thoughts on the significance of these mutational contours. But the reader can probably think ahead of me to a certain extent, so I won’t drone on. The basic gist is that for any given indirect word change (requiring multiple consecutive mutations, as opposed to being possible with one mutation), the “game” must allow a contextually safe set of intermediary changes. Most indirect words are available in two changes, but some might require a change to all three letters.
Here then, we can imagine a P playing the genetic board game, “needing” to get to Y, which is not available in a single mutation.
P is visualized as being in a state of both “-o” and “-oo” in the third letter, since it can mutate between either without a change in meaning (though potentially a change in how the mRNA molecule “performs” while being translated into a protein). With this simplification, we can visualize that P needs to perform a transition to the right corner (C to U) and a transversion to the bottom quad (C to A), and the intermediary word is defined by which order it “chooses” for that attack:
P can either get to Y via these two moves, or take some convoluted long route. The point is that CUGA lets us visualize that S might be tolerated within the current structure of the protein, but H might not be (you don’t want to suddenly start trapping metal in random places!). Perhaps S might even be a medium-optimal change, in which case the arrival at Y is only a matter of time.
As a final tangent, CUGA also allows for visualization of how the two multiple-first-letter, same second-letter words (R and L) can be thought of as existing in a cloud of potential mutations. L and R can both safely drift between their different forms without changing meaning, allowing for a constant rotation of potential changes. Overlapping potential changes thus calibrate likely potential changes. For L, V is always possible, the other mutations only sometimes. Is this some form of genetic mutation calibration? Again, the potential dynamics at play are all part of the mystery of the puzzle:
In biology, words will often find themselves contextually near each other and the same potential optimal indirect co-mutations will arise again and again. The fluidity of the language - that some changes might be sub-optimal, but still conservative (in that the protein won’t literally break) will lead to the same most-likely-paths between word set A and contextually optimal word set B.
So, if for example and R and E are adjacent, and there is some contextually advantage to changing both to W (this is likely a stupid example), forming a tryptophan tryptophylquinone bond, and birthing a totally novel (to a given life-form) structure, two separate life-forms will likely arrive at the WW set using the same set of conservative intermediary steps. (A-)R will safely 1v to (C-)R, then E will safely 2s to G, not disrupting anything because this E is part of a surface area with multiple negative charges and frequent mutational calibration, and then the (C-)R to W, and then the G to W. Nature will solve the same puzzle the same way, much as chess games will feature recurring patterns of coordinated movement.
But this is also a bit unsatisfactory. If arriving at indirect novel optimals requires transit through tolerated sub-optimals, there will be more reversions back to E than conversions from G(-oo) to W (especially since 50% of G(-oo) U 1vs will result in a stop codon, and likely death for the cell / offspring.
Ah, if only this “nature” thing could create some sort of being whose job, within any given genome, was to constantly innovate receptor proteins. Like, this being’s survival would depend on constantly altering the amino acids that are used to accomplish the same job of binding to a given receptor, so that regression from G back to E would result in extinction. And if, I don’t know, just spit-balling, those proteins could occasionally re-incorporate into the host’s genome, potentially delivering a novel optimal to the host.
Ah, who am I kidding? That’ll never happen…
Hey Brian. Please try to put a summary in your articles explaining the main idea of what you are claiming. i.e. an informative title was 'Liquid Cancer'' while this is not. Not everyone has the time to read a technical article where the argument is not clear from the title or the summary. You should also start to rethink who you are writing for. If it is scientists they also need a summary, if it is people with not full knowledge of the domain you are writing for then you need a summary and explanations. This is dense and I had to read it sideways a couple of times before I decide whether I need to read it. Thanks!
I should be so productive when my internet is down. 😄