Evolution is increasing randomness of nucleotides in coding sequences

Fumihiko Takeuchi, Kenji Yamamoto
Research Institute, International Medical Center of Japan, 162-8655, Japan

Abstract.

Nucleotides in genomes are ordered fairly but not completely random.Several statistical regularities, for example, repetition ofperiodicity three, are known. The nucleotides must not be completely"random", because they are encoding vast amount of information,information for development, body function, sex, etc., and each codingsequence is encoding proteins with specific functions.
Our purpose here is to compare this randomness of nucleotides amongspecies from a viewpoint of evolution. The measurement of randomnesswe take is by the difference of two kinds of amino acid frequencies.The real frequency of an amino acid in a coding sequence is itsfrequency in the translated protein. The theoretical frequencyof an amino acid in a coding sequence is the expected frequencycalculated from the ratio of nucleotides.
If the nucleotides were in random order in the coding sequences, thereal and theoretical frequency should coincide for each amino acid.The nonrandomness is the discrepancy between real andtheoretical frequencies. The randomness is the coincidencebetween real and theoretical frequencies. We have analyzed the realand theoretical frequencies for 27 species, and computed an index ofnonrandomness.
Eukaryotes have smaller nonrandomness (i.e., are more random),compared to Prokaryotes. However, among Prokaryotes,the two subgroups Archaea and Bacteria seem to beindistinguishable by this nonrandomness. It can also be seen that GCcontent is not effecting nonrandomness.