Evolution is increasing randomness of nucleotides in
coding sequences
Fumihiko Takeuchi, Kenji Yamamoto
Research Institute, International Medical Center of Japan,
162-8655, Japan
Abstract.
Nucleotides in genomes are ordered fairly but not completely
random.Several statistical regularities, for example, repetition
ofperiodicity three, are known. The nucleotides must not be
completely"random", because they are encoding vast amount of
information,information for development, body function, sex, etc.,
and each codingsequence is encoding proteins with specific
functions. Our purpose here is to compare this randomness of
nucleotides amongspecies from a viewpoint of evolution. The
measurement of randomnesswe take is by the difference of two kinds
of amino acid frequencies.The real frequency of an amino
acid in a coding sequence is itsfrequency in the translated
protein. The theoretical frequencyof an amino acid in a
coding sequence is the expected frequencycalculated from the ratio
of nucleotides.
If the nucleotides were in random order in the coding
sequences, thereal and theoretical frequency should coincide for
each amino acid.The nonrandomness is the discrepancy between
real andtheoretical frequencies. The randomness is the
coincidencebetween real and theoretical frequencies. We have
analyzed the realand theoretical frequencies for 27 species, and
computed an index ofnonrandomness. Eukaryotes have
smaller nonrandomness (i.e., are more random),compared to
Prokaryotes. However, among Prokaryotes,the two
subgroups Archaea and Bacteria seem to
beindistinguishable by this nonrandomness. It can also be seen
that GCcontent is not effecting nonrandomness.