Evolution of protein superfamilies and bacterial genome size.
J MOL BIOL
871 - 887.
We present the structural annotation of 56 different bacterial species based on the assignment of genes to 816 evolutionary superfamilies in the CATH domain structure database. These assignments have enabled us to analyse the recurrence of specific superfamilies within and across the genomes. We have selected the superfamilies that have a very broad representation and therefore appear to be universally distributed in a significant number of bacterial lineages. Occurrence profiles of these universally distributed, superfamilies are compared with genome size in order to estimate the correlation between superfamily duplication and the increase in proteome size. This distinguishes between those size-dependent superfamilies where frequency of occurrence is highly correlated with increase in genome size, and size-independent superfamilies where no correlation is observed.Consideration of the size correlation and the ratio between the mean and the standard deviations for all the superfamily profiles allows more detailed subdivisions and classification of superfamilies. For example, within the size-independent superfamilies, we distinguished a group that are distributed evenly amongst all the genomes. Within the size-dependent superfamilies we differentiated two groups: linearly distributed and non-linearly distributed. Functional annotation using the COG database was performed for all superfamilies in each of these groups, and this revealed significant differences amongst the three sets of superfamilies. Evenly distributed, size-independent domains are shown to be involved primarily in protein translation and biosynthesis. For the size-dependent superfamilies, linearly distributed superfamilies are involved mainly in metabolism, and non-linearly distributed superfamily domains are involved principally in gene regulation. (C) 2003 Elsevier Ltd. All rights reserved.
|Title:||Evolution of protein superfamilies and bacterial genome size|
|Keywords:||protein family, three-dimensional structure, genome size, bacteria, domain distribution, STRUCTURAL GENOMICS, N-ACETYLTRANSFERASES, SIGNAL-TRANSDUCTION, GENE-TRANSFER, PSI-BLAST, DATABASE, SEQUENCES, CLASSIFICATION, CATH, SCOP|
|UCL classification:||UCL > School of Life and Medical Sciences > Faculty of Life Sciences > Biosciences (Division of) > Structural and Molecular Biology
UCL > School of BEAMS > Faculty of Engineering Science > Computer Science
Archive Staff Only