{"id":579,"date":"2023-03-13T12:56:20","date_gmt":"2023-03-13T05:56:20","guid":{"rendered":"https:\/\/conf.icgbio.ru\/bgrs98\/?page_id=579"},"modified":"2023-09-04T15:52:33","modified_gmt":"2023-09-04T08:52:33","slug":"039_computer-analysis-of-transcription-regulatory-patterns-in-completely-sequenced-bacterial-genomes","status":"publish","type":"page","link":"https:\/\/conf.icgbio.ru\/bgrs98\/abstracts\/abstract-list\/039_computer-analysis-of-transcription-regulatory-patterns-in-completely-sequenced-bacterial-genomes\/","title":{"rendered":"COMPUTER ANALYSIS OF TRANSCRIPTION REGULATORY PATTERNS IN COMPLETELY SEQUENCED BACTERIAL GENOMES"},"content":{"rendered":"<p><a href=\"https:\/\/conf.icgbio.ru\/bgrs98\/abstracts\/authors-index\/#gelfand\">GELFAND M.S.<\/a><sup>1*<\/sup>,\u00a0<a href=\"https:\/\/conf.icgbio.ru\/bgrs98\/abstracts\/authors-index\/#mironov\">MIRONOV A.A.<\/a><sup>2<\/sup><\/p>\n<p><sup>1<\/sup>Institute of Protein Research, Russian Acad. Sci., Pushchino, 142292, Russia;\u00a0misha@imb.imb.ac.ru<\/p>\n<p><sup>2<\/sup>State Center of Biotechnology NIIGenetika, Moscow, 113545, Russia;\u00a0mir@vnigen.msk.su<\/p>\n<p><sup>*<\/sup>Corresponding author<\/p>\n<p><a href=\"https:\/\/conf.icgbio.ru\/bgrs98\/abstracts\/keywords-index\/\">Keywords<\/a>: transcription regulatory patterns, bacterial genomes, site recognition, escherichia coli, haemophilus influenzae, purine and arginine regulons<\/p>\n<p>&nbsp;<\/p>\n<p>Recognition of transcription regulation sites is one of the most difficult problems of computational molecular biology. In most cases small sample size and low degree of sequence conservation do not allow for construction of reliable recognition rules. We suggest a new approach to this problem based on simultaneous analysis of several related genomes. At that, we assume that groups of genes subject to some specific regulation (&#8220;regulons&#8221;) are evolutionary stable. Thus, in each genome we select genes that have candidate sites in regulatory regions. Then all comparisons between the selected genes are performed and groups of homologous genes are determined. In order to distinguish between paralogs and orthologs, the selected genes from each set are compared with the the total gene complements of the other genomes.<\/p>\n<p>We applied this technique to analysis of purine (PurR), arginine (ArgR) and aromatic amino acid (TrpR and TyrR) regulons of\u00a0<i>Escherichia coli<\/i>\u00a0and\u00a0<i>Haemophilus influenzae<\/i>. Candidate binding sites in regulatory sites of\u00a0<i>H.influenzae<\/i>\u00a0were found, a new family of purine transport proteins subject to PurR regulation was described, ArgR regulation of arginine transport was demonstrated, and differences in regulation of some\u00a0<i>E.coli<\/i>\u00a0and\u00a0<i>H.influenzae<\/i>\u00a0genes were discovered.<\/p>\n<p><b>1. Data and Algorithms<\/b><\/p>\n<p>Three regulons were analyzed (the purine\/PurR and arginine\/ArgR regulons were considered separately, the TrpR and TyrR regulons were combined, as some genes from the aromatic amino acid regulon are subject to regulation by both of these factors). Genes belonging to the\u00a0<i>E.coli<\/i>\u00a0regulons were collected from the literature [1] and their orthologs in\u00a0<i>H.influenzae<\/i>\u00a0were identified. Known\u00a0<i>E.coli<\/i>\u00a0transcription factor binding sites were collected and positional nucleotide weight matrices (profiles) were derived. The positional nucleotide weights are defined by<br \/>\n<a href=\"https:\/\/conf.icgbio.ru\/bgrs98\/wp-content\/uploads\/sites\/111\/2023\/03\/Thesis39_Image1.gif\" target=\"_blank\" rel=\"noopener\"><img loading=\"lazy\" class=\"alignnone wp-image-580 size-full\" src=\"https:\/\/conf.icgbio.ru\/bgrs98\/wp-content\/uploads\/sites\/111\/2023\/03\/Thesis39_Image1.gif\" alt=\"\" width=\"391\" height=\"26\" \/><\/a><br \/>\nwhere N(b,k) is the number of occurences of nucleotide b at position k. Site score is the sum of the respective positional nucleotide weights. The base of the logarithm is chosen so as the standard distribution of the site score on random Bernoulli sequences equals 1.<\/p>\n<p>Candidate sites (PUR, ARG, TRP and TYR boxes) were selected in regions upstream of annotated genes of\u00a0<i>E.coli<\/i>\u00a0and\u00a0<i>H.influenzae<\/i>. Thresholds and region boundaries in each case were selected so as to lose none of the known sites. Sets of potentially co-regulated genes were constructed. They consisted of genes having candidate sites in the upstream regions and genes downstream of those, if they were transcribed in the same direction and the intergenic distances did not exceed some threshold (usually 100 nucleotides).<\/p>\n<p>Pairwise alignment of all genes from the\u00a0<i>E.coli<\/i>\u00a0and\u00a0<i>H.influenzae<\/i>\u00a0was performed. Pairs of genes having strong similarity were retained for further analysis. This included comparison of the genes with the total gene complements of\u00a0<i>E.coli<\/i>\u00a0and\u00a0<i>H.influenzae<\/i>\u00a0in order to distinguish orthologs (that can be assumed to have the same role in the cell) from paralogs. Some genes with strong sites, that had known potentially relevant function, were also compared with GenBank and their close homologs were analyzed for the presence of candiadte sites in their upstream regions.<\/p>\n<p>All analysis was performed using the programs DNA-SUN [2] and GENOME (A.M., unpublished).<\/p>\n<p><b>2. Results<\/b><\/p>\n<p><b>2.1. Transport proteins in the purine and arginine regulons<\/b><\/p>\n<p>Analysis of the PurR regulon resulted in identification of a family of transport proteins that has representatives in\u00a0<i>E.coli<\/i>\u00a0and\u00a0<i>H.influenzae<\/i>, as well as a number of other bacteria. The family consists of two subfamilies. The known members of one subfamily are uracyl and xanthine transporters [3], whereas the other subfamily has no proteins with known specificity.\u00a0<i>E.coli<\/i>\u00a0has representativs in both subfamilies, and they happen to form pairs of close paralogs (<i>yicO<\/i>\u00a0and\u00a0<i>yieG<\/i>,\u00a0<i>yjcD<\/i>\u00a0and\u00a0<i>ygfQ\/R<\/i>,\u00a0<i>yicE<\/i>\u00a0and\u00a0<i>ygfO<\/i>). In each case the first member of a pair has a strong PUR box and thus is likely to be regulated by PurR, whereas the second member has no PUR boxes. All close relatives of the\u00a0<i>yicE-ygfO<\/i>\u00a0pair and one more gene with a PUR box,\u00a0<i>ygfU<\/i>, are H+\/purine(xanthine) symporters, and thus purine transport is a very likely function for these genes. The two other pairs,\u00a0<i>yicO-yieG<\/i>\u00a0and\u00a0<i>yjcD-ygfQ\/R<\/i>, as well as the\u00a0<i>H.influenzae<\/i>\u00a0gene\u00a0<i>HI0125<\/i>, which is an ortholog of the latter pair, can be ascribed only an unspecified transport function.<\/p>\n<p>In addition, PUR boxes were found upstream of the gene\u00a0<i>tsx<\/i>\u00a0encoding outer membrane nucleoside-specific channel in\u00a0<i>E.coli<\/i>,\u00a0<i>Enterobacter aerogenes<\/i>,\u00a0<i>Klebsiella pneumoniae<\/i>, and\u00a0<i>Salmonella typhimurium<\/i>\u00a0[4, 4A]<i>.<\/i><\/p>\n<p>Analysis of the ArgR regulon allowed us to identify ARG boxes upstream of operons encoding arginine-specific ABC transport systems (<i>artPIQM<\/i>\u00a0and\u00a0<i>artJ<\/i>\u00a0from\u00a0<i>E.coli<\/i>,\u00a0<i>HI1180-HI1177<\/i>\u00a0from\u00a0<i>H.influenzae<\/i>) and thus to place these operons in the arginine regulon.<\/p>\n<p><b>2.2. Changes in operon structure with retained regulation<\/b><\/p>\n<p>There are two main types of differences between\u00a0<i>E.coli<\/i>\u00a0and\u00a0<i>H.influenzae<\/i>\u00a0operons subject to the same regulation. First, genes can be absent in an operon. The gene\u00a0<i>HI0811<\/i>, which is a candidate member of the\u00a0<i>H.influenzae<\/i>\u00a0ArgR regulon, is an ortholog of the last gene of the\u00a0<i>E.coli<\/i>\u00a0<i>argCBH<\/i>\u00a0operon, whereas the first two genes have no orthologs in\u00a0<i>H.influenzae<\/i>. Similarly, presumably TyrR-regulated gene\u00a0<i>HI1290<\/i>\u00a0is an ortholog of\u00a0<i>tyrA<\/i>, whereas the first gene of the\u00a0<i>aroFtyrA<\/i>\u00a0operon of\u00a0<i>E.coli<\/i>\u00a0has no orthologs in\u00a0<i>H.influenzae<\/i>. Finally,\u00a0<i>purB<\/i>\u00a0of\u00a0<i>E.coli<\/i>\u00a0is an ortholog of\u00a0<i>H.influenzae<\/i>\u00a0gene\u00a0<i>HI0639<\/i>, whereas the PUR box is upstream of the first gene in the operon-like gene string\u00a0<i>HI0638-HI0639<\/i>.<\/p>\n<p>The second type of changes is breaking of an operon into two parts with retained regulation. Two\u00a0<i>E.coli\u00a0<\/i>operons\u00a0<i>purHD<\/i>\u00a0and\u00a0<i>glyA<\/i>, both regulated by PurR, correspond to a single\u00a0<i>H.influenzae<\/i>\u00a0gene string\u00a0<i>HI0887-HI0889<\/i>, and a PUR box is found upstream of\u00a0<i>HI0887<\/i>.<\/p>\n<p>Both types of differences occur in the tryptophan operon(s) regulated by TrpR and having TRP boxes in upstream regions. There is a single operon on interobacteria (<i>trpLEDCBA<\/i>\u00a0in\u00a0<i>E.coli<\/i>,\u00a0<i>trpEGDC\/FB<\/i>\u00a0in\u00a0<i>Vibrio parahaemoliticus<\/i>) and two operons in\u00a0<i>H.influenzae<\/i>:\u00a0<i>HI1387-1389.1<\/i>\u00a0(<i>trpEDDC<\/i>) and\u00a0<i>HI1430-HI1432<\/i>\u00a0(<i>ydfGtrpBA<\/i>). On the other hand,\u00a0<i>HI1430<\/i>\u00a0(<i>ydfG<\/i>) is a hypothetical oxidoreductase that is absent in the\u00a0<i>trpBA<\/i>\u00a0operon of\u00a0<i>Pasteurella multocida<\/i>, a close relative of\u00a0<i>H.influenzae<\/i>.<\/p>\n<p><b>2.3. Changes of regulation<\/b><\/p>\n<p>In some cases regulation patterns seem to be changed. The simplest case is the loss of regulation; the most interesting example of this type is the absence of PUR boxes in the upstream region of\u00a0<i>HI1632<\/i>\u00a0(<i>H.influenzae\u00a0<\/i>ortholog of\u00a0<i>purR<\/i>). This means that unlike its\u00a0<i>E.coli<\/i>\u00a0counterpart [5,6], this gene is not autoregulated. A more subtle case is the change of the regulation mechanism:\u00a0<i>purB<\/i>\u00a0is regulated by PurR via the roadblock mechanism [7], which explains an unusual location of the PUR box in the coding region of this gene (around codon 60), whereas the position of PUR box in the corresponding\u00a0<i>H.influenzae<\/i>\u00a0operon\u00a0<i>HI0638-HI0639<\/i>\u00a0is similar to position of PUR boxes in other operons.<\/p>\n<p>The most interesting situation seems to be that of the unique\u00a0<i>H.influenzae<\/i>\u00a0DAPH-synthase (there are three DAPH-synthases in\u00a0<i>E.coli<\/i>\u00a0encoded by\u00a0<i>aroH<\/i>,\u00a0<i>aroG<\/i>\u00a0and\u00a0<i>aroF<\/i>\u00a0and feedback repressed by tryptophan, phenylalanine and tyrosine, respectively [1]). The gene\u00a0<i>HI1547<\/i>\u00a0is an ortholog of\u00a0<i>aroG<\/i>\u00a0and thus encodes DAPH-synthase-PHE (E.Koonin, personal communication). However, unlike\u00a0<i>aroG<\/i>, regulated by TyrR (with phenyalanine and tryptophan acting as co-repressors), it has a TRP box, but no TYR boxes, similarly to the tryptophan-regulated gene\u00a0<i>aroH<\/i>\u00a0coding for DAPH-synthase-TRP. Thus either the regulation of this gene has changed, or a very subtle non-orthologous displacement has taken place [8]. There seem to be no computational way for resolving this ambiguity, that thus should be subject to experimental analysis.<\/p>\n<p><b>3. Discussion<\/b><\/p>\n<p>Computer analysis was used for prediction of bacterial transcription signals for more than 15 years (reviewed, in particular, in [9]), and in many cases it served as a basis for further experimental work (e.g. [10]). However, this study represents the first attempt to completely characterize regulons in newly sequenced genomes using large-scale genomic comparison.<\/p>\n<p>There are three main components in our approach: prediction of transription factor sites, analysis of protein homologies, and consideration of protein function. The use of complete genomes allows us to identify orthologs, and thus to use sequence similarity to make conclusions about similar cellular role of proteins. However, a good supplement to our technique is analysis of homologous genes in all related bacterial species using similarity search in GenBank. Thus the approach is flexible, yet sufficiently robust to make non-trivial predictions even when the operon structure and regulatory interactions are not stable (cf. [11]).<\/p>\n<p>An important prerequisite for this type of analysis is conservation of the regulatory protein itself. Thus, there are no strong PUR boxes in the\u00a0<i>Helicobacter pylori<\/i>\u00a0genome that does not contain a gene for PurR. Similarly, although there is a purine repressor in\u00a0<i>Bacillus subtilis<\/i>, it is unrelated to PurR of\u00a0<i>E.coli<\/i>\u00a0and indeed, the type of regulation (mostly by attenuation) and regulatory sites (in a few genes regulated on the transcription level) of the\u00a0<i>B.subtilis<\/i>\u00a0purine regulon differ from those of\u00a0<i>E.coli<\/i>. On the other hand, if the regulatory protein is conserved, the regulatory signals tend to be conserved as well. There are only three known genes in the arginine regulon of\u00a0<i>H.influenzae<\/i>, including the repressor ArgR itself (not counting the transport proteins predicted to belong to the arginine regulon in this work), but the ARG boxes are conserved. Preliminary results show that\u00a0<i>E.coli<\/i>\u00a0ARG box recognition matrix can recover the relevant signals even in the distantly related\u00a0<i>B.subtilis<\/i>\u00a0genome.<\/p>\n<p>This study allowed us to make a number of interesting predictions that can be checked by rather simple experimental techniques. One group of such predictions includes inferences about change of regulation patterns: loss of autoregulation in the\u00a0<i>H.influenzae<\/i>\u00a0homolog of\u00a0<i>PurR<\/i>, different mode of repression of\u00a0<i>purB<\/i>, and change of regulation of\u00a0<i>aroG<\/i>. The second group is formed by predictions that extend the existing purine and arginine regulons both in\u00a0<i>E.coli<\/i>\u00a0and\u00a0<i>H.influenzae<\/i>\u00a0by inclusion of transport proteins (purine and arginine transporters). It is somewhat surprizing that these systems, especially the large family of H+\/purine symporters, were not detected by genetic analysis. A possible explanation is that all PurR-regulated genes from this family have close non-regulated paralogs, and thus the influence of mutation in these genes would be weak or expressed in very specific conditions.<\/p>\n<p>Our further plans involve analysis of global regulatory systems (SOS, CRP, Fur, Fnr regulons) and multiple interacting systems (e.g. interaction of purine\/pyrimidine regulation), comparisons of with more distant genomes (in particular,\u00a0<i>E.coli<\/i>\u00a0and\u00a0<i>B.subtilis<\/i>), development of multiple local alignment \/ signal definition algorithms that would allow to analyze functionally related regulons with non-homologous regulators, more detailed analysis of interaction between proteins and their binding sites from both structural and evolutionary point of view, and, as a distant goal, development of techniques for automated characterization of regulatory pathways in newly sequenced genomes.<\/p>\n<p><b>Acknowledgements<\/b><\/p>\n<p>This work was partially supported by grants from the Russian Fund of Fundamental Research and the US Department of Energy. We are grateful to Mikhail Roytberg and Eugene Koonin for discussions and assistance.<\/p>\n<p><b>References<\/b><\/p>\n<ol>\n<li>F.C. Neidhardt, Ed. \u201cEscherichia coli and Salmonella. Cellular and Molecular Biology\u201c (ASM Press, Washington, 1996)<\/li>\n<li>A.A. Mironov et al., Comput. Appl. Biosci. 11 (1995)<\/li>\n<li>G. Diallinas, L. Gorfinkel, H.N. Arst Jr., G. Cecchetto, C. Scazzocchio. J. Biol. Chem. 270, 8610-8622 (1995)<\/li>\n<li>E. Bremer, A. Middendorf, J. Martinussen., P. Valentin-Hansen. Gene 96, 59-65 (1990)<\/li>\n<li>4A. A. Nieweg, E. Bremer. Microbiology 143, 603-615 (1997)<\/li>\n<li>R.F. Rolfes, H. Zalkin. J. Bacteriol. 172, 57585766 (1990)<\/li>\n<li>L.M. Meng, P. Nygaard. Mol. Microbiol. 4, 2187-2192 (1990)<\/li>\n<li>B. He, H. Zalkin. J. Bacteriol. 174, 7121-2127 (1992)<\/li>\n<li>E.V. Koonin, A.R. Mushegian, P. Bork. Trends in Genetics 12, 334-336 (1996)<\/li>\n<li>M.S. Gelfand. J. Comput. Biol. 2, 87-117 (1995)<\/li>\n<li>B. He, K.Y. Choi, H. Zalkin. J. Bacteriol. 175, 3598-3606 (1993)<\/li>\n<li>M.Y. Galperin, E.V. Koonin. In Silico Biol. 1, 0007 (1998) &lt;http:\/\/www.bioinfo.de\/isb\/1998\/01\/007\/&gt;<\/li>\n<\/ol>\n","protected":false},"excerpt":{"rendered":"<p>GELFAND M.S.1*,\u00a0MIRONOV A.A.2 1Institute of Protein Research, Russian Acad. Sci., Pushchino, 142292, Russia;\u00a0misha@imb.imb.ac.ru 2State Center of Biotechnology NIIGenetika, Moscow, 113545, Russia;\u00a0mir@vnigen.msk.su *Corresponding author Keywords: transcription regulatory patterns, bacterial genomes, site recognition, escherichia coli, haemophilus influenzae, purine and arginine regulons &nbsp; &hellip; <a href=\"https:\/\/conf.icgbio.ru\/bgrs98\/abstracts\/abstract-list\/039_computer-analysis-of-transcription-regulatory-patterns-in-completely-sequenced-bacterial-genomes\/\">Continue reading <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":13,"featured_media":0,"parent":97,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":[],"_links":{"self":[{"href":"https:\/\/conf.icgbio.ru\/bgrs98\/wp-json\/wp\/v2\/pages\/579"}],"collection":[{"href":"https:\/\/conf.icgbio.ru\/bgrs98\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/conf.icgbio.ru\/bgrs98\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/conf.icgbio.ru\/bgrs98\/wp-json\/wp\/v2\/users\/13"}],"replies":[{"embeddable":true,"href":"https:\/\/conf.icgbio.ru\/bgrs98\/wp-json\/wp\/v2\/comments?post=579"}],"version-history":[{"count":5,"href":"https:\/\/conf.icgbio.ru\/bgrs98\/wp-json\/wp\/v2\/pages\/579\/revisions"}],"predecessor-version":[{"id":1518,"href":"https:\/\/conf.icgbio.ru\/bgrs98\/wp-json\/wp\/v2\/pages\/579\/revisions\/1518"}],"up":[{"embeddable":true,"href":"https:\/\/conf.icgbio.ru\/bgrs98\/wp-json\/wp\/v2\/pages\/97"}],"wp:attachment":[{"href":"https:\/\/conf.icgbio.ru\/bgrs98\/wp-json\/wp\/v2\/media?parent=579"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}