{"id":550,"date":"2023-03-13T12:09:23","date_gmt":"2023-03-13T05:09:23","guid":{"rendered":"https:\/\/conf.icgbio.ru\/bgrs98\/?page_id=550"},"modified":"2023-04-11T13:54:41","modified_gmt":"2023-04-11T06:54:41","slug":"036_a-system-for-activation-of-the-trrd-database-further-development-of-geneexpress","status":"publish","type":"page","link":"https:\/\/conf.icgbio.ru\/bgrs98\/abstracts\/abstract-list\/036_a-system-for-activation-of-the-trrd-database-further-development-of-geneexpress\/","title":{"rendered":"A SYSTEM FOR ACTIVATION OF THE TRRD DATABASE: FURTHER DEVELOPMENT OF GENEEXPRESS"},"content":{"rendered":"<p><a href=\"https:\/\/conf.icgbio.ru\/bgrs98\/abstracts\/authors-index\/#frolov\">FROLOV A.S.<\/a><sup>#<\/sup>,\u00a0<a href=\"https:\/\/conf.icgbio.ru\/bgrs98\/abstracts\/authors-index\/#lavryushev\">LAVRYUSHEV S.V.<\/a>,\u00a0<a href=\"https:\/\/conf.icgbio.ru\/bgrs98\/abstracts\/authors-index\/#vorobiev\">VOROBIEV D.G.<\/a>,\u00a0<a href=\"https:\/\/conf.icgbio.ru\/bgrs98\/abstracts\/authors-index\/#grigorovich\">GRIGOROVICH D.A.<\/a><\/p>\n<p>Laboratory of Theoretical Genetics, Institute of Cytology and Genetics, Siberian Branch of the Russian Academy of Sciences, Novosibirsk, 630090 Russia<\/p>\n<p>#Corresponding author e-mail:\u00a0<a href=\"mailto:fas@bionet.nsc.ru\" target=\"_blank\" rel=\"noopener\">fas@bionet.nsc.ru<\/a><\/p>\n<p><a href=\"https:\/\/conf.icgbio.ru\/bgrs98\/abstracts\/keywords-index\/\">Keywords<\/a>: promoter recognition, database activation, transcription regulation, Internet-based recogniton<\/p>\n<p>&nbsp;<\/p>\n<p>The data on regulation of eukaryotic gene expression is being rapidly accumulated. Such well-known databases as EPD [Peter, 1998], TRANSFAC [Wingender, 1996], TRRD [Kel A., 1997], COMPEL [Kel O., 1995b], and EpoDB [Salas, 1998] have been developed.<\/p>\n<p>A great number of WWW servers containing similar databases and programs for molecular genetic studies are available worldwide. However, these servers suggest a list of resources that can be used only independently of one another, whereas a number of molecular genetic problems demand simultaneous employment of several databases and sequential or simultaneous running of several programs. For example, genome annotation requires the search for homology in GenBank and EMBL databases and\/or recognition of functional sites by their patterns using databases on transcription regulation. It is evident that a simple hypertext-based integration would not help to solve this problem.<\/p>\n<p>The positive experience of information integration accumulated during development and use of the Sequence Retrieval System (SRS) [Etzold, 1993], the system for operation with molecular biological databases; the data are accessible via the Internet.<\/p>\n<p>Such databases as GenBank, EMBL, TRASFAC, and TRRD are now accessible under SRS. It allows these resources to be used as initial data for creation of the programs for recognition, homology search, etc.<\/p>\n<p>We proposed the system GeneExpress [Kolchanov et al., 1998] as a first step in realization of the integrated approach to analysis of nucleotide sequences.<\/p>\n<p>The system GeneExpress has been designed to integrate description, annotation and recognition of eukaryotic regulatory sequences. The system contains the following basic units: (1)\u00a0<b>GeneNet<\/b>\u00a0contains an object-oriented database for accumulation of data on gene networks and signal transduction pathways and a Java-based viewer that allows an exploration and visualization of the information on gene networks; (2)<b>\u00a0Transcription Regulation\u00a0<\/b>combines<b>\u00a0<\/b>the database on transcription regulatory regions of eukaryotic genes (<i><b>TRRD<\/b><\/i>) and\u00a0<b>TRRD Viewer;<i>\u00a0<\/i><\/b>(3)\u00a0<b>Transcription Factor Binding Site Recognition\u00a0<\/b>contains a compilation of transcription factor binding sites (<b>TFBSC<\/b>) and programs for their analysis and recognition; (4)<b>\u00a0mRNA Translation<\/b>\u00a0is designed for analysis of structural and contextual properties of mRNA 5\u00edUTRs and prediction of their translation efficiency; and<b>\u00a0<\/b>(5)\u00a0<b>ACTIVITY<\/b>\u00a0is<b>\u00a0<\/b>the module for analysis and site activity prediction of a given nucleotide sequence. Integration of these databases in\u00a0<b>GeneExpress<\/b>\u00a0is based on the Sequence Retrieval System (<b>SRS<\/b>) created in the European Bioinformatics Institute.<\/p>\n<p>The next step in this direction is a superstructure based on GeneExpress that provides the search for promoters of a given type employing the data accumulated in the databases of TRRD [Kel\u00ed et al, 1997] and the recognition methods created basing on sequence sets from the SAMPLES database.<\/p>\n<p>The basic idea on the approach proposed is to used the information compiled in TRRD as ready-to-use scenarios for promoter recognition. In the simplest case, it allows the regions similar to a given promoter to be detected in an arbitrary sequence.<\/p>\n<p>Let&#8217;s consider the system operation using this simplest situation.<\/p>\n<p>Let&#8217;s consider a promoter\u00a0<b><i>P{p<sub>i<\/sub>}<\/i>,\u00a0<\/b>for which the following information is contained in TRRD: it contains\u00a0<b>N<\/b>\u00a0known transcription factor binding sites\u00a0<i><b>{(a<sub>n<\/sub>,b<sub>n<\/sub>)}\u00a0<\/b><\/i>with the site boundaries\u00a0<b>a<sub>n<\/sub>\u00a0<\/b>and<b>\u00a0b<sub>n<\/sub><\/b>\u00a0relative to the transcription start (here\u00a0<i><b>pO{ATGC}, 1&lt;=n&lt;=N<\/b><\/i>).<\/p>\n<p><a href=\"https:\/\/conf.icgbio.ru\/bgrs98\/wp-content\/uploads\/sites\/111\/2023\/03\/Thesis36_Image1.gif\" target=\"_blank\" rel=\"noopener\"><img loading=\"lazy\" class=\"alignnone wp-image-551 size-full\" src=\"https:\/\/conf.icgbio.ru\/bgrs98\/wp-content\/uploads\/sites\/111\/2023\/03\/Thesis36_Image1.gif\" alt=\"\" width=\"478\" height=\"248\" \/><\/a><\/p>\n<p>Fig. 1. Illustration of\u00a0<i><b>Score<sub>n<\/sub>(i<\/b>)\u00a0<\/i>and<i>\u00a0<b>Score(i)<\/b><\/i>\u00a0calculation.<\/p>\n<p>&nbsp;<\/p>\n<p>The nucleotide sequence\u00a0<i><b>S={s<sub>i<\/sub>}<\/b><\/i>\u00a0with the length\u00a0<i><b>L\u00a0<\/b><\/i>(here<b>\u00a0<i>s\u0152{ATGC}, 1&lt;=i&lt;=<\/i><\/b><i>L<\/i>) is analyzed to construct the set of\u00a0<i><b>N<\/b><\/i>\u00a0similarity profiles\u00a0<b>{<i>Score<sub>n<\/sub>(i)<\/i>}<\/b>\u00a0for each\u00a0<b>n<\/b>th binding site of this promoter:<\/p>\n<table border=\"0\" width=\"100%\" cellspacing=\"0\" cellpadding=\"0\">\n<tbody>\n<tr>\n<td width=\"50%\"><a href=\"https:\/\/conf.icgbio.ru\/bgrs98\/wp-content\/uploads\/sites\/111\/2023\/03\/Thesis36_Image2.gif\" target=\"_blank\" rel=\"noopener\"><img loading=\"lazy\" class=\"alignnone wp-image-552 size-full\" src=\"https:\/\/conf.icgbio.ru\/bgrs98\/wp-content\/uploads\/sites\/111\/2023\/03\/Thesis36_Image2.gif\" alt=\"\" width=\"183\" height=\"55\" \/><\/a><\/td>\n<td width=\"10%\">(1)<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>where <a href=\"https:\/\/conf.icgbio.ru\/bgrs98\/wp-content\/uploads\/sites\/111\/2023\/03\/Thesis36_Image3.gif\" target=\"_blank\" rel=\"noopener\"><img loading=\"lazy\" class=\"alignnone wp-image-553 size-full\" src=\"https:\/\/conf.icgbio.ru\/bgrs98\/wp-content\/uploads\/sites\/111\/2023\/03\/Thesis36_Image3.gif\" alt=\"\" width=\"207\" height=\"48\" \/><\/a><\/p>\n<p>&nbsp;<\/p>\n<p>Equation 1 ascribes to the\u00a0<b>i<\/b>th position of the sequence\u00a0<b>S<\/b>\u00a0the number\u00a0<b>{<i>Score<sub>n<\/sub>(i)<\/i>}\u00a0<\/b>of the coincidences of its region with the boundaries\u00a0<b>(<i>i-a<sub>n<\/sub>, a<sub>i<\/sub>-b<sub>n<\/sub><\/i>)<\/b>\u00a0with the region of the promoter considered with the boundaries\u00a0<b>(<i>a<sub>n<\/sub>,b<sub>n<\/sub><\/i>),<\/b>\u00a0that is, with the binding site of each\u00a0<b>n<\/b>th transcription factor. Then the integral similarity profile\u00a0<b>{<i>Score(i)<\/i>}<\/b>\u00a0of this sequence and entire promoter is constructed:<\/p>\n<table border=\"0\" width=\"100%\" cellspacing=\"0\" cellpadding=\"0\">\n<tbody>\n<tr>\n<td width=\"50%\"><a href=\"https:\/\/conf.icgbio.ru\/bgrs98\/wp-content\/uploads\/sites\/111\/2023\/03\/Thesis36_Image4.gif\" target=\"_blank\" rel=\"noopener\"><img loading=\"lazy\" class=\"alignnone wp-image-554 size-full\" src=\"https:\/\/conf.icgbio.ru\/bgrs98\/wp-content\/uploads\/sites\/111\/2023\/03\/Thesis36_Image4.gif\" alt=\"\" width=\"219\" height=\"87\" \/><\/a><\/td>\n<td width=\"10%\">(2)<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>&nbsp;<\/p>\n<p>Equation 2 ascribes to the\u00a0<b>i<\/b>th position of the sequence\u00a0<b>S<\/b>\u00a0the value of the similarity to the transcription start of the promoter\u00a0<b>P\u00a0<\/b>considered: the greater is the integral similarity of each of the considered\u00a0<b>n<\/b>th binding sites of this promoter to this region of this sequence, the greater is the ascribed value.<\/p>\n<p>Note that in the course of calculations,\u00a0<b>Score<sub>n<\/sub>(i)<\/b>\u00a0is &#8220;shifted&#8221; along the sequence by a length\u00a0<b>(a<sub>n<\/sub>+b<sub>n<\/sub>)\/2<\/b>\u00a0relative to the\u00a0<b>n<\/b>th binding site, so that its maximal value coincides with the transcription start (bold horizontal arrows in Fig. 1). Thus, it is not necessary to consider the positions of concrete binding sites while calculating\u00a0<b>Score<sub>n<\/sub>(i)<\/b>.<\/p>\n<p>The\u00a0<b>Score<sub>n<\/sub>(i)<\/b>\u00a0is used to predict the potential transcription starts in the sequence\u00a0<b>S<\/b>\u00a0as follows. The mean value<b>\u00a0M<\/b>\u00a0and standard deviation\u00a0<b>s<\/b>\u00a0are calculated and used to find the region with the borders\u00a0<i><b>{c,d}<\/b><\/i>\u00a0within which the value\u00a0<b>Score<sub>n<\/sub>(i)\u00a0<\/b>exceeds the threshold value\u00a0<b>M+3*s,<\/b>\u00a0corresponding to the confidence interval\u00a0<b>a<\/b>~0.01 of the Student&#8217;s test with the number of degrees of freedom &gt;&gt;100. This region houses the maximal value\u00a0<i><b>Score(t)<\/b>,<\/i>\u00a0and the position\u00a0<i><b>t<\/b><\/i>\u00a0is predicted as a potential transcription start\u00a0<i><b>T<\/b><\/i>\u00a0of the sequences\u00a0<i><b>S<\/b><\/i>. When\u00a0<i><b>K<\/b><\/i>\u00a0such regions\u00a0<i><b>{c<sub>k<\/sub>,d<sub>k<\/sub>}<\/b><\/i>\u00a0are found,\u00a0<i><b>K<\/b><\/i>\u00a0potential transcription starts<b>\u00a0{t<sub>k<\/sub>}<\/b>\u00a0are predicted (here\u00a0<i><b>1&lt;=k&lt;=K<\/b><\/i>).<\/p>\n<p>The system is available at\u00a0<a href=\"http:\/\/wwwmgs.bionet.nsc.ru\/Programs\/SeqAnn\/\" target=\"_blank\" rel=\"noopener\">http:\/\/wwwmgs.bionet.nsc.ru\/Programs\/SeqAnn\/<\/a><\/p>\n<p><a href=\"https:\/\/conf.icgbio.ru\/bgrs98\/wp-content\/uploads\/sites\/111\/2023\/03\/Thesis36_Image5.gif\" target=\"_blank\" rel=\"noopener\"><img loading=\"lazy\" class=\"alignnone wp-image-555 size-full\" src=\"https:\/\/conf.icgbio.ru\/bgrs98\/wp-content\/uploads\/sites\/111\/2023\/03\/Thesis36_Image5.gif\" alt=\"\" width=\"600\" height=\"300\" \/><\/a><\/p>\n<p>Fig. 2. An element of the profile\u00a0<i><b>Score(i)<\/b><\/i>\u00a0for searching the sequence extracted from EMBL by AC=X73839, with the located transcription start, for the promoter extracted form the TRRD database by ID=Hs:PBGD.<\/p>\n<p>The results of the application of Equations 1 and 2 to a sequence extracted form EMBL by AC=X73839 (<i>A. thaliana<\/i>\u00a0gene for hemC) and promoter of porphobilinogen deaminase extracted from TRRD by ID=Hs:PBGD are shown in Fig. 2. Note that the algorithm described above predicted one potential transcription start at position 1638 in this sequence. According to the information contained in the field FT, this sequence has the transcription start at position 1603.<\/p>\n<p>Further development of this approach will employ the construction of\u00a0<i><b>Score<sub>n<\/sub>(i)<\/b><\/i>\u00a0using homology search, weight matrices, perceptrons, and other methods. Constructions of control and test samples will be refined methodologically basing on extraction of experimental footprints, local Gibbs aligning, and construction of recognition groups.<\/p>\n<p>The TRRD database itself will be integrated with the system described above through cross-hyperreferences and automatic generation of the recognition programs for a promoter described in TRRD and intended to be searched for in a user&#8217;s sequence. We also plan to supplement the system with the modules for analysis of functional site activity based on the di- and trinucleotide composition.<\/p>\n<p>The work was supported by the Russian Foundation for Basic Research (97-04-90309) and Russian Human Genome Project.<\/p>\n<p><b>References<\/b><\/p>\n<ol>\n<li>Etzold, T. and Argos, P. (1993) SRS&#8211;an indexing and retrieval tool for flat file data libraries. CABIOS. 9, 49-57<\/li>\n<li>Kel, A.E., Kolchanov, N.A., Kel\u00ed, O.V., Romashchenko, A.G., Anan\u00edko, E.A., Ignatieva, E.V., Merkulova, T.I., Podkolodnaya, O.A., Stepanenko, I.L., Kochetov, A.V, Kolpakov, F.A., Podkolodny, N.L., and Naumochkin A.N. (1997) TRRD: database on transcription regulatory regions of eukaryotic genes. Mol. Biol. (Msk) 31, 521-530.<\/li>\n<li>Kel, O.V., Romashchenko, A.G., Kel, A.E., Wingender, and E., Kolchanov, N.A., (1995b) A compilation of composite regulatory elements affecting gene transcription in vertebrates. Nucl. Acids Res. 23, 4097-4103.<\/li>\n<li>N.A. Kolchanov, M.P. Ponomarenko, A.E. Kel, Y.V. Kondrakhin, A.S. Frolov, F.A. Kolpakov, O.V. Kel, E.A. Ananko, E.V. Ignatieva, O.A. Podkolodnaya, I.L. Stepanenko, T.I. Merkulova, V.N. Babenko, D.G. Voroblev, S.V. Lavyushev, Y.V. Ponomarenko, A.V. Kochetov, G.B. Kolesov, N.L. Podkolodny, L. Milanesi, E. Wingender, T. Heinemeyer, V.V. Solvyev &#8220;Genexpress: A Computer System for Description, Analysis and Recognition of Regulatory Sequences in Eukaryotic Genome&#8221; ISMB&#8217;98, in press (1998)<\/li>\n<li>Peter, R.C., Juner, T., and Bucher, P. (1998) The eukaryotic promoter database EPD. Nucl. Acids Res. 26, 353-357.<\/li>\n<li>Salas, F., Haas, J., Brunk, B., Stoeckert Jr, C.J., and Overton, G.C. (1998) EpoDB: a database of genes expressed during vertebrate erythropoiesis. Nucleic Acids Res. 26, 290-292<\/li>\n<li>Wingender, E., Dietze, P., Karas, H., and Kneuppel, R. (1996) TRANSFAC: a database on transcription factors and their DNA binding sites. Nucl. Acids Res. 24. P. 238-241.<\/li>\n<\/ol>\n","protected":false},"excerpt":{"rendered":"<p>FROLOV A.S.#,\u00a0LAVRYUSHEV S.V.,\u00a0VOROBIEV D.G.,\u00a0GRIGOROVICH D.A. Laboratory of Theoretical Genetics, Institute of Cytology and Genetics, Siberian Branch of the Russian Academy of Sciences, Novosibirsk, 630090 Russia #Corresponding author e-mail:\u00a0fas@bionet.nsc.ru Keywords: promoter recognition, database activation, transcription regulation, Internet-based recogniton &nbsp; The data &hellip; <a href=\"https:\/\/conf.icgbio.ru\/bgrs98\/abstracts\/abstract-list\/036_a-system-for-activation-of-the-trrd-database-further-development-of-geneexpress\/\">Continue reading <span class=\"meta-nav\">&rarr;<\/span><\/a><\/p>\n","protected":false},"author":13,"featured_media":0,"parent":97,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":[],"_links":{"self":[{"href":"https:\/\/conf.icgbio.ru\/bgrs98\/wp-json\/wp\/v2\/pages\/550"}],"collection":[{"href":"https:\/\/conf.icgbio.ru\/bgrs98\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/conf.icgbio.ru\/bgrs98\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/conf.icgbio.ru\/bgrs98\/wp-json\/wp\/v2\/users\/13"}],"replies":[{"embeddable":true,"href":"https:\/\/conf.icgbio.ru\/bgrs98\/wp-json\/wp\/v2\/comments?post=550"}],"version-history":[{"count":3,"href":"https:\/\/conf.icgbio.ru\/bgrs98\/wp-json\/wp\/v2\/pages\/550\/revisions"}],"predecessor-version":[{"id":1370,"href":"https:\/\/conf.icgbio.ru\/bgrs98\/wp-json\/wp\/v2\/pages\/550\/revisions\/1370"}],"up":[{"embeddable":true,"href":"https:\/\/conf.icgbio.ru\/bgrs98\/wp-json\/wp\/v2\/pages\/97"}],"wp:attachment":[{"href":"https:\/\/conf.icgbio.ru\/bgrs98\/wp-json\/wp\/v2\/media?parent=550"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}