IGM, university Marne la Vallee – 2 rue de la Butte Verte – 93166 Noisy le Grand Cedex – FRANCE
e-mail:tolila@genethon.fr
Keywords: formal language, grammar, automata, cell signaling, transduction pathway
The integration of the various types of cellular activities in a multicellular organism is mainly performed by the nervous system, the endocrine system and the immune system.
In fact although DNA is indispensable for the life of the cell, outside of the context of the living cell and the intercellular communications network, DNA is basically inert. Memory exists at 2 levels: the memory of the species which consists of the unchangeable DNA on one hand, and the active memory or short-term memory of each cell, which is its metabolic state at each instant. Enzymes constitute the short-term memory of the cell, its identity, and the network of indicators of what is going to happen. It is the same for the intercellular communications network.
In fact DNA is indispensable to the life of the cell: during apoptosis, or programmed cell death, in which endonuclease enzymes which digest the cell’s own DNA are activated, thus rendering the DNA unusable. DNA is necessary for life, it is a matrix which is read and interpreted by the cell according to its own identity, its own biochemical context and its environment. The identity of a differentiated cell is maintained by its metabolism; a cell which loses control of its regulation dedifferentiates and loses its identity. Enzymes activate or inhibit the metabolic and differentiation pathways in which the cell may engage (Figure 1): some enzymes regulate the expression of genes and the synthesis of proteins from the DNA template (both structural proteins and the enzymes themselves); while others enzymes catalyse metabolic reactions, anabolic, in which products required by the cell are synthesized, and catabolic in which substrates are degraded to elementary subunits with production of energy.
Intercellular communications are mainly conducted by secreted proteins (ex: hormone, growth factors, cytokines, antibodies) and exogenous ligands (ex: antigen). These circulating substances transmit a signal to competent cells using trans-receptor proteins as intermediaries; the signal is then relayed to the interior of the cell by transduction by an enzymes cascade. The signal may be transmitted to the nuclear by transcription factors which provoke the expression or repression of a gene, or to the cytoplasm where a metabolic pathway may be activated or inactivated (Figure 1). Without the enzymes to promote a given pathway, reactions would occur, but would proced so slowly that the products of of a given reaction might be degraded before they could serve as substrates for the next reaction of the pathway. The progression of enzymes that serve as catalysts for a metabolic pathway form a code which switch on or off, these enzymes form the code for the metabolic pathway or word of the language.
Molecules interact by contact and chemical interactions, a binding between two molecules may produce activation or inhibition of the catalytic site of one of the two molecules. This usually involves allosteric alteration or covalent modification by phosphorilation. An enzyme can be described by these 2 sites: the catalytic site and the allosteric site. An inhibitor may bind to the catalytic site and block it (isosteric modification, inhibition of the catalytic site by an analog of the substrate) or to the allosteric site and cause a change in conformation which will activate the catalytic site. An enzyme which catalyse covalent binding (interconversion), usually by phosphorilation or dephosphorilation, of another enzyme may cause activation or inactivation of the latter.
In a previous paper (Bentolila, 96) we described a context sensitive grammar which models the 4 main types of genes regulation. The proposed model considers two types of objects: transcriptional units on DNA and regulatory or structural proteins which are synthesized, and which are, in the case of regulatory proteins, themselves destined to activate or repress other transcriptional units in a later phase. A transcriptional unit is described by the list of its active sites (operator, promoter, binding sites for transcription factors). A regulatory protein is described by the list of its active sites (binding domain, activation domain, binding domain for ligand). The DNA sites and the protein domains are the terminal symbols of the proposed grammar. The interaction of these proteins with the DNA, and in certain cases preliminary interactions between proteins, leads to one of two antagonistic actions: expression or repression of the transcriptional unit. These protein-protein and protein-DNA interactions are grouped into syntactic categories (induction, inhibition, initiation complex, repressor complex, activation complex) which are called biological binding operators. The expression/repression actions are described by grammar rules which provide the chain of execution by biological binding operators for the four activable/repressible regulatory systems modulated by positive/negative co-factors.
If we suppose that the semantics of biological binding operators is already implemented (using a database), it is sufficient to write a context-free grammar which describes the order of application of biological binding operators, similar to the context-free grammar of arithmetic operators, for example. In the case of arithmetic operators, the semantics of the operations is recognized and implemented in the compiler, whereas in the case of biological binding operators, the semantics, i.e. the result of the operation cannot be given by computation; a table of meaningful bindings and their result is necessary as an extension.
We have extended this model to intercellular communication pathways or signal pathway. The grammar that we have developped describes the series of operations that leads to either the activation or inactivation of an anabolic / catabolic pathway (ex: glycogensis / glycogenolysis) or the expression / repression of a protein ( ex: antibody, ccytokines). We have applied this model to 2 examples: the key enzymes involved in sugar metabolism regulation in the liver (Table 1) which is under hormonal control; and a simplified model of the immune response (Table 2).
An sql simulator based on a relational database
The simulation models the binding of 2 molecules either in the same cell, or a receptor and a circulating molecule or 2 circulating molecules (or protein complexe).
Some operators are not detailed, and reference is made to a sub-automaton such as the sub-automaton “transduction” which cannot be detailed in some cases because all of the binding elements of the pathway are not yet known; or the sub-automata Expression / Represssion which was previously described (Bentolila, 96).
One cell refers to a population of cells.
The object of this modelization is the observation of a cell in a given state for a given process.
The knolewdge database which describes the process studied, contains the list of binding operators (1st operand, 2nd operand, operator, result), each operand described by (cell, molecule, type, state) (Table 1 and 2).
The current state of the simulation contains the initial state of the simulation: the list of cells described by the list of their receptors and their cytoplasmic proteins, as well as the list of circulating molecules.
In each cycle, the simulation considers the licit bindings based on the current state and applies the resulting actions to the current state, such as: addition of a circulating / cytoplasmic / receptor protein (expression of a gene); change in state of a protein (activation / inactivation); addition of a cell (cell division); suppression of a cell (cell destruction).
The simulation uses sql to provides a better simulation of parallelism.
Figure 1 A modeling of signal pathway
Figure 2 Key enzymes in regulation of sugar in the liver
Table 1 Signal pathway in the regulation of sugar metabolism in the liver
1st operand |
2nd operand |
Operator |
Result |
|||||||
Cell | molecule | type | Cell | molecule | type | Cell | molecule | type | state | |
glucose | Expression | insulin | circulating | |||||||
liver | insulin receptor | receptor | insulin | circulating | transduction | liver | glycogen-synthase | cytoplasmic | activated | |
liver | glycogen-synthase | liver | metabolic pathway | liver | glycogen | cytoplasmic | ||||
liver | insulin receptor | receptor | insulin | circulating | transduction | liver | glycolysiskey enzymes | cytoplasmic | activated | |
liver | glycolysiskey enzymes | liver | metabolic pathway | liver | pyruvate | cytoplasmic | ||||
liver | insulin receptor | receptor | insulin | circulating | transduction | liver | gluconeogenesis key enzymes | cytoplasmic | inactivated | |
liver | insulin receptor | receptor | insulin | circulating | transduction | liver | glycogen-phosphorylase | cytoplasmic | inactivated | |
Expression | glucagon | circulating | ||||||||
liver | glucagon receptor | receptor | glucagon | circulating | transduction | liver | gluconeogenesis key enzymes | cytoplasmic | activated | |
liver | gluconeogenesis key enzymes | liver | metabolic pathway | liver | glucose | circulating | ||||
liver | glucagon receptor | receptor | glucagon | circulating | transduction | liver | glycogen-phosphorylase | cytoplasmiqc | activated | |
liver | glycogen-phosphorylase | liver | metabolic pathway | liver | glucose | circulating | ||||
liver | glucagon receptor | receptor | glucagon | circulating | transduction | liver | glycolysiskey enzymes | cytoplasmic | inactivated | |
liver | glucagon receptor | receptor | glucagon | circulating | transduction | liver | glycogen-synthase | cytoplasmic | inactivated |
Detail
glycogense key enzymes | glycogen synthase |
glycogenolyse key enzymes | glycogen phosphorylase |
gluconeogenesis key enzymes | pyruvate carboxylase |
PEP carboxykinase | |
fructose1,6 bisphosphatase | |
glucose 6 phosphatase | |
glycolysis key enzymes | hexokinase |
6-phosphofructokinase | |
pyruvate kinase |
Detail of glucagon transduction pathway
1st operand |
2nd operand |
Operator |
Result |
|||||||
Cell | molecule | type | Cell | molecule | type | Cell | molecule | type | state | |
liver | glucagon | circulating | liver | glucagon receptor | receptor | binding | liver | glucagon receptor | receptor | activated |
liver | glucagon receptor | receptor | liver | G protein | cytoplasmic | binding | liver | G protein | cytoplasmic | activated |
liver | G protein | cytoplasmic | liver | adenylate cyclase | cytoplasmic | binding | liver | adenylate cyclase | cytoplasmic | activated |
liver | adenylate cyclase | cytoplasmic | liver | ATP | cytoplasmic | binding | liver | AMPc | cytoplasmic | activated |
liver | AMPc | cytoplasmic | liver | protein kinase A | cytoplasmic | binding | liver | protein kinase A | cytoplasmic | activated |
liver | protein kinase A | cytoplasmic | liver | phosphorylase-kinase | cytoplasmic | binding | liver | phosphorylase-kinase | cytoplasmic | activated |
liver | phosphorylase-kinase | cytoplasmic | liver | glycogen phosphorylase | cytoplasmic | binding | liver | glycogen phosphorylase | cytoplasmic | activated |
liver | phosphorylase-kinase | cytoplasmic | liver | glycogen synthase | cytoplasmic | binding | liver | glycogen synthase | cytoplasmic | inactivated |
Table 2 Simplified model of the immune response
1st operand |
2nd operand |
Operator |
Result |
||||||
Cell | molecule | type | Cell | molecule | type | Act | Cell | molecule | type |
Target cell | cell receptor | receptor | antigen | circulating | Expression | infected cell | MHCI + antigen | receptor | |
Target cell | cell receptor | receptor | antigen | circulating | infected cell | antigen | receptor | ||
B | antibody Ig | receptor | antigen | circulating | Expression | B | MHCII + antigen | receptor | |
Th | T receptor | receptor | B | MHCII + antigen | receptor | Expression | Th | cytokine | circulating |
B | cytokinereceptor | receptor | cytokine | circulating | Cell division | B | |||
B | cytokinereceptor | receptor | cytokine | circulating | Expression | B | antibody | circulating | |
antigen | circulating | antibody | circulating | antigen + antibody | circulating | ||||
infected cell | antigen | circulating | antibody | circulating | infected cell | antigen + antibody | receptor | ||
macrophage | M receptor | receptor | antigen | circulating | Expression | macrophage | MHCII + antigen | receptor | |
macrophage | Fc receptor | receptor | antigen + antibody | circulating | destruction antigen + antibody | macrophage | |||
Th | T receptor | receptor | macrophage | MHCII + antigen | receptor | Expression | Th | cytokine | circulating |
Th | cytokine receptor | receptor | cytokine | circulating | Cell division | ||||
Tc | cytokine receptor | receptor | cytokine | circulating | Cell division | ||||
Tc | T receptor | receptor | infected cell | MHCI + antigen | receptor | Cell destruction | |||
K | Fc receptor | receptor | infected cell | antigen + antibody | receptor | Cell destruction |
References
- Atlan H., (1990) The cellular computer DNA: program or data Bull. math; Biol. 52-3,335-348
- Bentolila S., (1996) A grammar describing “biological binding operators ” to model gene regulation. Biochimie 78, 335-350
- Chomsky N, (1957) Syntactic Structures. Mouton
- Collado-Vides J, (1991) The Search for a Grammatical Theory of Gene Regulation Is Formally Justified by Showing the Inadequacy of Context-free Grammars. Comput Applic Biosci 7(3), 321-326
- Collado-Vides J, (1991) A syntactic Representation of Units of genetic information.J Theor Biol 148, 401-429
- Collado-Vides J, (1993) A linguistic representation of the regulation of transcription initiation. BioSystems 29, 87-128
- Hofestadt R, (1993) A simulation shell to model metabolicpathways. J Syst Analysis Modeling Simulation 11, 253-262
- Hofestadt R, Meineke F (1995) Interactive modelling and simulation of Biochemical networks. Comput Biol Med 25(3), 321-334
- Hopcroft J E, Ullman J D (1979) Introduction to automata theory, languages and computation. Addison Wesley
- Searls D B, Dong S (1993) in Proceedings of the 2nd International Conference on Bioinformatics, Supercomputing, and Complex Genome Analysis (Lim H A et al, eds) World Scientific, 89-101
- Trifonov E N, (1993) DNA as a language in Proceedings of the 2nd International Conferencce of Bioinformatics, Supercomputing, and Complex Genome Analysis (Lim H A et al, eds) World Scientific, 103-110