5.1 Digital Chromosome
1. Biological Genes
Organic living systems like mammals are made up of billions of cells, and at the center of each of those cells is a nucleus that contains a number of chromosomes. Each chromosome is a string of DNA which contains the genes that control the machinery of life. There are roughly 250 different kinds of cells in a human being. There are liver cells, neural cells, pancreas cells and so on. What determines whether a cell is a liver cell that produces insulin as opposed to a neural cell that generates electricity is differential gene expression. All of the cells have some basic genes that are turned on. But a liver cell will have a set of genes turned on that a neural cell will not, and vice versa. So what are genes and how do they work this miraculous feat of engineering? Of all of the code in human DNA only about 5% of it codes for functioning genes. The rest of it has no use that we have been able to determine. So basically DNA has tiny sections in it that code for genes surround by vast strings of junk. Each of these genes can be transcribed and later translated to make proteins. It is these proteins that are natures little nanomachines. They are the workers that keep the cell alive and determine its behavior. So a single gene usually codes for a single protein. However, its genes are not always on. They can be flipped on and off. This flipping on and off is what causes the differential gene expression that makes a liver cell different from a neuron. In nature genes are usually flipped completely off by several different methods, but two of the most common ways are two either wrap the gene up so tightly that it can no longer be translated, or to methylate it. Methylation basically adds a molecule onto the bases of the nucleotides so that they can no longer be read by the transcription machinery. To flip the gene back on you would unwind the gene or un-methylate it. Also, gene expression rates can be regulated as well. In one instance you might want to really crank up expression of a gene and in others you might want to reduce it back down to normal or temporarily inhibit expression altogether. This is done via transcription factors and enhancer regions. Enhancer sites are places around a gene where special proteins called transcription factors can bind. When they attach they can either enhance or inhibit the transcription of that gene. So a simple definition of a gene is that it is a portion of a chromosome that produces a protein, and can often be regulated or switched.
2. Digital Genes
The digital genes in this system attempt to keep the basic principles of natural genes from organic system while simplifying
them enough that they can be processed in a reasonable time frame on current computer systems. Each chromosomes is a
string of binary numbers. It is also split up into a number of different genes. There is no fixed number of genes per
chromosome and there is no fixed binary size for any one gene or protein. It is possible to have one chromosome with
10 genes that is 123 bytes long and another chromosome with 15 genes that is 90 bytes long. Unlike organic genes though
there are no junk sequences between genes. Every bit in a digital chromosome is part of a gene. Each gene is composed of
a number of different pieces.
The enhancer count tells how many enhancers there are for this gene. Next is the list of enhancers. If the enhancer count is zero then the list of enhancers is empty. After that comes the controller count and list of controllers. Controllers and enhancers will be discussed in more detail in later sections. After the gene regulation sites comes the protein type. This is a value that determines what type of protein that this gene expresses and also allows for the loading of that specific type of protein. Finally there is the data that specifies the expressed protein itself. The size of the data to describe each protein is different and is determined by the type of protein being expressed. But since we already know the type of protein that is being loaded, and each protein type knows how to load itself, this is not a problem here. However, having different size genes and chromosome strings will have an effect on how evolutionary operations like crossover work, and these problems will be discussed later. After the expressed protein is defined then a new gene begins. If at any time during the loading of a gene the end of the chromosome is reached then that gene is discarded and the chromosome is ended.
One of the key concepts used in this system is that of the BindingID. This is basically a binary tag that is used to match up two interlocking units. A good example of this is the transcription factor and enhancer interaction. Transcription factors will be discussed in more detail later. What is important to understand at this time is that each protein, including a transcription factor, has a bindingID site. Also, genes have a list of enhancers associated with them. If the binding ID of a transcription factor matches the binding ID of an enhancer then transcription factor binds to the gene and effects its regulation in some way. Currently the binding ID for both have to match exactly. Even a one bit difference will keep the factor from effecting the regulation of the gene. One thing that should be obvious though is that a given factor with a binding ID can affect a number of different proteins. And a number of different proteins can have the same binding ID but different effects. You might have two factors with the same binding id and one of them up-regulates the gene and the other down regulates it. This allows for very complex and non-linear behavior.
4. Genetic Modularity
Chromosomes and genes are at the heart of this developmental system. The guiding principle behind all of this work is that I believe it is possible to evolve chromosomes that direct the growth of neural networks which control the behavior of simulated, and some day real, robotic systems. These chromosomes will not be evolved as complete systems. The idea is to evolve specific, modular sets of genes that perform some function and then begin combining them in a guided way so that they work together to produce a neural network. An example of this is pattern formation like the achaete-scute gene system discussed later in this site. This is a module of genes that could be controlled with a simple switch. Once that switch is thrown in a given area of the developing brain it will produce that pattern. Then depending on previous differentiation events the resulting pattern can then be used in the next step of formation. But that same pattern can be reused by different system for different purposes even though the module contains the same basic genes. In software this is called modular reusability, and genes of this type are well designed for this function. Others have attempted to use evolution and genetic algorithms to grow neural networks. However, in virtually all of those cases the structure of the neural network was explicitly laid out in the genetic code. Neuron A connects to B with a defined strength of 10, etc. But this is the exact opposite of having reusable modules, and it is not how nature performs the task. A genetic encoding scheme of this type will not allow you to reuse any module and forces you to explicitly define every connection. That means that as the network increases in complexity you get a corresponding increase in the size and complexity of the chromosome needed to grow it. But natures solution is much more elegant and does not require this. In the achaete-scute system discussed earlier, the developing brain does not care if that module is turned on for 10x10x10 grid or a 100x100x100 grid. It will still produce the same pattern regardless. All that is changed is the size of the area where those genes are flipped on. And since the following steps in development are based on the results of previous elastic steps and the results of the newly formed pattern, they will also work correctly with the larger area. This also demonstrates two other important points about using genes in this manner instead of explicitly encoding the network structure. Those principles are resiliency and scalability. This type of genetic encoding is more resilient because mutations can be more easily adapted to. For example, lets say we have a section of a brain where the pattern formation needed for the next step is dependent on the diffusion of some ligand. But that ligand is mutated so that it degrades faster than before and can not diffuse as far now. This mutation does not break the system and will usually have very little effect on the development of the brain. It will simply alter the formation of the pattern making it more compact than before. This more compact pattern may in fact prove to be more beneficial. If so then that is evolution at work and it will probably be kept. The system is more scalable because it is easier to go from 10 cells to 100 or 1000. It is the preceding pattern formation that is crucial to determining what the future actions will be. The network does not care how many total cells are involved, only that the overall relations between sections stay the same.
5. Chromosome Overview
Chromosomes and genes that use the same kinds epigenetic processes as natural development are one of the cornerstones of this entire endeavor. Neural network systems are so complex that actually engineering one from scratch is simply beyond the abilities of the human mind. Yes, simple systems can be built by hand that contain a hundred or so nodes and a number of interconnections. But these simple systems are orders of magnitude less complex than even the brain of a simple insect like the ant. We need to be able to harness the power of natures own search engine if we are ever to be able to build networks of sufficient complexity. But as discussed above, simple genetic encoding schemes are doomed to failure as well. Again we need to learn from nature. And nature uses a highly dynamic system of development that eliminates those problems. This system is a first step in the direction of trying emulate nature. Hopefully, by evolving gene modules that can perform certain developmental tasks, and then combining them so that they will work together to produce more complex systems we can begin to produce systems that at least approach the level of intelligence of insects.