Visualizing genomes in genetic programming sheds light on how information is shared
Genetic programming (GP) is premised on the idea that sharing the building blocks of good solutions in a population can effectively search the space of candidate solutions for a given problem. Most of the understanding for how building blocks are shared comes from schema theory or macro-level analysis of populations from several GP trials. In addition to these approaches, a lot of understanding can be gained by simply visualizing the genomes of single runs of GP and looking at how information is shared. As we show here, it can be especially helpful to visualize different methods to gain a qualitative feel for how they shape population dynamics. In this case I look at three evolutionary algorithms: tournament selection, deterministic crowding, and age-fitness Pareto optimization.
We can think of the spread of information through an evolving population as a diffusion process, so I used Paraview, a visualization tool that is mostly used to animate fluids, to look at GP populations. I used the Tower problem as an example problem. The visualizations are single runs with a population of 250, using ellen with uniform alternation and mutation.
To visualize the population, each gene (program building block) is assigned a color. The programs are represented as linear genotypes (post-fix notation equations) with a head (the top or root of the program) and a tail (the end of the program). In the following videos, vertical lines are single programs, and the programs are sorted in order of better fitness to the right.
There are a few things we should expect to see: first, the population should become more homogeneous over several generations, as the population converges in solution space. second, more fit programs should diffuse their building blocks to the rest of the population. Also, from previous research we know that the heads (roots) of programs in a population tend to converge on similar forms before the tails of programs, so we expect to see more homology (similarity) at the bottoms of the graphs.
Tournament Selection (size 2)
Each generation, for each parent selection, two individuals are randomly chosen from the population. The fitter individual becomes a parent. This repeats until all of the parents for the next generation are chosen.
Tournament selection is the most common type of selection method used in GP. In this run, we see several things. First, the heads of the programs become similar quickly, across the entire population. Homology continues to increase as the generations increase, with genes and combinations of genes diffusing from the fit individuals on the right to the less fit individuals on the left. The sizes of the programs become quite similar by the end as well.
Each generation parents are chosen at random to produce children. If the children are more fit than the parent they are most similar to genetically, they replace that parent. Otherwise the child is discarded.
Deterministic crowding is a diversity preservation technique that only allows new individuals to compete with the parent they are most similar to (thus forming niches). It only replaces parents with more fit children, and this is clear from the video, in which individuals appear to move to the right - in fact, these are improved offspring replacing their parents! it is also clear that the updating process slows considerably towards the end of the run which indicates that fitter individuals are becoming more difficult to find with the search operators in place. Overall, it is clear that much less homogeneity is produced throughout the population by this method.
Age-fitness Pareto Optimization
This one is a bit more complicated. Individuals are assigned an age based on the generations since their oldest ancestor. individuals survive based on youth and fitness. A new individual is introduced each generation.
Age-fitness Pareto optimization is based on the idea of having individuals compete with other individuals that have had the same amount of time to evolve, and so individuals can only be kicked out of the population by other individuals that dominate them by the metrics of age and fitness. By introducing a new individual each generation, randomized genetic material in the form of a new program is continually introduced. These individuals can be seen in the video arriving on the left side of the population. Here we see quite a bit more homology than deterministic crowding, but notice that it appears in columns or sections with sharp divisions. These divisions are the effect of stratifying the population by age. The video below makes this more clear by including an axis for age when visualizing the same run.
3D Age-Fitness Pareto Optimization
Same as above but with an age axis.
From this video it is clear that building blocks are shared across individuals of similar age. The dynamics of this method are quite interesting - information in the form of older, evolved building blocks diffuses to the left from older programs, while new, diverse programs are pumped into the population from the left, resulting in a balance between diversity preservation and highly homogenized pockets of similarly-aged individuals. perhaps this combination helps explain why age has made such a strong improvement to genetic programming.