Evol is clear dsl for composable evolutionary algorithms that optimised for joy.

GoDataDriven

Last update: Dec 27, 2022

Related tags

Overview

Evol is clear dsl for composable evolutionary algorithms that optimised for joy.

Installation

We currently support python3.6 and python3.7 and you can install it via pip.

pip install evol

Documentation

For more details you can read the docs but we advice everyone to get start by first checking out the examples in the /examples directory. These stand alone examples should show the spirit of usage better than the docs.

The Gist

The main idea is that you should be able to define a complex algorithm in a composable way. To explain what we mean by this: let's consider two evolutionary algorithms for travelling salesman problems.

The first approach takes a collections of solutions and applies:

a survival where only the top 50% solutions survive
the population reproduces using a crossover of genes
certain members mutate
repeat this, maybe 1000 times or more!

We can also think of another approach:

pick the best solution of the population
make random changes to this parent and generate new solutions
repeat this, maybe 1000 times or more!

One could even combine the two algorithms into a new one:

run algorithm 1 50 times
run algorithm 2 10 times
repeat this, maybe 1000 times or more!

You might notice that many parts of these algorithms are similar and it is the goal of this library is to automate these parts. We hope to provide an API that is fun to use and easy to tweak your heuristics in.

A working example of something silimar to what is depicted above is shown below. You can also find this code as an example in the /examples/simple_nonlinear.py.

import random
from evol import Population, Evolution

random.seed(42)

def random_start():
    """
    This function generates a random (x,y) coordinate
    """
    return (random.random() - 0.5) * 20, (random.random() - 0.5) * 20

def func_to_optimise(xy):
    """
    This is the function we want to optimise (maximize)
    """
    x, y = xy
    return -(1-x)**2 - 2*(2-x**2)**2

def pick_random_parents(pop):
    """
    This is how we are going to select parents from the population
    """
    mom = random.choice(pop)
    dad = random.choice(pop)
    return mom, dad

def make_child(mom, dad):
    """
    This function describes how two candidates combine into a new candidate
    Note that the output is a tuple, just like the output of `random_start`
    We leave it to the developer to ensure that chromosomes are of the same type
    """
    child_x = (mom[0] + dad[0])/2
    child_y = (mom[1] + dad[1])/2
    return child_x, child_y

def add_noise(chromosome, sigma):
    """
    This is a function that will add some noise to the chromosome.
    """
    new_x = chromosome[0] + (random.random()-0.5) * sigma
    new_y = chromosome[1] + (random.random()-0.5) * sigma
    return new_x, new_y

# We start by defining a population with candidates.
pop = Population(chromosomes=[random_start() for _ in range(200)],
                 eval_function=func_to_optimise, maximize=True)

# We define a sequence of steps to change these candidates
evo1 = (Evolution()
       .survive(fraction=0.5)
       .breed(parent_picker=pick_random_parents, combiner=make_child)
       .mutate(func=add_noise, sigma=1))

# We define another sequence of steps to change these candidates
evo2 = (Evolution()
       .survive(n=1)
       .breed(parent_picker=pick_random_parents, combiner=make_child)
       .mutate(func=add_noise, sigma=0.2))

# We are combining two evolutions into a third one. You don't have to
# but this approach demonstrates the flexibility of the library.
evo3 = (Evolution()
       .repeat(evo1, n=50)
       .repeat(evo2, n=10)
       .evaluate())

# In this step we are telling evol to apply the evolutions
# to the population of candidates.
pop = pop.evolve(evo3, n=5)
print(f"the best score found: {max([i.fitness for i in pop])}")

Getting Started

The best place to get started is the /examples folder on github. This folder contains self contained examples that work out of the box.

How does it compare to ...

... deap? We think our library is more composable and pythonic while not removing any functionality. Our library may be a bit slower though.
... hyperopt? Since we force the user to make the actual algorithm we are less black boxy. Hyperopt is meant for hyperparameter tuning for machine learning and has better support for search in scikit learn.
... inspyred? The library offers a simple way to get started but it seems the project is less actively maintained than ours.

Comments

How to set up evolution steps
The evolution will contain a chain of steps, which can in sequence be applied to a population. In principle these steps will need to be no more than functions - we could simply do

for step in self.chain: population = step(population)

These step functions would simply look like:

def step(population): return population.evaluate()

but we shouldn't replace them by lambda's for the sake of pickling.

In order to provide arguments to the methods of the population inside such functions we could re-define the functions with the proper arguments for each time we use them.

Now on the other hand it would be nice to be able to give a name to such a step, for example to identify the step in logging or to make it easier to debug. We could achieve that by creating a namedtuple with name and function fields. Once we have a namedtuple, we could decide to put the arguments not baked into the function, but in their own field. The application of the steps would then look like

for step in self.chain: population = step.function(population, **step.kwargs)

We can take it one step further by defining an EvolutionStep class, and make classes for each of the step types inheriting from that class. This would look like:

class EvolutionStep: def __init__(self, name, **kwargs): self.name = name self.kwargs = kwargs class EvaluationStep(EvolutionStep): def __init__(self, name, lazy=False): EvolutionStep.__init__(self, name=name, lazy=lazy) def apply(self, population): return population.evaluate(**self.kwargs)

which has the very nice property that the arguments are well taken care off as the function is only defined once (as a method to the step class). In this case the Evolution.evaluate method can look as simple as

def evaluate(self): result = copy(self) result.chain.append(EvolutionStep()) return result

On the other hand we will need to define a separate Step class for each operation.

@koaning, any preference?
opened by rogiervandergeer 16
Islands
I've created a pull request for the islands (#37), but we still need to do some work before we can merge this. In this issue I'll explain some of the changes I've made, and I have a few design decisions to make.

Let's start at the beginning; we have a Population, which is a collection of Individuals on which we can perform a multitude of operations (evaluate, mutate, etc). The Population can also take an Evolution object and perform a series of such operations.

Now we want to introduce islands, where the population is split into several subgroups, on each of which we can perform the same operations as on the Population. So we add two (or three, see below) operations: split and combine in which we can split a population in multiple islands and then merge these islands back into a single population.

I have assumed that we will always want the islands to go through the same evolution. I.e. if we split a population in two islands, and then we apply an evolution on those islands, that the evolution will be the same for both islands. Note that we can pass an island-id to the combiner, picker and mutator functions such that these behave differently on each island - but we assume that the order of the steps is the same.

Question 1: is that a reasonable assumption?

If we will always apply an operation to all islands in the same time, then the interface to the islands will be the same as to the Population: calling mutate on a group of islands will simply call mutate on each island. The same goes for all other operations (except split and combine). Hence a group of islands behaves exactly the same as a Population, and we can apply the same Evolution on each (as long as we do not use split or combine). Since the interface to a group of islands is so close to that of a Population, it makes sense to name the group of islands an IslandPopulation.

Now we have Population, which is a collection of Individuals, and IslandPopulation, which has the same interface as Population (except, again, for split and combine) and which is a collection of Populations. Since Population and IslandPopulation have mostly the same interface it makes sense for them to be of the same type, but as IslandPopulation contains Populations, they cannot inherit from each other. Therefore I've created an abstract base class called PopulationBase of which both the Population and the IslandPopulation inherit. (And since ContestPopulation inherits from Population, also that inherits from PopulationBase.) The great thing about this is that IslandPopulation no longer has to be a collection of Population objects per se, but can be a collection of PopulationBase objects - making it a collection of any type of population.

The above means that operations like mutate and evaluate can be performed on any object of type PopulationBase. But I've had to make exceptions for split and combine, since these transform a Population into an IslandPopulation and vice versa. For example, calling split on a Population should clearly result in an IslandPopulation, while calling combine on an IslandPopulation should result in a Population (or ContestPopulation if it was a collection of those). Naturally, Population.combine should not exist (or throw an error, or do nothing). And now comes the difficult part:

Question 2: what should IslandPopulation.split do? Should it

Refuse to work, just like you cannot combine a Population.

Pass the split to each island, returning now the same IslandPopulation, containing not Populations but IslandPopulations . E.g. A[1, 2, 3] -> A[1[a, b], 2[c, d], 3[e, f]]

Split itself, such that it returns _a new IslandPopulation, containing islands which contain the original Populations. E.g. A[1, 2, 3, 4] -> q[A[1, 2], B[3, 4]]

Allow both options 2 and 3 using for example a flag pushdown or level.

Split the internal populations such that the original IslandPopulation now contains more islands. E.g. A[1, 2] -> A[1a, 1b, 2a, 2b].

If we choose for one of the options 2-4 we will have to deal with many layers of sub-islands. This makes functions like combine much more complicated, since we would have to specify which layer to combine (or always push down as far as possible) and verify that that layer exists. Also, if we allow sub-islands, we need to make sure that the depth of each branch stays the same. If someone manipulates an IslandPopulation to look like this: A[1, 2[c, d]], then what do we do if he requests the second layer to be combined?

Next, we need to decide the workings of split. Intuitively I find that split should split the population in several groups, such that each individuals ends up in only one of them. I noticed that @koaning expects each individual to end up in each group, effectively duplicating the original Population. Since these are two fundamentally different operations, I propose we make both split() and duplicate() available.

Question 3: do you agree?

This does mean that we have to keep track of the original intended_size somehow. For example, suppose I have a Population of size 100. If I duplicate this into four groups, I end up with an IslandPopulation consisting of four islands with 400 individuals total. However, if I split it into four groups, I get an IslandPopulation consisting of four each with 25 individuals each, and hence 100 total. If we recombine these, we want to obtain a Population with intended_size == 100. Of course, when we combine the duplicated islands we will have an initial population of size 400, but this should be reduced back to 100 (either immediately or after the first survive, breed steps).

For the duplicated islands, it is clear that the intended_size of each island is 100. For the split islands, since their initial size is 25, this is not so clear.

Question 4: which is the best option:

When using split, the intended size of each island is a fraction of the original (25 in the example), and when we use duplicate the intended size of each island is the same as the original (100 in the example). We keep track of the original intended size of the whole population and use that when we combine.

When using split or duplicate, the intended size of the islands is a fraction of the original (25 in the example). In the case of duplicate, each island will initially be much larger than intended.

Same as 2, but in the case of duplicate, we first use survive to cut the original population back to 25 before we duplicate.

When using split or duplicate, the intended size of the islands is the same as the original (100). In the case of split each island will initially be much smaller than intended.

Same as 4, but in the case of split we call breed immediately after the split such that all populations are of the right size.

Of course we may allow the user to override the intended size at any point.

@koaning I'd love to hear your opinion
opened by rogiervandergeer 13
Thoughts on Parallelism

We already apply some performance tricks with the .evaluate() mechanic but we may be able to add some form of parallelism/queing to perhaps make things even more performant.

In terms of easy win: it seems like the .map (and thus mutate) can be run in parallel in general. Same would hold for .evaluate() in the BasePopulation.

Do we want to explore this?

opened by koaning 11
Keeping track of the fittest individual
In addition to logging (as discussed in #15) and checkpointing (as discussed in #33) I think we need to keep track of the best-performing individual. I think we need to do this separately from logging for two reasons:

the best individual is something you always want to find, irregardless of whether you want to log,

if logging is going to happen at an interval other than once every iteration, there is a chance that you do not log the best performing individual when it mutates before the logging moment.

We could implement this by storing a single individual inside the Population. I would suggest to make this part of the evaluate call, and always (whether the call is lazy or not) replace the current best by an individual when it is evaluated and its fitness score is better than the current best.

Naturally the result of this is nonsense for the ContestPopulation, as there the fitness depends on the rest of the population too. In this case it would only make sense to store the entire population together with the fittest individual, otherwise it would be impossible to recompute the result. Of course we cannot store populations inside populations, and this would be a typical case to be solved by logging.

Even in a normal Population, for stochastic evaluation functions the 'best individual' may of course be a lucky shot; but I think this is not a problem for us to solve.

Technically speaking, to me it feels like we want to store the best individual in a variable named something like best or historical_best. Currently we have min_individual and max_individual - and I don't recall why we didn't implement a current_best (or best). @koaning do you remember the reason?
opened by rogiervandergeer 9
Cost functions
Would we want to have problem instances to be featured in our library as well? It may help people to get started. Say to have cost functions for:

easy/hard continous search spaces

basic TSP problems

8-queen-like problems
opened by koaning 8
Logging
@rogiervandergeer i am creating this pull request merely to get your attention and to shoot at some things. the point is that i think parts are correct, but a discussion may be helpful now.

ive stumbled on some things that need discussion along with logging.

i am proposing that we pass an evol.Logger object to the population apon initialisation. if you do not pass anything in then we will use the baselogger

if you call .log() via the population/evolution then the logger that is passed in initialisation will determine what and how to log.

you can see in travis what the output of a baselogger is, the current idea is to print to standardout if no file is given. it will log to a file instead if a file is given.

the evol.Logger object is split in two verbs: .log() and .handle(). the idea is that the former will deal with the what to handle and the latter will deal with the how to handle it/where to put it. this should leave it free for anybody to write a custom logger but allows us to come up with a few basic ones that are general enough. we can have a BaseLogger to just log the population performance but offer a PerformanceLogger that logs the difference in time between two log moments.

i am currently logging the datetime, an id for the population, an id for the individual, the fitness of every individual and the chromosome. is this well enough?

there is a todo at line 202 in population.py -> do we still want to keep track of a populations/individuals age? i think we don't need it, it may be nicer to keep track of a generation via the logging object (ie. we keep track of how often .log() is called an refer to that as a generation).

we can try to use pythons logging framework here, but i am wondering if we need it. opinions?
opened by koaning 7
Rebranding it as MapReproduce? ;)

I know it's kind of arrogant to be some random person hinting for renaming a repo... but I still think it is a good idea. (At PyData Warsaw I chuckled a few times. Yet, 3 days later I still think it is worth doing.)

As it is essentially a functional (map, filter, kind-of-reduce) approach to evolutionary algorithms, MapReproduce would capture its spirit in its name.

(No, I don't mean renaming any parts of API, at least not for this reason.)

opened by stared 7
Python version

Currently we only support python 3.6, since we use f'{string}' syntax. On many platforms python3.6 isn't readily available (e.g. raspberry pi, on which I haven't even been able to compile it so far) - don't we want to support 3.4 too?

opened by rogiervandergeer 7

examples won't run, import error

I am having trouble making sense of this one.

➜  evol git:(master) ✗ pip uninstall evol; pip install .
Uninstalling evol-0.1:
  /anaconda/lib/python3.6/site-packages/evol-0.1-py3.6.egg-info
Proceed (y/n)? y
  Successfully uninstalled evol-0.1
Processing /Users/code/Development/evol
Requirement already satisfied: pytest in /anaconda/lib/python3.6/site-packages (from evol==0.1)
Requirement already satisfied: py>=1.4.29 in /anaconda/lib/python3.6/site-packages (from pytest->evol==0.1)
Requirement already satisfied: setuptools in /anaconda/lib/python3.6/site-packages (from pytest->evol==0.1)
Installing collected packages: evol
  Running setup.py install for evol ... done
Successfully installed evol-0.1
➜  evol git:(master) ✗ python examples/example_pheromone.py
Traceback (most recent call last):
  File "examples/example_pheromone.py", line 13, in <module>
    from evol import Population, Evolution
  File "/anaconda/lib/python3.6/site-packages/evol/__init__.py", line 2, in <module>
    from .evolution import Evolution
  File "/anaconda/lib/python3.6/site-packages/evol/evolution.py", line 3, in <module>
    from .population import Population
  File "/anaconda/lib/python3.6/site-packages/evol/population.py", line 2, in <module>
    from evol.helpers.utils import select_arguments
  File "/anaconda/lib/python3.6/site-packages/evol/helpers.py", line 6, in <module>
    from evol.population import Population
ImportError: cannot import name 'Population'

Did you get something similar? I am wondering if this is my system that is acting up.

opened by koaning 6

how to do a grid search

Depending on the problem you're dealing with, you may want to supply the population with a predefined list of chromosomes instead of a function to generate a mere individual. This way the end user might specify a grid to start from.

We could accomodate for this flow by adding a init_chromosomes parameter to Population.__init__ where we would check that either init_func or init_chromosomes are used.

Any thoughts for/against this? After thinking it over it seems like the only way to do this in our current API is to first generate a population and next overwrite all the individuals.

opened by koaning 6

how to deal with kwargs

def init_func():
    return 1


def eval_func(x):

    return x


def pick_two_random_parents(population):
    return random.choices(population, k=2)


def pick_n_random_parents(population, n_parents=2):
    return random.choices(population, k=n_parents)


def combine_two_parents(mom, dad):
    return (mom+dad)/2


def general_combiner(*parents):
    return sum(parents)/len(parents)

pop1 = Population(init_function=init_func, eval_function=eval_func, size=200)
pop1.survive(n=50).breed(parent_picker=pick_two_random_parents, combiner=combine_two_parents)

pop2 = Population(init_function=init_func, eval_function=eval_func, size=200)
pop2.survive(n=50).breed(parent_picker=pick_n_random_parents, combiner=general_combiner)

What is a nice way to change the n_parents parameter? Something like;

pop2.survive(n=50).breed(parent_picker=pick_n_random_parents, combiner=general_combiner, n_parents=3)

opened by koaning 6

Is it possible to implement elitism?

I wrote this function and used it as a callback to attemp elitist selection without success:

best_chromosome = None

def save_best(population):
    global best_chromosome
    
    population_best = population.current_best
    
    if best_chromosome is None or population_best.fitness > best_chromosome.fitness:
        best_chromosome = deepcopy(population_best)

Is there a way to do it?

opened by ELC 16

CI Fix

I've made the repo flake8 compatible again and I've changed the Makefile. Flake8 raised this warning without the change;

WARNING: flake8 setuptools integration is deprecated and scheduled for removal in 4.x.  For more information, see https://gitlab.com/pycqa/flake8/issues/544

opened by koaning 0

WIP: Change the signature of pick_random

Parent pickers are no longer passed any kwargs. The pick_random must now be initialised before use, the number of parents passed to it upon initialization. In addition, pickers must always return a sequence of picked parents - even if it is only one.

These changes make it much easier to implement more complex picking algorithms, and in addition they remove the requirement for the select_arguments() decorator, which hurts my eyes.

opened by rogiervandergeer 7
DiceDots: a problem for contest population

I heard of a cool puzzle.

Suppose that you can take a cube and draw 18 dots on it such that you can make a custom dice with custom eyes on each side. You can determine which die can beat other dice so the question is, what is the most balanced die?

This might be a fun problem to explain the ContestPopulation with and might deserve to make an appearance as a problem instance.

opened by koaning 0