- [x] Documentation
- [x] Tests added and passed
To calculate causal effects by segments a quantile-based approach is used to create the segments. However, we've seen that this is usually not the ideal way to create them as there are methods that create segments that are more distinguishable between one another.
One such example is the Fisher-Jenks algorithm. A user could create its own partitioner with this algorithm like this:
import pandas as pd
from jenkspy import jenks_breaks
from toolz import curry
from typing import List
def fisher_jenks_partitioner(series: pd.Series, segments: int) -> List:
bins = jenks_breaks(series, n_classes=segments)
bins = -float("inf")
bins[-1] = float("inf")
And use it in
from fklearn.causal.effects import linear_effect
from fklearn.causal.validation.curves import effect_by_segment
df = pd.DataFrame(dict(
t=[1, 1, 1, 2, 2, 2, 3, 3, 3],
x=[1, 2, 3, 1, 2, 3, 1, 2, 3],
y=[1, 1, 1, 2, 3, 4, 3, 5, 7],
result = effect_by_segment(
Or use another custom partitioner such as:
def bin_partitioner(series: pd.Series, segments: int = 1) -> List:
return [1, 4, 5]
Description of the changes proposed in the pull request
- an argument to the
effect_by_segment function so a user can define the way the segments are created.
quantile_partitioner so the default behavior of
effect_by_segment is maintained.
- a new
- tests for
- documentation for the new
Where should the reviewer start?
At the modifications we did in
effect_by_segment and then to the
Remaining problems or questions
We are not creating additional partitioners to the ones used by default because this would require more complex definitions or imports on new libraries (such as the Fisher-Jenks algorithm).