AB-test-analyzer
Python class to perform AB test analysis
Overview
This repo contains a Python class to perform an A/B/C… test analysis with proportion-based metrics (including posthoc test). In practice, the class can be used along with any appropriate RDBMS retrieval tool (e.g. google.cloud.bigquery
module for BigQuery) so that, together, they result in an end-to-end analysis process, i.e. from querying the experiment data stored originally in SQL to arriving at the complete analysis results.
ABTest
Class
The The class is named ABTest
. It is written on top of several well-known libraries (numpy
, pandas
, scipy
, and statsmodels
). The class' main functionality is to consume an experiment results data frame (experiment_df
), metric information (nominator_metric
, denominator_metric
), and meta-information about the platform being experimented (platform
) to perform two layers of statistical tests.
First, it will perform a Chi-square test on the aggregate data level. If this test is significant, the function will continue to perform a posthoc test that consists of testing each pair of experimental groups to report their adjusted p-values, as well as their absolute lift (difference) confidence intervals. Moreover, the class also has a method to calculate the statistical power of the experiment.
Class Init
To create an instance of ABTest class, we need to pass the following parameters--that also become the class instance attributes:
experiment_df
: pandas dataframe that contains the experiment data to be analyzed. The data contained must form a proportion based metric (nominator_metric/denominator_metric <= 1
). More on this parameter can be found in a later section.nominator_metric
: string representing the name of the nominator metric, one constituent of the proportion-based metric inexperiment_df
, e.g."transaction"
denominator_metric
: string representing the name of the denominator metric, another constituent of the proportion-based metric inexperiment_df
, e.g."visit"
platform
: string representing the platform represented by the experiment data, e.g."android"
,"ios"
Methods
get_reporting_df
This function has one parameter called metric_level
(string, default value is None
) that specifies the metric level of the experiment data whose reporting dataframe is to be derived. Two common values for this parameter are "user"
and "event"
.
Below is the output example from calling self.get_reporting_df(metric_level='user')
| | experiment_group | metric_level | targeted | redeemed | conversion |
|---:|:-------------------|:---------------|-----------:|-----------:|-------------:|
| 0 | control | user | 8333 | 1062 | 0.127445 |
| 1 | variant1 | user | 8002 | 825 | 0.103099 |
| 2 | variant2 | user | 8251 | 1289 | 0.156223 |
| 3 | variant3 | user | 8275 | 1228 | 0.148399 |
posthoc_test
This function is the engine under the hood of the analyze
method. It has three parameters:
reporting_df
: pandas dataframe, output ofget_reporting_df
methodmetric_level
: string, the metric level of the experiment data whose reporting dataframe is to be derivedalpha
: float, the used alpha in the analysis
analyze
The main function to analyze the AB test. It has two parameters:
metric_level
: string, the metric level of the experiment data whose reporting dataframe is to be derived (default value isNone
). Two common values for this parameter are"user"
and"event"
alpha
: float, the used alpha in the analysis (default value is0.05
)
The output of this method is a pandas dataframe with the following columns:
metric_level
: optional, only if metric_level parameter is notNone
pair
: the segment pair being individually tested using z-proportion testraw_p_value
: the raw p-value from the individual z-proportion testadj_p_value
: the adjusted p-value (using Benjamini-Hochberg method) from z-proportion tests. Note that significant result is marked with *mean_ci
: the mean (center value) of the metrics delta confidence interval at1-alpha
lower_ci
: the lower bound of the metrics delta confidence interval at1-alpha
upper_ci
: the upper bound of the metrics delta confidence interval at1-alpha
Sample output:
| | metric_level | pair | raw_p_value | adj_p_value | mean_ci | lower_ci | upper_ci |
|---:|:---------------|:---------------------|--------------:|:------------------------|------------:|------------:|------------:|
| 0 | user | control vs variant1 | 1.13731e-06 | 1.592240591875927e-06* | -0.0243459 | -0.0341516 | -0.0145402 |
| 1 | user | control vs variant2 | 1.08192e-07 | 1.8933619380632198e-07* | 0.0287784 | 0.0181608 | 0.0393959 |
| 2 | user | control vs variant3 | 9.00223e-05 | 0.00010502606726165857* | 0.0209537 | 0.0104664 | 0.031441 |
| 3 | user | variant1 vs variant2 | 7.82096e-24 | 2.737334684573585e-23* | 0.0531243 | 0.0427802 | 0.0634683 |
| 4 | user | variant1 vs variant3 | 3.23786e-18 | 7.554997289146693e-18* | 0.0452996 | 0.0350976 | 0.0555015 |
| 5 | user | variant2 vs variant1 | 7.82096e-24 | 2.737334684573585e-23* | -0.0531243 | -0.0634683 | -0.0427802 |
| 6 | user | variant2 vs variant3 | 0.161595 | 0.16159493454321772 | nan | nan | nan |
calculate_power
This function calculates the experiment’s statistical power for the supplied experiment_df
. It has three parameters:
practical_lift
: float, the metrics lift that perceived meaningfulalpha
: float, the used alpha in the analysis (default value is0.05
)metric_level
: string, the metric level of the experiment data whose reporting dataframe is to be derived (default value isNone
). Two common values for this parameter are"user"
and"event"
Sample output:
The experiment's statistical power is 0.2680540196528648
Data Format
This section is dedicated to explaining the details of the format of experiment_df
, i.e. the main data supply for the ABTest
class.
experiment_df
must at least have three columns with the following names:
experiment_group
: self-explanatorydenominator_metric
: the name of the denominator metric, one constituent of the proportion-based metric inexperiment_df
, e.g."visit"
nominator_metric
: the name of the nominator metric, one constituent of the proportion-based metric inexperiment_df
, e.g."transaction"
- (optional)
metric_level
: the metric level of the data (usually either"user"
or"event"
)
In practice, this dataframe is derived by querying SQL tables using an appropriate retrieval tool.
Sample experiment_df
| | experiment_group | metric_level | targeted | redeemed |
|---:|:-------------------|:---------------|-----------:|-----------:|
| 0 | control | user | 8333 | 1062 |
| 1 | variant1 | user | 8002 | 825 |
| 2 | variant2 | user | 8251 | 1289 |
| 3 | variant3 | user | 8275 | 1228 |
Usage Guideline
The general steps:
- Prepare
experiment_df
(via anything you’d prefer) - Create an
ABTest
class instance - To get reporting dataframe, call
get_reporting_df
method - To analyze end-to-end, call
analyze
method - To calculate experiment’s statistical power, call
calculate_power
method
See the sample usage notebook for more details.