ForecastGA is a Python tool to forecast Google Analytics data using several popular time series models.

JR Oakes

Last update: Jan 3, 2023

Related tags

Overview

ForecastGA

A Python tool to forecast GA data using several popular time series models.

About

Welcome to ForecastGA

ForecastGA is a tool that combines a couple of popular libraries, Atspy and googleanalytics, with a few enhancements.

The models are made more intuitive to upgrade and add by having the tool logic separate from the model training and prediction.
When calling am.forecast_insample(), any kwargs included (e.g. learning_rate) are passed to the train method of the model.
Google Analytics profiles are specified by simply passing the URL (e.g. https://analytics.google.com/analytics/web/?authuser=2#/report-home/aXXXXXwXXXXXpXXXXXX).
You can provide a data dict with GA config options or a Pandas Series as the input data.
Multiple log levels.
Auto GPU detection (via Torch).
List all available models, with descriptions, by calling forecastga.print_model_info().
Google API info can be passed in the data dict or uploaded as a JSON file named identity.json.
Created a companion Google Colab notebook to easily run on GPU.
A handy plot function for Colab, forecastga.plot_colab(forecast_in, title="Insample Forecast", dark_mode=True) that formats nicely and also handles Dark Mode!

Models Available

ARIMA : Automated ARIMA Modelling
Prophet : Modeling Multiple Seasonality With Linear or Non-linear Growth
ProphetBC : Prophet Model with Box-Cox transform of the data
HWAAS : Exponential Smoothing With Additive Trend and Additive Seasonality
HWAMS : Exponential Smoothing with Additive Trend and Multiplicative Seasonality
NBEATS : Neural basis expansion analysis (now fixed at 20 Epochs)
Gluonts : RNN-based Model (now fixed at 20 Epochs)
TATS : Seasonal and Trend no Box Cox
TBAT : Trend and Box Cox
TBATS1 : Trend, Seasonal (one), and Box Cox
TBATP1 : TBATS1 but Seasonal Inference is Hardcoded by Periodicity
TBATS2 : TBATS1 With Two Seasonal Periods

How To Use

Find Model Info:

forecastga.print_model_info()

Initialize Model:

Google Analytics:

data = { 'client_id': '',
         'client_secret': '',
         'identity': '',
         'ga_start_date': '2018-01-01',
         'ga_end_date': '2019-12-31',
         'ga_metric': 'sessions',
         'ga_segment': 'organic traffic',
         'ga_url': 'https://analytics.google.com/analytics/web/?authuser=2#/report-home/aXXXXXwXXXXXpXXXXXX',
         'omit_values_over': 2000000
        }

model_list = ["TATS", "TBATS1", "TBATP1", "TBATS2", "ARIMA"]
am = forecastga.AutomatedModel(data , model_list=model_list, forecast_len=30 )

Pandas DataFrame:

# CSV with columns: Date and Sessions
df = pd.read_csv('ga_sessions.csv')
df.Date = pd.to_datetime(df.Date)
df = df.set_index("Date")
data = df.Sessions

model_list = ["TATS", "TBATS1", "TBATP1", "TBATS2", "ARIMA"]
am = forecastga.AutomatedModel(data , model_list=model_list, forecast_len=30 )

Forecast Insample:

forecast_in, performance = am.forecast_insample()

Forecast Outsample:

forecast_out = am.forecast_outsample()

Ensemble Performance:

all_ensemble_in, all_ensemble_out, all_performance = am.ensemble(forecast_in, forecast_out)

Pretty Plot in Google Colab

forecastga.plot_colab(forecast_in, title="Insample Forecast", dark_mode=True)

Installation

Windows users may need to manually install the two items below via conda :

conda install pystan
conda install pytorch -c pytorch
!pip install --upgrade git+https://github.com/jroakes/ForecastGA.git

otherwise, pip install --upgrade forecastga

This repo support GPU training. Below are a few libraries that may have to be manually installed to support.

pip install --upgrade mxnet-cu101
pip install --upgrade torch 1.7.0+cu101

Acknowledgements

Majority of forecasting code taken from https://github.com/firmai/atspy and refactored heavily.
Google Analytics based off of: https://github.com/debrouwere/google-analytics
Thanks to richardfergie for the addition of the Prophet Box-Cox model to control negative predictions.

Contribute

The goal of this repo is to grow the list of available models to test. If you would like to contribute one please read on. Feel free to have fun naming your models.

Fork the repo.
In the /src/forecastga/models folder there is a model called template.py. You can use this as a template for creating your new model. All available variables are there. Forecastga ensures each model has the right data and calls only the train and forecast methods for each model. Feel free to add additional methods that your model requires.
Edit the /src/forecastga/models/__init__.py file to add your model's information. Follow the format of the other entries. Forecastga relies on loc to find the model and class to find the class to use.
Edit requirments.txt with any additional libraries needed to run your model. Keep in mind that this repo should support GPU training if available and some libraries have separate GPU-enabled versions.
Issue a pull request.

If you enjoyed this tool consider buying me some beer at: Paypalme

Comments

Add model that combines Prophet and Box-Cox transform

The Box-Cox generates a parameter lambda during training that needs to be passed to the forecasting stage so that the correct inverse can be applied. I've done this as self.lam but there might be a cleaner interface for it; I am very inexperienced with OO programming

I'm really sorry, but I have hardly tested this code at all because I'm a bit short on time. I know you wanted something this week so I thought I'd better commit and pull request now rather than when I've got everything 100% working

opened by richardfergie 2

A Pythonic introduction to methods for scaling your data science and machine learning work to larger datasets and larger models, using the tools and APIs you know and love from the PyData stack (such as numpy, pandas, and scikit-learn).

This tutorial's purpose is to introduce Pythonistas to methods for scaling their data science and machine learning work to larger datasets and larger models, using the tools and APIs they know and love from the PyData stack (such as numpy, pandas, and scikit-learn).

102 Nov 10, 2022

Driver Analysis with Factors and Forests: An Automated Data Science Tool using Python

Driver Analysis with Factors and Forests: An Automated Data Science Tool using Python 📊

2 May 26, 2022

fds is a tool for Data Scientists made by DAGsHub to version control data and code at once.

Fast Data Science, AKA fds, is a CLI for Data Scientists to version control data and code at once, by conveniently wrapping git and dvc

359 Dec 22, 2022

🧪 Panel-Chemistry - exploratory data analysis and build powerful data and viz tools within the domain of Chemistry using Python and HoloViz Panel.

🧪📈 🐍. The purpose of the panel-chemistry project is to make it really easy for you to do DATA ANALYSIS and build powerful DATA AND VIZ APPLICATIONS within the domain of Chemistry using using Python and HoloViz Panel.

97 Dec 8, 2022

Amundsen is a metadata driven application for improving the productivity of data analysts, data scientists and engineers when interacting with data.

3.7k Jan 3, 2023

Elementary is an open-source data reliability framework for modern data teams. The first module of the framework is data lineage.

Data lineage made simple, reliable, and automated. Effortlessly track the flow of data, understand dependencies and analyze impact. Features Visualiza

898 Jan 9, 2023

Integrate bus data from a variety of sources (batch processing and real time processing).

Purpose: This is integrate bus data from a variety of sources such as: csv, json api, sensor data ... into Relational Database (batch processing and r

1 Nov 25, 2021

Fit models to your data in Python with Sherpa.

Table of Contents Sherpa License How To Install Sherpa Using Anaconda Using pip Building from source History Release History Sherpa Sherpa is a modeli

134 Jan 7, 2023

Created covid data pipeline using PySpark and MySQL that collected data stream from API and do some processing and store it into MYSQL database.

2 Nov 20, 2021

ForecastGA is a Python tool to forecast Google Analytics data using several popular time series models.

Related tags

Overview

ForecastGA

About

Welcome to ForecastGA

Models Available

How To Use

Find Model Info:

Initialize Model:

Google Analytics:

Pandas DataFrame:

Forecast Insample:

Forecast Outsample:

Ensemble Performance:

Pretty Plot in Google Colab

Installation

Acknowledgements

Contribute

You might also like...

A Pythonic introduction to methods for scaling your data science and machine learning work to larger datasets and larger models, using the tools and APIs you know and love from the PyData stack (such as numpy, pandas, and scikit-learn).

Driver Analysis with Factors and Forests: An Automated Data Science Tool using Python

fds is a tool for Data Scientists made by DAGsHub to version control data and code at once.

🧪 Panel-Chemistry - exploratory data analysis and build powerful data and viz tools within the domain of Chemistry using Python and HoloViz Panel.

Amundsen is a metadata driven application for improving the productivity of data analysts, data scientists and engineers when interacting with data.

Elementary is an open-source data reliability framework for modern data teams. The first module of the framework is data lineage.

Integrate bus data from a variety of sources (batch processing and real time processing).

Fit models to your data in Python with Sherpa.

Created covid data pipeline using PySpark and MySQL that collected data stream from API and do some processing and store it into MYSQL database.

Comments

Add model that combines Prophet and Box-Cox transform

Owner

JR Oakes

Utilize data analytics skills to solve real-world business problems using Humana’s big data

PyEmits, a python package for easy manipulation in time-series data.

A Python 3 library making time series data mining tasks, utilizing matrix profile algorithms

Mortgage-loan-prediction - Show how to perform advanced Analytics and Machine Learning in Python using a full complement of PyData utilities

An Integrated Experimental Platform for time series data anomaly detection.

Tokyo 2020 Paralympics, Analytics

Weather Image Recognition - Python weather application using series of data

First and foremost, we want dbt documentation to retain a DRY principle. Every time we repeat ourselves, we waste our time. Second, we want to understand column level lineage and automate impact analysis.

Parses data out of your Google Takeout (History, Activity, Youtube, Locations, etc...)

A real-time financial data streaming pipeline and visualization platform using Apache Kafka, Cassandra, and Bokeh.