PyEmits, a python package for easy manipulation in time-series data. Time-series data is very common in real life.

- Engineering
- FSI industry (Financial Services Industry)
- FMCG (Fast Moving Consumer Good)

Data scientist's work consists of:

- forecasting
- prediction/simulation
- data prepration
- cleansing
- anomaly detection
- descriptive data analysis/exploratory data analysis

each new business unit shall build the following wheels again and again

- data pipeline
- extraction
- transformation
- cleansing
- feature engineering
- remove outliers
- AI landing for prediction, forecasting

- write it back to database

- ml framework
- multiple model training
- multiple model prediction
- kfold validation
- anomaly detection
- forecasting
- deep learning model in easy way
- ensemble modelling

- exploratory data analysis
- descriptive data analysis
- ...

That's why I create this project, also for fun. haha

This project is under active development, free to use (Apache 2.0) I am happy to see anyone can contribute for more advancement on features

# Install

`pip install pyemits`

# Features highlight

- Easy training

```
import numpy as np
from pyemits.core.ml.regression.trainer import RegTrainer, RegressionDataModel
X = np.random.randint(1, 100, size=(1000, 10))
y = np.random.randint(1, 100, size=(1000, 1))
raw_data_model = RegressionDataModel(X, y)
trainer = RegTrainer(['XGBoost'], [None], raw_data_model)
trainer.fit()
```

- Accept neural network as model

```
import numpy as np
from pyemits.core.ml.regression.trainer import RegTrainer, RegressionDataModel
from pyemits.core.ml.regression.nn import KerasWrapper
X = np.random.randint(1, 100, size=(1000, 10, 10))
y = np.random.randint(1, 100, size=(1000, 4))
keras_lstm_model = KerasWrapper.from_simple_lstm_model((10, 10), 4)
raw_data_model = RegressionDataModel(X, y)
trainer = RegTrainer([keras_lstm_model], [None], raw_data_model)
trainer.fit()
```

also keep flexibility on customized model

```
import numpy as np
from pyemits.core.ml.regression.trainer import RegTrainer, RegressionDataModel
from pyemits.core.ml.regression.nn import KerasWrapper
X = np.random.randint(1, 100, size=(1000, 10, 10))
y = np.random.randint(1, 100, size=(1000, 4))
from keras.layers import Dense, Dropout, LSTM
from keras import Sequential
model = Sequential()
model.add(LSTM(128,
activation='softmax',
input_shape=(10, 10),
))
model.add(Dropout(0.1))
model.add(Dense(4))
model.compile(loss='mse', optimizer='adam', metrics=['mse'])
keras_lstm_model = KerasWrapper(model, nickname='LSTM')
raw_data_model = RegressionDataModel(X, y)
trainer = RegTrainer([keras_lstm_model], [None], raw_data_model)
trainer.fit()
```

or attach it in algo config

```
import numpy as np
from pyemits.core.ml.regression.trainer import RegTrainer, RegressionDataModel
from pyemits.core.ml.regression.nn import KerasWrapper
from pyemits.common.config_model import KerasSequentialConfig
X = np.random.randint(1, 100, size=(1000, 10, 10))
y = np.random.randint(1, 100, size=(1000, 4))
from keras.layers import Dense, Dropout, LSTM
from keras import Sequential
keras_lstm_model = KerasWrapper(nickname='LSTM')
config = KerasSequentialConfig(layer=[LSTM(128,
activation='softmax',
input_shape=(10, 10),
),
Dropout(0.1),
Dense(4)],
compile=dict(loss='mse', optimizer='adam', metrics=['mse']))
raw_data_model = RegressionDataModel(X, y)
trainer = RegTrainer([keras_lstm_model],
[config],
raw_data_model,
{'fit_config' : [dict(epochs=10, batch_size=32)]})
trainer.fit()
```

PyTorch, MXNet under development you can leave me a message if you want to contribute

- MultiOutput training

```
import numpy as np
from pyemits.core.ml.regression.trainer import RegressionDataModel, MultiOutputRegTrainer
from pyemits.core.preprocessing.splitting import SlidingWindowSplitter
X = np.random.randint(1, 100, size=(10000, 1))
y = np.random.randint(1, 100, size=(10000, 1))
# when use auto-regressive like MultiOutput, pls set ravel = True
# ravel = False, when you are using LSTM which support multiple dimension
splitter = SlidingWindowSplitter(24,24,ravel=True)
X, y = splitter.split(X, y)
raw_data_model = RegressionDataModel(X,y)
trainer = MultiOutputRegTrainer(['XGBoost'], [None], raw_data_model)
trainer.fit()
```

- Parallel training
- provide fast training using parallel job
- use RegTrainer as base, but add Parallel running

```
import numpy as np
from pyemits.core.ml.regression.trainer import RegressionDataModel, ParallelRegTrainer
X = np.random.randint(1, 100, size=(10000, 1))
y = np.random.randint(1, 100, size=(10000, 1))
raw_data_model = RegressionDataModel(X,y)
trainer = ParallelRegTrainer(['XGBoost', 'LightGBM'], [None, None], raw_data_model)
trainer.fit()
```

or you can use RegTrainer for multiple model, but it is not in Parallel job

```
import numpy as np
from pyemits.core.ml.regression.trainer import RegressionDataModel, RegTrainer
X = np.random.randint(1, 100, size=(10000, 1))
y = np.random.randint(1, 100, size=(10000, 1))
raw_data_model = RegressionDataModel(X,y)
trainer = RegTrainer(['XGBoost', 'LightGBM'], [None, None], raw_data_model)
trainer.fit()
```

- KFold training
- KFoldConfig is global config, will apply to all

```
import numpy as np
from pyemits.core.ml.regression.trainer import RegressionDataModel, KFoldCVTrainer
from pyemits.common.config_model import KFoldConfig
X = np.random.randint(1, 100, size=(10000, 1))
y = np.random.randint(1, 100, size=(10000, 1))
raw_data_model = RegressionDataModel(X,y)
trainer = KFoldCVTrainer(['XGBoost', 'LightGBM'], [None, None], raw_data_model, {'kfold_config':KFoldConfig(n_splits=10)})
trainer.fit()
```

- Easy prediction

```
import numpy as np
from pyemits.core.ml.regression.trainer import RegressionDataModel, RegTrainer
from pyemits.core.ml.regression.predictor import RegPredictor
X = np.random.randint(1, 100, size=(10000, 1))
y = np.random.randint(1, 100, size=(10000, 1))
raw_data_model = RegressionDataModel(X,y)
trainer = RegTrainer(['XGBoost', 'LightGBM'], [None, None], raw_data_model)
trainer.fit()
predictor = RegPredictor(trainer.clf_models, 'RegTrainer')
predictor.predict(RegressionDataModel(X))
```

- Forecast at scale
- see examples: forecast at scale.ipynb

- Data Model

```
from pyemits.common.data_model import RegressionDataModel
import numpy as np
X = np.random.randint(1, 100, size=(1000,10,10))
y = np.random.randint(1, 100, size=(1000, 1))
data_model = RegressionDataModel(X, y)
data_model._update_variable('X_shape', (1000,10,10))
data_model.X_shape
data_model.add_meta_data('X_shape', (1000,10,10))
data_model.meta_data
```

- Anomaly detection (under development)
- see module: anomaly detection
- Kalman filter

- Evaluation (under development)
- see module: evaluation
- backtesting
- model evaluation

- Ensemble (under development)
- blending
- stacking
- voting
- by combo package
- moa
- aom
- average
- median
- maximization

- IO
- db connection
- local

- dashboard ???
- other miscellaneous feature
- continuous evaluation
- aggregation
- dimensional reduction
- data profile (intensive data overview)

- to be confirmed

# References

the following libraries gave me some idea/insight

- greykit
- changepoint detection
- model summary
- seaonality

- pytorch-forecasting
- darts
- pyaf
- orbit
- kats/prophets by facebook
- sktime
- gluon ts
- tslearn
- pyts
- luminaries
- tods
- autots
- pyodds
- scikit-hts