OHLC Average Prediction of Apple Inc. Using LSTM Recurrent Neural Network

Overview

Stock Price Prediction of Apple Inc. Using Recurrent Neural Network

OHLC Average Prediction of Apple Inc. Using LSTM Recurrent Neural Network

Dataset:

The dataset is taken from yahoo finace's website in CSV format. The dataset consists of Open, High, Low and Closing Prices of Apple Inc. stocks from 3rd january 2011 to 13th August 2017 - total 1664 rows.

Price Indicator:

Stock traders mainly use three indicators for prediction: OHLC average (average of Open, High, Low and Closing Prices), HLC average (average of High, Low and Closing Prices) and Closing price, In this project, OHLC average has been used.

Data Pre-processing:

After converting the dataset into OHLC average, it becomes one column data. This has been converted into two column time series data, 1st column consisting stock price of time t, and second column of time t+1. All values have been normalized between 0 and 1.

Model:

Two sequential LSTM layers have been stacked together and one dense layer is used to build the RNN model using Keras deep learning library. Since this is a regression task, 'linear' activation has been used in final layer.

Version:

Python 2.7 and latest versions of all libraries including deep learning library Keras and Tensorflow.

Training:

75% data is used for training. Adagrad (adaptive gradient algorithm) optimizer is used for faster convergence. After training starts it will look like:

tt3

Test:

Test accuracy metric is root mean square error (RMSE).

Results:

The comparison of OHLC, HLC and Closing price:

ttt1

After the training the fitted curve with original stock price:

tt2

Observation and Conclusion:

Since difference among OHLC average, HLC average and closing value is not significat, so only OHLC average is used to build the model and prediction. The training and testing RMSE are: 1.24 and 1.37 respectively which is pretty good to predict future values of stock. Stock price of last day of dataset was 158.8745 and using this model and price of next two days are predicted as 160.3230 and 160.9240 - which were 159.2075 and 159.8325 on 14th and 15th August 2017 according to Yahoo Finance. However, future values for any time period can be predicted using this model.

Finally, this work can greatly help the quantitative traders to take decisions.

Comments
  • Fix me if i wrong

    Fix me if i wrong

    1. I think your predictions are delayed for one step and if you zoom in your graphic you'll see this
    2. Predictions itself are just a data of the previous step with some lag

    i don't think it's correct model or maybe it's just a oddities of drawing....

    figure_1 please fix me if i wrong

    opened by hackitdown 6
  • This model no longer produces accurate predictions.

    This model no longer produces accurate predictions.

    I believe this may be an issue with keras / tensorflow updating, and not a 'code' problem.

    absl-py==0.11.0 astunparse==1.6.3 cachetools==4.2.1 certifi==2020.12.5 chardet==4.0.0 cycler==0.10.0 flatbuffers==1.12 gast==0.3.3 google-auth==1.27.0 google-auth-oauthlib==0.4.2 google-pasta==0.2.0 grpcio==1.32.0 h5py==2.10.0 idna==3.1 joblib==1.0.1 Keras==2.4.3 Keras-Preprocessing==1.1.2 kiwisolver==1.3.1 Markdown==3.3.4 matplotlib==3.3.4 mplfinance==0.12.7a7 numpy==1.19.5 oauthlib==3.1.0 opt-einsum==3.3.0 pandas==1.2.2 Pillow==8.1.0 protobuf==3.15.3 pyasn1==0.4.8 pyasn1-modules==0.2.8 pyparsing==2.4.7 python-dateutil==2.8.1 pytz==2021.1 PyYAML==5.4.1 requests==2.25.1 requests-oauthlib==1.3.0 rsa==4.7.2 scikit-learn==0.24.1 scipy==1.6.1 six==1.15.0 tensorboard==2.4.1 tensorboard-plugin-wit==1.8.0 tensorflow==2.4.1 tensorflow-estimator==2.4.0 termcolor==1.1.0 threadpoolctl==2.1.0 typing-extensions==3.7.4.3 urllib3==1.26.3 Werkzeug==1.0.1 wrapt==1.12.1

    image

    No errors of any kind, please advise.

    opened by HourGlss 4
  • Hi there can i ask a question?

    Hi there can i ask a question?

    First of all thank you for sharing your code. Im just looking for this type code (predict future price). Now im testing but i got wrong predicted values. All predicted values are increased. Why?

    opened by munkh-erdene 4
  • Mistake in Code

    Mistake in Code

    There is a big mistake in your code. In the StockPrediction.py , model.add('linear') is not right. We all know that the activation function have to be not linear, the linear activation will make the perdiction not right.

    opened by YuQuankun 1
  • Issues with preprocessing

    Issues with preprocessing

    Hello ,

    I am facing some issues with Preprocessing. When I a running the section with preprocessing this is what I get:

    AttributeError: module 'sklearn.preprocessing' has no attribute 'new_dataset'

    Here is the code of yours. Am I missing any steps?

    #Edit Author: Ray

    IMPORTING IMPORTANT LIBRARIES

    import pandas as pd import matplotlib.pyplot as plt import numpy as np import math from sklearn.preprocessing import MinMaxScaler from sklearn.metrics import mean_squared_error from keras.models import Sequential from keras.layers import Dense, Activation from keras.layers import LSTM from sklearn import preprocessing # how to import preprocessing import sklearn.preprocessing import numpy as np

    FOR REPRODUCIBILITY

    np.random.seed(7)

    IMPORTING DATASET

    dataset = pd.read_csv('C:/Users/ray/Documents/Python Scripts/LSTM-Stock-prediction-master/apple_share_price.csv', usecols=[1,2,3,4]) dataset = dataset.reindex(index = dataset.index[::-1])

    CREATING OWN INDEX FOR FLEXIBILITY

    obs = np.arange(1, len(dataset) + 1, 1)

    TAKING DIFFERENT INDICATORS FOR PREDICTION

    OHLC_avg = dataset.mean(axis = 1) HLC_avg = dataset[['High', 'Low', 'Close']].mean(axis = 1) close_val = dataset[['Close']]

    PLOTTING ALL INDICATORS IN ONE PLOT

    plt.plot(obs, OHLC_avg, 'r', label = 'OHLC avg') plt.plot(obs, HLC_avg, 'b', label = 'HLC avg') plt.plot(obs, close_val, 'g', label = 'Closing price') plt.legend(loc = 'upper right') plt.show()

    PREPARATION OF TIME SERIES DATASET

    OHLC_avg = np.reshape(OHLC_avg.values, (len(OHLC_avg),1)) # 1664 scaler = MinMaxScaler(feature_range=(0, 1)) OHLC_avg = scaler.fit_transform(OHLC_avg)

    TRAIN-TEST SPLIT

    train_OHLC = int(len(OHLC_avg) * 0.75) test_OHLC = len(OHLC_avg) - train_OHLC train_OHLC, test_OHLC = OHLC_avg[0:train_OHLC,:], OHLC_avg[train_OHLC:len(OHLC_avg),:]

    +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ step_size = 1

    FUNCTION TO CREATE 1D DATA INTO TIME SERIES DATASET

    def new_dataset(dataset, step_size): trainX, trainY = [], [] for i in range(len(dataset)-step_size-1): a = dataset[i:(i+step_size), 0] trainX.append(a) trainY.append(dataset[i + step_size, 0]) return np.array(trainX), np.array(trainY)

    TIME-SERIES DATASET (FOR TIME T, VALUES FOR TIME T+1)

    trainX, trainY = sklearn.preprocessing.new_dataset(train_OHLC, 1) testX, testY = sklearn.preprocessing.new_dataset(test_OHLC, 1) +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

    RESHAPING TRAIN AND TEST DATA

    trainX = np.reshape(train_OHLC, (train_OHLC.shape[0], 1, train_OHLC.shape[1])) testX = np.reshape(testX, (testX.shape[0], 1, testX.shape[1])) step_size = 1

    LSTM MODEL

    model = Sequential() model.add(LSTM(32, input_shape=(1, step_size), return_sequences = True)) model.add(LSTM(16)) model.add(Dense(1)) model.add(Activation('linear'))

    MODEL COMPILING AND TRAINING

    model.compile(loss='mean_squared_error', optimizer='adagrad') # Try SGD, adam, adagrad and compare!!! model.fit(trainX, trainY, epochs=5, batch_size=1, verbose=2)

    PREDICTION

    trainPredict = model.predict(trainX) testPredict = model.predict(testX)

    DE-NORMALIZING FOR PLOTTING

    trainPredict = scaler.inverse_transform(trainPredict) trainY = scaler.inverse_transform([trainY]) testPredict = scaler.inverse_transform(testPredict) testY = scaler.inverse_transform([testY])

    TRAINING RMSE

    trainScore = math.sqrt(mean_squared_error(trainY[0], trainPredict[:,0])) print('Train RMSE: %.2f' % (trainScore))

    TEST RMSE

    testScore = math.sqrt(mean_squared_error(testY[0], testPredict[:,0])) print('Test RMSE: %.2f' % (testScore))

    CREATING SIMILAR DATASET TO PLOT TRAINING PREDICTIONS

    trainPredictPlot = np.empty_like(OHLC_avg) trainPredictPlot[:, :] = np.nan trainPredictPlot[step_size:len(trainPredict)+step_size, :] = trainPredict

    CREATING SIMILAR DATASSET TO PLOT TEST PREDICTIONS

    testPredictPlot = np.empty_like(OHLC_avg) testPredictPlot[:, :] = np.nan testPredictPlot[len(trainPredict)+(step_size*2)+1:len(OHLC_avg)-1, :] = testPredict

    DE-NORMALIZING MAIN DATASET

    OHLC_avg = scaler.inverse_transform(OHLC_avg)

    PLOT OF MAIN OHLC VALUES, TRAIN PREDICTIONS AND TEST PREDICTIONS

    plt.plot(OHLC_avg, 'g', label = 'original dataset') plt.plot(trainPredictPlot, 'r', label = 'training set') plt.plot(testPredictPlot, 'b', label = 'predicted stock price/test set') plt.legend(loc = 'upper right') plt.xlabel('Time in Days') plt.ylabel('OHLC Value of Apple Stocks') plt.show()

    PREDICT FUTURE VALUES

    last_val = testPredict[-1] last_val_scaled = last_val/last_val next_val = model.predict(np.reshape(last_val_scaled, (1,1,1))) print "Last Day Value:", np.asscalar(last_val) print "Next Day Value:", np.asscalar(last_val*next_val)

    print np.append(last_val, next_val)

    opened by rayislam 1
  • Wrong Prediction

    Wrong Prediction

    Hi,

    nice post but the results are wrong.

    You are not predicting some days ahead but only one day ahead at a time. You are not taking your prediction as input for your next prediciton, but you are taking the actual value.

    This results in a lag of the actual signal, all your network has to do is produce a similar value to the last input of the price.

    If you would take your prediction as the input for the next prediction you would see that the results are quite bad…

    I see lot’s of LSTM price prediction examples but they all seem to be wrong and I don’t think it is possible to predict accuratly the next prices.

    opened by leodennis 1
  • Measuring error

    Measuring error

    Nice work! I have a suggest: You are testing the model with RMSE this way:

    testScore = math.sqrt(mean_squared_error(testY[0], testPredict[:,0]))

    which is not a good metric to quantify the accuracy neither the performance of the model. Much more important that RMSE is the ability of the model to predict stock movements. So I would like to see some metrics taking into account how much ups and downs are efficiently predicted. I suggest to implement recall and prediction measurements.

    opened by fedecaccia 1
  • Add More Features?

    Add More Features?

    This is awesome and helpful. I have been playing with it and have RMSE down on test data under 1. If I am reading the output right, it displays the previous days average you are using and then the next days prediction? Is there a way to look ahead more steps?

    Also, you are taking the OHLC and turning it into a 1 value input. Is there a way to use this model to add more inputs such as features on top of the OHLC value?

    opened by boxxa 1
  • Update StockPricePrediction.py

    Update StockPricePrediction.py

    I tried the initial code, but there were some problems in the CSV file. I tried with another CSV file from yahoo. I also found an anomaly that the ohlc_avg was in millions, so I removed 'volume' column from the table for making the average. And I did remove 'dataset = dataset.reindex(index = dataset.index[::-1])' because due to it, the price was falling, whereas it should have increased with time.

    I tried the code in this notebook, you can review it too. https://colab.research.google.com/drive/1v6FF9zHMYgBy8QD1kuSL3q3ln-pD7lQw?usp=sharing

    And thanks to you for sharing this code, so that I could learn LSTM. You can contact me for any clarifications at [email protected] or +918617781030(WhatsApp & Call)

    opened by Samar-080301 0
  • Logic behind the code

    Logic behind the code

    Hello there, I would like to know the logic behind the future prediction of the code snippets. last_val = testPredict[-1] last_val_scaled = last_val/last_val next_val = model.predict(np.reshape(last_val_scaled, (1,1,1))) print ("Last Day Value:", np.asscalar(last_val)) print ("Next Day Value:", np.asscalar(last_val*next_val))

    opened by 4emkay 0
  • PLEASE EXPLAIN THE REASON FOR IMPORTING PREPROCESSING

    PLEASE EXPLAIN THE REASON FOR IMPORTING PREPROCESSING

    Hi. Thank you for this project.. Although its written in python 2.7. but there is a package named preprocessing (USED FOR TEXT PREPRPOCESSING) and cannot be install in python2. what can be suitable alternate function instead of using preprocessing.new_data()? preferably in scikit learn. Thank you

    opened by KhaaQ 0
  • TTM Squeeze Pro

    TTM Squeeze Pro

    TTM Squeeze Pro {"m":"create_study","p":["cs_YcJ513Cvsuhb","st4","st1","sds_1","Script@tv-scripting-101!",{"text":"CJj4Ug3Yp5M7ZVNBdQXoRQ==_pHhbuDEwXDaCrSt+VDSNI9LL/zg2VhikNtpD6S8x8lri/FQ/Ko1DMDAYAC25z10cjq79NfyXj9qs+jiQMVCHm5rcQX6lirXbH9n+4RiT4An1nXIxb0nx6F++RV3uMAQa/9s2nygsfsh4KL1sKlc3e6ZcaXRqjSAemX74kYVPP/XSw2gEGYwAGXgVrb9NoWDi1it9eWk9scCPqvmsrgMMvNVVLmBOBhSShrCbA8B6NtIuuSiVw2sJpIvJHzqJuf6dBCo+7oIrIjX2mMhdWYfY+qL+xtu4p56QMFkSiI5RNtMnU/a8y09DfDA6W5+q/la23KcUu1awoRHH3leen8apCGTvMDtLFrJQ3m3cUYttpzu7YL/tuNlGzrnrhUzepe3BHjOjz5PFQK6MO0JyHSlvzUjLEa5JsUpWDBYgWF1UJkGMhu0yT2n87omqlm8zfp5hxhYl3BiCwUuVvfzHj56QmaGE5PaZBmhaci0DsPkUsHIh4atk7lXs7HMlz2sAxfOjWogEZo/cfWckWoQUEyLsEgAvB5hYrnGG9yofTVvKE90O3v5slNSo7lE/Df9iJUveUIzFQezCLUZTEGiUR0n8KVNwehGGRdTRnljSbbLLt7Y+wtYn2GQHEhMNzmufjP0LcZpySZfl4o2yIrOV1u1p7Ypg6PHK+ULBJccxQUmQq39hb6JPuAH4JTCtNBbWRR3PwCPtfK2p2dTn/h6Hsb542S1x1OtFr427I2Pi/cWxOEzuOTwHgsFHpSF4pawUyJ7bIlfEawpa5PwU3wfJX9/+xcllFfnvporcVTsa/GqI0CmDU/CiaFFIrFkVKZ0lgbdwbpKD6znXGllglWEt5kDBKw3P1F0vD75hhQoQOEzjsbeuGsOaNXuYtet8wSnJguhSCUIbcrH8yoN9frYEYe/GQsxAvPx9CSbRQSbvZEys28FOOHQNflD3oORqEx1HbER1xLpJ90VNVKo7OF43ul54Yne8jkcXz+7x3n5Q45ZQsfejoUPZYHIHRcEPTSFCReQDfPa/iM+JnW39Lq0mYJxQSQomf4L71c8GuWoEqGpZlFKkQYFWmg==","pineId":"PUB;7e9cf40f672c4ab88ac70c580a327870","pineVersion":"4.0","pineFeatures":{"v":"{"indicator":1,"plot":1,"ta":1,"math":1,"alert":1}","f":true,"t":"text"},"in_0":{"v":20,"f":true,"t":"integer"},"in_1":{"v":2,"f":true,"t":"float"},"in_2":{"v":1,"f":true,"t":"float"},"in_3":{"v":1.5,"f":true,"t":"float"},"in_4":{"v":2,"f":true,"t":"float"},"in_5":{"v":true,"f":true,"t":"bool"},"in_6":{"v":true,"f":true,"t":"bool"}}]}

    opened by euvgub 0
Owner
Nouroz Rahman
Data Scientist at Pathao. Interests: Deep Learning, Data Science, Financial Mathematics, Bayesian Statistics.
Nouroz Rahman
PyTorch implementation of the Quasi-Recurrent Neural Network - up to 16 times faster than NVIDIA's cuDNN LSTM

Quasi-Recurrent Neural Network (QRNN) for PyTorch Updated to support multi-GPU environments via DataParallel - see the the multigpu_dataparallel.py ex

Salesforce 1.3k Dec 28, 2022
Implementation of Bidirectional Recurrent Independent Mechanisms (Learning to Combine Top-Down and Bottom-Up Signals in Recurrent Neural Networks with Attention over Modules)

BRIMs Bidirectional Recurrent Independent Mechanisms Implementation of the paper Learning to Combine Top-Down and Bottom-Up Signals in Recurrent Neura

Sarthak Mittal 26 May 26, 2022
Monocular Depth Estimation - Weighted-average prediction from multiple pre-trained depth estimation models

merged_depth runs (1) AdaBins, (2) DiverseDepth, (3) MiDaS, (4) SGDepth, and (5) Monodepth2, and calculates a weighted-average per-pixel absolute dept

Pranav 39 Nov 21, 2022
Sign Language is detected in realtime using video sequences. Our approach involves MediaPipe Holistic for keypoints extraction and LSTM Model for prediction.

RealTime Sign Language Detection using Action Recognition Approach Real-Time Sign Language is commonly predicted using models whose architecture consi

Rishikesh S 15 Aug 20, 2022
Air Quality Prediction Using LSTM

AirQualityPredictionUsingLSTM In this Repo, i present to you the winning solution of smart gujarat hackathon 2019 where the task was to predict the qu

Deepak Nandwani 2 Dec 13, 2022
Speech Separation Using an Asynchronous Fully Recurrent Convolutional Neural Network

Speech Separation Using an Asynchronous Fully Recurrent Convolutional Neural Network This repository is the official implementation of Speech Separati

Kai Li (李凯) 116 Nov 9, 2022
Code and datasets for the paper "Combining Events and Frames using Recurrent Asynchronous Multimodal Networks for Monocular Depth Prediction" (RA-L, 2021)

Combining Events and Frames using Recurrent Asynchronous Multimodal Networks for Monocular Depth Prediction This is the code for the paper Combining E

Robotics and Perception Group 69 Dec 26, 2022
A Simple LSTM-Based Solution for "Heartbeat Signal Classification and Prediction" in Tianchi

LSTM-Time-Series-Prediction A Simple LSTM-Based Solution for "Heartbeat Signal Classification and Prediction" in Tianchi Contest. The Link of the Cont

KevinCHEN 1 Jun 13, 2022
Attention-based CNN-LSTM and XGBoost hybrid model for stock prediction

Attention-based CNN-LSTM and XGBoost hybrid model for stock prediction Requirements The code has been tested running under Python 3.7.4, with the foll

zshicode 84 Jan 1, 2023
Pytorch implementation of the Variational Recurrent Neural Network (VRNN).

VariationalRecurrentNeuralNetwork Pytorch implementation of the Variational RNN (VRNN), from A Recurrent Latent Variable Model for Sequential Data. Th

emmanuel 251 Dec 17, 2022
Pytorch implementation of "Attention-Based Recurrent Neural Network Models for Joint Intent Detection and Slot Filling"

RNN-for-Joint-NLU Pytorch implementation of "Attention-Based Recurrent Neural Network Models for Joint Intent Detection and Slot Filling"

Kim SungDong 194 Dec 28, 2022
A real world application of a Recurrent Neural Network on a binary classification of time series data

What is this This is a real world application of a Recurrent Neural Network on a binary classification of time series data. This project includes data

Josep Maria Salvia Hornos 2 Jan 30, 2022
Using LSTM to detect spoofing attacks in an Air-Ground network

Using LSTM to detect spoofing attacks in an Air-Ground network Specifications IDE: Spider Packages: Tensorflow 2.1.0 Keras NumPy Scikit-learn Matplotl

Tiep M. H. 1 Nov 20, 2021
Using multidimensional LSTM neural networks to create a forecast for Bitcoin price

Multidimensional LSTM BitCoin Time Series Using multidimensional LSTM neural networks to create a forecast for Bitcoin price. For notes around this co

Jakob Aungiers 318 Dec 14, 2022
a reccurrent neural netowrk that when trained on a peice of text and fed a starting prompt will write its on 250 character text using LSTM layers

RNN-Playwrite a reccurrent neural netowrk that when trained on a peice of text and fed a starting prompt will write its on 250 character text using LS

Arno Barton 1 Oct 29, 2021
Definition of a business problem according to Wilson Lower Bound Score and Time Based Average Rating

Wilson Lower Bound Score, Time Based Rating Average In this study I tried to calculate the product rating and sorting reviews more accurately. I have

null 3 Sep 30, 2021
Robustness between the worst and average case

Robustness between the worst and average case A repository that implements intermediate robustness training and evaluation from the NeurIPS 2021 paper

CMU Locus Lab 10 Dec 10, 2021
A commany has recently introduced a new type of bidding, the average bidding, as an alternative to the bid given to the current maximum bidding

Business Problem A commany has recently introduced a new type of bidding, the average bidding, as an alternative to the bid given to the current maxim

Kübra Bilinmiş 1 Jan 15, 2022
Official implementation for NIPS'17 paper: PredRNN: Recurrent Neural Networks for Predictive Learning Using Spatiotemporal LSTMs.

PredRNN: A Recurrent Neural Network for Spatiotemporal Predictive Learning The predictive learning of spatiotemporal sequences aims to generate future

THUML: Machine Learning Group @ THSS 243 Dec 26, 2022