Hands-On Machine Learning for Algorithmic Trading, published by Packt

Overview

Hands-On Machine Learning for Algorithmic Trading

Hands-On Machine Learning for Algorithmic Trading, published by Packt

Hands-On Machine Learning for Algorithmic Trading

This is the code repository for Hands-On Machine Learning for Algorithmic Trading, published by Packt.

Design and implement investment strategies based on smart algorithms that learn from data using Python

What is this book about?

The explosive growth of digital data has boosted the demand for expertise in trading strategies that use machine learning (ML). This book enables you to use a broad range of supervised and unsupervised algorithms to extract signals from a wide variety of data sources and create powerful investment strategies.

This book covers the following exciting features:

  • Implement machine learning techniques to solve investment and trading problems
  • Leverage market, fundamental, and alternative data to research alpha factors
  • Design and fine-tune supervised, unsupervised, and reinforcement learning models
  • Optimize portfolio risk and performance using pandas, NumPy, and scikit-learn
  • Integrate machine learning models into a live trading strategy on Quantopian

If you feel this book is for you, get your copy today!

https://www.packtpub.com/

Instructions and Navigations

All of the code is organized into folders. For example, Chapter02.

The code will look like the following:

interesting_times = extract_interesting_date_ranges(returns=returns)
interesting_times['Fall2015'].to_frame('pf') \
.join(benchmark_rets) \
.add(1).cumprod().sub(1) \
.plot(lw=2, figsize=(14, 6), title='Post-Brexit Turmoil')

Following is what you need for this book: Hands-On Machine Learning for Algorithmic Trading is for data analysts, data scientists, and Python developers, as well as investment analysts and portfolio managers working within the finance and investment industry. If you want to perform efficient algorithmic trading by developing smart investigating strategies using machine learning algorithms, this is the book for you. Some understanding of Python and machine learning techniques is mandatory.

With the following software and hardware list you can run all code files present in the book (Chapter 1-15).

Software and Hardware List

Chapter Software required OS required
2-20 Python 2.7/3.5, SciPy 0.18, Windows, Mac OS X, and Linux (Any)
Numpy 1.11+, Matplotlib 2.0,
ScikitLearn 0.18+,
Gensim, Keras 2+

We also provide a PDF file that has color images of the screenshots/diagrams used in this book. Click here to download it.

Related products

Get to Know the Author

Stefan Jansen, CFA is Founder and Lead Data Scientist at Applied AI where he advises Fortune 500 companies and startups across industries on translating business goals into a data and AI strategy, builds data science teams and develops ML solutions. Before his current venture, he was Managing Partner and Lead Data Scientist at an international investment firm where he built the predictive analytics and investment research practice. He was also an executive at a global fintech startup operating in 15 markets, worked for the World Bank, advised Central Banks in emerging markets, and has worked in 6 languages on four continents. Stefan holds Master's from Harvard and Berlin University and teaches data science at General Assembly and Datacamp.

Suggestions and Feedback

Click here if you have any feedback or suggestions.

Comments
  • ResolvePackageNotFound: error

    ResolvePackageNotFound: error

    Hello. I recently bought a book and am reading it well. Thank you.

    I'm going to practice the code.

    Using 'conda env create -f environment.yml'

    I was trying to install my environment, but the following error occurred:

    What should I do? (I am a Windows os user.)

    `ResolvePackageNotFound:

    • binutils_impl_linux-64=2.28.1
    • gxx_impl_linux-64=7.2.0
    • gxx_linux-64=7.2.0
    • libgcc-ng=8.2.0
    • libstdcxx-ng=8.2.0
    • readline=7.0
    • gcc_linux-64=7.2.0
    • gmp=6.1.2
    • libuuid=1.0.3
    • gstreamer=1.14.0
    • graphviz=2.40.1
    • dbus=1.13.2
    • binutils_linux-64=7.2.0
    • expat=2.2.6
    • libgfortran-ng=7.3.0
    • gcc_impl_linux-64=7.2.0
    • ncurses=6.1
    • gst-plugins-base=1.14.0
    • libedit=3.1.20170329`
    opened by silent0506 4
  • Error when creating ml4t environment using the yml file

    Error when creating ml4t environment using the yml file

    Hello I have tried to create an environment using the environment.yml file provided but I get the following error:

    Solving environment: failed

    ResolvePackageNotFound:

    • gcc_linux-64=7.2.0
    • binutils_impl_linux-64=2.28.1
    • gxx_linux-64=7.2.0
    • gst-plugins-base=1.14.0
    • gstreamer=1.14.0
    • gmp=6.1.2
    • pango=1.42.4
    • dbus=1.13.2
    • gcc_impl_linux-64=7.2.0
    • binutils_linux-64=7.2.0
    • gxx_impl_linux-64=7.2.0
    • ncurses=6.1
    • libgcc-ng=8.2.0
    • libstdcxx-ng=8.2.0
    • libuuid=1.0.3
    • readline=7.0
    • expat=2.2.6
    • fribidi=1.0.5
    • libgfortran-ng=7.3.0
    • graphviz=2.40.1
    • libedit=3.1.20170329

    is there any way I can fix this? thanks for the help

    opened by ssilverac 4
  • Missing

    Missing "import seaborn as sns" in Chapter02/01.../01_build_itch_order_book.ipynb

    The section "Buy-Sell Order Distribution" uses "sns.distplot" which fails due to a missing 'import seaborn as sns' up at the very top under "Imports". Once I added this and re-ran both it passes without error...

    opened by eakoskela 1
  • Memory Error (and a few updates to original code)

    Memory Error (and a few updates to original code)

    Hey!

    Love the work so far. I've noticed a couple of changes to make though. In the first Jupyter Notebook in Chapter 2 ("01_build_itch_order_book.ipynb"), Seaborn was not imported so I ran into an error in the "Buy-Sell Order Distribution" section, so add:

    import seaborn as sns

    ...to the top. Also, I was having an issue in the Download/Unzip section at the beginning. Even though I had already downloaded and unzipped the sample file, when I went to run the book again, it was starting to unzip the .gz file all over again, so I also changed the line:

    unzipped = data_path / (filename.stem + '.bin')\n",

    ...to:

    unzipped = data_path / (os.path.splitext(SOURCE_FILE)[0] + '.bin')

    ...and also need to add:

    import os

    ...to the top. This will then properly see that there's already an unzipped file there and won't start to unzip the .gz file again.

    Now, I'm running into a memory error and I'm wondering if an external hard drive is actually the solution, or if Windows is just running out of memory from trying to work with such a huge file? My traceback leading up to the memory error looks like this:

    Empty DataFrame Columns: [Message Type, # Trades] Index: [] <class 'pandas.core.frame.DataFrame'> RangeIndex: 2010099 entries, 0 to 2010098 Data columns (total 9 columns): timestamp 2010099 non-null datetime64[ns] buy_sell_indicator 1873082 non-null float64 shares 1995581 non-null float64 price 1995581 non-null float64 type 2010099 non-null object executed_shares 54956 non-null float64 execution_price 500 non-null float64 shares_replaced 14159 non-null float64 price_replaced 14159 non-null float64 dtypes: datetime64ns, float64(7), object(1) memory usage: 138.0+ MB <class 'pandas.io.pytables.HDFStore'> File path: data\order_book.h5 /AAPL/buy frame_table (typ->appendable,nrows->177108242,ncols->2,indexers->[index],dc->[]) /AAPL/messages frame (shape->[2010099,9]) /AAPL/sell frame_table (typ->appendable,nrows->183264614,ncols->2,indexers->[index],dc->[]) /AAPL/trades frame (shape->[59796,3]) <class 'pandas.core.frame.DataFrame'> DatetimeIndex: 59796 entries, 2019-03-27 04:00:56.459428646 to 2019-03-27 19:54:05.600648466 Data columns (total 3 columns): shares 59796 non-null int32 price 59796 non-null int32 cross 59796 non-null int32 dtypes: int32(3) memory usage: 1.1 MB None 100,000 0:01:02.811120 200,000 0:01:27.826051 300,000 0:01:28.564991 400,000 0:01:27.096437 500,000 0:01:29.712940 600,000 0:01:29.556783 700,000 0:01:31.360356 800,000 0:01:35.391212 900,000 0:01:35.037438 1,000,000 0:01:56.868902 1,100,000 0:02:12.290003 1,200,000 0:01:42.614183 1,300,000 0:01:51.042871 1,400,000 0:01:41.064018 1,500,000 0:01:48.513128 1,600,000 0:01:41.310528 1,700,000 0:01:52.142803 1,800,000 0:01:44.541379 1,900,000 0:01:48.800505 2,000,000 0:01:48.460462 A 924117 D 869968 X 2789 E 52299 P 6995 F 2282 U 14159 C 473 dtype: int64 <class 'pandas.io.pytables.HDFStore'> File path: data\order_book.h5 /AAPL/buy frame_table (typ->appendable,nrows->265662363,ncols->2,indexers->[index],dc->[]) /AAPL/messages frame (shape->[2010099,9]) /AAPL/sell frame_table (typ->appendable,nrows->274896921,ncols->2,indexers->[index],dc->[]) /AAPL/trades frame (shape->[59796,3]) Traceback (most recent call last): File "01_build_itch_order_book.py", line 464, in sell = store['{}/sell'.format(stock)].reset_index().drop_duplicates() File "C:\Users\windowshopr\AppData\Local\Programs\Python\Python36\lib\site-packages\pandas\core\frame.py", line 4630, in drop_duplicates duplicated = self.duplicated(subset, keep=keep) File "C:\Users\windowshopr\AppData\Local\Programs\Python\Python36\lib\site-packages\pandas\core\frame.py", line 4687, in duplicated labels, shape = map(list, zip(*map(f, vals))) File "C:\Users\windowshopr\AppData\Local\Programs\Python\Python36\lib\site-packages\pandas\core\frame.py", line 4668, in f vals, size_hint=min(len(self), _SIZE_HINT_LIMIT)) File "C:\Users\windowshopr\AppData\Local\Programs\Python\Python36\lib\site-packages\pandas\util_decorators.py", line 188, in wrapper return func(*args, **kwargs) File "C:\Users\windowshopr\AppData\Local\Programs\Python\Python36\lib\site-packages\pandas\core\algorithms.py", line 613, in factorize na_value=na_value) File "C:\Users\windowshopr\AppData\Local\Programs\Python\Python36\lib\site-packages\pandas\core\algorithms.py", line 460, in _factorize_array na_value=na_value) File "pandas_libs\hashtable_class_helper.pxi", line 1209, in pandas._libs.hashtable.Int64HashTable.factorize File "pandas_libs\hashtable_class_helper.pxi", line 1104, in pandas._libs.hashtable.Int64HashTable._unique MemoryError

    So I'm assuming it's because every time I try to run the program, it has to chew up a lot of working memory, so is there a way to work around this/work with a smaller ITCH sample/is there anything else you'd recommend? I'd love to be able to continue on working with this. My laptop has plenty of Hard Drive space, RAM is 8GB though.

    Thanks for your input!

    opened by windowshopr 1
  • 550 error on Chapter 2 notebook 01_build_itch_order_book

    550 error on Chapter 2 notebook 01_build_itch_order_book

    I have installed all packages from an updated version from packtpublishing and I am getting an erorr:

    Downloading... ftp://emi.nasdaq.com/ITCH/Nasdaq_ITCH/03272019.NASDAQ_ITCH50.gz
    ---------------------------------------------------------------------------
    error_perm                                Traceback (most recent call last)
    ~\anaconda3\envs\ml4trading\lib\urllib\request.py in ftp_open(self, req)
       1564         try:
    -> 1565             fw = self.connect_ftp(user, passwd, host, port, dirs, req.timeout)
       1566             type = file and 'I' or 'D'
    
    ~\anaconda3\envs\ml4trading\lib\urllib\request.py in connect_ftp(self, user, passwd, host, port, dirs, timeout)
       1586         return ftpwrapper(user, passwd, host, port, dirs, timeout,
    -> 1587                           persistent=False)
       1588 
    
    ~\anaconda3\envs\ml4trading\lib\urllib\request.py in __init__(self, user, passwd, host, port, dirs, timeout, persistent)
       2407         try:
    -> 2408             self.init()
       2409         except:
    
    ~\anaconda3\envs\ml4trading\lib\urllib\request.py in init(self)
       2419         _target = '/'.join(self.dirs)
    -> 2420         self.ftp.cwd(_target)
       2421 
    
    ~\anaconda3\envs\ml4trading\lib\ftplib.py in cwd(self, dirname)
        630         cmd = 'CWD ' + dirname
    --> 631         return self.voidcmd(cmd)
        632 
    
    ~\anaconda3\envs\ml4trading\lib\ftplib.py in voidcmd(self, cmd)
        277         self.putcmd(cmd)
    --> 278         return self.voidresp()
        279 
    
    ~\anaconda3\envs\ml4trading\lib\ftplib.py in voidresp(self)
        250         """Expect a response beginning with '2'."""
    --> 251         resp = self.getresp()
        252         if resp[:1] != '2':
    
    ~\anaconda3\envs\ml4trading\lib\ftplib.py in getresp(self)
        245         if c == '5':
    --> 246             raise error_perm(resp)
        247         raise error_proto(resp)
    
    error_perm: 550 The system cannot find the file specified. 
    
    During handling of the above exception, another exception occurred:
    
    URLError                                  Traceback (most recent call last)
    <ipython-input-6-8ab12e3a4ec2> in <module>
    ----> 1 file_name = may_be_download(urljoin(FTP_URL, SOURCE_FILE))
          2 date = file_name.name.split('.')[0]
    
    <ipython-input-4-4c71a35e1865> in may_be_download(url)
          7     if not filename.exists():
          8         print('Downloading...', url)
    ----> 9         urlretrieve(url, filename)
         10     unzipped = data_path / (filename.stem + '.bin')
         11     if not (data_path / unzipped).exists():
    
    ~\anaconda3\envs\ml4trading\lib\urllib\request.py in urlretrieve(url, filename, reporthook, data)
        246     url_type, path = splittype(url)
        247 
    --> 248     with contextlib.closing(urlopen(url, data)) as fp:
        249         headers = fp.info()
        250 
    
    ~\anaconda3\envs\ml4trading\lib\urllib\request.py in urlopen(url, data, timeout, cafile, capath, cadefault, context)
        221     else:
        222         opener = _opener
    --> 223     return opener.open(url, data, timeout)
        224 
        225 def install_opener(opener):
    
    ~\anaconda3\envs\ml4trading\lib\urllib\request.py in open(self, fullurl, data, timeout)
        524             req = meth(req)
        525 
    --> 526         response = self._open(req, data)
        527 
        528         # post-process response
    
    ~\anaconda3\envs\ml4trading\lib\urllib\request.py in _open(self, req, data)
        542         protocol = req.type
        543         result = self._call_chain(self.handle_open, protocol, protocol +
    --> 544                                   '_open', req)
        545         if result:
        546             return result
    
    ~\anaconda3\envs\ml4trading\lib\urllib\request.py in _call_chain(self, chain, kind, meth_name, *args)
        502         for handler in handlers:
        503             func = getattr(handler, meth_name)
    --> 504             result = func(*args)
        505             if result is not None:
        506                 return result
    
    ~\anaconda3\envs\ml4trading\lib\urllib\request.py in ftp_open(self, req)
       1581         except ftplib.all_errors as exp:
       1582             exc = URLError('ftp error: %r' % exp)
    -> 1583             raise exc.with_traceback(sys.exc_info()[2])
       1584 
       1585     def connect_ftp(self, user, passwd, host, port, dirs, timeout):
    
    ~\anaconda3\envs\ml4trading\lib\urllib\request.py in ftp_open(self, req)
       1563             dirs = dirs[1:]
       1564         try:
    -> 1565             fw = self.connect_ftp(user, passwd, host, port, dirs, req.timeout)
       1566             type = file and 'I' or 'D'
       1567             for attr in attrs:
    
    ~\anaconda3\envs\ml4trading\lib\urllib\request.py in connect_ftp(self, user, passwd, host, port, dirs, timeout)
       1585     def connect_ftp(self, user, passwd, host, port, dirs, timeout):
       1586         return ftpwrapper(user, passwd, host, port, dirs, timeout,
    -> 1587                           persistent=False)
       1588 
       1589 class CacheFTPHandler(FTPHandler):
    
    ~\anaconda3\envs\ml4trading\lib\urllib\request.py in __init__(self, user, passwd, host, port, dirs, timeout, persistent)
       2406         self.keepalive = persistent
       2407         try:
    -> 2408             self.init()
       2409         except:
       2410             self.close()
    
    ~\anaconda3\envs\ml4trading\lib\urllib\request.py in init(self)
       2418         self.ftp.login(self.user, self.passwd)
       2419         _target = '/'.join(self.dirs)
    -> 2420         self.ftp.cwd(_target)
       2421 
       2422     def retrfile(self, file, type):
    
    ~\anaconda3\envs\ml4trading\lib\ftplib.py in cwd(self, dirname)
        629             dirname = '.'  # does nothing, but could return error
        630         cmd = 'CWD ' + dirname
    --> 631         return self.voidcmd(cmd)
        632 
        633     def size(self, filename):
    
    ~\anaconda3\envs\ml4trading\lib\ftplib.py in voidcmd(self, cmd)
        276         """Send a command and expect a response beginning with '2'."""
        277         self.putcmd(cmd)
    --> 278         return self.voidresp()
        279 
        280     def sendport(self, host, port):
    
    ~\anaconda3\envs\ml4trading\lib\ftplib.py in voidresp(self)
        249     def voidresp(self):
        250         """Expect a response beginning with '2'."""
    --> 251         resp = self.getresp()
        252         if resp[:1] != '2':
        253             raise error_reply(resp)
    
    ~\anaconda3\envs\ml4trading\lib\ftplib.py in getresp(self)
        244             raise error_temp(resp)
        245         if c == '5':
    --> 246             raise error_perm(resp)
        247         raise error_proto(resp)
        248 
    
    URLError: <urlopen error ftp error: error_perm('550 The system cannot find the file specified. ',)>
    

    seems like the url changed but I am not sure where I can get the updated one from.

    opened by yahoyoungho 2
  • Unsupported parameter in pandas read_excel

    Unsupported parameter in pandas read_excel

    The chapter 2 01_build_itch_order_book notebook, attempts to read message type definitions from an xlsx file using Pandas read_excel method.

    One of the parameters passed is the file's encoding. However the current version of pandas doesn't support the encoding argument, causing it to throw an error.

    Happily simply removing the parameter allows the file to be loaded without problem, though I don't know if there might be backwards compatibility issues for people running older versions of Pandas.

    opened by petercoles 0
  • package conflicts

    package conflicts

    Following the directions for creating the environment in the installation.md I get conflicts.

    conda env create -f environment_linux.yml
    

    I figured I would try and update a few of these and give a pull request back with updates, however I failed miserably. I tried the suggestion from @TheStoneMX in #9 and followed the directions in the post he gave. I appended '37' to the name to signify python 3.7 as I surmised this might be the issue:

    conda create --name ml4t37 python=3.7
    conda activate ml4t37
    conda env update --file environment_linux.yml
    

    In either case, I get an amazing amount of package conflicts (9,266 lines of conflict errors which I am attaching).

    error.log

    This enourmous amount of conflicts leads me to believe that I have done something wrong as it seems excessive for this. I have duplicated this on both an arch linux machine and a bionic beaver machine. Any suggestions on what I might have done wrong?

    opened by joshuacox 5
  • Typing error in chapter2/dataproviders

    Typing error in chapter2/dataproviders

    https://github.com/PacktPublishing/Hands-On-Machine-Learning-for-Algorithmic-Trading/tree/master/Chapter02/02_data_providers

    Some 'U' in titles are missing

    opened by rageSpin 1
Owner
Packt
Providing books, eBooks, video tutorials, and articles for IT developers, administrators, and users.
Packt
Algorithmic trading with deep learning experiments

Deep-Trading Algorithmic trading with deep learning experiments. Now released part one - simple time series forecasting. I plan to implement more soph

Alex Honchar 1.4k Jan 2, 2023
High frequency AI based algorithmic trading module.

Flow Flow is a high frequency algorithmic trading module that uses machine learning to self regulate and self optimize for maximum return. The current

null 59 Dec 14, 2022
Algorithmic Trading using RNN

Deep-Trading This an implementation adapted from Rachnog Neural networks for algorithmic trading. Part One — Simple time series forecasting and this c

Hazem Nomer 29 Sep 4, 2022
This project is a loose implementation of paper "Algorithmic Financial Trading with Deep Convolutional Neural Networks: Time Series to Image Conversion Approach"

Stock Market Buy/Sell/Hold prediction Using convolutional Neural Network This repo is an attempt to implement the research paper titled "Algorithmic F

Asutosh Nayak 136 Dec 28, 2022
Trading Gym is an open source project for the development of reinforcement learning algorithms in the context of trading.

Trading Gym Trading Gym is an open-source project for the development of reinforcement learning algorithms in the context of trading. It is currently

Dimitry Foures 535 Nov 15, 2022
This is a simple backtesting framework to help you test your crypto currency trading. It includes a way to download and store historical crypto data and to execute a trading strategy.

You can use this simple crypto backtesting script to ensure your trading strategy is successful Minimal setup required and works well with static TP a

Andrei 154 Sep 12, 2022
A general-purpose, flexible, and easy-to-use simulator alongside an OpenAI Gym trading environment for MetaTrader 5 trading platform (Approved by OpenAI Gym)

gym-mtsim: OpenAI Gym - MetaTrader 5 Simulator MtSim is a simulator for the MetaTrader 5 trading platform alongside an OpenAI Gym environment for rein

Mohammad Amin Haghpanah 184 Dec 31, 2022
A comprehensive list of published machine learning applications to cosmology

ml-in-cosmology This github attempts to maintain a comprehensive list of published machine learning applications to cosmology, organized by subject ma

George Stein 290 Dec 29, 2022
OpenPose: Real-time multi-person keypoint detection library for body, face, hands, and foot estimation

Build Type Linux MacOS Windows Build Status OpenPose has represented the first real-time multi-person system to jointly detect human body, hand, facia

null 25.7k Jan 9, 2023
MediaPipeのPythonパッケージのサンプルです。2020/12/11時点でPython実装のある4機能(Hands、Pose、Face Mesh、Holistic)について用意しています。

mediapipe-python-sample MediaPipeのPythonパッケージのサンプルです。 2020/12/11時点でPython実装のある以下4機能について用意しています。 Hands Pose Face Mesh Holistic Requirement mediapipe 0.

KazuhitoTakahashi 217 Dec 12, 2022
Checkout some cool self-projects you can try your hands on to curb your boredom this December!

SoC-Winter Checkout some cool self-projects you can try your hands on to curb your boredom this December! These are short projects that you can do you

Web and Coding Club, IIT Bombay 29 Nov 8, 2022
Expressive Body Capture: 3D Hands, Face, and Body from a Single Image

Expressive Body Capture: 3D Hands, Face, and Body from a Single Image [Project Page] [Paper] [Supp. Mat.] Table of Contents License Description Fittin

Vassilis Choutas 1.3k Jan 7, 2023
SMPL-X: A new joint 3D model of the human body, face and hands together

SMPL-X: A new joint 3D model of the human body, face and hands together [Paper Page] [Paper] [Supp. Mat.] Table of Contents License Description News I

Vassilis Choutas 1k Jan 9, 2023
GazeScroller - Using Facial Movements to perform Hands-free Gesture on the system

GazeScroller Using Facial Movements to perform Hands-free Gesture on the system

null 2 Jan 5, 2022
Axel - 3D printed robotic hands and they controll with Raspberry Pi and Arduino combo

Axel It's our graduation project about 3D printed robotic hands and they control

null 0 Feb 14, 2022
Providing the solutions for high-frequency trading (HFT) strategies using data science approaches (Machine Learning) on Full Orderbook Tick Data.

Modeling High-Frequency Limit Order Book Dynamics Using Machine Learning Framework to capture the dynamics of high-frequency limit order books. Overvi

Chang-Shu Chung 1.3k Jan 7, 2023
Repository for "Improving evidential deep learning via multi-task learning," published in AAAI2022

Improving evidential deep learning via multi task learning It is a repository of AAAI2022 paper, “Improving evidential deep learning via multi-task le

deargen 11 Nov 19, 2022
Submission to Twitter's algorithmic bias bounty challenge

Twitter Ethics Challenge: Pixel Perfect Submission to Twitter's algorithmic bias bounty challenge, by Travis Hoppe (@metasemantic). Abstract We build

Travis Hoppe 4 Aug 19, 2022