Calling Julia from Python - an experiment on data loading
See the slides.
TLDR
After reading Patrick's blog post, we decided to try to replace C++ with Julia to check:
- How easy/hard it is
- How much improvement can be gained with a basic version
- How much improvement can be gained with an optimized version
A basic version is already an improvement over the pure Python version, and an optimized version was faster than the C++ version.
Reproduction
- Follow Patrick's blog post to install the C++ part.
- Install Julia (We've used Julia 1.6.3)
- I recommend using Jill
- We'll refer to this Julia as
path/to/julia
.
- Install Python
- Ideally, one dynamically linked to
libpython
. - To test it, use
ldd path/to/python
and look forlibpython3.9
. It should exist for the shared version. - If you don't have, look into workarounds here
- Tip: Archlinux's system Python is dynamically linked.
- We've used Python 3.9.7 from Archlinux.
- Ideally, one dynamically linked to
- Open Julia and enter the following commands:
ENV["PYTHON"] = "path/to/python"
using Pkg
Pkg.add("PyCall")
- This will make sure that the packages we are installing use the correct Python version
- Install
juliapy
withpath/to/python -m pip install julia
- Run
path/to/python
and enterimport julia
julia.install("julia=path/to/julia")
- Download dataset and store in
gen-data
folder: - Run
scalability_test.py
- it should take several hours (over 10) and consume a moderate amount of memory. - Run
scalability_analysis.py
.