Book on Julia for Data Science

Julia Data Science

Last update: Dec 25, 2022

Related tags

Science data-science data book julia julia-language data-visualization data-manipulation

Overview

Julia Data Science

Open source and open access book for data science in Julia.

You can read the full book on https://juliadatascience.io.

This book is also published at Amazon.com.

LICENSE

This book is licensed under Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International.

Comments

Overview of Plots.jl Chapter
Here is an opinionated version. Feel free to criticize:

Subsections for Plots

Brief overview of the JuliaPlots ecosystem

JuliaPlots Organization

Plots.jl (what is it good for and what are its limitations)

Makie.jl (what is it good for and what are its limitations)

AlgebraOfGraphics.jl (what is it good for and what are its limitations) (version 2.0)

What is Plots.jl?

plot vs plot!

input data

Series Types and functions(e.g. line, line!, heatmap, heatmap!)

How to save a plot

Attributes

Overview (What are attributes, and the whole symbol e.g. :xticks system)

Series attributes

Plot Attributes

Footnote about the extra_kwargs stuff

Examples using the most common things that you want to do in a visualization. This could be inserted right after you introduce a specific attribute.

Color and Palettes I think we should cover colorbrewer and the ones from matplotlib (inferno, viridis, magma). We should cover some stuff from the Claus Wilke Fundamentals of Data Visualization book (Chapters 4 and 19). Also we should cover the three types of color usage:

sequential: continuous stuff, e.g. :blues (only blue)

diverging: continuous stuff, e.g. :RdBu (from red to blue)

distinguishable: discrete stuff, e.g. :Set1_5

I have a very strong positive bias towards colorbrewer Sets (e.g. palette=:Set1_5).

We should also mention that the reader should use a colorblinded-friendly palette or colors. Maybe we should include an official statistics regarding the prevalence of any sort of colorblindness or color difficulties in the population. I remember seeing somewhere that it was around 5% of people.

Layouts

Overview on several ways to do layouts

the layout argument, also cover the grid

the @layout macro

specific measures with the Plots.PlotMeasures submodule

adding subplots incremententally. Define p1, p2, p3; then do a plot(p1, p2, p3; layout=l)

writing
opened by storopoli 13
Add full cover

This PR add code to generate the full cover. See https://github.com/JuliaDataScience/JuliaDataScience/issues/17#issuecomment-927185934 for more information about the dimensions.

I don't consider the cover done and perfect now. It is meant as a place where we can start discussing the appearance.

Preview

EDIT: This wasn't working in Amazon. It was turned into

So, instead using the Amazon cover editor:

which is turned into

opened by rikhuijzer 10

makie.jl: ERROR: LoadError: UndefVarError: Downloads not defined

Hi,

I get an

ERROR: LoadError: UndefVarError: Downloads not defined
Stacktrace:
 [1] top-level scope
   @ o:\Julia\makie.jl:604
in expression starting at o:\Julia\makie.jl:604

Commented out.

And on execution for all demo functions:

ERROR: UndefVarError: Options not defined
Stacktrace:
 [1] custom_plot()
   @ Main o:\Julia\makie.jl:16
 [2] top-level scope
   @ REPL[12]:1

Commented out.

On execution,

custom_plot()

no error, no plot appears.

opened by bardo84 10

Chapter 7 Link broken to Makie Docs

There is a link broken in Chapter 7 datavisMakie.md:

In the "See Makie’s documentation for more." It redirects to http://makie.juliaplots.org/stable/backends_and_output.html#Backends-and-Output which is broken.

cc @lazarusA
bug

opened by storopoli 9
Notation discussion points
Notation discussion points from #20:

Always using : before the start of a code block.

Mentioning functions like DataFrame as DataFrame() or DataFrame(...).

My suggestion: only Julia objects between backticks and filenames and extension names between quotation marks (like Julia's strings).
opened by rikhuijzer 9
[dataframes_select] not the same selection

The lines

https://github.com/JuliaDataScience/JuliaDataScience/blob/b5582d29a9afa300e0a124ad2820389c386c04cc/contents/dataframes_select.md#L48-L56

don't give the same result as the previous example ... (where :id is not shown)

I think you need to rephrase the text.

opened by Mo-Gul 7
Fix pipe alignment in front cover

this should fix some alignment issues regarding the pipes. Additionally, I did a print for the previous one and the lack of grid lines makes the whole cover a little bit dull.

opened by lazarusA 7
Julia cannot reproduce the rand

I have put an issue on Stack Overflow about an example of this book. Could you please explain? https://stackoverflow.com/questions/70321085/julia-cannot-reproduce-the-rand

Many thanks.

Shixiang

opened by ShixiangWang 6
Improve section numbering
For a discussion see issue #221 that it is beneficial to have unnumbered sections when there are no other sections on that level, i.e. that there is no other "section x.2".

There is only the instance

https://github.com/JuliaDataScience/JuliaDataScience/blob/2c750e092b7aa932cf10d7fc12f1d0f7ba7ae909/contents/julia_basics.md#L671

which cannot be made unnumbered, because it is referenced at

https://github.com/JuliaDataScience/JuliaDataScience/blob/2c750e092b7aa932cf10d7fc12f1d0f7ba7ae909/contents/dataframes_performance.md#L9

My suggestion is to bring this one level up, i.e.

- #### Functions with a bang `!` {#sec:function_bang} + ### Functions with a bang `!` {#sec:function_bang}

I think that also fits to the previous section header where I (with my current n00bie understanding) don't see the "bang operator" fitting in.

Do you agree? If yes, I would prepare another commit. Otherwise you can merge the PR directly.
opened by Mo-Gul 6
Logo for Julia Data Science Organization
We need a logo I will talk to someone who can do that for me at UNINOVE. Any thoughts @rikhuijzer ? We should move anything stats/Bayesian so to not confuse with future endeavors.

Maybe something with Tabular Data or Line Plots. We should definitely use Julia colors.

[x] Update JuliaDataScience GitHub Organization Logo

[x] Update JuliaDataScience Book favicon site icon

enhancement
opened by storopoli 6
improve typographical stuff

As I already stated in e.g. https://github.com/JuliaDataScience/JuliaDataScience/pull/215/commits/e8e1d11ed6386f8bcd552abbfd9ab058c3176b51 it would nice to make some sections unnumbered when there is no second section on that level.

When I have seen this correct, this should be possible by appending the section entry by {-} as e.g. can be seen in

https://github.com/JuliaDataScience/JuliaDataScience/blob/56151f0f6d69ad4c62945aa1deb3169af90ef9ad/contents/index.md#L1

So if you consider it would be nice to have that I'll redo the suggestions in a new PR. Especially if it is that easily doable.

PS:
I am not sure if my newest comments in #217 have been noticed by one of you, since I have added them after the PR was merged. Thank you for your comments!

opened by Mo-Gul 5
cheatSheet_cairo.jl improvements
Some suggested changes to the CairoMakie cheatsheet, some for consistency and some to make it easier to understand what the function actually does (I think that is the main use of this figure: once the purpose of a plotting function is clear, the user can always check the documentation for the different ways to call the function). The biggest change is for linesegments: it now uses the linesegments(x, y) signature with the same data as the previous plots, to help understanding what's going on.

Full list of changes:

change range of first plots to have even number of points (for linesegments)

change linesegments to use same data as previous plots

uniformize parameter names in plot titles

use variable heights in crossbar

fix title of violin plot

more explicit title for mesh
opened by knuesel 0
4.1.2 Excel - failure

Hi,

Using Julia 1.8.1, VS Code notebooks

Entering the code from 4.1.2 Excel, I tried running: path = write_grades_xlsx() xf = readxlsx(path)

which gave: MethodError: objects of type Vector{String} are not callable Use square brackets [] for indexing an Array.

Stacktrace: [1] write_xlsx(name::String, df::DataFrame) @ Main ~/julia-test/juliadatascience-dataframes.ipynb:4 [2] write_grades_xlsx() @ Main ~/julia-test/juliadatascience-dataframes.ipynb:3 [3] top-level scope @ ~/julia-test/juliadatascience-dataframes.ipynb:1

Here's the function: function write_xlsx(name, df::DataFrame) path = "$name.xlsx" data = collect(eachcol(df)) cols = names(df) writetable(path, data, cols) end

I found that because you had defined a "names" variable earlier in the chapter, this clobbered the "names()" function. When I changed this to use Base.names(), everything worked properly. (Somewhat ironically, you mentioned the global variable problem just after defining "names". ;-))

I'd recommend just renaming "names" to something less ambiguous, and then it won't break the code below.

Thanks for the great work! Ari

opened by arimeyer 0
new book format

After this first experience of doing and printing the book I still feel that the margins [text at the edges] and overall book size is not an appropriate layout. Thoughts?
version-2

opened by lazarusA 2

Releases(edition-1)

edition-1(Oct 31, 2021)

First edition published as paperback on Amazon
Source code(tar.gz)
Source code(zip)
juliadatascience.pdf(6.59 MB)

Owner

Julia Data Science

Julia Data Science Book

GitHub https://juliadatascience.io

Kedro is an open-source Python framework for creating reproducible, maintainable and modular data science code

A Python framework for creating reproducible, maintainable and modular data science code.

7.9k Jan 1, 2023

CS 506 - Computational Tools for Data Science

CS 506 - Computational Tools for Data Science Code, slides, and notes for Boston University CS506 Fall 2021 The Final Project Repository can be found

14 Mar 23, 2022

A logical, reasonably standardized, but flexible project structure for doing and sharing data science work.

Cookiecutter Data Science A logical, reasonably standardized, but flexible project structure for doing and sharing data science work. Project homepage

0 Sep 5, 2021

A framework for feature exploration in Data Science

Beehive A framework for feature exploration in Data Science Background What do we do when we finish one episode of feature exploration in a jupyter no

1 Jan 3, 2022

ReproZip is a tool that simplifies the process of creating reproducible experiments from command-line executions, a frequently-used common denominator in computational science.

ReproZip ReproZip is a tool aimed at simplifying the process of creating reproducible experiments from command-line executions, a frequently-used comm

267 Jan 1, 2023

collection of interesting Computer Science resources

137 Dec 22, 2022

PsychoPy is an open-source package for creating experiments in behavioral science.

PsychoPy is an open-source package for creating experiments in behavioral science. It aims to provide a single package that is: precise enoug

1.3k Dec 31, 2022

Algorithms covered in the Bioinformatics Course part of the Cambridge Computer Science Tripos

Bioinformatics This is a repository of all the algorithms covered in the Bioinformatics Course part of the Cambridge Computer Science Tripos Algorithm

16 Jun 30, 2022

An interactive explorer for single-cell transcriptomics data

an interactive explorer for single-cell transcriptomics data cellxgene (pronounced "cell-by-gene") is an interactive data explorer for single-cell tra

424 Dec 15, 2022

3D visualization of scientific data in Python

Mayavi: 3D visualization of scientific data in Python Mayavi docs: http://docs.enthought.com/mayavi/mayavi/ TVTK docs: http://docs.enthought.com/mayav

1.1k Jan 6, 2023

🍊 :bar_chart: :bulb: Orange: Interactive data analysis

Orange Data Mining Orange is a data mining and visualization toolbox for novice and expert alike. To explore data with Orange, one requires no program

3.9k Jan 5, 2023

Efficient Python Tricks and Tools for Data Scientists

Why efficient Python? Because using Python more efficiently will make your code more readable and run more efficiently.

944 Dec 28, 2022

Py address book gui - An address book with graphical user interface developed with Python Tkinter

py_address_book_gui An address book with graphical user interface developed with

4 Feb 1, 2022

A python scripts that uses 3 different feature extraction methods such as SIFT, SURF and ORB to find a book in a video clip and project trailer of a movie based on that book, on to it.

3 Feb 10, 2022

Calling Julia from Python - an experiment on data loading

Calling Julia from Python - an experiment on data loading See the slides. TLDR After reading Patrick's blog post, we decided to try to replace C++ wit

8 Jun 7, 2022

CKAN is an open-source DMS (data management system) for powering data hubs and data portals. CKAN makes it easy to publish, share and use data. It powers catalog.data.gov, open.canada.ca/data, data.humdata.org among many other sites.

CKAN: The Open Source Data Portal Software CKAN is the world’s leading open-source data portal platform. CKAN makes it easy to publish, share and work

3.6k Dec 27, 2022

Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler; for Python, R, Julia, Scala, Go, Javascript and more

Apache MXNet (incubating) for Deep Learning Apache MXNet is a deep learning framework designed for both efficiency and flexibility. It allows you to m

20.2k Jan 8, 2023

Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler; for Python, R, Julia, Scala, Go, Javascript and more

Apache MXNet (incubating) for Deep Learning Apache MXNet is a deep learning framework designed for both efficiency and flexibility. It allows you to m

20.2k Jan 5, 2023

Analytical Web Apps for Python, R, Julia, and Jupyter. No JavaScript Required.

Dash Dash is the most downloaded, trusted Python framework for building ML & data science web apps. Built on top of Plotly.js, React and Flask, Dash t

17.9k Dec 31, 2022

Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler; for Python, R, Julia, Scala, Go, Javascript and more

Apache MXNet (incubating) for Deep Learning Apache MXNet is a deep learning framework designed for both efficiency and flexibility. It allows you to m

19.3k Feb 12, 2021

Book on Julia for Data Science

Related tags

Overview

Julia Data Science

Open source and open access book for data science in Julia.

LICENSE

Comments

Preview

Releases(edition-1)

edition-1(Oct 31, 2021)

Owner

Julia Data Science

Kedro is an open-source Python framework for creating reproducible, maintainable and modular data science code

CS 506 - Computational Tools for Data Science

A logical, reasonably standardized, but flexible project structure for doing and sharing data science work.

A framework for feature exploration in Data Science

ReproZip is a tool that simplifies the process of creating reproducible experiments from command-line executions, a frequently-used common denominator in computational science.

collection of interesting Computer Science resources

PsychoPy is an open-source package for creating experiments in behavioral science.

Algorithms covered in the Bioinformatics Course part of the Cambridge Computer Science Tripos

An interactive explorer for single-cell transcriptomics data

3D visualization of scientific data in Python

🍊 :bar_chart: :bulb: Orange: Interactive data analysis

Efficient Python Tricks and Tools for Data Scientists

Py address book gui - An address book with graphical user interface developed with Python Tkinter

A python scripts that uses 3 different feature extraction methods such as SIFT, SURF and ORB to find a book in a video clip and project trailer of a movie based on that book, on to it.

Calling Julia from Python - an experiment on data loading

CKAN is an open-source DMS (data management system) for powering data hubs and data portals. CKAN makes it easy to publish, share and use data. It powers catalog.data.gov, open.canada.ca/data, data.humdata.org among many other sites.

Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler; for Python, R, Julia, Scala, Go, Javascript and more

Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler; for Python, R, Julia, Scala, Go, Javascript and more

Analytical Web Apps for Python, R, Julia, and Jupyter. No JavaScript Required.

Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler; for Python, R, Julia, Scala, Go, Javascript and more