Book on Julia for Data Science

Overview
Comments
  • Overview of Plots.jl Chapter

    Overview of Plots.jl Chapter

    Here is an opinionated version. Feel free to criticize:

    Subsections for Plots

    Brief overview of the JuliaPlots ecosystem

    • JuliaPlots Organization
    • Plots.jl (what is it good for and what are its limitations)
    • Makie.jl (what is it good for and what are its limitations)
    • AlgebraOfGraphics.jl (what is it good for and what are its limitations) (version 2.0)
    1. What is Plots.jl?

      • plot vs plot!
      • input data
      • Series Types and functions(e.g. line, line!, heatmap, heatmap!)
      • How to save a plot
    2. Attributes

      • Overview (What are attributes, and the whole symbol e.g. :xticks system)
      • Series attributes
      • Plot Attributes
      • Footnote about the extra_kwargs stuff
      • Examples using the most common things that you want to do in a visualization. This could be inserted right after you introduce a specific attribute.
    3. Color and Palettes I think we should cover colorbrewer and the ones from matplotlib (inferno, viridis, magma). We should cover some stuff from the Claus Wilke Fundamentals of Data Visualization book (Chapters 4 and 19). Also we should cover the three types of color usage:

      1. sequential: continuous stuff, e.g. :blues (only blue)
      2. diverging: continuous stuff, e.g. :RdBu (from red to blue)
      3. distinguishable: discrete stuff, e.g. :Set1_5

      I have a very strong positive bias towards colorbrewer Sets (e.g. palette=:Set1_5).

      We should also mention that the reader should use a colorblinded-friendly palette or colors. Maybe we should include an official statistics regarding the prevalence of any sort of colorblindness or color difficulties in the population. I remember seeing somewhere that it was around 5% of people.

    4. Layouts

      • Overview on several ways to do layouts
      • the layout argument, also cover the grid
      • the @layout macro
      • specific measures with the Plots.PlotMeasures submodule
      • adding subplots incremententally. Define p1, p2, p3; then do a plot(p1, p2, p3; layout=l)
    writing 
    opened by storopoli 13
  • Add full cover

    Add full cover

    This PR add code to generate the full cover. See https://github.com/JuliaDataScience/JuliaDataScience/issues/17#issuecomment-927185934 for more information about the dimensions.

    I don't consider the cover done and perfect now. It is meant as a place where we can start discussing the appearance.

    Preview

    image

    EDIT: This wasn't working in Amazon. It was turned into

    image

    So, instead using the Amazon cover editor:

    image

    which is turned into

    image

    opened by rikhuijzer 10
  • makie.jl: ERROR: LoadError: UndefVarError: Downloads not defined

    makie.jl: ERROR: LoadError: UndefVarError: Downloads not defined

    Hi,

    I get an

    ERROR: LoadError: UndefVarError: Downloads not defined
    Stacktrace:
     [1] top-level scope
       @ o:\Julia\makie.jl:604
    in expression starting at o:\Julia\makie.jl:604
    

    Commented out.

    And on execution for all demo functions:

    ERROR: UndefVarError: Options not defined
    Stacktrace:
     [1] custom_plot()
       @ Main o:\Julia\makie.jl:16
     [2] top-level scope
       @ REPL[12]:1
    

    Commented out.

    On execution,

    custom_plot()

    no error, no plot appears.

    opened by bardo84 10
  • Chapter 7 Link broken to Makie Docs

    Chapter 7 Link broken to Makie Docs

    There is a link broken in Chapter 7 datavisMakie.md:

    In the "See Makie’s documentation for more." It redirects to http://makie.juliaplots.org/stable/backends_and_output.html#Backends-and-Output which is broken.

    cc @lazarusA

    bug 
    opened by storopoli 9
  • Notation discussion points

    Notation discussion points

    Notation discussion points from #20:

    1. Always using : before the start of a code block.
    2. Mentioning functions like DataFrame as DataFrame() or DataFrame(...).
    3. My suggestion: only Julia objects between backticks and filenames and extension names between quotation marks (like Julia's strings).
    opened by rikhuijzer 9
  • [dataframes_select] not the same selection

    [dataframes_select] not the same selection

    The lines

    https://github.com/JuliaDataScience/JuliaDataScience/blob/b5582d29a9afa300e0a124ad2820389c386c04cc/contents/dataframes_select.md#L48-L56

    don't give the same result as the previous example ... (where :id is not shown)

    I think you need to rephrase the text.

    opened by Mo-Gul 7
  • Fix pipe alignment in front cover

    Fix pipe alignment in front cover

    this should fix some alignment issues regarding the pipes. Additionally, I did a print for the previous one and the lack of grid lines makes the whole cover a little bit dull.

    opened by lazarusA 7
  • Julia cannot reproduce the rand

    Julia cannot reproduce the rand

    I have put an issue on Stack Overflow about an example of this book. Could you please explain? https://stackoverflow.com/questions/70321085/julia-cannot-reproduce-the-rand

    Many thanks.

    Shixiang

    opened by ShixiangWang 6
  • Improve section numbering

    Improve section numbering

    For a discussion see issue #221 that it is beneficial to have unnumbered sections when there are no other sections on that level, i.e. that there is no other "section x.2".

    There is only the instance

    https://github.com/JuliaDataScience/JuliaDataScience/blob/2c750e092b7aa932cf10d7fc12f1d0f7ba7ae909/contents/julia_basics.md#L671

    which cannot be made unnumbered, because it is referenced at

    https://github.com/JuliaDataScience/JuliaDataScience/blob/2c750e092b7aa932cf10d7fc12f1d0f7ba7ae909/contents/dataframes_performance.md#L9

    My suggestion is to bring this one level up, i.e.

    - #### Functions with a bang `!` {#sec:function_bang}
    + ### Functions with a bang `!` {#sec:function_bang}
    

    I think that also fits to the previous section header where I (with my current n00bie understanding) don't see the "bang operator" fitting in.

    Do you agree? If yes, I would prepare another commit. Otherwise you can merge the PR directly.

    opened by Mo-Gul 6
  • Logo for Julia Data Science Organization

    Logo for Julia Data Science Organization

    We need a logo I will talk to someone who can do that for me at UNINOVE. Any thoughts @rikhuijzer ? We should move anything stats/Bayesian so to not confuse with future endeavors.

    Maybe something with Tabular Data or Line Plots. We should definitely use Julia colors.

    • [x] Update JuliaDataScience GitHub Organization Logo
    • [x] Update JuliaDataScience Book favicon site icon
    enhancement 
    opened by storopoli 6
  • improve typographical stuff

    improve typographical stuff

    As I already stated in e.g. https://github.com/JuliaDataScience/JuliaDataScience/pull/215/commits/e8e1d11ed6386f8bcd552abbfd9ab058c3176b51 it would nice to make some sections unnumbered when there is no second section on that level.

    When I have seen this correct, this should be possible by appending the section entry by {-} as e.g. can be seen in

    https://github.com/JuliaDataScience/JuliaDataScience/blob/56151f0f6d69ad4c62945aa1deb3169af90ef9ad/contents/index.md#L1

    So if you consider it would be nice to have that I'll redo the suggestions in a new PR. Especially if it is that easily doable.


    PS:
    I am not sure if my newest comments in #217 have been noticed by one of you, since I have added them after the PR was merged. Thank you for your comments!

    opened by Mo-Gul 5
  • cheatSheet_cairo.jl improvements

    cheatSheet_cairo.jl improvements

    Some suggested changes to the CairoMakie cheatsheet, some for consistency and some to make it easier to understand what the function actually does (I think that is the main use of this figure: once the purpose of a plotting function is clear, the user can always check the documentation for the different ways to call the function). The biggest change is for linesegments: it now uses the linesegments(x, y) signature with the same data as the previous plots, to help understanding what's going on.

    Full list of changes:

    • change range of first plots to have even number of points (for linesegments)
    • change linesegments to use same data as previous plots
    • uniformize parameter names in plot titles
    • use variable heights in crossbar
    • fix title of violin plot
    • more explicit title for mesh
    opened by knuesel 0
  • 4.1.2 Excel - failure

    4.1.2 Excel - failure

    Hi,

    Using Julia 1.8.1, VS Code notebooks

    Entering the code from 4.1.2 Excel, I tried running: path = write_grades_xlsx() xf = readxlsx(path)

    which gave: MethodError: objects of type Vector{String} are not callable Use square brackets [] for indexing an Array.

    Stacktrace: [1] write_xlsx(name::String, df::DataFrame) @ Main ~/julia-test/juliadatascience-dataframes.ipynb:4 [2] write_grades_xlsx() @ Main ~/julia-test/juliadatascience-dataframes.ipynb:3 [3] top-level scope @ ~/julia-test/juliadatascience-dataframes.ipynb:1

    Here's the function: function write_xlsx(name, df::DataFrame) path = "$name.xlsx" data = collect(eachcol(df)) cols = names(df) writetable(path, data, cols) end

    I found that because you had defined a "names" variable earlier in the chapter, this clobbered the "names()" function. When I changed this to use Base.names(), everything worked properly. (Somewhat ironically, you mentioned the global variable problem just after defining "names". ;-))

    I'd recommend just renaming "names" to something less ambiguous, and then it won't break the code below.

    Thanks for the great work! Ari

    opened by arimeyer 0
  • new book format

    new book format

    After this first experience of doing and printing the book I still feel that the margins [text at the edges] and overall book size is not an appropriate layout. Thoughts?

    version-2 
    opened by lazarusA 2
Releases(edition-1)
Owner
Julia Data Science
Julia Data Science Book
Julia Data Science
Kedro is an open-source Python framework for creating reproducible, maintainable and modular data science code

A Python framework for creating reproducible, maintainable and modular data science code.

QuantumBlack Labs 7.9k Jan 1, 2023
CS 506 - Computational Tools for Data Science

CS 506 - Computational Tools for Data Science Code, slides, and notes for Boston University CS506 Fall 2021 The Final Project Repository can be found

Lance Galletti 14 Mar 23, 2022
A logical, reasonably standardized, but flexible project structure for doing and sharing data science work.

Cookiecutter Data Science A logical, reasonably standardized, but flexible project structure for doing and sharing data science work. Project homepage

Jon C Cline 0 Sep 5, 2021
A framework for feature exploration in Data Science

Beehive A framework for feature exploration in Data Science Background What do we do when we finish one episode of feature exploration in a jupyter no

Steven IJ 1 Jan 3, 2022
ReproZip is a tool that simplifies the process of creating reproducible experiments from command-line executions, a frequently-used common denominator in computational science.

ReproZip ReproZip is a tool aimed at simplifying the process of creating reproducible experiments from command-line executions, a frequently-used comm

null 267 Jan 1, 2023
collection of interesting Computer Science resources

collection of interesting Computer Science resources

Kirill Bobyrev 137 Dec 22, 2022
PsychoPy is an open-source package for creating experiments in behavioral science.

PsychoPy is an open-source package for creating experiments in behavioral science. It aims to provide a single package that is: precise enoug

PsychoPy 1.3k Dec 31, 2022
Algorithms covered in the Bioinformatics Course part of the Cambridge Computer Science Tripos

Bioinformatics This is a repository of all the algorithms covered in the Bioinformatics Course part of the Cambridge Computer Science Tripos Algorithm

null 16 Jun 30, 2022
An interactive explorer for single-cell transcriptomics data

an interactive explorer for single-cell transcriptomics data cellxgene (pronounced "cell-by-gene") is an interactive data explorer for single-cell tra

Chan Zuckerberg Initiative 424 Dec 15, 2022
3D visualization of scientific data in Python

Mayavi: 3D visualization of scientific data in Python Mayavi docs: http://docs.enthought.com/mayavi/mayavi/ TVTK docs: http://docs.enthought.com/mayav

Enthought, Inc. 1.1k Jan 6, 2023
🍊 :bar_chart: :bulb: Orange: Interactive data analysis

Orange Data Mining Orange is a data mining and visualization toolbox for novice and expert alike. To explore data with Orange, one requires no program

Bioinformatics Laboratory 3.9k Jan 5, 2023
Efficient Python Tricks and Tools for Data Scientists

Why efficient Python? Because using Python more efficiently will make your code more readable and run more efficiently.

Khuyen Tran 944 Dec 28, 2022
Py address book gui - An address book with graphical user interface developed with Python Tkinter

py_address_book_gui An address book with graphical user interface developed with

Milton 4 Feb 1, 2022
A python scripts that uses 3 different feature extraction methods such as SIFT, SURF and ORB to find a book in a video clip and project trailer of a movie based on that book, on to it.

A python scripts that uses 3 different feature extraction methods such as SIFT, SURF and ORB to find a book in a video clip and project trailer of a movie based on that book, on to it.

tooraj taraz 3 Feb 10, 2022
Calling Julia from Python - an experiment on data loading

Calling Julia from Python - an experiment on data loading See the slides. TLDR After reading Patrick's blog post, we decided to try to replace C++ wit

Abel Siqueira 8 Jun 7, 2022
ckan 3.6k Dec 27, 2022
Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler; for Python, R, Julia, Scala, Go, Javascript and more

Apache MXNet (incubating) for Deep Learning Apache MXNet is a deep learning framework designed for both efficiency and flexibility. It allows you to m

The Apache Software Foundation 20.2k Jan 8, 2023
Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler; for Python, R, Julia, Scala, Go, Javascript and more

Apache MXNet (incubating) for Deep Learning Apache MXNet is a deep learning framework designed for both efficiency and flexibility. It allows you to m

The Apache Software Foundation 20.2k Jan 5, 2023
Analytical Web Apps for Python, R, Julia, and Jupyter. No JavaScript Required.

Dash Dash is the most downloaded, trusted Python framework for building ML & data science web apps. Built on top of Plotly.js, React and Flask, Dash t

Plotly 17.9k Dec 31, 2022
Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler; for Python, R, Julia, Scala, Go, Javascript and more

Apache MXNet (incubating) for Deep Learning Apache MXNet is a deep learning framework designed for both efficiency and flexibility. It allows you to m

The Apache Software Foundation 19.3k Feb 12, 2021