CS 506 - Computational Tools for Data Science
Code, slides, and notes for Boston University CS506 Fall 2021
The Final Project Repository can be found here
Code, slides, and notes for Boston University CS506 Fall 2021
The Final Project Repository can be found here
Mayavi: 3D visualization of scientific data in Python Mayavi docs: http://docs.enthought.com/mayavi/mayavi/ TVTK docs: http://docs.enthought.com/mayav
Orange Data Mining Orange is a data mining and visualization toolbox for novice and expert alike. To explore data with Orange, one requires no program
It is mainly for data visualization projects of Atmospheric Science, Marine Science, Environmental Science or other majors.
CS 506 - Computational Tools for Data Science Code, slides, and notes for Boston
ReproZip ReproZip is a tool aimed at simplifying the process of creating reproducible experiments from command-line executions, a frequently-used comm
bartpy Python bindings for BART. Overview This repo contains code to generate an updated Python wrapper for the Berkeley Advance Reconstruction Toolbo
Rosetta Tools for data science with a focus on text processing. Focuses on "medium data", i.e. data too big to fit into memory but too small to necess
This tutorial's purpose is to introduce Pythonistas to methods for scaling their data science and machine learning work to larger datasets and larger models, using the tools and APIs they know and love from the PyData stack (such as numpy, pandas, and scikit-learn).
CKAN: The Open Source Data Portal Software CKAN is the world’s leading open-source data portal platform. CKAN makes it easy to publish, share and work
XL-Sum This repository contains the code, data, and models of the paper titled "XL-Sum: Large-Scale Multilingual Abstractive Summarization for 44 Lang
XL-Sum This repository contains the code, data, and models of the paper titled "XL-Sum: Large-Scale Multilingual Abstractive Summarization for 44 Lang
MaD GUI Machine Learning and Data Analytics Graphical User Interface MaD GUI is a basis for graphical annotation and computational analysis of time se
Modeling High-Frequency Limit Order Book Dynamics Using Machine Learning Framework to capture the dynamics of high-frequency limit order books. Overvi
Tuplex is a parallel big data processing framework that runs data science pipelines written in Python at the speed of compiled code. Tuplex has similar Python APIs to Apache Spark or Dask, but rather than invoking the Python interpreter, Tuplex generates optimized LLVM bytecode for the given pipeline and input data set.
time-series-kafka-demo Mock stream producer for time series data using Kafka. I walk through this tutorial and others here on GitHub and on my Medium
duality Data science, Data manipulation and Machine learning package. Use permitted according to the terms of use and conditions set by the attached l
Continuous Machine Learning project integration with DVC Data Version Control or DVC is an open-source tool for data science and machine learning proj
Data Scientist Learning Plan Demonstrate the breadth and depth of your data science skills by earning all of the Databricks Data Scientist credentials
Date created February 10, 2022 Project Title Explore US Bikeshare Data Descripti
CKAN: The Open Source Data Portal Software CKAN is the world’s leading open-source data portal platform. CKAN makes it easy to publish, share and work
The latest information about Galaxy can be found on the Galaxy Community Hub. Community support is available at Galaxy Help. Galaxy Quickstart Galaxy
A Python framework for creating reproducible, maintainable and modular data science code.
Cookiecutter Data Science A logical, reasonably standardized, but flexible project structure for doing and sharing data science work. Project homepage
Beehive A framework for feature exploration in Data Science Background What do we do when we finish one episode of feature exploration in a jupyter no
collection of interesting Computer Science resources
PsychoPy is an open-source package for creating experiments in behavioral science. It aims to provide a single package that is: precise enoug
Bioinformatics This is a repository of all the algorithms covered in the Bioinformatics Course part of the Cambridge Computer Science Tripos Algorithm
Why efficient Python? Because using Python more efficiently will make your code more readable and run more efficiently.
an interactive explorer for single-cell transcriptomics data cellxgene (pronounced "cell-by-gene") is an interactive data explorer for single-cell tra