Hitchhikers-guide - The Hitchhiker's Guide to Data Science for Social Good

Data Science for Social Good

Last update: Jan 1, 2023

Related tags

Miscellaneous data-science tutorial-exercises

Overview

Welcome to the Hitchhiker's Guide to Data Science for Social Good.

What is the Data Science for Social Good Fellowship?

The Data Science for Social Good Fellowship (DSSG) is a hands-on and project-based summer program that launched in 2013 at the University of Chicago and has now expanded to multiple locations globally. It brings data science fellows, typically graduate students, from across the world to work on machine learning, artificial intelligence, and data science projects that have a social impact. From a pool of typically over 800 applicants, 20-40 fellows are selected from diverse computational and quantitative disciplines including computer science, statistics, math, engineering, psychology, sociology, economics, and public policy.

The fellows work in small, cross-disciplinary teams on social good projects spanning education, health, energy, transportation, criminal justice, social services, economic development and international development in collaboration with global government agencies and non-profits. This work is done under close and hands-on mentorship from full-time, dedicated data science mentors as well as dedicated project managers, with industry experience. The result is highly trained fellows, improved data science capacity of the social good organization, and a high quality data science project that is ready for field trial and implementation at the end of the program.

In addition to hands-on project-based training, the summer program also consists of workshops, tutorials, and ethics discussion groups based on our data science for social good curriculum designed to train the fellows in doing practical data science and artificial intelligence for social impact.

Who is this guide for?

The primary audience for this guide is the set of fellows coming to DSSG but we want everything we create to be open and accessible to larger world. We hope this is useful to people beyond the summer fellows coming to DSSG.

If you are applying to the program or have been accepted as a fellow, check out the manual to see how you can prepare before arriving, what orientation and training will cover, and what to expect from the summer.

If you are interested in learning at home, check out the tutorials and teach-outs developed by our staff and fellows throughout the summer, and to suggest or contribute additional resources.

*Another one of our goals is to encourage collaborations. Anyone interested in doing this type of work, or starting a DSSG program, to build on what we've learned by using and contributing to these resources.

What is in this guide?

Our number one priority at DSSG is to train fellows to do data science for social good work. This curriculum includes many things you'd find in a data science course or bootcamp, but with an emphasis on solving problems with social impact, integrating data science with the social sciences, discussing ethical implications of the work, as well as privacy, and confidentiality issues.

We have spent many (sort of) early mornings waxing existential over Dunkin' Donuts while trying to define what makes a "data scientist for social good," that enigmatic breed combining one part data scientist, one part consultant, one part educator, and one part bleeding heart idealist. We've come to a rough working definition in the form of the skills and knowledge one would need, which we categorize as follows:

Programming, because you'll need to tell your computer what to do, usually by writing code.
Computer science, because you'll need to understand how your data is - and should be - structured, as well as the algorithms you use to analyze it.
Math and stats, because everything else in life is just applied math, and numerical results are meaningless without some measure of uncertainty.
Machine learning, because you'll want to build predictive or descriptive models that can learn, evolve, and improve over time.
Social science, because you'll need to know how to design experiments to validate your models in the field, and to understand when correlation can plausibly suggest causation, and sometimes even do causal inference.
Problem and Project Scoping, because you'll need to be able to go from a vague and fuzzy project description to a problem you can solve, understand the goals of the project, the interventions you are informing, the data you have and need, and the analysis that needs to be done.
Project management, to make progress as a team, to work effectively with your project partner, and work with a team to make that useful solution actually happen.
Privacy and security, because data is people and needs to be kept secure and confidential.
Ethics, fairness, bias, and transparency, because your work has the potential to be misused or have a negative impact on people's lives, so you have to consider the biases in your data and analyses, the ethical and fairness implications, and how to make your work interpretable and transparent to the users and to the people impacted by it.
Communications, because you'll need to be able to tell the story of why what you're doing matters and the methods you're using to a broad audience.
Social issues, because you're doing this work to help people, and you don't live or work in a vacuum, so you need to understand the context and history surrounding the people, places and issues you want to impact.

All material is licensed under CC-BY 4.0

The links below will help you find things quickly.

DSSG Manual

Summer Overview

This sections covers general information on projects, working with partners, presentations, orientation information, and the following schedules:

High level summer plan: details what the goals are for each week of the program
Sample Orientation schedule: sample detailed schedule for the first two weeks of the program (from 2016)

Conduct, Culture, and Communications

This section details the DSSG anti-harassment policy, goals of the fellowship, what we hope fellows get out of the experience, the expectations of the fellows, and the DSSG environment. A slideshow version of this can also be found here.

Curriculum

This section details the various topics we will be covering throughout the summer. This includes:

Wiki

In the wiki, you will find a bunch of helpful information and instructions that people have found helpful along the way. It covers topics like:

Accessing S3 from the command line
Creating an alias to make Python3 your default (rather than python2)
Installing RStudio on your EC2
Killing your query
Creating a custom jupyter setup
Mounting box from ubuntu
Pretty Print psql and less output
Remotely editing text files in your favorite text editor
SQL Server to Postgres
Using rpy2
VNC Viewer

Comments

Git advanced
New section on Advanced git

contains gitflow, branching, merging, PRs

contains useful commands on push, pull, discard changes and diff

contains a suggested teaching agenda

Replaces the old "git_continued" session. Any additional content from that file has been integrated here.

enhancement
opened by Maren-Eckhoff 2
Deleted Google Doc link - update or remove?

Source link on bottom of the page refers to a Google Doc that no longer exists: https://github.com/dssg/hitchhikers-guide/blob/master/curriculum/project-management/README.md

Need to either find correct link or remove link.

opened by joan-wang 1
Multiple SSH keys

I have two machines I would like to use during dssg. Do I need to make two different SSH keys and if so what should I name them? Can I use the same SSH key on both machines?

opened by t-davidson 1
Broken/missing link to blank project scoping worksheet

👋 everyone! I find the project scoping guide really helpful and refer to it a lot.

Your link to the Blank Project Scoping Worksheet in sources/curriculum/scoping/overview.md is broken.

This might be the same file as the one on the DSSG website. If so, it would be helpful for this to be in an accessible and/or editable format (i.e. not pdf), to make it easier to use.

opened by harrietrs 0
Bump ujson from 5.2.0 to 5.4.0
Bumps ujson from 5.2.0 to 5.4.0.

Release notes

Sourced from ujson's releases.

5.4.0

Added

Add support for arbitrary size integers (#548) @JustAnotherArchivist

Fixed

CVE-2022-31116:

Replace wchar_t string decoding implementation with a uint32_t-based one (#555) @JustAnotherArchivist

Fix handling of surrogates on decoding (#550) @JustAnotherArchivist

CVE-2022-31117: Potential double free of buffer during string decoding @JustAnotherArchivist

Fix memory leak on encoding errors when the buffer was resized (#549) @JustAnotherArchivist

Integer parsing: always detect overflows (#544) @NaN-git

Fix handling of surrogates on encoding (#530) @JustAnotherArchivist

5.3.0

Added

Test Python 3.11 beta (#539) @hugovk

Changed

Benchmark refactor - argparse CLI (#533) @Erotemic

Fixed

Fix segmentation faults when errors occur while handling unserialisable objects (#531) @JustAnotherArchivist

Fix segmentation fault when an exception is raised while converting a dict key to a string (#526) @JustAnotherArchivist

Fix memory leak dumping on non-string dict keys (#521) @JustAnotherArchivist

Fix ref counting on repeated default function calls (#524) @JustAnotherArchivist

Remove redundant wheel dependency from pyproject.toml (#535) @hugovk

Commits

9c20de0 Merge pull request from GHSA-fm67-cv37-96ff

b21da40 Fix double free on string decoding if realloc fails

67ec071 Merge pull request #555 from JustAnotherArchivist/fix-decode-surrogates-2

bc7bdff Replace wchar_t string decoding implementation with a uint32_t-based one

cc70119 Merge pull request #548 from JustAnotherArchivist/arbitrary-ints

4b5cccc Merge pull request #553 from bwoodsend/pypy-ci

abe26fc Merge pull request #551 from bwoodsend/bye-bye-travis

3efb5cc Delete old TravisCI workflow and references.

404de1a xfail test_decode_surrogate_characters() on Windows PyPy.

f7e66dc Switch to musl docker base images.

Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

@dependabot use these labels will set the current labels as the default for future PRs for this repo and language

@dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language

@dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language

@dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

You can disable automated security fix PRs for this repo from the Security Alerts page.

dependencies
opened by dependabot[bot] 0
Change the Python and SQL process to discourage SQL injection

The training in sources/curriculum/software/python_sql.md doesn't say anything about potential SQL injection issues, and is training folks to write potentially unsafe code. There should at least be a mention of SQL injection attacks, or the training should be rewritten to use bound parameters.

opened by jdcc 0
:bug: Incorrect instructions on the `Python` section in `Tech Setup`
In the Python section we use an ssh tunnel to connect to jupyter lab on the training VM. This command:

$ ssh localhost:{YYYY}:localhost:{XXXX} [email protected]

actually requires the following flags:

$ ssh -i ~/.ssh/id_rsa -N -L localhost:{YYYY}:localhost:{XXXX} [email protected]
opened by ChristinaLast 0

Owner

Data Science for Social Good

GitHub

Kellogg bad | Union good | Support strike funds

KelloggBot Credit to SeanDaBlack for the basis of the script. req.py is selenium python bot. sc.js is a the base of the ios shortcut [COMING SOON] Set

407 Nov 17, 2022

Expense Tracker is a very good tool to keep track of your expenseditures and the total money you saved.

9 Dec 31, 2022

A good Tool to comment on xmw

1 Feb 10, 2022

Tools for collecting social media data around focal events

Social Media Focal Events The focalevents codebase provides tools for organizing data collected around focal events on social media. It is often diffi

80 Nov 28, 2022

A demo of a data science project using Kedro

iris Overview This is your new Kedro project, which was generated using Kedro 0.17.4. Take a look at the Kedro documentation to get started. Rules and

14 Oct 14, 2022

A Python wrapper API for operating and working with the Neo4j Graph Data Science (GDS) library

gdsclient NOTE: This is a work in progress and many GDS features are known to be missing or not working properly. This repo hosts the sources for gdsc