ProPublica's collaborative tip-gathering framework. Import and manage CSV, Google Sheets and Screendoor data with ease.

ProPublica

Last update: Oct 18, 2022

Related tags

Organization django google-sheets journalism csv-import screendoor

Overview

Collaborate

This is a web application for managing and building stories based on tips solicited from the public. This project is meant to be easy to setup for non-programmer, intuitive to use and highly extendable.

Here are a few use cases:

Collection of data from various sources (Google Form via Google Sheets, Screendoor, Private Google Spreadsheets)
An easy to setup data entry system
Organizing data from multiple sources and allowing many users to view and annotate it

The project is broken up into several components:

A system for transforming CSV files into managed database records
A default and automatic Django admin panel built for rapid and easy editing, managing and browsing of data
Customizable fields for tagging, querying, annotating and tracking tips

This is a project of ProPublica, supported by the Google News Initiative.

Documentation

We have a GitBook with a full user guide that covers running Collaborate, importing and refining data, and setting up Google services. You can read the documentation here.

Deploy it

Collaborate has builtin support for one-click installs in both Google Cloud and Heroku. During the setup process for both deployments, make sure to fill in the email, username and password fields so you can log in.

Heroku

The Heroku deploy button will create a small, "free-tier" Collaborate system. This consists of a small web server, a database which supports between 10k-10M records (depending on data size) and automatically configures scheduled data re-importing.

Google Cloud

The Google Cloud Run button launches Collaborate into the Google Cloud environment. This deploy requires you to setup a Google Project, enable Google Cloud billing and enable the Cloud Run API. Full set up instructions are here.

This deploy does not automatically configure scheduled re-importing, but you can add it via Cloud Scheduler by following these instructions.

Once you've deployed your Cloud Run instance, you can manage your running instance from the Google Developer's Console.

Getting Started (Local Testing/Development)

Getting the system set up and running locally begins with cloning this repository and installing the Python dependencies. Python 3.6 or 3.7 and Django 2.2 are assumed here.

# virtual environment is recommended
mkvirtualenv -p /path/to/python3.7 collaborative
# install python dependencies
pip install -r requirements.txt

Assuming everything worked, let's bootstrap and then start the local server:

# get the database ready
python manage.py migrate

# create a default admin account
python manage.py createsuperuser

# gather up django and collaborate assets
python manage.py collectstatic --noinput

# start the local application
python manage.py runserver

You can then access the application http://localhost:8000 and log in with the credentials you selected in the createsuperuser step (above). Logging in will bring you to a configuration wizard where you will import your first Google Sheet and import its contents.

Production Deploy (Nginx/Docker)

If you want to deploy this to a production environment, we've included configuration templates and scripts for Docker and Nginx.

A Collaborate Dockerfile (the same one used by the Google Cloud Run deploy) can be found here:

deploy/google-cloud/Dockerfile

This creates a basic production environment with nginx and gunicorn. By default, it uses SQLite3, but you can configure the database by adding a DATABASE_URL environment variable. You can read more about the format for this variable here.

We also included a configuration script for plain Nginx deploys here:

deploy/google-cloud/django_nginx.conf

This can be copied to your main Nginx sites configuration directory (e.g., /etc/nginx/sites-available/).

In order to get auto-updating data sources, make sure to add a cron job that runs the following manage.py command:

manage.py refresh_data_sources

There's an example cron file that, when added to your /etc/crontab, will update data every 15 minutes:

./deploy/cron/refresh_data_sources

Note that if you use the above example, you probably want to add logrotate for the logfile the above cron config adds. You can find the logrotate script here (add it to /etc/logrotate.d/refresh_data_sources):

./deploy/logrotate/refresh_data_sources

Comments

Add to existing dataset

Public Integrity is using JotForm and manually uploading responses each day. Is there a way we can add to our existing dataset without creating a new group?

opened by kristinecpi 27
Memory leak?

Every day, Heroku throws hundreds of Error R14 (Memory quota exceeded) errors, within an hour or so of rebooting. I'm not sure how to start to diagnose this, but wanted to file an issue in case other users are seeing this or if you have advice to resolve it.

opened by tommeagher 17
Can't re-import new Screendoor write-ins

Hey all,

I'm working on the "Debthospitalslrn" project in Collaborate, and was trying to re-import new responses from Screendoor during Brandon's latest fix.

Now, when I try to re-import Screendoor responses I get directed to an error page. Here's a screenshot of the top of that page:

Feel free to reach me at [email protected] if you want to talk through further, or if I can send any additional information to make troubleshooting easier.

Thanks,

Maya

opened by mayatmiller 11
Cloud-run Dockerfile installs Django 3, errors
Built the project using a modified Dockerfile from deploy/google-cloud.

Got this error on deploy, when running the manage command:

ImportError: cannot import name 'six' from 'django.utils' (/usr/local/lib/python3.8/site-packages/django/utils/__init__.py)

Traced it to the Dockerfile cloning from the cloud-run repo branch, which installs Django >=2.2.2. Currently that's Django 3. Django 3 removed six.
opened by chriszs 9
Collaborate is not importing name and email from Screendoor

Hi,

I have a project which is pulling data from Screendoor, but for some reason the names and email addresses are not coming through to the Collaborate portal (these are compulsory data fields in Screendoor).

Is there something I'm missing?

opened by BluClare 7
No Google authentication on main page for fresh install

Per @rachelgli , I'm filing this ticket:

I've got a fairly fresh install of Collaborate running on Heroku. So fresh it has no data, which may be part of the problem. But the base page at / does not have a way to authenticate with Google:

Clicking on the Collaborate icon leads to /admin/ , which redirects to /admin/login/?next=/admin/ , where I do get Google authentication:

If this is your biggest problem you're in really good shape.

opened by stucka 6
Can't Re-Import Google Sheet Response

Hi Brandon,

Maya here -- hope you're doing well. I'm trying to re-import the Google sheet attached to "Longtermcare ctp" data source, and am getting a Server Error (500). Could you help de-bug?

Thanks,

Maya

opened by maya-miller-engagement 4
Error while deploying
Hi, thank you for the great tool! When I tried to deploy to GCP, I got these errors.

Cloud Run error: Container failed to start. Failed to start and then listen on the port defined by the PORT environment variable. Logs for this revision might contain more information.

ImportError: cannot import name 'six' from 'django.utils' (/usr/local/lib/python3.8/site-packages/django/utils/init.py)

Do you know how to solve these problems?
opened by n1n9-jp 3
User permissions aren't working properly

Unless you check the superuser box, users can't see any projects. Adding groups/user permissions don't work. Once you check the superuser box, the user can access all the projects.

opened by rachelgli 3
Responses not populating in Collaborate via Google Sheet import
Hey Brandon! Having another small bug in the project "longtermcare ctp". A handful of fields are not populating in Collaborate, including the following:

"Please explain how you know this."

"Please provide us with as much detail as you're comfortable with about the person and the circumstances around their death."

And a handful of others. Here's the Google Sheet that is feeding the form, which you're shared on: [redacted!]

Let me know if you want to chat through this on the phone -- I'm around if need be! Thanks,

Maya
opened by maya-miller-engagement 2
Getting server error 500 when uploading Screendoor project

I get a server error 500 error when trying to upload a new project from Screendoor. It has numerous responses, and there are no duplicate columns. I tested this on our external Collaborate and on your server, and I'm getting the same result. So I'm not sure why it's not working. This happened with another Screendoor project in December and I couldn't figure out the cause of the error. (For later reference: it's the Oregon timber callout)

I'm hoping we can get you into Screendoor to look in there to see what the issue might be.

opened by riogringa 2
Bump certifi from 2019.3.9 to 2022.12.7
Bumps certifi from 2019.3.9 to 2022.12.7.

Commits

9e9e840 2022.12.07

b81bdb2 2022.09.24

939a28f 2022.09.14

aca828a 2022.06.15.2

de0eae1 Only use importlib.resources's new files() / Traversable API on Python ≥3.11 ...

b8eb5e9 2022.06.15.1

47fb7ab Fix deprecation warning on Python 3.11 (#199)

b0b48e0 fixes #198 -- update link in license

9d514b4 2022.06.15

4151e88 Add py.typed to MANIFEST.in to package in sdist (#196)

Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

@dependabot use these labels will set the current labels as the default for future PRs for this repo and language

@dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language

@dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language

@dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

You can disable automated security fix PRs for this repo from the Security Alerts page.

dependencies
opened by dependabot[bot] 0
PyPI

Would it be possible to add a setup.py/pyproject.toml and release the code components onto PyPI, at least to reserve the name, but also so it is installable by other projects which want to utilise the code in their own project.

opened by jayvdb 0
Import Screendoor response dates

Putting this here for house keeping: we need to import the date field from Screendoor responses.

In Google Forms/Sheets, this is automatically handed to us in a column, but in SD we have to pull this manually from a characteristically odd location in the response data structure.

opened by brandonrobertz 0
Collaborate won't import new responses in large dataset from Screendoor

We added a project with several thousand responses from Screendoor yesterday. I was getting time out error messages when I reimported the data throughout the day, but when I went back into the project, the data would eventually update.

However, today, when I was trying to update some 3,000+ new responses, I'm getting the time out error and it's not actually updating the data.

The number of records is at 6,049; it should be over 9,100.

opened by riogringa 1
Add error notification to auto-import

Auto-importing (via a cron job) works, but there are a few issues with it. Also, we need to handle errors gracefully and alert the user to any issues. Currently, if there's an error, the automatic re-import will just silently fail.

opened by brandonrobertz 0

Owner

ProPublica

Journalism in the Public Interest

GitHub

🦉Data Version Control | Git for Data & Models

Website • Docs • Blog • Twitter • Chat (Community & Support) • Tutorial • Mailing List Data Version Control or DVC is an open-source tool for data sci

10.9k Jan 5, 2023

Invenio digital library framework

Invenio Framework v3 Open Source framework for large-scale digital repositories. Invenio Framework is like a Swiss Army knife of battle-tested, safe a

562 Jan 7, 2023

A :baby: buddy to help caregivers track sleep, feedings, diaper changes, and tummy time to learn about and predict baby's needs without (as much) guess work.

Baby Buddy A buddy for babies! Helps caregivers track sleep, feedings, diaper changes, tummy time and more to learn about and predict baby's needs wit

1.5k Jan 2, 2023

The open-source core of Pinry, a tiling image board system for people who want to save, tag, and share images, videos and webpages in an easy to skim through format.

The open-source core of Pinry, a tiling image board system for people who want to save, tag, and share images, videos and webpages in an easy to skim

2.7k Jan 8, 2023

🗃 Open source self-hosted web archiving. Takes URLs/browser history/bookmarks/Pocket/Pinboard/etc., saves HTML, JS, PDFs, media, and more...

14.8k Jan 5, 2023

Free and open-source digital preservation system designed to maintain standards-based, long-term access to collections of digital objects.

Archivematica By Artefactual Archivematica is a web- and standards-based, open-source application which allows your institution to preserve long-term

338 Dec 16, 2022

:books: Web app for browsing, reading and downloading eBooks stored in a Calibre database

About Calibre-Web is a web app providing a clean interface for browsing, reading and downloading eBooks using an existing Calibre database. This softw

8.2k Jan 2, 2023

Collect your thoughts and notes without leaving the command line.

jrnl To get help, submit an issue on Github. jrnl is a simple journal application for your command line. Journals are stored as human readable plain t

31 Dec 1, 2022

Scan, index, and archive all of your paper documents

[ en | de | el ] Important news about the future of this project It's been more than 5 years since I started this project on a whim as an effort to tr

7.8k Jan 6, 2023

Automatic Video Library Manager for TV Shows. It watches for new episodes of your favorite shows, and when they are posted it does its magic.

Automatic Video Library Manager for TV Shows. It watches for new episodes of your favorite shows, and when they are posted it does its magic. Exclusiv

1.5k Dec 30, 2022

ProPublica's collaborative tip-gathering framework. Import and manage CSV, Google Sheets and Screendoor data with ease.

Related tags

Overview

Collaborate

Documentation

Deploy it

Heroku

Google Cloud

Getting Started (Local Testing/Development)

Production Deploy (Nginx/Docker)

Comments

Owner

ProPublica

🦉Data Version Control | Git for Data & Models

Invenio digital library framework

A :baby: buddy to help caregivers track sleep, feedings, diaper changes, and tummy time to learn about and predict baby's needs without (as much) guess work.

The open-source core of Pinry, a tiling image board system for people who want to save, tag, and share images, videos and webpages in an easy to skim through format.

🗃 Open source self-hosted web archiving. Takes URLs/browser history/bookmarks/Pocket/Pinboard/etc., saves HTML, JS, PDFs, media, and more...

Free and open-source digital preservation system designed to maintain standards-based, long-term access to collections of digital objects.

:books: Web app for browsing, reading and downloading eBooks stored in a Calibre database

Collect your thoughts and notes without leaving the command line.

Scan, index, and archive all of your paper documents

Automatic Video Library Manager for TV Shows. It watches for new episodes of your favorite shows, and when they are posted it does its magic.

Agile project management platform. Built on top of Django and AngularJS

A collection of self-contained and well-documented issues for newcomers to start contributing with

ProPublica's collaborative tip-gathering framework. Import and manage CSV, Google Sheets and Screendoor data with ease.

Fully Automated YouTube Channel ▶️with Added Extra Features.

Peloton Stats to Google Sheets with Data Visualization through Seaborn and Plotly

Fairstructure - Structure your data in a FAIR way using google sheets or TSVs

DB-Drive-CSV - This is app is can be used to access CSV file as JSON from Google Drive.

A Discord BOT that uses Google Sheets for storing the roles and permissions of a discord server.

Autodrive is designed to make it as easy as possible to interact with the Google Drive and Sheets APIs via Python

A discord bot that utilizes Google's Rest API for Calendar, Drive, and Sheets