ProPublica's collaborative tip-gathering framework. Import and manage CSV, Google Sheets and Screendoor data with ease.

Overview

Collaborate

ProPublica Google News Initiative

This is a web application for managing and building stories based on tips solicited from the public. This project is meant to be easy to setup for non-programmer, intuitive to use and highly extendable.

Here are a few use cases:

  • Collection of data from various sources (Google Form via Google Sheets, Screendoor, Private Google Spreadsheets)
  • An easy to setup data entry system
  • Organizing data from multiple sources and allowing many users to view and annotate it

The project is broken up into several components:

  • A system for transforming CSV files into managed database records
  • A default and automatic Django admin panel built for rapid and easy editing, managing and browsing of data
  • Customizable fields for tagging, querying, annotating and tracking tips

This is a project of ProPublica, supported by the Google News Initiative.

Documentation

We have a GitBook with a full user guide that covers running Collaborate, importing and refining data, and setting up Google services. You can read the documentation here.

Deploy it

Collaborate has builtin support for one-click installs in both Google Cloud and Heroku. During the setup process for both deployments, make sure to fill in the email, username and password fields so you can log in.

Heroku

Deploy

The Heroku deploy button will create a small, "free-tier" Collaborate system. This consists of a small web server, a database which supports between 10k-10M records (depending on data size) and automatically configures scheduled data re-importing.

Google Cloud

Run on Google Cloud

The Google Cloud Run button launches Collaborate into the Google Cloud environment. This deploy requires you to setup a Google Project, enable Google Cloud billing and enable the Cloud Run API. Full set up instructions are here.

This deploy does not automatically configure scheduled re-importing, but you can add it via Cloud Scheduler by following these instructions.

Once you've deployed your Cloud Run instance, you can manage your running instance from the Google Developer's Console.

Getting Started (Local Testing/Development)

Getting the system set up and running locally begins with cloning this repository and installing the Python dependencies. Python 3.6 or 3.7 and Django 2.2 are assumed here.

# virtual environment is recommended
mkvirtualenv -p /path/to/python3.7 collaborative
# install python dependencies
pip install -r requirements.txt

Assuming everything worked, let's bootstrap and then start the local server:

# get the database ready
python manage.py migrate

# create a default admin account
python manage.py createsuperuser

# gather up django and collaborate assets
python manage.py collectstatic --noinput

# start the local application
python manage.py runserver

You can then access the application http://localhost:8000 and log in with the credentials you selected in the createsuperuser step (above). Logging in will bring you to a configuration wizard where you will import your first Google Sheet and import its contents.

Production Deploy (Nginx/Docker)

If you want to deploy this to a production environment, we've included configuration templates and scripts for Docker and Nginx.

A Collaborate Dockerfile (the same one used by the Google Cloud Run deploy) can be found here:

deploy/google-cloud/Dockerfile

This creates a basic production environment with nginx and gunicorn. By default, it uses SQLite3, but you can configure the database by adding a DATABASE_URL environment variable. You can read more about the format for this variable here.

We also included a configuration script for plain Nginx deploys here:

deploy/google-cloud/django_nginx.conf

This can be copied to your main Nginx sites configuration directory (e.g., /etc/nginx/sites-available/).

In order to get auto-updating data sources, make sure to add a cron job that runs the following manage.py command:

manage.py refresh_data_sources

There's an example cron file that, when added to your /etc/crontab, will update data every 15 minutes:

./deploy/cron/refresh_data_sources

Note that if you use the above example, you probably want to add logrotate for the logfile the above cron config adds. You can find the logrotate script here (add it to /etc/logrotate.d/refresh_data_sources):

./deploy/logrotate/refresh_data_sources
Comments
  • Add to existing dataset

    Add to existing dataset

    Public Integrity is using JotForm and manually uploading responses each day. Is there a way we can add to our existing dataset without creating a new group?

    opened by kristinecpi 27
  • Memory leak?

    Memory leak?

    Every day, Heroku throws hundreds of Error R14 (Memory quota exceeded) errors, within an hour or so of rebooting. I'm not sure how to start to diagnose this, but wanted to file an issue in case other users are seeing this or if you have advice to resolve it.

    opened by tommeagher 17
  • Can't re-import new Screendoor write-ins

    Can't re-import new Screendoor write-ins

    Hey all,

    I'm working on the "Debthospitalslrn" project in Collaborate, and was trying to re-import new responses from Screendoor during Brandon's latest fix.

    Now, when I try to re-import Screendoor responses I get directed to an error page. Here's a screenshot of the top of that page:

    Screen Shot 2019-12-16 at 10 31 50 AM

    Feel free to reach me at [email protected] if you want to talk through further, or if I can send any additional information to make troubleshooting easier.

    Thanks,

    Maya

    opened by mayatmiller 11
  • Cloud-run Dockerfile installs Django 3, errors

    Cloud-run Dockerfile installs Django 3, errors

    Built the project using a modified Dockerfile from deploy/google-cloud.

    Got this error on deploy, when running the manage command:

    ImportError: cannot import name 'six' from 'django.utils' (/usr/local/lib/python3.8/site-packages/django/utils/__init__.py)
    

    Traced it to the Dockerfile cloning from the cloud-run repo branch, which installs Django >=2.2.2. Currently that's Django 3. Django 3 removed six.

    opened by chriszs 9
  • Collaborate is not importing name and email from Screendoor

    Collaborate is not importing name and email from Screendoor

    Hi,

    I have a project which is pulling data from Screendoor, but for some reason the names and email addresses are not coming through to the Collaborate portal (these are compulsory data fields in Screendoor).

    Is there something I'm missing?

    opened by BluClare 7
  • No Google authentication on main page for fresh install

    No Google authentication on main page for fresh install

    Per @rachelgli , I'm filing this ticket:

    I've got a fairly fresh install of Collaborate running on Heroku. So fresh it has no data, which may be part of the problem. But the base page at / does not have a way to authenticate with Google: basepage

    Clicking on the Collaborate icon leads to /admin/ , which redirects to /admin/login/?next=/admin/ , where I do get Google authentication: adminpage

    If this is your biggest problem you're in really good shape.

    opened by stucka 6
  • Can't Re-Import Google Sheet Response

    Can't Re-Import Google Sheet Response

    Hi Brandon,

    Maya here -- hope you're doing well. I'm trying to re-import the Google sheet attached to "Longtermcare ctp" data source, and am getting a Server Error (500). Could you help de-bug?

    Thanks,

    Maya

    opened by maya-miller-engagement 4
  • Error while deploying

    Error while deploying

    Hi, thank you for the great tool! When I tried to deploy to GCP, I got these errors.

    • Cloud Run error: Container failed to start. Failed to start and then listen on the port defined by the PORT environment variable. Logs for this revision might contain more information.

    • ImportError: cannot import name 'six' from 'django.utils' (/usr/local/lib/python3.8/site-packages/django/utils/init.py)

    Do you know how to solve these problems?

    opened by n1n9-jp 3
  • User permissions aren't working properly

    User permissions aren't working properly

    Screen Shot 2019-10-30 at 3 00 26 PM

    Unless you check the superuser box, users can't see any projects. Adding groups/user permissions don't work. Once you check the superuser box, the user can access all the projects.

    opened by rachelgli 3
  • Responses not populating in Collaborate via Google Sheet import

    Responses not populating in Collaborate via Google Sheet import

    Hey Brandon! Having another small bug in the project "longtermcare ctp". A handful of fields are not populating in Collaborate, including the following:

    • "Please explain how you know this."
    • "Please provide us with as much detail as you're comfortable with about the person and the circumstances around their death."

    And a handful of others. Here's the Google Sheet that is feeding the form, which you're shared on: [redacted!]

    Let me know if you want to chat through this on the phone -- I'm around if need be! Thanks,

    Maya

    opened by maya-miller-engagement 2
  • Getting server error 500 when uploading Screendoor project

    Getting server error 500 when uploading Screendoor project

    I get a server error 500 error when trying to upload a new project from Screendoor. It has numerous responses, and there are no duplicate columns. I tested this on our external Collaborate and on your server, and I'm getting the same result. So I'm not sure why it's not working. This happened with another Screendoor project in December and I couldn't figure out the cause of the error. (For later reference: it's the Oregon timber callout)

    I'm hoping we can get you into Screendoor to look in there to see what the issue might be.

    opened by riogringa 2
  • Bump certifi from 2019.3.9 to 2022.12.7

    Bump certifi from 2019.3.9 to 2022.12.7

    Bumps certifi from 2019.3.9 to 2022.12.7.

    Commits

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    • @dependabot use these labels will set the current labels as the default for future PRs for this repo and language
    • @dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language
    • @dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language
    • @dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

    You can disable automated security fix PRs for this repo from the Security Alerts page.

    dependencies 
    opened by dependabot[bot] 0
  • PyPI

    PyPI

    Would it be possible to add a setup.py/pyproject.toml and release the code components onto PyPI, at least to reserve the name, but also so it is installable by other projects which want to utilise the code in their own project.

    opened by jayvdb 0
  • Import Screendoor response dates

    Import Screendoor response dates

    Putting this here for house keeping: we need to import the date field from Screendoor responses.

    In Google Forms/Sheets, this is automatically handed to us in a column, but in SD we have to pull this manually from a characteristically odd location in the response data structure.

    opened by brandonrobertz 0
  • Collaborate won't import new responses in large dataset from Screendoor

    Collaborate won't import new responses in large dataset from Screendoor

    We added a project with several thousand responses from Screendoor yesterday. I was getting time out error messages when I reimported the data throughout the day, but when I went back into the project, the data would eventually update.

    However, today, when I was trying to update some 3,000+ new responses, I'm getting the time out error and it's not actually updating the data.

    The number of records is at 6,049; it should be over 9,100.

    Screen Shot 2020-02-12 at 11 08 21 AM

    opened by riogringa 1
  • Add error notification to auto-import

    Add error notification to auto-import

    Auto-importing (via a cron job) works, but there are a few issues with it. Also, we need to handle errors gracefully and alert the user to any issues. Currently, if there's an error, the automatic re-import will just silently fail.

    opened by brandonrobertz 0
Owner
ProPublica
Journalism in the Public Interest
ProPublica
πŸ¦‰Data Version Control | Git for Data & Models

Website β€’ Docs β€’ Blog β€’ Twitter β€’ Chat (Community & Support) β€’ Tutorial β€’ Mailing List Data Version Control or DVC is an open-source tool for data sci

Iterative 10.9k Jan 5, 2023
Invenio digital library framework

Invenio Framework v3 Open Source framework for large-scale digital repositories. Invenio Framework is like a Swiss Army knife of battle-tested, safe a

Invenio digital repository framework 562 Jan 7, 2023
A :baby: buddy to help caregivers track sleep, feedings, diaper changes, and tummy time to learn about and predict baby's needs without (as much) guess work.

Baby Buddy A buddy for babies! Helps caregivers track sleep, feedings, diaper changes, tummy time and more to learn about and predict baby's needs wit

Baby Buddy 1.5k Jan 2, 2023
The open-source core of Pinry, a tiling image board system for people who want to save, tag, and share images, videos and webpages in an easy to skim through format.

The open-source core of Pinry, a tiling image board system for people who want to save, tag, and share images, videos and webpages in an easy to skim

Pinry 2.7k Jan 8, 2023
πŸ—ƒ Open source self-hosted web archiving. Takes URLs/browser history/bookmarks/Pocket/Pinboard/etc., saves HTML, JS, PDFs, media, and more...

ArchiveBox Open-source self-hosted web archiving. ▢️ Quickstart | Demo | Github | Documentation | Info & Motivation | Community | Roadmap "Your own pe

ArchiveBox 14.8k Jan 5, 2023
Free and open-source digital preservation system designed to maintain standards-based, long-term access to collections of digital objects.

Archivematica By Artefactual Archivematica is a web- and standards-based, open-source application which allows your institution to preserve long-term

Artefactual 338 Dec 16, 2022
:books: Web app for browsing, reading and downloading eBooks stored in a Calibre database

About Calibre-Web is a web app providing a clean interface for browsing, reading and downloading eBooks using an existing Calibre database. This softw

Jan B 8.2k Jan 2, 2023
Collect your thoughts and notes without leaving the command line.

jrnl To get help, submit an issue on Github. jrnl is a simple journal application for your command line. Journals are stored as human readable plain t

Manuel Ebert 31 Dec 1, 2022
Scan, index, and archive all of your paper documents

[ en | de | el ] Important news about the future of this project It's been more than 5 years since I started this project on a whim as an effort to tr

Paperless 7.8k Jan 6, 2023
Automatic Video Library Manager for TV Shows. It watches for new episodes of your favorite shows, and when they are posted it does its magic.

Automatic Video Library Manager for TV Shows. It watches for new episodes of your favorite shows, and when they are posted it does its magic. Exclusiv

pyMedusa 1.5k Dec 30, 2022
Agile project management platform. Built on top of Django and AngularJS

Taiga Backend Documentation Currently, we have authored three main documentation hubs: API: Our API documentation and reference for developing from Ta

Taiga.io 5.8k Jan 5, 2023
A collection of self-contained and well-documented issues for newcomers to start contributing with

fedora-easyfix A collection of self-contained and well-documented issues for newcomers to start contributing with How to setup the local development e

Akashdeep Dhar 8 Oct 16, 2021
ProPublica's collaborative tip-gathering framework. Import and manage CSV, Google Sheets and Screendoor data with ease.

Collaborate This is a web application for managing and building stories based on tips solicited from the public. This project is meant to be easy to s

ProPublica 86 Oct 18, 2022
Fully Automated YouTube Channel ▢️with Added Extra Features.

Fully Automated Youtube Channel β–’β–ˆβ–€β–€β–ˆ β–ˆβ–€β–€β–ˆ β–€β–€β–ˆβ–€β–€ β–€β–€β–ˆβ–€β–€ β–ˆβ–‘β–‘β–ˆ β–ˆβ–€β–€β–„ β–ˆβ–€β–€ β–ˆβ–€β–€β–ˆ β–’β–ˆβ–€β–€β–„ β–ˆβ–‘β–‘β–ˆ β–‘β–‘β–ˆβ–‘β–‘ β–‘β–’β–ˆβ–‘β–‘ β–ˆβ–‘β–‘β–ˆ β–ˆβ–€β–€β–„ β–ˆβ–€β–€ β–ˆβ–„β–„β–€ β–’β–ˆβ–„β–„β–ˆ β–€β–€β–€β–€ β–‘β–‘β–€β–‘β–‘ β–‘β–’β–ˆβ–‘β–‘ β–‘β–€β–€β–€ β–€β–€β–€β–‘

sam-sepiol 249 Jan 2, 2023
Peloton Stats to Google Sheets with Data Visualization through Seaborn and Plotly

Peloton Stats to Google Sheets with Data Visualization through Seaborn and Plotly Problem: 2 peloton users were looking for a way to track their metri

null 9 Jul 22, 2022
Fairstructure - Structure your data in a FAIR way using google sheets or TSVs

Fairstructure - Structure your data in a FAIR way using google sheets or TSVs. These are then converted to LinkML, and from there other formats

Linked data Modeling Language 23 Dec 1, 2022
DB-Drive-CSV - This is app is can be used to access CSV file as JSON from Google Drive.

DB Drive CSV This is app is can be used to access CSV file as JSON from Google Drive. How To Use Create file/ upload file to Google Drive There's 2 fi

Hartawan Bahari M. 5 Oct 20, 2022
A Discord BOT that uses Google Sheets for storing the roles and permissions of a discord server.

Discord Role Manager Bot Role Manager is a discord BOT that utilizes Google Sheets for the organization of a server's hierarchy and permissions. Detai

Dion Rigatos 17 Oct 13, 2022
Autodrive is designed to make it as easy as possible to interact with the Google Drive and Sheets APIs via Python

Autodrive Autodrive is designed to make it as easy as possible to interact with the Google Drive and Sheets APIs via Python. It is especially designed

Chris Larabee 1 Oct 2, 2021
A discord bot that utilizes Google's Rest API for Calendar, Drive, and Sheets

Bott This is a discord bot that utilizes Google's Rest API for Calendar, Drive, and Sheets. The bot first takes the sheet from the schedule manager in

null 1 Dec 4, 2021