Makes google's political ad database actually useful

Overview

Making Google's political ad transparency library suck less

This is a series of scripts that takes Google's political ad transparency data and makes the ad content searchable as, ironically, the world's most powerful search company does not make their ad data searchable.

It can also takes the ad targeting information and map it to electorates, but this only works for postcodes at the moment so isn't in the main group of scripts yet.

It is aimed at Australian content, but most of the scripts could be applied to all ad content if you'd like to use it elsewhere.

Get the data

The current output is a work-in-progress, but you can find the latest file here as gzipped csv or json

What it does:

  • Gets the text content from text ads
  • Gets the YouTube title for YouTube ads
  • Gets the YouTube transcript for YouTube ads if it is available
  • Gets the image URL for image ads

Still to do:

  • OCR images and put the text in the database
  • Figure out if there's a good way to get text from animated html ads
  • Run the non-YouTube video ads through speech-to-text and put the text in the database
  • Archive ad content to S3 as Google removes it entirely when an ad is removed (eg UAP ads)
  • Archive ad database daily to S3 with a timestamp
  • Add electorates for Australian ads
You might also like...
A Persistent Embedded Graph Database for Python
A Persistent Embedded Graph Database for Python

Cog - Embedded Graph Database for Python cogdb.io New release: 2.0.5! Installing Cog pip install cogdb Cog is a persistent embedded graph database im

A Painless Simple Way To Create Schema and Do Database Operations Quickly In Python
A Painless Simple Way To Create Schema and Do Database Operations Quickly In Python

PainlessDB - Taking Your Pain away to the moon ๐Ÿš€ Contribute ยท Community ยท Documentation ๐ŸŽซ Introduction : PainlessDB is a Python-based free and open-

HTTP graph database built in Python 3

KiwiDB HTTP graph database built in Python 3. Reference Format References are strings in the format: {refIDENTIFIER@GROUP} Authentication Currently, t

A NoSQL database made in python.

CookieDB A NoSQL database made in python.

Tiny local JSON database for Python.
Tiny local JSON database for Python.

Pylowdb Simple to use local JSON database ๐Ÿฆ‰ # This is pure python, not specific to pylowdb ;) db.data['posts'] = ({ 'id': 1, 'title': 'pylowdb is awe

PathfinderMonsterDatabase - A database of all monsters in Pathfinder 1e, created by parsing aonprd.com

PathfinderMonsterDatabase A database of all monsters in Pathfinder 1e, created by parsing aonprd.com Setup Run the following line to install all requi

ClutterDB - Extremely simple JSON database made for infrequent changes which behaves like a dict

extremely simple JSON database made for infrequent changes which behaves like a dict this was made for ClutterBot

Shelf DB is a tiny document database for Python to stores documents or JSON-like data
Shelf DB is a tiny document database for Python to stores documents or JSON-like data

Shelf DB Introduction Shelf DB is a tiny document database for Python to stores documents or JSON-like data. Get it $ pip install shelfdb shelfquery S

A very simple document database

DockieDb A simple in-memory document database. Installation Build the Wheel Fork or clone this repository and run python setup.py bdist_wheel in the r

Comments
  • added parsing for html images and all supporting code

    added parsing for html images and all supporting code

    What does this change?

    This adds support for html image parsing which has been tested with png images but should work with all image formats. Changes have been integrated across necessary scripts imageParser .py and ocrImages. A full functionality test still needs to be run.

    Unfortunately Tesseract is not picking up text from the parsed html images,. This may be because they have transparent backgrounds and a possible solution could involve adding a white background.

    opened by foroveralls 0
  • Missing col in sqlite table?

    Missing col in sqlite table?

    Looks like scraperwiki is looking for a "video_type" col - which doesn't automatically exist in the scraperwiki.sqlite file (at least not mine)

    (manually adding that field allows script to run btw)

    sqlalchemy.exc.OperationalError: (sqlite3.OperationalError) no such column: video_type
    [SQL: select * from aus_ads where Ad_Type='Video' AND video_type IS NULL]
    
    

    Cols I do automatically have:

    Ad_ID Ad_URL Ad_Type Regions Advertiser_ID Advertiser_Name Ad_Campaigns_List Date_Range_Start Date_Range_End Num_of_Days Impressions Spend_USD First_Served_Timestamp Last_Served_Timestamp Age_Targeting Gender_Targeting Geo_Targeting_Included Geo_Targeting_Excluded Spend_Range_Min_USD Spend_Range_Max_USD Spend_Range_Min_AUD Spend_Range_Max_AUD

    opened by dylanjmcconnell 0
  • requirements

    requirements

    Need to install some non-standard (at least for me) libraries

    โ€ข youtube_transcript_api โ€ข scraperwiki

    (...others I can see include pandas, simplejson, requests, pytesseract, PIL, cv2)

    opened by dylanjmcconnell 0
Owner
The Guardian
The source code of the world's leading liberal voice
The Guardian
TinyDB is a lightweight document oriented database optimized for your happiness :)

Quick Links Example Code Supported Python Versions Documentation Changelog Extensions Contributing Introduction TinyDB is a lightweight document orien

Markus Siemens 5.6k Dec 30, 2022
Python object-oriented database

ZODB, a Python object-oriented database ZODB provides an object-oriented database for Python that provides a high-degree of transparency. ZODB runs on

Zope 574 Dec 31, 2022
This is a simple graph database in SQLite, inspired by

This is a simple graph database in SQLite, inspired by "SQLite as a document database".

Denis Papathanasiou 1.2k Jan 3, 2023
Elara DB is an easy to use, lightweight NoSQL database that can also be used as a fast in-memory cache.

Elara DB is an easy to use, lightweight NoSQL database written for python that can also be used as a fast in-memory cache for JSON-serializable data. Includes various methods and features to manipulate data structures in-memory, protect database files and export data.

Saurabh Pujari 101 Jan 4, 2023
Python function to extract all the rows from a SQLite database file while iterating over its bytes, such as while downloading it

Python function to extract all the rows from a SQLite database file while iterating over its bytes, such as while downloading it

Department for International Trade 16 Nov 9, 2022
LightDB is a lightweight JSON Database for Python

LightDB What is this? LightDB is a lightweight JSON Database for Python that allows you to quickly and easily write data to a file Installing pip3 ins

Stanislaw 14 Oct 1, 2022
A simple GUI that interacts with a database to keep track of a collection of US coins.

CoinCollectorGUI A simple gui designed to interact with a database. The goal of the database is to make keeping track of collected coins simple. The G

Builder212 1 Nov 9, 2021
MyReplitDB - the most simplistic and easiest wrapper to use for replit's database system.

MyReplitDB is the most simplistic and easiest wrapper to use for replit's database system. Installing You can install it from the PyPI Or y

kayle 4 Jul 3, 2022
A Simple , โ˜๏ธ Lightweight , ๐Ÿ’ช Efficent JSON based database for ๐Ÿ Python.

A Simple, Lightweight, Efficent JSON based DataBase for Python The current stable version is v1.6.1 pip install pysondb==1.6.1 Support the project her

PysonDB 282 Jan 7, 2023
Decentralised graph database management system

Decentralised graph database management system To get started clone the repo, and run the command below. python3 database.py Now, create a new termina

Omkar Patil 2 Apr 18, 2022