Media-Cloud-Outlet-Filtering
Using ABYZ and Media-Bias Fact-Check outlet databases, I've provided outlet CSV files for both and scripts intended to match Media Cloud files to respective outlets.
Provided Files:
-
abyz_outlets.csv: CSV file containing information on outlets provided by the ABYZ dataset
Information included: index, greater region, sub-region, local, national or foreign, name, media type, media focus, language.
-
mbfc_outlets.csv: CSV file containing information on outlets provided by the Media-Bias Fact-Check dataset.
Information included: name, link, and perceived bias.
Scripts:
-
match_mbfc.py: Python script intended to match tar.xz files containing MediaCloud articles to Media-Bias Fact-Check outlets listed in mbfc_outlets.csv. To run this script in the command line, run the template command: "python match_mbfc.py {TAR.XZ FILE}"
output: a CSV file including all matched articles with corresponding mbfc-outlet information
example: If I sought to match all articles in the articles/pl.tar.xz collection, I might run the command below:
python match_mbfc.py articles/pl.tar.xz
-
match_abyz.py: Python script intended to match tar.xz files containing MediaCloud articles to ABYZ outlets listed in abyz_outlets.csv. To run this script in the command line, run the template command: "python match_abyz.py {TAR.XZ FILE}"
output: a CSV file including all matched articles with corresponding abyz-outlet information
example: If I sought to match all articles in the articles/pl.tar.xz collection, I might run the command below:
python match_abyz.py articles/pl.tar.xz