Parsing Course Info for NIT Kurukshetra
Overview
This repository houses code for a small Python script to convert the course info found here into a JSON file suitable for usage anywhere it's needed (provided all the PDFs are converted to text firsthand)
The code is fairly messy and does need to be cleaned up, and the output itself is unreliable as the PDF to text conversion is not perfect. The source material is also fairly inconsistent when it comes to key words used, sometimes using synonyms (instead of writing "reference books", they sometimes write "suggested books").
Hence care should be used while using the tool, and it is strongly recommended to go through the output once.
The tool can automatically detect certain unnecessary lines in the input (lines with just a newline character, ones with just ---, etc.) but it's not perfect.
Requirements
No external modules are used; the standard packages are enough
Usage
python savetojson.py path/to/txtfile.txt
Once a JSON has been generated, use checkjson.py
script to get a list of all the entries in each file that are empty. This output is printed to stderr from where you can redirect it to a file and use it as a checklist. A helpful summary is also printed out to stdout.
To use this script, run:
python checkjson.py path/to/json/file.json path/to/another/json/file.json