Automate the case review on legal case documents and find the most critical cases using network analysis

Yi Yin

Last update: Dec 28, 2022

Related tags

Data Visualization python lexical-analysis network-analysis igraph pdfminer pdf-parser

Overview

Automation on Legal Court Cases Review

This project is to automate the case review on legal case documents and find the most critical cases using network analysis.

Short write-up

Affiliation: Institute for Social and Economic Research and Policy, Columbia University

Project Information:

Keywords: Automation, PDF parse, String Extraction, Network Analysis

Software:

Python : pdfminer, LexNLP, nltk sklearn
R: igraph

Scope:

Parse court documents, extract citations from raw text.
Build citation network, identify important cases in the network.
Extract judge's opinion text and meta information including opinion author, court, decision.
Model training to predict court decision based on opinion text.

Polit Study on 159 Legal Court Documents (in `pilot_159` folder)

1. Process PDF documents using `Python`

Ipython Notebook	Description
`1.Extraction by LexNLP.ipynb`	Extract meta inforation use `LexNLP` package.
`2.Layer Analysis on Sigle File. ipynb`	Use `pdfminer` to extract the raw text and the paragraph segamentation in the PDF document.
`3.Patent Position by Layer.ipynb`	Identify the position of patent number in extracted layers from PDF.
`4.Opinion and Author by Layer.ipynb`	Extract opinion text, author, decisions from the layers list.
`5.Wrap up to Meta Data.ipynb`	Store extracted meta data to `.json` or `.csv`
`6.Visualize citation frequency.ipynb`	Bar plot of the citation frequencies

2. Data: Parse PDF documents via `Python`

These datasets are NOT included in this public repository for intellectual property and privacy concern

File
`pdf2text159.json`	A dictionary of 3 list: `file_name`, `raw_text`, `layers`.
`cite_edge159.csv`	Edge list of citation network
`cite_node159.csv`	Meta information of each case: `case_number`, `court`, `dates`
`reference_extract.csv`	cited cases in a list for every case, untidy format for analysis
`citation159.csv`	file citation pair, tidy format for calculation
`regulation159.csv`	file regulation pair, tidy format for calculation

3. Analyze and Visualize using `R`

File
`Calculate Citation Frequency.Rmd`	Analyze `reference_extract.csv`
`Citation Network.Rmd`	Analyze `cite_edge159`

4. Visulization Chart Sample

Citation Frequency

Citation Network

Network Visulization and Predictive Modeling on 854 Legal Court Cases (in `Extraction_Modelling` folder)

1. Extract opinion and meta information from raw text data

`.ipynb` notebook	Description
`Full Dataset Merge.ipynb`	Merge the 854 cases dataset
`Edge and Node List.ipynb`	Create edge and node list
`Full Extractions.ipynb`	Extract author, judge panel, opinion text
`Clean Opinion Text.ipynb`	Remove references and special characters in opinion text

2. Datasets

These datasets are NOT included in this public repository for intellectual property and privacy concern

Dataset	Description
`amy_cases.json`	large dictionary {file name: raw text} for 854 cases, from Lilian's PDF parsing
`full_name_text.json`	convert `amy_cases.json` key value pair to two list: `file_name`, `raw_text`
`cite_edge.csv`	edge list of citation
`cite_node.csv`	node list contains `case_code`, `case_name`, `court_from`, `court_type`
`extraction854.csv`	full extractions include `case_code`, `case_name`, `court_from`, `court_type`, `result`, `author`, `judge_panel`
`decision_text.json`	json file include `author`, `decision`(result of the case), `opinion` (opinion text), `cleaned_text` (cleaned opinion text)
`cleaned_text.csv`	csv file contains allt the cleaned text
`predict_data.csv`	cleaned dataset for NLP modeling predict court decision

3. Visulization using R

R markdown file
`Full Network Graph.Rmd`	draw the full citation network
`Citation Betwwen Nodes.Rmd`	draw citation between all the available cases
`Clean Data For Predictive Modelling.rmd`	clean text data for predictive modeling

Interactive Graph

Play with Interactive Graph

Full Citation Network (all cases and cited cases)

Citation Between Available Cases

4. Predictive Modeling using Python

`ipynb` notebook
`NLP Predictive Modeling.ipynb`	Try different preprocessing, and build a logistic regression to predict court decision.

Visulization of the Bi-gram (words) with the strongest coefficient

Domain Connectivity Analysis Tools to analyze aggregate connectivity patterns across a set of domains during security investigations

DomainCAT (Domain Connectivity Analysis Tool) Domain Connectivity Analysis Tool is used to analyze aggregate connectivity patterns across a set of dom

34 Dec 9, 2022

Histogramming for analysis powered by boost-histogram

Hist Hist is an analyst-friendly front-end for boost-histogram, designed for Python 3.7+ (3.6 users get version 2.4). See what's new. Installation You

97 Dec 25, 2022

Runtime analysis of code with plotting

Runtime analysis of code with plotting A quick comparison among Python, Cython, and the C languages A Programming Assignment regarding the Programming

2 Dec 24, 2021

AB-test-analyzer - Python class to perform AB test analysis

AB-test-analyzer Python class to perform AB test analysis Overview This repo con

13 Jul 16, 2022

A System Metrics Monitoring Tool Built using Python3 , rabbitmq,Grafana and InfluxDB. Setup using docker compose. Use to monitor system performance with graphical interface of grafana , storage of influxdb and message queuing of rabbitmq

SystemMonitoringRabbitMQGrafanaInflux This repository has code to setup a system monitoring tool The tools used are the follows Python3.6 Docker Rabbi

7 Sep 6, 2022

Simple, realtime visualization of neural network training performance.

The Timescale NFT Starter Kit is a step-by-step guide to get up and running with collecting, storing, analyzing and visualizing NFT data from OpenSea, using PostgreSQL and TimescaleDB.

Timescale NFT Starter Kit The Timescale NFT Starter Kit is a step-by-step guide to get up and running with collecting, storing, analyzing and visualiz

102 Dec 24, 2022

Automate the case review on legal case documents and find the most critical cases using network analysis

Related tags

Overview

Automation on Legal Court Cases Review

Project Information:

Polit Study on 159 Legal Court Documents (in pilot_159 folder)

1. Process PDF documents using Python

2. Data: Parse PDF documents via Python

3. Analyze and Visualize using R

4. Visulization Chart Sample

Citation Frequency

Citation Network

Network Visulization and Predictive Modeling on 854 Legal Court Cases (in Extraction_Modelling folder)

1. Extract opinion and meta information from raw text data

2. Datasets

3. Visulization using R

Interactive Graph

Full Citation Network (all cases and cited cases)

Citation Between Available Cases

4. Predictive Modeling using Python

Visulization of the Bi-gram (words) with the strongest coefficient

You might also like...

Domain Connectivity Analysis Tools to analyze aggregate connectivity patterns across a set of domains during security investigations

Histogramming for analysis powered by boost-histogram

Runtime analysis of code with plotting

AB-test-analyzer - Python class to perform AB test analysis

A System Metrics Monitoring Tool Built using Python3 , rabbitmq,Grafana and InfluxDB. Setup using docker compose. Use to monitor system performance with graphical interface of grafana , storage of influxdb and message queuing of rabbitmq

Simple, realtime visualization of neural network training performance.

Generate a roam research like Network Graph view from your Notion pages.

ICS-Visualizer is an interactive Industrial Control Systems (ICS) network graph that contains up-to-date ICS metadata

The Timescale NFT Starter Kit is a step-by-step guide to get up and running with collecting, storing, analyzing and visualizing NFT data from OpenSea, using PostgreSQL and TimescaleDB.

Owner

Yi Yin

Analysis and plotting for motor/prop/ESC characterization, thrust vs RPM and torque vs thrust

3D plotting and mesh analysis through a streamlined interface for the Visualization Toolkit (VTK)

3D plotting and mesh analysis through a streamlined interface for the Visualization Toolkit (VTK)

Python package for hypergraph analysis and visualization.

Squidpy is a tool for the analysis and visualization of spatial molecular data.

📊📈 Serves up Pandas dataframes via the Django REST Framework for use in client-side (i.e. d3.js) visualizations and offline analysis (e.g. Excel)

Political elections, appointment, analysis and visualization in Python

Sentiment Analysis application created with Python and Dash, hosted at socialsentiment.net

Python package for the analysis and visualisation of finite-difference fields.

Tools for exploratory data analysis in Python

Polit Study on 159 Legal Court Documents (in `pilot_159` folder)

1. Process PDF documents using `Python`

2. Data: Parse PDF documents via `Python`

3. Analyze and Visualize using `R`

Network Visulization and Predictive Modeling on 854 Legal Court Cases (in `Extraction_Modelling` folder)