Automate the case review on legal case documents and find the most critical cases using network analysis

Overview

Automation on Legal Court Cases Review

This project is to automate the case review on legal case documents and find the most critical cases using network analysis.

Short write-up

Affiliation: Institute for Social and Economic Research and Policy, Columbia University

Project Information:

Keywords: Automation, PDF parse, String Extraction, Network Analysis

Software:

  • Python : pdfminer, LexNLP, nltk sklearn
  • R: igraph

Scope:

  1. Parse court documents, extract citations from raw text.
  2. Build citation network, identify important cases in the network.
  3. Extract judge's opinion text and meta information including opinion author, court, decision.
  4. Model training to predict court decision based on opinion text.

Polit Study on 159 Legal Court Documents (in pilot_159 folder)

1. Process PDF documents using Python

Ipython Notebook Description
1.Extraction by LexNLP.ipynb Extract meta inforation use LexNLP package.
2.Layer Analysis on Sigle File. ipynb Use pdfminer to extract the raw text and the paragraph segamentation in the PDF document.
3.Patent Position by Layer.ipynb Identify the position of patent number in extracted layers from PDF.
4.Opinion and Author by Layer.ipynb Extract opinion text, author, decisions from the layers list.
5.Wrap up to Meta Data.ipynb Store extracted meta data to .json or .csv
6.Visualize citation frequency.ipynb Bar plot of the citation frequencies

2. Data: Parse PDF documents via Python

These datasets are NOT included in this public repository for intellectual property and privacy concern

File
pdf2text159.json A dictionary of 3 list: file_name, raw_text, layers.
cite_edge159.csv Edge list of citation network
cite_node159.csv Meta information of each case: case_number, court, dates
reference_extract.csv cited cases in a list for every case, untidy format for analysis
citation159.csv file citation pair, tidy format for calculation
regulation159.csv file regulation pair, tidy format for calculation

3. Analyze and Visualize using R

File
Calculate Citation Frequency.Rmd Analyze reference_extract.csv
Citation Network.Rmd Analyze cite_edge159

4. Visulization Chart Sample

Citation Frequencycase_freq

Citation Networkcitation_net

Network Visulization and Predictive Modeling on 854 Legal Court Cases (in Extraction_Modelling folder)

1. Extract opinion and meta information from raw text data

.ipynb notebook Description
Full Dataset Merge.ipynb Merge the 854 cases dataset
Edge and Node List.ipynb Create edge and node list
Full Extractions.ipynb Extract author, judge panel, opinion text
Clean Opinion Text.ipynb Remove references and special characters in opinion text

2. Datasets

These datasets are NOT included in this public repository for intellectual property and privacy concern

Dataset Description
amy_cases.json large dictionary {file name: raw text} for 854 cases, from Lilian's PDF parsing
full_name_text.json convert amy_cases.json key value pair to two list: file_name, raw_text
cite_edge.csv edge list of citation
cite_node.csv node list contains case_code, case_name, court_from, court_type
extraction854.csv full extractions include case_code, case_name, court_from, court_type, result, author, judge_panel
decision_text.json json file include author, decision(result of the case), opinion (opinion text), cleaned_text (cleaned opinion text)
cleaned_text.csv csv file contains allt the cleaned text
predict_data.csv cleaned dataset for NLP modeling predict court decision

3. Visulization using R

R markdown file
Full Network Graph.Rmd draw the full citation network
Citation Betwwen Nodes.Rmd draw citation between all the available cases
Clean Data For Predictive Modelling.rmd clean text data for predictive modeling

Interactive Graph

Play with Interactive Graph

Full Citation Network (all cases and cited cases)

Citation Between Available Cases

4. Predictive Modeling using Python

ipynb notebook
NLP Predictive Modeling.ipynb Try different preprocessing, and build a logistic regression to predict court decision.

Visulization of the Bi-gram (words) with the strongest coefficient

Bigram

You might also like...
Domain Connectivity Analysis Tools to analyze aggregate connectivity patterns across a set of domains during security investigations
Domain Connectivity Analysis Tools to analyze aggregate connectivity patterns across a set of domains during security investigations

DomainCAT (Domain Connectivity Analysis Tool) Domain Connectivity Analysis Tool is used to analyze aggregate connectivity patterns across a set of dom

Histogramming for analysis powered by boost-histogram
Histogramming for analysis powered by boost-histogram

Hist Hist is an analyst-friendly front-end for boost-histogram, designed for Python 3.7+ (3.6 users get version 2.4). See what's new. Installation You

Runtime analysis of code with plotting

Runtime analysis of code with plotting A quick comparison among Python, Cython, and the C languages A Programming Assignment regarding the Programming

AB-test-analyzer - Python class to perform AB test analysis

AB-test-analyzer Python class to perform AB test analysis Overview This repo con

A System Metrics Monitoring Tool Built using Python3 , rabbitmq,Grafana and InfluxDB. Setup using docker compose. Use to monitor system performance with graphical interface of grafana , storage of influxdb and message queuing of rabbitmq Simple, realtime visualization of neural network training performance.
Simple, realtime visualization of neural network training performance.

pastalog Simple, realtime visualization server for training neural networks. Use with Lasagne, Keras, Tensorflow, Torch, Theano, and basically everyth

Generate a roam research like Network Graph view from your Notion pages.
Generate a roam research like Network Graph view from your Notion pages.

Notion Graph View Export Notion pages to a Roam Research like graph view.

ICS-Visualizer is an interactive Industrial Control Systems (ICS) network graph that contains up-to-date ICS metadata
ICS-Visualizer is an interactive Industrial Control Systems (ICS) network graph that contains up-to-date ICS metadata

ICS-Visualizer is an interactive Industrial Control Systems (ICS) network graph that contains up-to-date ICS metadata (Name, company, port, user manua

The Timescale NFT Starter Kit is a step-by-step guide to get up and running with collecting, storing, analyzing and visualizing NFT data from OpenSea, using PostgreSQL and TimescaleDB.

Timescale NFT Starter Kit The Timescale NFT Starter Kit is a step-by-step guide to get up and running with collecting, storing, analyzing and visualiz

Owner
Yi Yin
Tech & Business Alignment @ Wolfram Research, Social Sciences Research @ Columbia University
Yi Yin
Analysis and plotting for motor/prop/ESC characterization, thrust vs RPM and torque vs thrust

esc_test This is a Python package used to plot and analyze data collected for the purpose of characterizing a particular propeller, motor, and ESC con

Alex Spitzer 1 Dec 28, 2021
3D plotting and mesh analysis through a streamlined interface for the Visualization Toolkit (VTK)

PyVista Deployment Build Status Metrics Citation License Community 3D plotting and mesh analysis through a streamlined interface for the Visualization

PyVista 1.6k Jan 8, 2023
3D plotting and mesh analysis through a streamlined interface for the Visualization Toolkit (VTK)

PyVista Deployment Build Status Metrics Citation License Community 3D plotting and mesh analysis through a streamlined interface for the Visualization

PyVista 692 Feb 18, 2021
Python package for hypergraph analysis and visualization.

The HyperNetX library provides classes and methods for the analysis and visualization of complex network data. HyperNetX uses data structures designed to represent set systems containing nested data and/or multi-way relationships. The library generalizes traditional graph metrics to hypergraphs.

Pacific Northwest National Laboratory 304 Dec 27, 2022
Squidpy is a tool for the analysis and visualization of spatial molecular data.

Squidpy is a tool for the analysis and visualization of spatial molecular data. It builds on top of scanpy and anndata, from which it inherits modularity and scalability. It provides analysis tools that leverages the spatial coordinates of the data, as well as tissue images if available.

Theis Lab 251 Dec 19, 2022
📊📈 Serves up Pandas dataframes via the Django REST Framework for use in client-side (i.e. d3.js) visualizations and offline analysis (e.g. Excel)

???? Serves up Pandas dataframes via the Django REST Framework for use in client-side (i.e. d3.js) visualizations and offline analysis (e.g. Excel)

wq framework 1.2k Jan 1, 2023
Political elections, appointment, analysis and visualization in Python

Political elections, appointment, analysis and visualization in Python poli-sci-kit is a Python package for political science appointment and election

Andrew Tavis McAllister 9 Dec 1, 2022
Sentiment Analysis application created with Python and Dash, hosted at socialsentiment.net

Social Sentiment Dash Application Live-streaming sentiment analysis application created with Python and Dash, hosted at SocialSentiment.net. Dash Tuto

Harrison 456 Dec 25, 2022
Python package for the analysis and visualisation of finite-difference fields.

discretisedfield Marijan Beg1,2, Martin Lang2, Samuel Holt3, Ryan A. Pepper4, Hans Fangohr2,5,6 1 Department of Earth Science and Engineering, Imperia

ubermag 12 Dec 14, 2022
Tools for exploratory data analysis in Python

Dora Exploratory data analysis toolkit for Python. Contents Summary Setup Usage Reading Data & Configuration Cleaning Feature Selection & Extraction V

Nathan Epstein 599 Dec 25, 2022