LegalQA using SentenceKoBART
Implementation of legal QA system based on SentenceKoBART
- How to train SentenceKoBART
- Based on Neural Search Engine Jina
- Provide Korean legal QA data(1,830 pairs)
Setup
# install git lfs , https://github.com/git-lfs/git-lfs/wiki/Installation
curl -s https://packagecloud.io/install/repositories/github/git-lfs/script.deb.sh | sudo bash
sudo apt install git-lfs
git clone https://github.com/haven-jeon/LegalQA.git
cd LegalQA
git lfs pull
pip install -r requirements.txt
Index
python app.py -t index
GPU-based indexing available as an option
pods/encoder.yml
-on_gpu: true
Search
With REST API
To start the Jina server for REST API:
python app.py -t query_restful
Then use a client to query:
curl --request POST -d '{"top_k": 1, "mode": "search", "data": ["상속 관련 문의"]}' -H 'Content-Type: application/json' 'http://0.0.0.0:1234/api/search'
Or use Jinabox with endpoint http://127.0.0.1:1234/api/search
From the terminal
python app.py -t query
Demo
Citation
Model training, data crawling, and demo system were all supported by the AWS Hero program.
@misc{heewon2021,
author = {Heewon Jeon},
title = {LegalQA using SentenceKoBART},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/haven-jeon/LegalQA}}
License
- QA data
data/legalqa.jsonlines
is crawled in www.freelawfirm.co.kr based onrobots.txt
. Commercial use other than academic use is prohibited. - We are not responsible for any legal decisions we make based on the resources provided here.