ThaiPersonalCardExtract
Library for extract infomation from thai personal identity card. imprement from easyocr and tesseract
ð
New Feature v1.3.2 - Increase performance.
- Support Thai Government Lottery āļŠāļāļąāļāļāđāļāļĄāļđāļĨāļāļēāļāļĨāļāļāđāļāļāļĢāđāļĢāļĩāđ āđāļāđāđāļāđāļāļĩāļāļąāļāļĢāļđāļāļ āļēāļāļāļĩāđāđāļāđāļāļēāļāđāļāļĢāļ·āđāļāļāđāļŠāļāļ (16 Aug. 2021)
- Refactor Output Structure.
- Support Thai Driving License (Beta) āļŠāļēāļĄāļēāļĢāļāļŠāļāļąāļāļāđāļāļĄāļđāļĨāļāļēāļāļ āļēāļāļāđāļēāļĒāđāļāļāļąāļāļāļĩāđāđāļāđāļāļēāļāļĢāļđāļāđāļāļ āđāļāļ·āđāļāļāļāļēāļ āļāļĢāļĄāļāļēāļāļāļāļŠāđāļāļāļēāļāļāļ āļĄāļĩāļĢāļđāļāđāļāļāļāļąāļāļĢāļŦāļĨāļēāļāļŦāļĨāļēāļĒāļĢāļđāļāđāļāļ āđāļĨāļ°āđāļāđāļĨāļ°āļĢāļđāļāđāļāļāļĄāļĩāļāļģāđāļŦāļāđāļāļāđāļāļĄāļđāļĨāļāļĩāđāđāļāļāļāđāļēāļāļāļąāļ āļāļķāļāļāļģāđāļŦāđāļāļĢāļ°āļŠāļīāļāļāļīāļ āļēāļāļāđāļģ
Examples
Example image file.
wrapPerpective image crop.
keypoint of image detected.
Resutls of library extract region of interest
Identification Number |
FullNameTH |
---|---|
NameEN |
LastNameEN |
BirthdayTH |
BirthdayEN |
Religion |
Address |
DateOfIssueTH |
DateOfIssueEN |
DateOfExpiryTH |
DateOfExpiryEN |
â
Recommend - Image quality lowest should be 600x350
- Images with minimal reflections should be used. for good results
- Identity Card should be size in the image about 75%, if the image doesn't cropped that to be left only Identity Card area.
- For faster, please resize image and usage CUDA GPU.
Installation
Install using pip
for stable release,
pip install thai-personal-card-extract
For latest development release,
pip install git+git://github.com/ggafiled/ThaiPersonalCardExtrac.git
Note 1: for Windows, please install tesseract first by following the official instruction here https://medium.com/@navapat.tpb/734dae2fb4d3 On medium website, be sure to setup already.
Note 2: for Linux os, please install tesseract by following the official instruction https://github.com/tesseract-ocr/tesseract
Usage
# With build-in Config Options.
import ThaiPersonalCardExtract as card
reader = card.PersonalCard(
lang=card.THAI,
provider=card.DEFAULT,
tesseract_cmd="D:/Program Files/Tesseract-OCR/tesseract",
save_extract_result=True,
path_to_save="D:/dev/ThaiPersonalCardExtract/examples/extract")
result = reader.extractInfo('examples/card.jpg')
print(result)
# With free-style āļāļąāļ§āļāļĒāđāļēāļāļāļēāļĢāđāļĢāļĩāļĒāļāđāļāđāļāļēāļāļāļĨāļēāļŠ PersonalCard āđāļāļ·āđāļāļŠāļāļąāļāļāđāļāļĄāļđāļĨāļāļąāļāļĢāļāļĢāļ°āļāļģāļāļąāļ§āļāļĢāļ°āļāļēāļāļ
from ThaiPersonalCardExtract import PersonalCard
reader = PersonalCard(lang="mix", tesseract_cmd="D:/Program Files/Tesseract-OCR/tesseract") # for windows need to pass tesseract_cmd parameter to setup your tesseract command path.
result = reader.extractInfo('examples/card.jpg')
print(result)
# With free-style āļāļąāļ§āļāļĒāđāļēāļāļāļēāļĢāđāļĢāļĩāļĒāļāđāļāđāļāļēāļāļāļĨāļēāļŠ DrivingLicense āđāļāļ·āđāļāļŠāļāļąāļāļāđāļāļĄāļđāļĨāđāļāļāļāļļāļāļēāļāļāļąāļāļāļĩāđ
from ThaiPersonalCardExtract import DrivingLicense
reader = DrivingLicense(lang="mix", tesseract_cmd="D:/Program Files/Tesseract-OCR/tesseract") # for windows need to pass tesseract_cmd parameter to setup your tesseract command path.
result = reader.extractInfo('examples/card.jpg')
print(result)
# With free-style āļāļąāļ§āļāļĒāđāļēāļāļāļēāļĢāđāļĢāļĩāļĒāļāđāļāđāļāļēāļāļāļĨāļēāļŠ ThaiGovernmentLottery āđāļāļ·āđāļāļŠāļāļąāļāļāđāļāļĄāļđāļĨāļĨāļāļāđāļāļāļĢāđāļĢāļĩāđ
from ThaiPersonalCardExtract import ThaiGovernmentLottery
reader = ThaiGovernmentLottery(save_extract_result=True, path_to_save="D:/dev/ThaiPersonalCardExtract/examples/extract/thai_government_lottery") # for windows need to pass tesseract_cmd parameter to setup your tesseract command path.
result = reader.extractInfo("../examples/card7.jpg")
print(result)
Output will be in list format, each item represents result of library can extract, respectively. type of namedtuple āļāļĨāļĨāļąāļāļāđāļāļĩāđāđāļāđāļāļ°āđāļāđāļāļāļĢāļ°āđāļ āļ namedtuple āļŠāļēāļĄāļēāļĢāļāļĻāļķāļāļĐāļēāđāļāļīāđāļĄāđāļāļīāļĄāđāļāļ·āđāļāđāļāđāļāļēāļāđāļāđāļāļēāļāļāļĩāđāļāļĩāđ āļāļĨāļīāļ
#Output of PersonalCard
Card(Identification_Number='9999999999999', FullNameTH='āļāļēāļĒ āļāļēāļĒāļļāļĄāļšāļĄāļļāļĢāļēāđāļŠāļ°', PrefixTH='āļāļēāļĒ', NameTH='āļāļēāļĒāļļāļĄāļšāļĄāļļāļĢāļēāđāļŠāļ°', LastNameTH='āļāļēāļĒāļļāļĄāļšāļĄāļļāļĢāļēāđāļŠāļ°', PrefixEN='.Mr.Shoyo', NameEN='', LastNameEN='Hinatao', BirthdayTH='21 āļĄāļĩ.āļĒ. 2539', BirthdayEN='21 Jun..1996', Religion='āļāļļāļāļ', Address='āļ8āļāļš` 99/1 āļĄāļīāļāļĩāđāļŪāļ° āđāļāļāļŪāļēāļāļēāļĄāļīāļāļēāļ§āļē āļāļģāđāļ āļāļāļīāļ', DateOfIssueTH='11 āļŠ.āļ. 2554', DateOfIssueEN='11 Ang. 2021', DateOfExpiryTH='11 āļŠ.āļ. 2574', DateOfExpiryEN='11 Aug. 2031,')
#Output of DrivingLicense
Card(License_Number='98765432', IssueDateTH='āļāļąāļāļāļēāļāļĄ', ExpiryDateTH='', IssueDateEN='14 August 2664', ExpiryDateEN='14 August 2574', NameTH='āļē? āđāļāļāļāļ° āđāļāļāļĩ', NameEN='MRONOREAUMANE', BirthDayTH='', BirthDayEN='wa hs OKRA', Identity_Number='', Province='āļāļāļēāļĢāļēāļāļĻāļĩāļĄāļē')
#Output of ThaiGovernmentLottery
Lottery(LotteryNumber='424603', LessonNumber='08', SetNumber='23', Year='2564') #type namedtuple
āļŠāļēāļĄāļēāļĢāļāđāļāđāļēāļāļķāļāļāļąāļ§āđāļāļĢāđāļāđāļāļēāļĄāļĢāļđāļāđāļāļāļāļĩāđ
print(result.LotteryNumber)
print(result.LessonNumber)
print(result.SetNumber)
print(result.Year)
For set lang
attribute to tha
from ThaiPersonalCardExtract import PersonalCard
reader = PersonalCard(lang="tha", tesseract_cmd="D:/Program Files/Tesseract-OCR/tesseract") # for windows need to pass tesseract_cmd parameter to setup your tesseract command path.
result = reader.extractInfo('examples/card.jpg')
print(result)
Output will be in list format, each item represents result of library can extract, respectively.
{
"Identification_Number": "9999999999999",
"FullNameTH": "āļāļēāļĒ āļāļēāļĒāļļāļĄāļšāļĄāļļāļĢāļēāđāļŠāļ°",
"PrefixTH": "āļāļēāļĒ",
"NameTH": "āļāļēāļĒāļļāļĄāļšāļĄāļļāļĢāļēāđāļŠāļ°",
"LastNameTH": "āļāļēāļĒāļļāļĄāļšāļĄāļļāļĢāļēāđāļŠāļ°",
"BirthdayTH": "21 āļĄāļĩ.āļĒ. 2539",
"Religion": "āļāļļāļāļ",
"Address": "āļāđ 99/1 āļĄāļīāļāļĩāđāļŪāļ° āđāļāļāļŪāļēāļāļēāļĄāļīāļāļēāļ§āļē āļāļģāđāļ āļāļāļīāļ;",
"DateOfIssueTH": "11 āļŠ.āļ. 2554",
"DateOfExpiryTH": "11 āļŠ.āļ. 2574"
}
And you can set ocr provider following below default #used both easyocr and tesseract **Recommend
Or easyocr
Or tesseract
from ThaiPersonalCardExtract import PersonalCard
reader = PersonalCard(lang="tha", provider="default", tesseract_cmd="D:/Program Files/Tesseract-OCR/tesseract") # for windows need to pass tesseract_cmd parameter to setup your tesseract command path.
result = reader.extractInfo('examples/card.jpg')
print(result)
Config Options
you can set options to Instance by below keyword
Parameter name | Value Type | Example |
---|---|---|
lang | String | Expected Results Language bash mix #get all area both tha and eng Or bash tha Or bash eng *Default is 'mix' āļŠāļģāļŦāļĢāļąāļ DrivingLicense, PersonalCard |
provider | String | OCR Provider have bash default #used both easyocr and tesseract **Recommend Or bash easyocr Or bash tesseract *Default is 'default' āļŠāļģāļŦāļĢāļąāļ DrivingLicense, PersonalCard |
template_threshold | Double | Rate to cals similarity of template *Default is 0.7 |
sift_rate | Int | Feature Keypoint rate *Default is 25,000 |
tesseract_cmd | String | Path of your tesseract command **For windows only. |
save_extract_result | Boolean | Set True if you want to save extracted image *Default is False |
path_to_save | String | Path that you given it save extracted image, relative with save_extract_result=True |