Go package for OCR (Optical Character Recognition), by using Tesseract C++ library

Overview

gosseract OCR

Runtime Test codecov Go Report Card License: MIT Go Reference

Golang OCR package, by using Tesseract C++ library.

OCR Server

Do you just want OCR server, or see the working example of this package? Yes, there is already-made server application, which is seriously easy to deploy!

馃憠 https://github.com/otiai10/ocrserver

Example

package main

import (
	"fmt"
	"github.com/otiai10/gosseract/v2"
)

func main() {
	client := gosseract.NewClient()
	defer client.Close()
	client.SetImage("path/to/image.png")
	text, _ := client.Text()
	fmt.Println(text)
	// Hello, World!
}

Install

  1. tesseract-ocr, including library and headers
  2. go get -t github.com/otiai10/gosseract

Check Dockerfile for more detail of installation, or you can just try by docker run -it --rm otiai10/gosseract.

Test

In case you have tesseract-ocr on your local, you can just hit

% go test .

Otherwise, if you DON'T want to install tesseract-ocr on your local, kick ./test/runtime which is using Docker and Vagrant to test the source code on some runtimes.

% ./test/runtime --driver docker
% ./test/runtime --driver vagrant

Check ./test/runtimes for more information about runtime tests.

Issues

Comments
  • Installation Failure on Windows 7

    Installation Failure on Windows 7

    Summary

    Installation Failure on Windows 7 位 go get -t github.com/otiai10/gosseract

    github.com/otiai10/gosseract

    tessbridge.cpp:5:10: fatal error: tesseract/baseapi.h: No such file or directory #include <tesseract/baseapi.h> ^~~~~~~~~~~~~~~~~~~~~ compilation terminated.

    Reproducibility

    Yes

    Reproducility Frequency

    • 100%

    Reproducible Dockerfile

    FROM your-os:your-version
    # Describe how to reproduce your problem
    # on your environment
    

    Otherwise, describe how to reproduce

    1. Install GO lan
    2. Install GCC (64 Bit Compiler)
    3. Install GIT
    4. Install Tesseract from this site https://digi.bib.uni-mannheim.de/tesseract/tesseract-ocr-setup-3.05.02-20180621.exe
    5. Execute 位 go get -t github.com/otiai10/gosseract

    github.com/otiai10/gosseract

    tessbridge.cpp:5:10: fatal error: tesseract/baseapi.h: No such file or directory #include <tesseract/baseapi.h> ^~~~~~~~~~~~~~~~~~~~~ compilation terminated.

    Environment

    Windows 7

    uname -a
    
    go env
    

    C:\Users\33133 位 go env set GOARCH=amd64 set GOBIN= set GOCACHE=C:\Users\33133\AppData\Local\go-build set GOEXE=.exe set GOHOSTARCH=amd64 set GOHOSTOS=windows set GOOS=windows set GOPATH=C:\Users\33133\go set GORACE= set GOROOT=C:\Go set GOTMPDIR= set GOTOOLDIR=C:\Go\pkg\tool\windows_amd64 set GCCGO=gccgo set CC=gcc set CXX=g++ set CGO_ENABLED=1 set CGO_CFLAGS=-g -O2 set CGO_CPPFLAGS= set CGO_CXXFLAGS=-g -O2 set CGO_FFLAGS=-g -O2 set CGO_LDFLAGS=-g -O2 set PKG_CONFIG=pkg-config set GOGCCFLAGS=-m64 -mthreads -fmessage-length=0 -fdebug-prefix-map=C:\Users\33133\AppData\Local\Temp\go-build238513982=/tmp/go-build -gno-record-gcc-switches

    C:\Users\33133 位

    go version
    

    位 go version go version go1.10.3 windows/amd64

    tesseract --version
    

    C:\Program Files (x86)\Tesseract-OCR>tesseract --version tesseract 3.05.02 leptonica-1.75.3 libgif 5.1.4 : libjpeg 8d (libjpeg-turbo 1.5.3) : libpng 1.6.34 : libtiff 4.0. 9 : zlib 1.2.11 : libwebp 0.6.1 : libopenjp2 2.2.0

    C:\Program Files (x86)\Tesseract-OCR>

    opened by vsarin 27
  • 'tesseract/baseapi.h' file not found

    'tesseract/baseapi.h' file not found

    % go test ./...
    # github.com/otiai10/gosseract/tesseract
    tesseract/tess.cpp:1:10: fatal error: 'tesseract/baseapi.h' file not found
    FAIL    github.com/otiai10/gosseract [build failed]
    
    question 
    opened by otiai10 25
  • Keep printing a blank with no error

    Keep printing a blank with no error

    package main

    import ( "fmt" "github.com/otiai10/gosseract" )

    func main() { client := gosseract.NewClient() defer client.Close() client.SetImage("path/to/image.png") text, _ := client.Text() fmt.Println(text) // Hello, World! }

    I am using this code and run it in docker but still getting a blank without error

    need more information 
    opened by gradygabriel10 10
  • tessbridge.cpp:5:10: fatal error: leptonica/allheaders.h: No such file or directory

    tessbridge.cpp:5:10: fatal error: leptonica/allheaders.h: No such file or directory

    Summary

    Installation failed on win 10 x64 by go get -t github.com/otiai10/gosseract $ go get -t github.com/otiai10/gosseract

    github.com/otiai10/gosseract

    tessbridge.cpp:5:10: fatal error: leptonica/allheaders.h: No such file or directory #include <leptonica/allheaders.h> ^~~~~~~~~~~~~~~~~~~~~~~~ compilation terminated.

    I don't know about header files,. How do I install them on windows? leptonica/allheaders.h This is header files?

    Reproducibility

    Reproducibility Frequency

    • 100%

    Reproducible Dockerfile

    FROM your-os:your-version
    # Describe how to reproduce your problem
    # on your environment
    

    Otherwise, describe how to reproduce

    Install GO lan Install GCC (64 Bit Compiler) Install GIT Install Tesseract from this site https://digi.bib.uni-mannheim.de/tesseract/tesseract-ocr-setup-3.05.02-20180621.exe go get -t github.com/otiai10/gosseract

    Environment

    
    
    go env
    

    $ go env set GO111MODULE= set GOARCH=amd64 set GOBIN= set GOCACHE=C:\Users\VULCAN\AppData\Local\go-build set GOENV=C:\Users\VULCAN\AppData\Roaming\go\env set GOEXE=.exe set GOFLAGS= set GOHOSTARCH=amd64 set GOHOSTOS=windows set GONOPROXY= set GONOSUMDB= set GOOS=windows set GOPATH=C:\Go\go\bin set GOPRIVATE= set GOPROXY=https://proxy.golang.org,direct set GOROOT=C:\Go set GOSUMDB=sum.golang.org set GOTMPDIR= set GOTOOLDIR=C:\Go\pkg\tool\windows_amd64 set GCCGO=gccgo set AR=ar set CC=gcc set CXX=g++ set CGO_ENABLED=1 set GOMOD= set CGO_CFLAGS=-g -O2 set CGO_CPPFLAGS= set CGO_CXXFLAGS=-g -O2 set CGO_FFLAGS=-g -O2 set CGO_LDFLAGS=-g -O2 set PKG_CONFIG=pkg-config set GOGCCFLAGS=-m64 -mthreads -fmessage-length=0 -fdebug-prefix-map=C:\Users\VULCAN\AppData\Local\Temp\go-build610875910=/tmp/go-build -gno-record-gcc-switches

    go version
    

    $ go version go version go1.13.5 windows/amd64

    tesseract --version
    ```$ tesseract --version
    tesseract 3.05.02
     leptonica-1.75.3
      libgif 5.1.4 : libjpeg 8d (libjpeg-turbo 1.5.3) : libpng 1.6.34 : libtiff 4.0.9 : zlib 1.2.11 : libwebp 0.6.1 : libopenjp2 2.2.0
    
    
    opened by QQ3544291 10
  • Leaks /dev/ttysXXX file handles even when Close() is manually called

    Leaks /dev/ttysXXX file handles even when Close() is manually called

    Summary

    This is a real issue for me as I am capturing images via security camera and every frame runs through Tesseract in real time.

    While running in a loop over files, OpenCV video frame, or as a web service, gosseract is opening a new /dev/ttys002 (or /dev/pts/004) every time an image is parsed. This eventually leads to a situation of running out of allowed file handlers.

    I have attached the example Go projects that have the issue and a C++ version that does not.

    lsof screenshot

    Reproducibility

    Always

    Environment

    macOS 10.14, Tesseract 4.1.0 Ubuntu 19.01, Tesseract 4.0

    GO111MODULE=""
    GOARCH="amd64"
    GOBIN=""
    GOCACHE="/Users/tbruno/Library/Caches/go-build"
    GOENV="/Users/tbruno/Library/Application Support/go/env"
    GOEXE=""
    GOFLAGS=""
    GOHOSTARCH="amd64"
    GOHOSTOS="darwin"
    GONOPROXY=""
    GONOSUMDB=""
    GOOS="darwin"
    GOPATH="/Users/tbruno/Projects/GolandProjects/go"
    GOPRIVATE=""
    GOPROXY="https://proxy.golang.org,direct"
    GOROOT="/usr/local/Cellar/go/1.13.4/libexec"
    GOSUMDB="sum.golang.org"
    GOTMPDIR=""
    GOTOOLDIR="/usr/local/Cellar/go/1.13.4/libexec/pkg/tool/darwin_amd64"
    GCCGO="gccgo"
    AR="ar"
    CC="clang"
    CXX="clang++"
    CGO_ENABLED="1"
    GOMOD=""
    CGO_CFLAGS="-g -O2"
    CGO_CPPFLAGS=""
    CGO_CXXFLAGS="-g -O2"
    CGO_FFLAGS="-g -O2"
    CGO_LDFLAGS="-g -O2"
    PKG_CONFIG="pkg-config"
    GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/2r/vy9wb4w90snd6wwv06ts6rth0000gn/T/go-build449183315=/tmp/go-build -gno-record-gcc-switches -fno-common"
    
    go version go1.13.4 darwin/amd64
    
    tesseract 4.1.0
     leptonica-1.78.0
      libgif 5.1.4 : libjpeg 9c : libpng 1.6.37 : libtiff 4.1.0 : zlib 1.2.11 : libwebp 1.0.3 : libopenjp2 2.3.1
     Found AVX512BW
     Found AVX512F
     Found AVX2
     Found AVX
     Found SSE
    

    Examples (I'm using the same jpg file, but this happens even if a new file is opened and also happens with SetImageFromBytes)

    func main() {
    	client := gosseract.NewClient()
    	for {
    		client.SetImage("/Users/tbruno/test.jpg")
    		text, _ := client.Text()
    		fmt.Println(text)
    	}
    	client.Close()
    }
    
    func main() {
    	for {
    		client := gosseract.NewClient()
    		client.SetImage("/Users/tbruno/test.jpg")
    		text, _ := client.Text()
    		fmt.Println(text)
    		client.Close()
    	}
    }
    

    TessTester-ClientInLoopGo.zip TessTester-SingleClientGo.zip TessApi-NoLeakCpp.zip

    bug 
    opened by tebruno99 10
  • Fix/add tessdata prefix

    Fix/add tessdata prefix

    Hi.

    I've added an ability to provide different TessdataPrefix directly from go code with default value equal to environment TESSDATA_PREFIX. Requesting for a review, thanks.

    Seems like my solution only works with latest tesseract and only on linux (different was not tested). We should somehow define default directory for models for different tesseract versions.

    opened by awskii 9
  • Init only when required (perfs)

    Init only when required (perfs)

    I benchmarked my app and seen that 90% of the CPU time is lost in "init()".

    With this code I keep the instance open and perform multiple recognition on it, if a configuration change requires to init again, I flag the instance to rerun init

    What do you think about it?

    Details

    This is a typical use of gosseract to extract text, in a sample program (profiling included):

    package main
    
    import (
        "bytes"
        "image/png"
    
        "gocv.io/x/gocv"
        "github.com/openrm/gosseract"
        "github.com/pkg/profile"
    )
    
    func GetTextFromImage(img *gocv.Mat, client *gosseract.Client) (string, error) {
        buf := new(bytes.Buffer)
        finalImage, err := img.ToImage()
        png.Encode(buf, finalImage)
    
        client.SetImageFromBytes(buf.Bytes())
        client.SetPageSegMode(gosseract.PSM_SINGLE_BLOCK)
    
        out, err := client.Text()
    
        if err != nil {
          return "", err
        }
    
        return out, nil
    }
    
    func main() {
        defer profile.Start().Stop()
    
        client := gosseract.NewClient()
        defer client.Close()
    
        client.Languages = []string{"jpn"}
    
        img := gocv.IMRead("1.png", gocv.IMReadColor)
    
        for i := 0; i < 20; i++ {
            GetTextFromImage(&img, client)
        }
    }
    

    With the code above, I get the following result with go profiling: result1

    As you can see, over the 12 seconds spent in the program, 11 are caused by repeated calls to init.

    With the proposed changes in this PR, the profiling is now like this: result2

    Notes

    • SetConfigFile and SetLanguage cause the program to init again
    • SetWhitelist, SetBlacklist, DisabledOutput and SetVariable make internal call to setVariablesToInitializedAPI if init has already been called
    opened by PuKoren 9
  • cannot find package

    cannot find package "github.com/otiai10/gosseract/v2"

    Hello, I have installed the package using go get github.com/otiai10/gosseract and imported it in my package: "github.com/otiai10/gosseract/v2" as per instructions.

    Summary

    I get this compile time error:

    vendor/app/shared/spamcheck/spamcheck.go:12:2: cannot find package "github.com/otiai10/gosseract/v2" in any of: /home/me/go/src/myapp/vendor/github.com/otiai10/gosseract/v2 (vendor tree) /usr/local/go/src/github.com/otiai10/gosseract/v2 (from $GOROOT) /home/me/go/src/github.com/otiai10/gosseract/v2 (from $GOPATH)

    Environment

    Ubuntu 18.08

    uname -a
    

    Linux pc5 5.3.0-28-generic #30~18.04.1-Ubuntu SMP Fri Jan 17 06:14:09 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

    go env
    

    GO111MODULE="" GOARCH="amd64" GOBIN="" GOCACHE="/home/me/.cache/go-build" GOENV="/home/me/.config/go/env" GOEXE="" GOFLAGS="" GOHOSTARCH="amd64" GOHOSTOS="linux" GONOPROXY="" GONOSUMDB="" GOOS="linux" GOPATH="/home/me/go" GOPRIVATE="" GOPROXY="https://proxy.golang.org,direct" GOROOT="/usr/local/go" GOSUMDB="sum.golang.org" GOTMPDIR="" GOTOOLDIR="/usr/local/go/pkg/tool/linux_amd64" GCCGO="gccgo" AR="ar" CC="gcc" CXX="g++" CGO_ENABLED="1" GOMOD="" CGO_CFLAGS="-g -O2" CGO_CPPFLAGS="" CGO_CXXFLAGS="-g -O2" CGO_FFLAGS="-g -O2" CGO_LDFLAGS="-g -O2" PKG_CONFIG="pkg-config" GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build702738196=/tmp/go-build -gno-record-gcc-switches"

    go version
    

    go1.13.6 linux/amd64

    tesseract --version
    

    tesseract 4.0.0-beta.1 leptonica-1.75.3 libgif 5.1.4 : libjpeg 8d (libjpeg-turbo 1.5.2) : libpng 1.6.34 : libtiff 4.0.9 : zlib 1.2.11 : libwebp 0.6.1 : libopenjp2 2.3.0

    Found AVX2 Found AVX Found SSE

    Appreciate your help to fix this.

    opened by themrkumar 8
  • macOS compile freebsd binary file failed

    macOS compile freebsd binary file failed

    Summary

    I'm using macOS 10.14.6 and going to compile binary file for FreeBSD 11.3, and build failed, show message: undefined: gosseract.NewClient.

    Reproducibility

    Reproducibility Frequency

    100%

    Environment

    Darwin Kernel Version 18.7.0
    
    GO111MODULE=""
    GOARCH="amd64"
    GOBIN=""
    GOCACHE="/Users/frankb/Library/Caches/go-build"
    GOENV="/Users/frankb/Library/Application Support/go/env"
    GOEXE=""
    GOFLAGS=""
    GOHOSTARCH="amd64"
    GOHOSTOS="darwin"
    GONOPROXY=""
    GONOSUMDB=""
    GOOS="darwin"
    GOPATH="/Volumes/home/Development files/Go files/"
    GOPRIVATE=""
    GOPROXY="https://proxy.golang.org,direct"
    GOROOT="/usr/local/go"
    GOSUMDB="sum.golang.org"
    GOTMPDIR=""
    GOTOOLDIR="/usr/local/go/pkg/tool/darwin_amd64"
    GCCGO="gccgo"
    AR="ar"
    CC="clang"
    CXX="clang++"
    CGO_ENABLED="1"
    GOMOD=""
    CGO_CFLAGS="-g -O2"
    CGO_CPPFLAGS=""
    CGO_CXXFLAGS="-g -O2"
    CGO_FFLAGS="-g -O2"
    CGO_LDFLAGS="-g -O2"
    PKG_CONFIG="pkg-config"
    GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/0b/ntd9fcqx6xn6nt3gpv4yndm40000gn/T/go-build712170545=/tmp/go-build -gno-record-gcc-switches -fno-common"
    
    go version go1.13 darwin/amd64
    
    tesseract 4.1.0
     leptonica-1.78.0
      libgif 5.1.4 : libjpeg 9c : libpng 1.6.37 : libtiff 4.0.10 : zlib 1.2.11 : libwebp 1.0.3 : libopenjp2 2.3.1
     Found AVX2
     Found AVX
     Found SSE
    

    Source

    package main
    
    import (
    	"fmt"
    	"github.com/otiai10/gosseract"
    )
    
    func main() {
    	client := gosseract.NewClient()
    	defer client.Close()
    	client.SetLanguage("deu");
    	client.SetImage("test.png")
    	text, _ := client.Text()
    	fmt.Println(text)
    }
    

    Compile command

    env GOOS=freebsd GOARCH=amd64 go build ocrtest.go
    

    Error Message

    # command-line-arguments
    ./ocrtest.go:9:12: undefined: gosseract.NewClient
    
    opened by frankble 8
  • macOS complie linux binary file failed

    macOS complie linux binary file failed

    Summary

    I'm using macOS and going to compile binary file for centos7.1, and build failed, show message: undefined: gosseract.NewClient. Thanks.

    Reproducibility

    Reproducility Frequency

    100%

    Environment

    uname -a
    

    Darwin AllenChen-MacBookPro.local 18.2.0 Darwin Kernel Version 18.2.0: Fri Oct 5 19:41:49 PDT 2018; root:xnu-4903.221.2~2/RELEASE_X86_64 x86_64

    go env
    

    GOARCH="amd64" GOBIN="/Users/allen/Documents/go/bin" GOCACHE="/Users/allen/Library/Caches/go-build" GOEXE="" GOFLAGS="" GOHOSTARCH="amd64" GOHOSTOS="darwin" GOOS="darwin" GOPATH="/Users/allen/Documents/go" GOPROXY="" GORACE="" GOROOT="/usr/local/go" GOTMPDIR="" GOTOOLDIR="/usr/local/go/pkg/tool/darwin_amd64" GCCGO="gccgo" CC="clang" CXX="clang++" CGO_ENABLED="1" GOMOD="" CGO_CFLAGS="-g -O2" CGO_CPPFLAGS="" CGO_CXXFLAGS="-g -O2" CGO_FFLAGS="-g -O2" CGO_LDFLAGS="-g -O2" PKG_CONFIG="pkg-config" GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/s6/mqbnh2yn52b5glrvhk53jyxh0000gn/T/go-build032873897=/tmp/go-build -gno-record-gcc-switches -fno-common"

    go version
    

    go version go1.11 darwin/amd64

    tesseract --version
    

    tesseract 4.0.0 leptonica-1.76.0 libjpeg 9c : libpng 1.6.35 : libtiff 4.0.9 : zlib 1.2.11 Found AVX Found SSE

    opened by czh0318 8
  • tesseract/baseapi.h: No such file or directory

    tesseract/baseapi.h: No such file or directory

    I'd like to use tesseract with go on Windows 7.

    During the installation process, as stated in the docs I execute

    c:\go\src\proj>go get github.com/otiai10/gosseract
    # github.com/otiai10/gosseract/tesseract
    C:\go\src\github.com\otiai10\gosseract\tesseract\tess.cpp:1:31: fatal error: tesseract/baseapi.h: No such file or directory
     #include <tesseract/baseapi.h>
                                   ^
    compilation terminated.
    

    And by searching the file system for the header file baseapi.h, I cannot find it.

    How can I solve this? Thank you

    question 
    opened by tobiassoltermann 8
  • gosseract finds no text where tesseract does

    gosseract finds no text where tesseract does

    Summary

    I am running tesseract and gosseract on the same image, a single line of text. Tesseract finds the text, gosseract does not.

    Reproducibility

    Reproducibility Frequency

    • 100%
    1. Run tesseract d2.pbm - --psm 13 and it will show the output
    2. Run go run main.go and it will not show any output

    go.mod:

    module gosstest
    
    go 1.19
    
    require github.com/otiai10/gosseract/v2 v2.4.0
    

    main.go:

    package main
    
    import (
    	"fmt"
    	"os"
    
    	"github.com/otiai10/gosseract/v2"
    )
    
    func main() {
    	const (
    		want     = "BPJAZGAP"
    		filename = "d2.pbm"
    	)
    	buf, err := os.ReadFile("d2.pbm")
    	if err != nil {
    		fmt.Fprintf(os.Stderr, "error reading %q: %v\n", filename, err)
    	}
    
    	fmt.Fprintln(os.Stderr, gosseract.Version())
    
    	ocr := gosseract.NewClient()
    	defer ocr.Close()
    	ocr.SetPageSegMode(gosseract.PSM_RAW_LINE) // --psm 13
    	ocr.SetImageFromBytes(buf)
    	got, err := ocr.Text()
    	if err != nil {
    		fmt.Fprintf(os.Stderr, "%v\n", err)
    	}
    
    	if want != got {
    		fmt.Fprintf(os.Stderr, "want %q but got %q", want, got)
    	}
    
    	fmt.Println(got)
    }
    

    d2.pbm:

    P1 42 8
    0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
    0 1 1 1 0 0 1 1 1 0 0 0 0 1 1 0 0 1 1 0 0 1 1 1 1 0 0 1 1 0 0 0 1 1 0 0 1 1 1 0 0 0
    0 1 0 0 1 0 1 0 0 1 0 0 0 0 1 0 1 0 0 1 0 0 0 0 1 0 1 0 0 1 0 1 0 0 1 0 1 0 0 1 0 0
    0 1 1 1 0 0 1 0 0 1 0 0 0 0 1 0 1 0 0 1 0 0 0 1 0 0 1 0 0 0 0 1 0 0 1 0 1 0 0 1 0 0
    0 1 0 0 1 0 1 1 1 0 0 0 0 0 1 0 1 1 1 1 0 0 1 0 0 0 1 0 1 1 0 1 1 1 1 0 1 1 1 0 0 0
    0 1 0 0 1 0 1 0 0 0 0 1 0 0 1 0 1 0 0 1 0 1 0 0 0 0 1 0 0 1 0 1 0 0 1 0 1 0 0 0 0 0
    0 1 1 1 0 0 1 0 0 0 0 0 1 1 0 0 1 0 0 1 0 1 1 1 1 0 0 1 1 1 0 1 0 0 1 0 1 0 0 0 0 0
    0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
    

    Environment

    Linux chieftec 6.0.15-300.fc37.x86_64 #1 SMP PREEMPT_DYNAMIC Wed Dec 21 18:33:23 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
    
    GO111MODULE=""
    GOARCH="amd64"
    GOBIN=""
    GOCACHE="/home/jot/.cache/go-build"
    GOENV="/home/jot/.config/go/env"
    GOEXE=""
    GOEXPERIMENT=""
    GOFLAGS=""
    GOHOSTARCH="amd64"
    GOHOSTOS="linux"
    GOINSECURE=""
    GOMODCACHE="/home/jot/go/pkg/mod"
    GONOPROXY=""
    GONOSUMDB=""
    GOOS="linux"
    GOPATH="/home/jot/go"
    [project.zip](https://github.com/otiai10/gosseract/files/10347142/project.zip)
    
    GOPRIVATE=""
    GOPROXY="direct"
    GOROOT="/usr/lib/golang"
    GOSUMDB="off"
    GOTMPDIR=""
    GOTOOLDIR="/usr/lib/golang/pkg/tool/linux_amd64"
    GOVCS=""
    GOVERSION="go1.19.4"
    GCCGO="gccgo"
    GOAMD64="v1"
    AR="ar"
    CC="gcc"
    CXX="g++"
    CGO_ENABLED="1"
    GOMOD="/home/jot/work/gosseract/go.mod"
    GOWORK=""
    CGO_CFLAGS="-g -O2"
    CGO_CPPFLAGS=""
    CGO_CXXFLAGS="-g -O2"
    CGO_FFLAGS="-g -O2"
    CGO_LDFLAGS="-g -O2"
    PKG_CONFIG="pkg-config"
    GOGCCFLAGS="-fPIC -m64 -pthread -Wl,--no-gc-sections -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build1563688176=/tmp/go-build -gno-record-gcc-switches"
    
    go version go1.19.4 linux/amd64
    
    tesseract 5.2.0
     leptonica-1.82.0
      libgif 5.2.1 : libjpeg 6b (libjpeg-turbo 2.1.3) : libpng 1.6.37 : libtiff 4.4.0 : zlib 1.2.12 : libwebp 1.2.4
     Found AVX2
     Found AVX
     Found FMA
     Found SSE4.1
    
    opened by jhinrichsen 1
  • CI on Windows

    CI on Windows

    • https://github.com/otiai10/gosseract/issues/251
    • https://github.com/otiai10/gosseract/issues/200
    • https://github.com/otiai10/gosseract/issues/240
    • https://github.com/otiai10/gosseract/issues/199
    • https://github.com/otiai10/gosseract/issues/132
    • https://github.com/otiai10/gosseract/issues/234
    • https://github.com/otiai10/gosseract/issues/215
    • https://github.com/otiai10/gosseract/issues/233
    • https://github.com/otiai10/gosseract/issues/223
    • https://github.com/otiai10/gosseract/issues/226
    • and more
    opened by otiai10 0
  • Win11 compiler error

    Win11 compiler error

    This text is generated based on ISSUE_TEMPLATE.md. The issue reporter must read and remove this block before submitting.

    Summary

    Go Compilation Error (in tessbridge.cpp:5): fatal error: leptonica/allheaders.h: No such file or directory

    Reproducibility

    Reproducibility Frequency

    • XX%

    Reproducible Dockerfile

    FROM your-os:your-version
    # Describe how to reproduce your problem
    # on your environment
    

    Otherwise, describe how to reproduce

    1. foo bar
    2. spam ham
    3. hoge fuga

    Environment

    uname -a
    

    Windows 11

    go env
    

    set GO111MODULE=on set GOARCH=amd64 set GOBIN= set GOCACHE=C:\Users\Administrator\AppData\Local\go-build set GOENV=C:\Users\Administrator\AppData\Roaming\go\env set GOEXE=.exe set GOEXPERIMENT= set GOFLAGS= set GOHOSTARCH=amd64 set GOHOSTOS=windows set GOINSECURE= set GOMODCACHE=E:\Go\pkg\mod set GONOPROXY= set GONOSUMDB= set GOOS=windows set GOPATH=E:\Go set GOPRIVATE= set GOPROXY=https://goproxy.cn,direct set GOROOT=D:\Program Files\Go set GOSUMDB=sum.golang.org set GOTMPDIR= set GOTOOLDIR=D:\Program Files\Go\pkg\tool\windows_amd64 set GOVCS= set GOVERSION=go1.18.3 set GCCGO=gccgo set GOAMD64=v1 set AR=ar set CC=gcc set CXX=g++ set CGO_ENABLED=1 set GOMOD=E:\Go\src\ERMS\go.mod set GOWORK= set CGO_CFLAGS=-g -O2 set CGO_CPPFLAGS= set CGO_CXXFLAGS=-g -O2 set CGO_FFLAGS=-g -O2 set CGO_LDFLAGS=-g -O2 set PKG_CONFIG=pkg-config set GOGCCFLAGS=-m64 -mthreads -fmessage-length=0 -fdebug-prefix-map=C:\Users\ADMINI~1\AppData\Local\Temp\go-build2580212505=/tmp/go-build -gno-record-gcc-switches

    go version
    

    go version go1.18.3 windows/amd64

    tesseract --version
    

    tesseract v5.1.0.20220510 leptonica-1.78.0 libgif 5.1.4 : libjpeg 8d (libjpeg-turbo 1.5.3) : libpng 1.6.34 : libtiff 4.0.9 : zlib 1.2.11 : libwebp 0.6.1 : libopenjp2 2.3.0 Found AVX512BW Found AVX512F Found AVX2 Found AVX Found FMA Found SSE4.1 Found libarchive 3.5.0 zlib/1.2.11 liblzma/5.2.3 bz2lib/1.0.6 liblz4/1.7.5 libzstd/1.4.5 Found libcurl/7.77.0-DEV Schannel zlib/1.2.11 zstd/1.4.5 libidn2/2.0.4 nghttp2/1.31.0

    opened by zzdboy 0
  • add a finalizer to close the client

    add a finalizer to close the client

    If the developer forgets to call the close method after creating the client, it will cause a memory leak.

    To avoid this, I refer to the method in os.File. By adding a finalizer, the Close method will be called when the client is unreachable and the developer haven't call the Close method neither.

    Test

    client.go

    // NewClient construct new Client. It's due to caller to Close this client.
    func NewClient() *Client {
    	client := &Client{
    		api:        C.Create(),
    		Variables:  map[SettableVariable]string{},
    		Trim:       true,
    		shouldInit: true,
    		Languages:  []string{"eng"},
    	}
    	// set a finalizer to close the client when it's unused and not closed by the user
    	runtime.SetFinalizer(client, (*Client).Close)
    	return client
    }
    
    // Close frees allocated API. This MUST be called for ANY client constructed by "NewClient" function.
    func (client *Client) Close() (err error) {
    	// defer func() {
    	// 	if e := recover(); e != nil {
    	// 		err = fmt.Errorf("%v", e)
    	// 	}
    	// }()
    	fmt.Println("Closed")
    	C.Clear(client.api)
    	C.Free(client.api)
    	if client.pixImage != nil {
    		C.DestroyPixImage(client.pixImage)
    		client.pixImage = nil
    	}
    	// no need for a finalizer anymore
    	runtime.SetFinalizer(client, nil)
    	return err
    }
    

    test code

    func main() {
    	runGgosseract()
    	runtime.GC() // run a garbage collection
    	time.Sleep(2 * time.Second)
    	// see "Close" before "exit"
    	fmt.Println("exit")
    }
    
    func runGgosseract() {
    	client := gosseract.NewClient()
    	client.SetImage("path/to/image.png")
    	text, _ := client.Text()
    	fmt.Println(text)
    }
    
    opened by yin1999 1
  • Fix the docker build to download project source files

    Fix the docker build to download project source files

    The gosseract source files were not being downloaded during the Docker build process so the go test step was failing. Setting the environment variable fixes the issue and allows correct building of the docker image.

    opened by mrisher23 1
  • failed to initialize TessBaseAPI with code -1:

    failed to initialize TessBaseAPI with code -1:

    This text is generated based on ISSUE_TEMPLATE.md. The issue reporter must read and remove this block before submitting.

    Summary

    • I install gosseract at win10 . use MSYS2 with Mingw64 to install tesseract , leptonica module銆 and finally install gosseract use command 'go get -t go get -t github.com/otiai10/gosseract/v2 '

    when i build my test project success , eventually throw a exception : `failed to initialize TessBaseAPI with code -1: '

    then I go to the install directory at go path , run the go test , also get the same error 锛 all_test.go:144 Expected to be <nil> But actual failed to initialize TessBaseAPI with code -1:

    I can do nothing , becasue there are haven't any message with that code . pls help ! 3q !

    Reproducibility

    Reproducibility Frequency

    • 100%

    Reproducible Dockerfile

    FROM your-os:your-version
    # Describe how to reproduce your problem
    # on your environment
    

    Otherwise, describe how to reproduce

    Environment

    win10

    uname -a

    
    

    go env

    go env

    set GO111MODULE=auto set GOARCH=amd64 set GOBIN= set GOCACHE=C:\Users\langxli\AppData\Local\go-build set GOENV=C:\Users\langxli\AppData\Roaming\go\env set GOEXE=.exe set GOEXPERIMENT= set GOFLAGS= set GOHOSTARCH=amd64 set GOHOSTOS=windows set GOINSECURE= set GOMODCACHE=C:\Users\langxli\go\pkg\mod set GONOPROXY= set GONOSUMDB= set GOOS=windows set GOPATH=C:\Users\langxli\go;D:\work\brick\brick_app_project set GOPRIVATE= set GOPROXY=https://proxy.golang.org,direct set GOROOT=C:\Program Files\Go set GOSUMDB=sum.golang.org set GOTMPDIR= set GOTOOLDIR=C:\Program Files\Go\pkg\tool\windows_amd64 set GOVCS= set GOVERSION=go1.17 set GCCGO=gccgo set AR=ar set CC=gcc set CXX=g++ set CGO_ENABLED=1 set GOMOD=C:\Users\langxli\go\pkg\mod\github.com\otiai10\gosseract\[email protected]\go.mod set CGO_CFLAGS=-g -O2 set CGO_CPPFLAGS= set CGO_CXXFLAGS=-g -O2 set CGO_FFLAGS=-g -O2 set CGO_LDFLAGS=-g -O2 set PKG_CONFIG=pkg-config set GOGCCFLAGS=-m64 -mthreads -fmessage-length=0 -fdebug-prefix-map=D:\msys64\tmp\go-build2794601340=/tmp/go-build -gno-record-gcc-switches

    go version
    # go version
    go version go1.17 windows/amd64
    
    
    

    tesseract --version

    tesseract --version

    tesseract 4.1.1 leptonica-1.81.1 libgif 5.2.1 : libjpeg 8d (libjpeg-turbo 2.0.6) : libpng 1.6.37 : libtiff 4.3.0 : zlib 1.2.11 : libwebp 1.2.2 : libopenjp2 2.4.0

    opened by langxlm 2
Releases(v2.3.1)
Owner
Hiromu OCHIAI
馃檵 鉂わ笍 馃崳
Hiromu OCHIAI
It is a image ocr tool using the Tesseract-OCR engine with the pytesseract package and has a GUI.

OCR-Tool It is a image ocr tool made in Python using the Tesseract-OCR engine with the pytesseract package and has a GUI. This is my second ever pytho

Khant Htet Aung 4 Jul 11, 2022
ISI's Optical Character Recognition (OCR) software for machine-print and handwriting data

VistaOCR ISI's Optical Character Recognition (OCR) software for machine-print and handwriting data Publications "How to Efficiently Increase Resolutio

ISI Center for Vision, Image, Speech, and Text Analytics 21 Dec 8, 2021
Provides OCR (Optical Character Recognition) services through web applications

OCR4all As suggested by the name one of the main goals of OCR4all is to allow basically any given user to independently perform OCR on a wide variety

null 174 Dec 31, 2022
A collection of resources (including the papers and datasets) of OCR (Optical Character Recognition).

OCR Resources This repository contains a collection of resources (including the papers and datasets) of OCR (Optical Character Recognition). Contents

Zuming Huang 363 Jan 3, 2023
Indonesian ID Card OCR using tesseract OCR

KTP OCR Indonesian ID Card OCR using tesseract OCR KTP OCR is python-flask with tesseract web application to convert Indonesian ID Card to text / JSON

Revan Muhammad Dafa 5 Dec 6, 2021
Programa que viabiliza a OCR (Optical Character Reading - leitura 贸ptica de caracteres) de um PDF.

Este programa tem o intuito de ser um modificador de arquivos PDF. Os arquivos PDFs podem ser 3: PDFs verdadeiros - em que podem ser selecionados o ti

Daniel Soares Saldanha 2 Oct 11, 2021
A curated list of resources for text detection/recognition (optical character recognition ) with deep learning methods.

awesome-deep-text-detection-recognition A curated list of awesome deep learning based papers on text detection and recognition. Text Detection Papers

null 2.4k Jan 8, 2023
Text recognition (optical character recognition) with deep learning methods.

What Is Wrong With Scene Text Recognition Model Comparisons? Dataset and Model Analysis | paper | training and evaluation data | failure cases and cle

Clova AI Research 3.2k Jan 4, 2023
Extract tables from scanned image PDFs using Optical Character Recognition.

ocr-table This project aims to extract tables from scanned image PDFs using Optical Character Recognition. Install Requirements Tesseract OCR sudo apt

Abhijeet Singh 209 Dec 6, 2022
A Screen Translator/OCR Translator made by using Python and Tesseract, the user interface are made using Tkinter. All code written in python.

About An OCR translator tool. Made by me by utilizing Tesseract, compiled to .exe using pyinstaller. I made this program to learn more about python. I

Fauzan F A 41 Dec 30, 2022
This is a GUI for scrapping PDFs with the help of optical character recognition making easier than ever to scrape PDFs.

pdf-scraper-with-ocr With this tool I am aiming to facilitate the work of those who need to scrape PDFs either by hand or using tools that doesn't imp

Jacobo Jos茅 Guijarro Villalba 75 Oct 21, 2022
Optical character recognition for Japanese text, with the main focus being Japanese manga

Manga OCR Optical character recognition for Japanese text, with the main focus being Japanese manga. It uses a custom end-to-end model built with Tran

Maciej Budy艣 327 Jan 1, 2023
python ocr using tesseract/ with EAST opencv detector

pytextractor python ocr using tesseract/ with EAST opencv text detector Uses the EAST opencv detector defined here with pytesseract to extract text(de

Danny Crasto 38 Dec 5, 2022
A bot that extract text from images using the Tesseract OCR.

Text from image (OCR) @ocr_text_bot A simple bot to extract text from images. Usage What do I need? A AWS key configured locally, see here. NodeJS. I

Weverton Marques 4 Aug 6, 2021
This pyhton script converts a pdf to Image then using tesseract as OCR engine converts Image to Text

Script_Convertir_PDF_IMG_TXT Este script de pyhton convierte un pdf en Imagen luego utilizando tesseract como motor OCR convierte la Imagen a Texto. p

alebogado 1 Jan 27, 2022
A Python wrapper for the tesseract-ocr API

tesserocr A simple, Pillow-friendly, wrapper around the tesseract-ocr API for Optical Character Recognition (OCR). tesserocr integrates directly with

Fayez 1.7k Dec 31, 2022
Run tesseract with the tesserocr bindings with @OCR-D's interfaces

ocrd_tesserocr Crop, deskew, segment into regions / tables / lines / words, or recognize with tesserocr Introduction This package offers OCR-D complia

OCR-D 38 Oct 14, 2022
Tesseract Open Source OCR Engine (main repository)

Tesseract OCR About This package contains an OCR engine - libtesseract and a command line program - tesseract. Tesseract 4 adds a new neural net (LSTM

null 48.4k Jan 9, 2023
Awesome multilingual OCR toolkits based on PaddlePaddle 锛坧ractical ultra lightweight OCR system, provide data annotation and synthesis tools, support training and deployment among server, mobile, embedded and IoT devices锛

English | 绠浣撲腑鏂 Introduction PaddleOCR aims to create multilingual, awesome, leading, and practical OCR tools that help users train better models and a

null 27.5k Jan 8, 2023