Ocular is a state-of-the-art historical OCR system.

Overview

Ocular

Ocular is a state-of-the-art historical OCR system.

Its primary features are:

  • Unsupervised learning of unknown fonts: requires only document images and a corpus of text.
  • Ability to handle noisy documents: inconsistent inking, spacing, vertical alignment, etc.
  • Support for multilingual documents, including those that have considerable word-level code-switching.
  • Unsupervised learning of orthographic variation patterns including archaic spellings and printer shorthand.
  • Simultaneous, joint transcription into both diplomatic (literal) and normalized forms.

It is described in the following publications:

Unsupervised Transcription of Historical Documents [pdf]
Taylor Berg-Kirkpatrick, Greg Durrett, and Dan Klein
ACL 2013

Improved Typesetting Models for Historical OCR [pdf]
Taylor Berg-Kirkpatrick and Dan Klein
ACL 2014

Unsupervised Code-Switching for Multilingual Historical Document Transcription [pdf] [data]
Dan Garrette, Hannah Alpert-Abrams, Taylor Berg-Kirkpatrick, and Dan Klein
NAACL 2015

An Unsupervised Model of Orthographic Variation for Historical Document Transcription [pdf] [data]
Dan Garrette and Hannah Alpert-Abrams
NAACL 2016

Continued development of Ocular is supported in part by a Digital Humanities Implementation Grant from the National Endowment for the Humanities for the project Reading the First Books: Multilingual, Early-Modern OCR for Primeros Libros.

Contents of this README

  1. Quick-Start Guide
  1. Listing of Command-Line Options

1. Quick-Start Guide

Obtaining Ocular

The easiest way to get the Ocular software is to download the self-contained jar from http://www.dhgarrette.com/maven-repository/snapshots/edu/berkeley/cs/nlp/ocular/0.3-SNAPSHOT/ocular-0.3-SNAPSHOT-with_dependencies.jar

Once you have this jar, you will be able to run Ocular according to the instructions below in the Using Ocular section; the code in this repository is not a requirement if all you'd like to do is run the software.

The jar is executable, so when you use go to use Ocular, you will run it following this template (where [MAIN-CLASS] will specify which program to run, as detailed in the Using Ocular section below):

java -Done-jar.main.class=[MAIN-CLASS] -mx7g -jar ocular-0.3-SNAPSHOT-with_dependencies.jar [options...]

This jar includes all the necessary dependencies, so you should be able to move it to, and run it from, wherever you like.

Optional: Building Ocular from source code

Clone this repository, and compile the project into a jar:

git clone https://github.com/tberg12/ocular.git
cd ocular
./make_jar.sh

This creates precisely the same ocular-0.3-SNAPSHOT-with_dependencies.jar jar file discussed above. Thus, this is sufficient to be able to run Ocular, as stated above, using the detailed instructions in the Using Ocular section below.

Also like above, since this jar includes all the necessary dependencies, so you should be able to move it wherever you like, without the rest of the contents of this repository.

Compiling to an executable script instead of jar

Alternatively, if you do not wish to create the entire jar, you can run make_run_script.sh, which compiles the code and generates an executable script target/start. This script can be used directly, in lieu of the jar file. Thus to run Ocular, it is sufficient to run the make_run_script.sh script and then use the following template instead of the template given above:

export JAVA_OPTS="-mx7g"     # Increase the available memory
target/start [MAIN-CLASS] [options...]

Optional: Obtaining Ocular via a dependency management system

To incorporate Ocular into a larger project, you may use a dependency management system like Maven or SBT with the following information:

Repository location: http://www.dhgarrette.com/maven-repository/snapshots
Group ID: edu.berkeley.cs.nlp
Artifact ID: ocular
Version: 0.3-SNAPSHOT

Using Ocular

  1. Initialize a language model:

Acquire some files with text written in the language(s) of your documents. For example, download a book in English. The path specified by -inputTextPath should point to a text file or directory or directory hierarchy of text files; the path will be searched recursively for files. Use -outputLmPath to specify where the trained LM should be written.

  java -Done-jar.main.class=edu.berkeley.cs.nlp.ocular.main.InitializeLanguageModel -mx7g -jar ocular-0.3-SNAPSHOT-with_dependencies.jar \
    -inputTextPath texts/pg2600.txt \
    -outputLmPath lm/english.lmser

For a multilingual (code-switching) model, specify multiple -inputTextPath entries composed of a language name and a path to files containing text in that language. For example, a combined Spanish/Latin/Nahuatl might be trained as follows:

  java -Done-jar.main.class=edu.berkeley.cs.nlp.ocular.main.InitializeLanguageModel -mx7g -jar ocular-0.3-SNAPSHOT-with_dependencies.jar \
    -inputTextPath "spanish->texts/sp/,latin->texts/la/,nahuatl->texts/na/" \
    -outputLmPath lm/trilingual.lmser

This program will work with any languages, and any number of languages; simply add an entry for every relevant language. The set of languages chosen should match the set of languages found in the documents that are to be transcribed.

More details on the various command-line options can be found below.

  1. Initialize a font:

Before a font can be trained from texts, a font model consisting of a "guess" for each character must be initialized based on the fonts on your computer. Use -outputFontPath to specify where the initialized font should be written. Since different languages use different character sets, a language model must be given in order for the system to know what characters to initialize (-inputLmPath).

  java -Done-jar.main.class=edu.berkeley.cs.nlp.ocular.main.InitializeFont -mx7g -jar ocular-0.3-SNAPSHOT-with_dependencies.jar \
    -inputLmPath lm/trilingual.lmser \
    -outputFontPath font/trilingual-init.fontser
  1. Train a font:

To train a font, a set of document pages must be given (-inputDocPath), along with the paths to the language model and initial font model. Use -outputFontPath to specify where the trained font model should be written, and -outputPath to specify where transcriptions and (optional) evaluation metrics should be written. The path specified by -inputDocPath should point to a pdf or image file or directory or directory hierarchy of such files. The value given by -inputDocPath will be searched recursively for non-.txt files; the transcriptions written to the -outputPath will maintain the same directory hierarchy.

  java -Done-jar.main.class=edu.berkeley.cs.nlp.ocular.main.TrainFont -mx7g -jar ocular-0.3-SNAPSHOT-with_dependencies.jar \
    -inputFontPath font/trilingual-init.fontser \
    -inputLmPath lm/trilingual.lmser \
    -inputDocPath sample_images/advertencias \
    -numDocs 10 \
    -outputFontPath font/advertencias/trained.fontser \
    -outputPath train_output

Since the operation of the font trainer is to take in a font model (-inputFontPath) and output a new and improved font model (-outputFontPath), TrainFont can be run multiple times, passing the output back in as the input of the next round, to continue to making improvements.

Many more command-line options, including several that affect speed and accuracy, can be found below.

Optional: Glyph substitution modeling for variable orthography

Ocular has the optional ability to learn, unsupervised, a mapping from archaic orthography to the orthography reflected in the trained language model. We call this a "glyph substitution model" (GSM). To train a GSM, add the -allowGlyphSubstitution, -updateGsm and -outputGsmPath options. If no -inputGsmPath is given, a new GSM will be created and then trained along with the font.

  java -Done-jar.main.class=edu.berkeley.cs.nlp.ocular.main.TrainFont -mx7g -jar ocular-0.3-SNAPSHOT-with_dependencies.jar \
    -inputFontPath font/trilingual-init.fontser \
    -inputLmPath lm/trilingual.lmser \
    -inputDocPath sample_images/advertencias \
    -numDocs 10 \
    -outputFontPath font/advertencias/trained.fontser \
    -outputPath train_output \
    -allowGlyphSubstitution true \
    -updateGsm true \
    -outputGsmPath gsm/advertencias/trained.gsmser

If -allowGlyphSubstitution is set to true, Ocular will produce simultaneous dual transcriptions: one diplomatic (literal) and one normalized to match the LM training data's orthography.

  1. Transcribe some pages:

To transcribe pages, -inputFontPath should point to the newly-trained font model (the -outputFontPath from the training step, instead of the "initial" font model used during font training).

  java -Done-jar.main.class=edu.berkeley.cs.nlp.ocular.main.Transcribe -mx7g -jar ocular-0.3-SNAPSHOT-with_dependencies.jar \
    -inputDocPath sample_images/advertencias \
    -inputLmPath lm/trilingual.lmser \
    -inputFontPath font/advertencias/trained.fontser \
    -outputPath transcribe_output 

As above, if -allowGlyphSubstitution is set to true and the -inputGsmPath is given, Ocular will produce simultaneous dual transcriptions: one diplomatic (literal) and one normalized to match the LM training data's orthography.

Many more command-line options, including several that affect speed and accuracy, can be found below. Among these, -skipAlreadyTranscribedDocs might be particularly useful.

Optional: Continued model improvements during transcription

Since training is a model is done in an unsupervised fashion (it requires no gold transcriptions), the operation of transcribing is actually a subset of EM font training. Because of this, it is possible make further improvements to the models during transcription, without having to make multiple iterations over the documents. This can be done by setting -updateFont to true, and -updateDocBatchSize to a reasonable number of training documents:

  java -Done-jar.main.class=edu.berkeley.cs.nlp.ocular.main.Transcribe -mx7g -jar ocular-0.3-SNAPSHOT-with_dependencies.jar \
    -inputDocPath sample_images/advertencias \
    -inputLmPath lm/trilingual.lmser \
    -inputFontPath font/advertencias/trained.fontser \
    -outputPath transcribe_output \
    -updateFont true \
    -updateDocBatchSize 50 \
    -outputFontPath font/advertencias/trained.fontser

The same can be done to update the glyph substitution model by passing in the previously-trained model (-inputGsmPath) and setting -updateGsm to true.

    -allowGlyphSubstitution true \
    -inputGsmPath gsm/advertencias/trained.gsmser \
    -updateGsm true \
    -outputGsmPath gsm/advertencias/trained.gsmser

Optional: Checking accuracy with a gold transcription

If a gold standard transcription is available for a file, it should be written in a .txt file in the same directory as the corresponding image, and given the same filename (but with a different extension). These files will be used to evaluate the accuracy of the transcription (during either training or testing). Likewise, if a gold normalized transcription is available, it should be given the same filename, but with _normalized appended. For example:

  path/to/some/image_001.jpg              # document image
  path/to/some/image_001.txt              # corresponding transcription
  path/to/some/image_001_normalized.txt   # corresponding normalized transcription

For pdf files, the transcription filename is based on both the pdf filename and the relevant page number (as a 5-digit number):

  path/to/some/filename.pdf                            # document image
  path/to/some/filename_pdf_page00001.txt              # transcription of the document's first page
  path/to/some/filename_pdf_page00001_normalized.txt   # corresponding normalized transcription

2. All Command-Line Options

InitializeLanguageModel

Required
  • -inputTextPath: Path to the text files (or directory hierarchies) for training the LM. For each entry, the entire directory will be recursively searched for any files that do not start with .. For a multilingual (code-switching) model, give multiple comma-separated files with language names: "english->texts/english/,spanish->texts/spanish/,french->texts/french/". Be sure to wrap the whole string with "quotes".) Required.

  • -outputLmPath: Output LM file path. Required.

Additional Options
  • -minCharCount: Number of times the character must be seen in order to be included. Default: 10

  • -insertLongS: Automatically insert "long s" characters into the language model training data? Default: false

  • -charNgramLength: LM character n-gram length. If just one language is used, or if all languages should use the same value, just give an integer. If languages can have different values, give them as comma-separated language/integer pairs: "english->6,spanish->4,french->4"; be sure to wrap the whole string with "quotes". Default: 6

  • -alternateSpellingReplacementPaths: Paths to Alternate Spelling Replacement files. If just a simple path is given, the replacements will be applied to all languages. For language-specific replacements, give multiple comma-separated language/path pairs: "english->rules/en.txt,spanish->rules/sp.txt,french->rules/fr.txt". Be sure to wrap the whole string with "quotes". Any languages for which no replacements are need can be safely ignored. Default: No alternate spelling replacements.

Rarely Used Options
  • -removeDiacritics: Remove diacritics? Default: false

  • -pKeepSameLanguage: Prior probability of sticking with the same language when moving between words in a code-switch model transition model. (Only relevant if multiple languages used.) Default: 0.999999

  • -languagePriors: Prior probability of each language; ignore for uniform priors. Give multiple comma-separated language/prior pairs: "english->0.7,spanish->0.2,french->0.1". Be sure to wrap the whole string with "quotes". (Only relevant if multiple languages used.) Default: Uniform priors.

  • -lmPower: Exponent on LM scores. Default: 4.0

  • -explicitCharacterSet: A set of valid characters. If a character with a diacritic is found but not in this set, the diacritic will be dropped. Other excluded characters will simply be dropped. Ignore to allow all characters. Default: Allow all characters.

  • -lmCharCount: Number of characters to use for training the LM. Use 0 to indicate that the full training data should be used. Default: Use all documents in full.

InitializeFont

Required
  • -inputLmPath: Path to the language model file (so that it knows which characters to create images for). Required.

  • -outputFontPath: Output font file path. Required.

Additional Options
  • -allowedFontsPath: Path to a file that contains a custom list of font names that may be used to initialize the font. The file should contain one font name per line. Default: Use all valid fonts found on the computer.
Rarely Used Options
  • -numFontInitThreads: Number of threads to use. Default: 8

  • -spaceMaxWidthFraction: Max space template width as fraction of text line height. Default: 1.0

  • -spaceMinWidthFraction: Min space template width as fraction of text line height. Default: 0.0

  • -templateMaxWidthFraction: Max template width as fraction of text line height. Default: 1.0

  • -templateMinWidthFraction: Min template width as fraction of text line height. Default: 0.0

TrainFont

Main Options
  • -inputDocPath: Path to the directory that contains the input document images. The entire directory will be searched recursively for any files that do not end in .txt (and that do not start with .). Files will be processed in lexicographical order. Default: Either inputDocPath or inputDocListPath is required.

  • -inputDocListPath: Path to a file that contains a list of paths to images files that should be used. The file should contain one path per line. These paths will be searched in order. Each path may point to either a file or a directory, which will be searched recursively for any files that do not end in .txt (and that do not start with .). Paths will be processed in the order given in the file, and each path will be searched in lexicographical order. Default: Either inputDocPath or inputDocListPath is required.

  • -inputFontPath: Path of the input font file. Required.

  • -inputLmPath: Path to the input language model file. Required.

  • -numDocs: Number of documents (pages) to use, counting alphabetically. Ignore or use 0 to use all documents. Default: Use all documents.

  • -numDocsToSkip: Number of training documents (pages) to skip over, counting alphabetically. Useful, in combination with -numDocs, if you want to break a directory of documents into several chunks. Default: 0

  • -numEMIters: Number of iterations of EM to use for font learning. Default: 3

  • -continueFromLastCompleteIteration: If true, the font trainer will find the latest completed iteration in the outputPath and load it in order to pick up training from that point. Convenient if a training run crashes when only partially completed. Default: false

  • -outputPath: Path of the directory that will contain output transcriptions. Required.

  • -outputFormats: Output formats to be generated. Choose from one or multiple of {dipl,norm,normlines,comp,html,alto}, comma-separated. dipl = diplomatic, norm = normalized (lines joined), normlines = normalized (separate lines), comp = comparisons. Default: dipl,norm if -allowGlyphSubstitution=true; dipl otherwise.

  • -outputFontPath: Path to write the learned font file to. Required if updateFont is set to true, otherwise ignored.

Additional Options
  • -extractedLinesPath: Path of the directory where the line-extraction images should be read/written. If the line files exist here, they will be used; if not, they will be extracted and then written here. Useful if: 1) you plan to run Ocular on the same documents multiple times and you want to save some time by not re-extracting the lines, or 2) you use an alternate line extractor (such as Tesseract) to pre-process the document. If ignored, the document will simply be read from the original document image file, and no line images will be written. Default: Don't read or write line image files.

  • -updateDocBatchSize: Number of documents to process for each parameter update. This is useful if you are transcribing a large number of documents, and want to have Ocular slowly improve the model as it goes, which you would achieve with updateFont=true. Default: Update only after each full pass over the document set.

These options affect the speed of font training

  • -emissionEngine: Engine to use for inner loop of emission cache computation. DEFAULT: Uses Java on CPU, which works on any machine but is the slowest method. OPENCL: Faster engine that uses either the CPU or integrated GPU (depending on processor) and requires OpenCL installation. CUDA: Fastest method, but requires a discrete NVIDIA GPU and CUDA installation. Default: DEFAULT

  • -beamSize: Size of beam for Viterbi inference. (Usually in range 10-50. Increasing beam size can improve accuracy, but will reduce speed.) Default: 10

  • -markovVerticalOffset: Use Markov chain to generate vertical offsets. (Slower, but more accurate. Turning on Markov offsets my require larger beam size for good results.) Default: false

Glyph Substitution Model Options

Glyph substitution is the feature that allows Ocular to use a probabilistic mapping from modern orthography (as used in the language model training text) to the orthography seen in the documents. If the glyph substitution feature is used, Ocular will jointly produce dual transcriptions: one that is an exact transcription of the document, and one that is a normalized version of the text.

  • -allowGlyphSubstitution: Should the model allow glyph substitutions? This includes substituted letters as well as letter elisions. Default: false

  • -inputGsmPath: Path to the input glyph substitution model file. (Only relevant if allowGlyphSubstitution is set to true.) Default: Don't use a pre-initialized GSM. (Learn one from scratch).

  • -updateGsm: Should the glyph substitution model be trained (or updated) along with the font? (Only relevant if allowGlyphSubstitution is set to true.) Default: false

  • -outputGsmPath: Path to write the retrained glyph substitution model file to. Required if updateGsm is set to true, otherwise ignored.

Language Model Training Options
  • -updateLM: Should the language model be updated along with the font? Default: false

  • -outputLmPath: Path to write the retrained language model file to. Required if updateLM is set to true, otherwise ignored.

Line Extraction Options
  • -binarizeThreshold: Quantile to use for pixel value thresholding. (High values mean more black pixels.) Default: 0.12

  • -crop: Crop pages? Default: true

Evaluate During Training
  • -evalInputDocPath: When evaluation should be done during training (after each parameter update in EM), this is the path of the directory that contains the evaluation input document images. The entire directory will be recursively searched for any files that do not end in .txt (and that do not start with .). (Only relevant if updateFont is set to true.) Default: Do not evaluate during font training.

  • -evalNumDocs: When using -evalInputDocPath, this is the number of documents that will be evaluated on. Ignore or use 0 to use all documents. Default: Use all documents in the specified path.

  • -evalExtractedLinesPath: When using -evalInputDocPath, this is the path of the directory where the evaluation line-extraction images should be read/written. If the line files exist here, they will be used; if not, they will be extracted and then written here. Useful if: 1) you plan to run Ocular on the same documents multiple times and you want to save some time by not re-extracting the lines, or 2) you use an alternate line extractor (such as Tesseract) to pre-process the document. If ignored, the document will simply be read from the original document image file, and no line images will be written. Default: Don't read or write line image files.

  • -evalFreq: When using -evalInputDocPath, the font trainer will perform an evaluation every evalFreq iterations. Default: Evaluate only after all iterations have completed.

  • -evalBatches: When using -evalInputDocPath, on iterations in which we run the evaluation, should the evaluation be run after each batch, as determined by -updateDocBatchSize (in addition to after each iteration)? Default: false

Rarely Used Options
  • -allowLanguageSwitchOnPunct: A language model to be used to assign diacritics to the transcription output. Default: true

  • -cudaDeviceID: GPU ID when using CUDA emission engine. Default: 0

  • -decodeBatchSize: Number of lines that compose a single decode batch. (Smaller batch size can reduce memory consumption.) Default: 32

  • -gsmElideAnything: Should the GSM be allowed to elide letters even without the presence of an elision-marking tilde? Default: false

  • -gsmElisionSmoothingCountMultiplier: gsmElisionSmoothingCountMultiplier. Default: 100.0

  • -gsmNoCharSubPrior: The prior probability of not-substituting the LM char. This includes substituted letters as well as letter elisions. Default: 0.9

  • -gsmPower: Exponent on GSM scores. Default: 4.0

  • -gsmSmoothingCount: The default number of counts that every glyph gets in order to smooth the glyph substitution model estimation. Default: 1.0

  • -paddingMaxWidth: Max horizontal padding between characters in pixels (Best left at default value.) Default: 5

  • -paddingMinWidth: Min horizontal padding between characters in pixels. (Best left at default value.) Default: 1

  • -uniformLineHeight: Scale all lines to have the same height? Default: true

  • -numDecodeThreads: Number of threads to use for decoding. (More thread may increase speed, but may cause a loss of continuity across lines.) Default: 1

  • -numEmissionCacheThreads: Number of threads to use during emission cache computation. (Only has effect when emissionEngine is set to DEFAULT.) Default: 8

  • -numMstepThreads: Number of threads to use for LFBGS during m-step. Default: 8

Transcribe

Main Options
  • -inputDocPath: Path to the directory that contains the input document images. The entire directory will be searched recursively for any files that do not end in .txt (and that do not start with .). Files will be processed in lexicographical order. Default: Either inputDocPath or inputDocListPath is required.

  • -inputDocListPath: Path to a file that contains a list of paths to images files that should be used. The file should contain one path per line. These paths will be searched in order. Each path may point to either a file or a directory, which will be searched recursively for any files that do not end in .txt (and that do not start with .). Paths will be processed in the order given in the file, and each path will be searched in lexicographical order. Default: Either inputDocPath or inputDocListPath is required.

  • -inputFontPath: Path of the input font file. Required.

  • -inputLmPath: Path to the input language model file. Required.

  • -numDocs: Number of documents (pages) to use, counting alphabetically. Ignore or use 0 to use all documents. Default: Use all documents.

  • -numDocsToSkip: Number of training documents (pages) to skip over, counting alphabetically. Useful, in combination with -numDocs, if you want to break a directory of documents into several chunks. Default: 0

  • -skipAlreadyTranscribedDocs: If true, for each doc the outputPath will be checked for an existing transcription and if one is found then the document will be skipped. Default: false

  • -outputPath: Path of the directory that will contain output transcriptions. Required.

  • -outputFormats: Output formats to be generated. Choose from one or multiple of {dipl,norm,normlines,comp,html,alto}, comma-separated. dipl = diplomatic, norm = normalized (lines joined), normlines = normalized (separate lines), comp = comparisons. Default: dipl,norm if -allowGlyphSubstitution=true; dipl otherwise.

Additional Options
  • -extractedLinesPath: Path of the directory where the line-extraction images should be read/written. If the line files exist here, they will be used; if not, they will be extracted and then written here. Useful if: 1) you plan to run Ocular on the same documents multiple times and you want to save some time by not re-extracting the lines, or 2) you use an alternate line extractor (such as Tesseract) to pre-process the document. If ignored, the document will simply be read from the original document image file, and no line images will be written. Default: Don't read or write line image files.

  • -failIfAllDocsAlreadyTranscribed: If true, an exception will be thrown if all of the input documents have already been transcribed (and thus the job has nothing to do). Ignored unless -skipAlreadyTranscribedDocs=true. Default: false

These options affect the speed of transcription

  • -emissionEngine: Engine to use for inner loop of emission cache computation. DEFAULT: Uses Java on CPU, which works on any machine but is the slowest method. OPENCL: Faster engine that uses either the CPU or integrated GPU (depending on processor) and requires OpenCL installation. CUDA: Fastest method, but requires a discrete NVIDIA GPU and CUDA installation. Default: DEFAULT

  • -beamSize: Size of beam for Viterbi inference. (Usually in range 10-50. Increasing beam size can improve accuracy, but will reduce speed.) Default: 10

  • -markovVerticalOffset: Use Markov chain to generate vertical offsets. (Slower, but more accurate. Turning on Markov offsets my require larger beam size for good results.) Default: false

Glyph Substitution Model Options

Glyph substitution is the feature that allows Ocular to use a probabilistic mapping from modern orthography (as used in the language model training text) to the orthography seen in the documents. If the glyph substitution feature is used, Ocular will jointly produce dual transcriptions: one that is an exact transcription of the document, and one that is a normalized version of the text.

  • -allowGlyphSubstitution: Should the model allow glyph substitutions? This includes substituted letters as well as letter elisions. Default: false

  • -inputGsmPath: Path to the input glyph substitution model file. (Only relevant if allowGlyphSubstitution is set to true.) Default: Don't use a pre-initialized GSM. (Learn one from scratch).

Model Updating Options
  • -updateDocBatchSize: Number of documents to process for each parameter update. This is useful if you are transcribing a large number of documents, and want to have Ocular slowly improve the model as it goes, which you would achieve with updateFont=true. Default: Update only after each full pass over the document set.

For updating the font model

  • -updateFont: Update the font during transcription based on the new input documents? Default: false

  • -outputFontPath: Path to write the learned font file to. Required if updateFont is set to true, otherwise ignored.

For updating the glyph substitution model

  • -updateGsm: Should the glyph substitution model be trained (or updated) along with the font? (Only relevant if allowGlyphSubstitution is set to true.) Default: false

  • -outputGsmPath: Path to write the retrained glyph substitution model file to. Required if updateGsm is set to true, otherwise ignored.

For updating the language model

  • -updateLM: Should the language model be updated along with the font? Default: false

  • -outputLmPath: Path to write the retrained language model file to. Required if updateLM is set to true, otherwise ignored.

Line Extraction Options
  • -binarizeThreshold: Quantile to use for pixel value thresholding. (High values mean more black pixels.) Default: 0.12

  • -crop: Crop pages? Default: true

Evaluate During Training
  • -evalInputDocPath: When evaluation should be done during training (after each parameter update in EM), this is the path of the directory that contains the evaluation input document images. The entire directory will be recursively searched for any files that do not end in .txt (and that do not start with .). (Only relevant if updateFont is set to true.) Default: Do not evaluate during font training.

  • -evalNumDocs: When using -evalInputDocPath, this is the number of documents that will be evaluated on. Ignore or use 0 to use all documents. Default: Use all documents in the specified path.

  • -evalBatches: When using -evalInputDocPath, on iterations in which we run the evaluation, should the evaluation be run after each batch, as determined by -updateDocBatchSize (in addition to after each iteration)? Default: false

  • -evalExtractedLinesPath: When using -evalInputDocPath, this is the path of the directory where the evaluation line-extraction images should be read/written. If the line files exist here, they will be used; if not, they will be extracted and then written here. Useful if: 1) you plan to run Ocular on the same documents multiple times and you want to save some time by not re-extracting the lines, or 2) you use an alternate line extractor (such as Tesseract) to pre-process the document. If ignored, the document will simply be read from the original document image file, and no line images will be written. Default: Don't read or write line image files.

Rarely Used Options
  • -allowLanguageSwitchOnPunct: A language model to be used to assign diacritics to the transcription output. Default: true

  • -cudaDeviceID: GPU ID when using CUDA emission engine. Default: 0

  • -decodeBatchSize: Number of lines that compose a single decode batch. (Smaller batch size can reduce memory consumption.) Default: 32

  • -gsmElideAnything: Should the GSM be allowed to elide letters even without the presence of an elision-marking tilde? Default: false

  • -gsmElisionSmoothingCountMultiplier: gsmElisionSmoothingCountMultiplier. Default: 100.0

  • -gsmNoCharSubPrior: The prior probability of not-substituting the LM char. This includes substituted letters as well as letter elisions. Default: 0.9

  • -gsmPower: Exponent on GSM scores. Default: 4.0

  • -gsmSmoothingCount: The default number of counts that every glyph gets in order to smooth the glyph substitution model estimation. Default: 1.0

  • -paddingMaxWidth: Max horizontal padding between characters in pixels (Best left at default value.) Default: 5

  • -paddingMinWidth: Min horizontal padding between characters in pixels. (Best left at default value.) Default: 1

  • -uniformLineHeight: Scale all lines to have the same height? Default: true

  • -numDecodeThreads: Number of threads to use for decoding. (More thread may increase speed, but may cause a loss of continuity across lines.) Default: 1

  • -numEmissionCacheThreads: Number of threads to use during emission cache computation. (Only has effect when emissionEngine is set to DEFAULT.) Default: 8

  • -numMstepThreads: Number of threads to use for LFBGS during m-step. Default: 8

Comments
  • Requirements / Performance

    Requirements / Performance

    Hi,

    For training the font model, what requirements are there on the system used to do the training and the images themselves? What resolution is recommended? Any pre-processing required?

    The reason I ask is that I have been trying to learn a model for 19th century English-language newspapers and it runs and is processing, but after more than 48h it was still on the first iteration of the first image "Extracting text line images".

    I can provide the image if that is any help.

    Best Regards, Mark

    opened by scmmmh 5
  • NullPointerException when writing language model

    NullPointerException when writing language model

    Hi,

    I am using the precompiled jar to train a language model for Dutch. However, it fails to write the lm:

    writing LM to dut.lmser Exception in thread "main" java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:483) at com.simontuffs.onejar.Boot.run(Boot.java:340) at com.simontuffs.onejar.Boot.main(Boot.java:166) Caused by: java.lang.RuntimeException: java.lang.NullPointerException at edu.berkeley.cs.nlp.ocular.main.InitializeLanguageModel.writeLM(InitializeLanguageModel.java:314) at edu.berkeley.cs.nlp.ocular.main.InitializeLanguageModel.run(InitializeLanguageModel.java:121) at edu.berkeley.cs.nlp.ocular.main.OcularRunnable.doMain(OcularRunnable.java:23) at edu.berkeley.cs.nlp.ocular.main.InitializeLanguageModel.main(InitializeLanguageModel.java:85) ... 6 more Caused by: java.lang.NullPointerException at edu.berkeley.cs.nlp.ocular.main.InitializeLanguageModel.writeLM(InitializeLanguageModel.java:310) ... 9 more

    Best, Matje

    opened by matjemeisje 3
  • Training for Finnish (containing also letters ä ö å Ä Ö Å)

    Training for Finnish (containing also letters ä ö å Ä Ö Å)

    I created model for Finnish

    1. initializing language model for 1 language (Finnish) and
    2. initializing a font for (537939 words in training data)
    3. training font with only 3 training images results this kinds of texts.
    4. The results from the same 3 images. look weird?

    What could be wrong?

    eval_diplomatic.txt

    Document: sample_images/finnish/1.jpg CER, keep punc: 0.974025974025974 CER, keep punc, allow f->s: 0.974025974025974 CER, remove punc: 0.9970326409495549 CER, remove punc, allow f->s: 0.9970326409495549 WER, keep punc: 1.0 WER, keep punc, allow f->s: 1.0 WER, remove punc: 1.0 WER, remove punc, allow f->s: 1.0

    Document: sample_images/finnish/2.jpg CER, keep punc: 0.98094688221709 CER, keep punc, allow f->s: 0.98094688221709 CER, remove punc: 0.9940369707811568 CER, remove punc, allow f->s: 0.9940369707811568 WER, keep punc: 1.0 WER, keep punc, allow f->s: 1.0 WER, remove punc: 1.0 WER, remove punc, allow f->s: 1.0

    Document: sample_images/finnish/3.jpg CER, keep punc: 0.9778783308195073 CER, keep punc, allow f->s: 0.9778783308195073 CER, remove punc: 0.9947451392538098 CER, remove punc, allow f->s: 0.9947451392538098 WER, keep punc: 1.0 WER, keep punc, allow f->s: 1.0 WER, remove punc: 0.996309963099631 WER, remove punc, allow f->s: 0.996309963099631

    Macro-avg total eval: CER, keep punc: 0.9776170623541904 CER, keep punc, allow f->s: 0.9776170623541904 CER, remove punc: 0.9952715836615071 CER, remove punc, allow f->s: 0.9952715836615071 WER, keep punc: 1.0 WER, keep punc, allow f->s: 1.0 WER, remove punc: 0.998769987699877 WER, remove punc, allow f->s: 0.998769987699877

    1_transcription.txt

    uºгp

    (

    П

    (
    (
    (

    (

    (


    m
    A
    (
    (
    (
    (
    (
    (

    tºг⅙

    • ˶


      v

    1_comparisons.txt MD: Model diplomatic transcription GD: Gold diplomatic transcription

    MD: GD: my"oten erinnyt, yksi osa seurannut Kamajokea pohjaseen p"ain,

    MD: .őд1 GD: toinen Wolgajokea l"ansiluoteesen ja kolmas wasta nimitetty"a

    MD: ú GD: jokea Kaspiamereen p"ain. Ensimm"ainen osa, Karjalai- ....

    Thanks, Mika Koistinen

    opened by jmokoistinen 3
  • Bad link for self-contained JAR

    Bad link for self-contained JAR

    http://www.cs.utexas.edu/~dhg/maven-repository/snapshots/edu/berkeley/cs/nlp/ocular/0.3-SNAPSHOT/ocular-0.3-SNAPSHOT-with_dependencies.jar does not resolve.

    I get a Not Found error.

    opened by todrobbins 3
  • Change existingExtractionsPath path construction

    Change existingExtractionsPath path construction

    With this patch if I specify e.g. -inputPath images/pugna -existingExtractionsPath pre_ex, it will look in pre_ex/line_extract/pugna for existing extractions. It should also fall back to normal extraction if no extractions are found.

    I've also made a simple Tesseract line splitter whose output works with this option.

    opened by ryanfb 3
  • Hundreds of warnings

    Hundreds of warnings

    Hello,

    I'm trying to use ocular to do OCR of a phototypic edition of a dictionary. Firstly, I wasn't able to run ocular on Windows at all, because it kept complaining it wasn't able to load main class:

    Error: Could not find or load main class main.class=edu.berkeley.cs.nlp.ocular.main.InitializeLanguageModel
    

    Then I was able to run ocular under WSL (Windows Subsystem for Linux). However, each run gives me hundreds of warinings like the ones quoted at the end of this post.

    I see that ocular runs, but I'm never comfortable with so many warnings.

    milos@MILOS-LENOVO:/mnt/c/zwamp/vdrive/ocular$ java -Done-jar.main.class=edu.berkeley.cs.nlp.ocular.main.InitializeFont -mx7g -jar ocular-0.3-SNAPSHOT-with_dependencies.jar -inputLmPath lm/serbian.lmser -outputFontPath font/serbian-init.fontser
    JarClassLoader: Warning: jcuda/jcublas/cublasAtomicsMode.class in lib/jcublas-0.8.0.jar is hidden by lib/jcublas-0.6.0.jar (with different bytecode)
    JarClassLoader: Warning: jcuda/jcublas/cublasDiagType.class in lib/jcublas-0.8.0.jar is hidden by lib/jcublas-0.6.0.jar (with different bytecode)
    JarClassLoader: Warning: jcuda/jcublas/cublasFillMode.class in lib/jcublas-0.8.0.jar is hidden by lib/jcublas-0.6.0.jar (with different bytecode)
    JarClassLoader: Warning: jcuda/jcublas/cublasHandle.class in lib/jcublas-0.8.0.jar is hidden by lib/jcublas-0.6.0.jar (with different bytecode)
    JarClassLoader: Warning: jcuda/jcublas/cublasOperation.class in lib/jcublas-0.8.0.jar is hidden by lib/jcublas-0.6.0.jar (with different bytecode)
    JarClassLoader: Warning: jcuda/jcublas/cublasPointerMode.class in lib/jcublas-0.8.0.jar is hidden by lib/jcublas-0.6.0.jar (with different bytecode)
    JarClassLoader: Warning: jcuda/jcublas/cublasSideMode.class in lib/jcublas-0.8.0.jar is hidden by lib/jcublas-0.6.0.jar (with different bytecode)
    JarClassLoader: Warning: jcuda/jcublas/cublasStatus.class in lib/jcublas-0.8.0.jar is hidden by lib/jcublas-0.6.0.jar (with different bytecode)
    JarClassLoader: Warning: jcuda/jcublas/JCublas.class in lib/jcublas-0.8.0.jar is hidden by lib/jcublas-0.6.0.jar (with different bytecode)
    JarClassLoader: Warning: jcuda/jcublas/JCublas2.class in lib/jcublas-0.8.0.jar is hidden by lib/jcublas-0.6.0.jar (with different bytecode)
    JarClassLoader: Warning: jcuda/BufferUtils.class in lib/jcuda-0.8.0.jar is hidden by lib/jcuda-0.6.0.jar (with different bytecode)
    JarClassLoader: Warning: jcuda/cuComplex.class in lib/jcuda-0.8.0.jar is hidden by lib/jcuda-0.6.0.jar (with different bytecode)
    JarClassLoader: Warning: jcuda/CudaException.class in lib/jcuda-0.8.0.jar is hidden by lib/jcuda-0.6.0.jar (with different bytecode)
    JarClassLoader: Warning: jcuda/cuDoubleComplex.class in lib/jcuda-0.8.0.jar is hidden by lib/jcuda-0.6.0.jar (with different bytecode)
    JarClassLoader: Warning: jcuda/driver/CUaddress_mode.class in lib/jcuda-0.8.0.jar is hidden by lib/jcuda-0.6.0.jar (with different bytecode)
    JarClassLoader: Warning: jcuda/driver/CUarray.class in lib/jcuda-0.8.0.jar is hidden by lib/jcuda-0.6.0.jar (with different bytecode)
    JarClassLoader: Warning: jcuda/driver/CUarray_cubemap_face.class in lib/jcuda-0.8.0.jar is hidden by lib/jcuda-0.6.0.jar (with different bytecode)
    JarClassLoader: Warning: jcuda/driver/CUarray_format.class in lib/jcuda-0.8.0.jar is hidden by lib/jcuda-0.6.0.jar (with different bytecode)
    JarClassLoader: Warning: jcuda/driver/CUcomputemode.class in lib/jcuda-0.8.0.jar is hidden by lib/jcuda-0.6.0.jar (with different bytecode)
    JarClassLoader: Warning: jcuda/driver/CUcontext.class in lib/jcuda-0.8.0.jar is hidden by lib/jcuda-0.6.0.jar (with different bytecode)
    JarClassLoader: Warning: jcuda/driver/CUctx_flags.class in lib/jcuda-0.8.0.jar is hidden by lib/jcuda-0.6.0.jar (with different bytecode)
    JarClassLoader: Warning: jcuda/driver/CUDA_ARRAY3D_DESCRIPTOR.class in lib/jcuda-0.8.0.jar is hidden by lib/jcuda-0.6.0.jar (with different bytecode)
    JarClassLoader: Warning: jcuda/driver/CUDA_ARRAY_DESCRIPTOR.class in lib/jcuda-0.8.0.jar is hidden by lib/jcuda-0.6.0.jar (with different bytecode)
    JarClassLoader: Warning: jcuda/driver/CUDA_MEMCPY2D.class in lib/jcuda-0.8.0.jar is hidden by lib/jcuda-0.6.0.jar (with different bytecode)
    JarClassLoader: Warning: jcuda/driver/CUDA_MEMCPY3D.class in lib/jcuda-0.8.0.jar is hidden by lib/jcuda-0.6.0.jar (with different bytecode)
    JarClassLoader: Warning: jcuda/driver/CUDA_MEMCPY3D_PEER.class in lib/jcuda-0.8.0.jar is hidden by lib/jcuda-0.6.0.jar (with different bytecode)
    JarClassLoader: Warning: jcuda/driver/CUDA_POINTER_ATTRIBUTE_P2P_TOKENS.class in lib/jcuda-0.8.0.jar is hidden by lib/jcuda-0.6.0.jar (with different bytecode)
    JarClassLoader: Warning: jcuda/driver/CUDA_RESOURCE_DESC.class in lib/jcuda-0.8.0.jar is hidden by lib/jcuda-0.6.0.jar (with different bytecode)
    JarClassLoader: Warning: jcuda/driver/CUDA_RESOURCE_VIEW_DESC.class in lib/jcuda-0.8.0.jar is hidden by lib/jcuda-0.6.0.jar (with different bytecode)
    JarClassLoader: Warning: jcuda/driver/CUDA_TEXTURE_DESC.class in lib/jcuda-0.8.0.jar is hidden by lib/jcuda-0.6.0.jar (with different bytecode)
    JarClassLoader: Warning: jcuda/driver/CUdevice.class in lib/jcuda-0.8.0.jar is hidden by lib/jcuda-0.6.0.jar (with different bytecode)
    JarClassLoader: Warning: jcuda/driver/CUdeviceptr.class in lib/jcuda-0.8.0.jar is hidden by lib/jcuda-0.6.0.jar (with different bytecode)
    JarClassLoader: Warning: jcuda/driver/CUdevice_attribute.class in lib/jcuda-0.8.0.jar is hidden by lib/jcuda-0.6.0.jar (with different bytecode)
    JarClassLoader: Warning: jcuda/driver/CUdevprop.class in lib/jcuda-0.8.0.jar is hidden by lib/jcuda-0.6.0.jar (with different bytecode)
    JarClassLoader: Warning: jcuda/driver/CUevent.class in lib/jcuda-0.8.0.jar is hidden by lib/jcuda-0.6.0.jar (with different bytecode)
    JarClassLoader: Warning: jcuda/driver/CUevent_flags.class in lib/jcuda-0.8.0.jar is hidden by lib/jcuda-0.6.0.jar (with different bytecode)
    JarClassLoader: Warning: jcuda/driver/CUfilter_mode.class in lib/jcuda-0.8.0.jar is hidden by lib/jcuda-0.6.0.jar (with different bytecode)
    JarClassLoader: Warning: jcuda/driver/CUfunction.class in lib/jcuda-0.8.0.jar is hidden by lib/jcuda-0.6.0.jar (with different bytecode)
    JarClassLoader: Warning: jcuda/driver/CUfunction_attribute.class in lib/jcuda-0.8.0.jar is hidden by lib/jcuda-0.6.0.jar (with different bytecode)
    JarClassLoader: Warning: jcuda/driver/CUfunc_cache.class in lib/jcuda-0.8.0.jar is hidden by lib/jcuda-0.6.0.jar (with different bytecode)
    JarClassLoader: Warning: jcuda/driver/CUGLDeviceList.class in lib/jcuda-0.8.0.jar is hidden by lib/jcuda-0.6.0.jar (with different bytecode)
    JarClassLoader: Warning: jcuda/driver/CUGLmap_flags.class in lib/jcuda-0.8.0.jar is hidden by lib/jcuda-0.6.0.jar (with different bytecode)
    JarClassLoader: Warning: jcuda/driver/CUgraphicsMapResourceFlags.class in lib/jcuda-0.8.0.jar is hidden by lib/jcuda-0.6.0.jar (with different bytecode)
    JarClassLoader: Warning: jcuda/driver/CUgraphicsRegisterFlags.class in lib/jcuda-0.8.0.jar is hidden by lib/jcuda-0.6.0.jar (with different bytecode)
    JarClassLoader: Warning: jcuda/driver/CUgraphicsResource.class in lib/jcuda-0.8.0.jar is hidden by lib/jcuda-0.6.0.jar (with different bytecode)
    JarClassLoader: Warning: jcuda/driver/CUipcEventHandle.class in lib/jcuda-0.8.0.jar is hidden by lib/jcuda-0.6.0.jar (with different bytecode)
    JarClassLoader: Warning: jcuda/driver/CUipcMemHandle.class in lib/jcuda-0.8.0.jar is hidden by lib/jcuda-0.6.0.jar (with different bytecode)
    JarClassLoader: Warning: jcuda/driver/CUipcMem_flags.class in lib/jcuda-0.8.0.jar is hidden by lib/jcuda-0.6.0.jar (with different bytecode)
    JarClassLoader: Warning: jcuda/driver/CUjitInputType.class in lib/jcuda-0.8.0.jar is hidden by lib/jcuda-0.6.0.jar (with different bytecode)
    JarClassLoader: Warning: jcuda/driver/CUjit_cacheMode.class in lib/jcuda-0.8.0.jar is hidden by lib/jcuda-0.6.0.jar (with different bytecode)
    JarClassLoader: Warning: jcuda/driver/CUjit_fallback.class in lib/jcuda-0.8.0.jar is hidden by lib/jcuda-0.6.0.jar (with different bytecode)
    JarClassLoader: Warning: jcuda/driver/CUjit_option.class in lib/jcuda-0.8.0.jar is hidden by lib/jcuda-0.6.0.jar (with different bytecode)
    JarClassLoader: Warning: jcuda/driver/CUjit_target.class in lib/jcuda-0.8.0.jar is hidden by lib/jcuda-0.6.0.jar (with different bytecode)
    JarClassLoader: Warning: jcuda/driver/CUlimit.class in lib/jcuda-0.8.0.jar is hidden by lib/jcuda-0.6.0.jar (with different bytecode)
    JarClassLoader: Warning: jcuda/driver/CUlinkState.class in lib/jcuda-0.8.0.jar is hidden by lib/jcuda-0.6.0.jar (with different bytecode)
    JarClassLoader: Warning: jcuda/driver/CUmemAttach_flags.class in lib/jcuda-0.8.0.jar is hidden by lib/jcuda-0.6.0.jar (with different bytecode)
    JarClassLoader: Warning: jcuda/driver/CUmemorytype.class in lib/jcuda-0.8.0.jar is hidden by lib/jcuda-0.6.0.jar (with different bytecode)
    JarClassLoader: Warning: jcuda/driver/CUmipmappedArray.class in lib/jcuda-0.8.0.jar is hidden by lib/jcuda-0.6.0.jar (with different bytecode)
    JarClassLoader: Warning: jcuda/driver/CUmodule.class in lib/jcuda-0.8.0.jar is hidden by lib/jcuda-0.6.0.jar (with different bytecode)
    JarClassLoader: Warning: jcuda/driver/CUoutput_mode.class in lib/jcuda-0.8.0.jar is hidden by lib/jcuda-0.6.0.jar (with different bytecode)
    JarClassLoader: Warning: jcuda/driver/CUpointer_attribute.class in lib/jcuda-0.8.0.jar is hidden by lib/jcuda-0.6.0.jar (with different bytecode)
    JarClassLoader: Warning: jcuda/driver/CUresourcetype.class in lib/jcuda-0.8.0.jar is hidden by lib/jcuda-0.6.0.jar (with different bytecode)
    JarClassLoader: Warning: jcuda/driver/CUresourceViewFormat.class in lib/jcuda-0.8.0.jar is hidden by lib/jcuda-0.6.0.jar (with different bytecode)
    JarClassLoader: Warning: jcuda/driver/CUresult.class in lib/jcuda-0.8.0.jar is hidden by lib/jcuda-0.6.0.jar (with different bytecode)
    JarClassLoader: Warning: jcuda/driver/CUsharedconfig.class in lib/jcuda-0.8.0.jar is hidden by lib/jcuda-0.6.0.jar (with different bytecode)
    JarClassLoader: Warning: jcuda/driver/CUstream.class in lib/jcuda-0.8.0.jar is hidden by lib/jcuda-0.6.0.jar (with different bytecode)
    JarClassLoader: Warning: jcuda/driver/CUstreamCallback.class in lib/jcuda-0.8.0.jar is hidden by lib/jcuda-0.6.0.jar (with different bytecode)
    JarClassLoader: Warning: jcuda/driver/CUstream_flags.class in lib/jcuda-0.8.0.jar is hidden by lib/jcuda-0.6.0.jar (with different bytecode)
    JarClassLoader: Warning: jcuda/driver/CUsurfObject.class in lib/jcuda-0.8.0.jar is hidden by lib/jcuda-0.6.0.jar (with different bytecode)
    JarClassLoader: Warning: jcuda/driver/CUsurfref.class in lib/jcuda-0.8.0.jar is hidden by lib/jcuda-0.6.0.jar (with different bytecode)
    JarClassLoader: Warning: jcuda/driver/CUtexObject.class in lib/jcuda-0.8.0.jar is hidden by lib/jcuda-0.6.0.jar (with different bytecode)
    JarClassLoader: Warning: jcuda/driver/CUtexref.class in lib/jcuda-0.8.0.jar is hidden by lib/jcuda-0.6.0.jar (with different bytecode)
    JarClassLoader: Warning: jcuda/driver/JCudaDriver$ConstantPointer.class in lib/jcuda-0.8.0.jar is hidden by lib/jcuda-0.6.0.jar (with different bytecode)
    JarClassLoader: Warning: jcuda/driver/JCudaDriver.class in lib/jcuda-0.8.0.jar is hidden by lib/jcuda-0.6.0.jar (with different bytecode)
    JarClassLoader: Warning: jcuda/driver/JITOptions.class in lib/jcuda-0.8.0.jar is hidden by lib/jcuda-0.6.0.jar (with different bytecode)
    JarClassLoader: Warning: jcuda/LibUtils$OSType.class in lib/jcuda-0.8.0.jar is hidden by lib/jcuda-0.6.0.jar (with different bytecode)
    JarClassLoader: Warning: jcuda/LibUtils.class in lib/jcuda-0.8.0.jar is hidden by lib/jcuda-0.6.0.jar (with different bytecode)
    JarClassLoader: Warning: jcuda/LogLevel.class in lib/jcuda-0.8.0.jar is hidden by lib/jcuda-0.6.0.jar (with different bytecode)
    JarClassLoader: Warning: jcuda/NativePointerObject.class in lib/jcuda-0.8.0.jar is hidden by lib/jcuda-0.6.0.jar (with different bytecode)
    JarClassLoader: Warning: jcuda/Pointer.class in lib/jcuda-0.8.0.jar is hidden by lib/jcuda-0.6.0.jar (with different bytecode)
    JarClassLoader: Warning: jcuda/runtime/cudaArray.class in lib/jcuda-0.8.0.jar is hidden by lib/jcuda-0.6.0.jar (with different bytecode)
    JarClassLoader: Warning: jcuda/runtime/cudaChannelFormatDesc.class in lib/jcuda-0.8.0.jar is hidden by lib/jcuda-0.6.0.jar (with different bytecode)
    JarClassLoader: Warning: jcuda/runtime/cudaChannelFormatKind.class in lib/jcuda-0.8.0.jar is hidden by lib/jcuda-0.6.0.jar (with different bytecode)
    JarClassLoader: Warning: jcuda/runtime/cudaComputeMode.class in lib/jcuda-0.8.0.jar is hidden by lib/jcuda-0.6.0.jar (with different bytecode)
    JarClassLoader: Warning: jcuda/runtime/cudaDeviceAttr.class in lib/jcuda-0.8.0.jar is hidden by lib/jcuda-0.6.0.jar (with different bytecode)
    JarClassLoader: Warning: jcuda/runtime/cudaDeviceProp.class in lib/jcuda-0.8.0.jar is hidden by lib/jcuda-0.6.0.jar (with different bytecode)
    JarClassLoader: Warning: jcuda/runtime/cudaError.class in lib/jcuda-0.8.0.jar is hidden by lib/jcuda-0.6.0.jar (with different bytecode)
    JarClassLoader: Warning: jcuda/runtime/cudaEvent_t.class in lib/jcuda-0.8.0.jar is hidden by lib/jcuda-0.6.0.jar (with different bytecode)
    JarClassLoader: Warning: jcuda/runtime/cudaExtent.class in lib/jcuda-0.8.0.jar is hidden by lib/jcuda-0.6.0.jar (with different bytecode)
    JarClassLoader: Warning: jcuda/runtime/cudaFuncAttributes.class in lib/jcuda-0.8.0.jar is hidden by lib/jcuda-0.6.0.jar (with different bytecode)
    JarClassLoader: Warning: jcuda/runtime/cudaFuncCache.class in lib/jcuda-0.8.0.jar is hidden by lib/jcuda-0.6.0.jar (with different bytecode)
    JarClassLoader: Warning: jcuda/runtime/cudaGLDeviceList.class in lib/jcuda-0.8.0.jar is hidden by lib/jcuda-0.6.0.jar (with different bytecode)
    JarClassLoader: Warning: jcuda/runtime/cudaGLMapFlags.class in lib/jcuda-0.8.0.jar is hidden by lib/jcuda-0.6.0.jar (with different bytecode)
    JarClassLoader: Warning: jcuda/runtime/cudaGraphicsCubeFace.class in lib/jcuda-0.8.0.jar is hidden by lib/jcuda-0.6.0.jar (with different bytecode)
    JarClassLoader: Warning: jcuda/runtime/cudaGraphicsMapFlags.class in lib/jcuda-0.8.0.jar is hidden by lib/jcuda-0.6.0.jar (with different bytecode)
    JarClassLoader: Warning: jcuda/runtime/cudaGraphicsRegisterFlags.class in lib/jcuda-0.8.0.jar is hidden by lib/jcuda-0.6.0.jar (with different bytecode)
    JarClassLoader: Warning: jcuda/runtime/cudaGraphicsResource.class in lib/jcuda-0.8.0.jar is hidden by lib/jcuda-0.6.0.jar (with different bytecode)
    JarClassLoader: Warning: jcuda/runtime/cudaIpcEventHandle.class in lib/jcuda-0.8.0.jar is hidden by lib/jcuda-0.6.0.jar (with different bytecode)
    JarClassLoader: Warning: jcuda/runtime/cudaIpcMemHandle.class in lib/jcuda-0.8.0.jar is hidden by lib/jcuda-0.6.0.jar (with different bytecode)
    JarClassLoader: Warning: jcuda/runtime/cudaLimit.class in lib/jcuda-0.8.0.jar is hidden by lib/jcuda-0.6.0.jar (with different bytecode)
    JarClassLoader: Warning: jcuda/runtime/cudaMemcpy3DParms.class in lib/jcuda-0.8.0.jar is hidden by lib/jcuda-0.6.0.jar (with different bytecode)
    JarClassLoader: Warning: jcuda/runtime/cudaMemcpy3DPeerParms.class in lib/jcuda-0.8.0.jar is hidden by lib/jcuda-0.6.0.jar (with different bytecode)
    JarClassLoader: Warning: jcuda/runtime/cudaMemcpyKind.class in lib/jcuda-0.8.0.jar is hidden by lib/jcuda-0.6.0.jar (with different bytecode)
    JarClassLoader: Warning: jcuda/runtime/cudaMemoryType.class in lib/jcuda-0.8.0.jar is hidden by lib/jcuda-0.6.0.jar (with different bytecode)
    JarClassLoader: Warning: jcuda/runtime/cudaMipmappedArray.class in lib/jcuda-0.8.0.jar is hidden by lib/jcuda-0.6.0.jar (with different bytecode)
    JarClassLoader: Warning: jcuda/runtime/cudaOutputMode.class in lib/jcuda-0.8.0.jar is hidden by lib/jcuda-0.6.0.jar (with different bytecode)
    JarClassLoader: Warning: jcuda/runtime/cudaPitchedPtr.class in lib/jcuda-0.8.0.jar is hidden by lib/jcuda-0.6.0.jar (with different bytecode)
    JarClassLoader: Warning: jcuda/runtime/cudaPointerAttributes.class in lib/jcuda-0.8.0.jar is hidden by lib/jcuda-0.6.0.jar (with different bytecode)
    JarClassLoader: Warning: jcuda/runtime/cudaPos.class in lib/jcuda-0.8.0.jar is hidden by lib/jcuda-0.6.0.jar (with different bytecode)
    JarClassLoader: Warning: jcuda/runtime/cudaResourceDesc.class in lib/jcuda-0.8.0.jar is hidden by lib/jcuda-0.6.0.jar (with different bytecode)
    JarClassLoader: Warning: jcuda/runtime/cudaResourceType.class in lib/jcuda-0.8.0.jar is hidden by lib/jcuda-0.6.0.jar (with different bytecode)
    JarClassLoader: Warning: jcuda/runtime/cudaResourceViewDesc.class in lib/jcuda-0.8.0.jar is hidden by lib/jcuda-0.6.0.jar (with different bytecode)
    JarClassLoader: Warning: jcuda/runtime/cudaResourceViewFormat.class in lib/jcuda-0.8.0.jar is hidden by lib/jcuda-0.6.0.jar (with different bytecode)
    JarClassLoader: Warning: jcuda/runtime/cudaSharedMemConfig.class in lib/jcuda-0.8.0.jar is hidden by lib/jcuda-0.6.0.jar (with different bytecode)
    JarClassLoader: Warning: jcuda/runtime/cudaStreamCallback.class in lib/jcuda-0.8.0.jar is hidden by lib/jcuda-0.6.0.jar (with different bytecode)
    JarClassLoader: Warning: jcuda/runtime/cudaStream_t.class in lib/jcuda-0.8.0.jar is hidden by lib/jcuda-0.6.0.jar (with different bytecode)
    JarClassLoader: Warning: jcuda/runtime/cudaSurfaceBoundaryMode.class in lib/jcuda-0.8.0.jar is hidden by lib/jcuda-0.6.0.jar (with different bytecode)
    JarClassLoader: Warning: jcuda/runtime/cudaSurfaceFormatMode.class in lib/jcuda-0.8.0.jar is hidden by lib/jcuda-0.6.0.jar (with different bytecode)
    JarClassLoader: Warning: jcuda/runtime/cudaSurfaceObject.class in lib/jcuda-0.8.0.jar is hidden by lib/jcuda-0.6.0.jar (with different bytecode)
    JarClassLoader: Warning: jcuda/runtime/cudaTextureAddressMode.class in lib/jcuda-0.8.0.jar is hidden by lib/jcuda-0.6.0.jar (with different bytecode)
    JarClassLoader: Warning: jcuda/runtime/cudaTextureDesc.class in lib/jcuda-0.8.0.jar is hidden by lib/jcuda-0.6.0.jar (with different bytecode)
    JarClassLoader: Warning: jcuda/runtime/cudaTextureFilterMode.class in lib/jcuda-0.8.0.jar is hidden by lib/jcuda-0.6.0.jar (with different bytecode)
    JarClassLoader: Warning: jcuda/runtime/cudaTextureObject.class in lib/jcuda-0.8.0.jar is hidden by lib/jcuda-0.6.0.jar (with different bytecode)
    JarClassLoader: Warning: jcuda/runtime/cudaTextureReadMode.class in lib/jcuda-0.8.0.jar is hidden by lib/jcuda-0.6.0.jar (with different bytecode)
    JarClassLoader: Warning: jcuda/runtime/dim3.class in lib/jcuda-0.8.0.jar is hidden by lib/jcuda-0.6.0.jar (with different bytecode)
    JarClassLoader: Warning: jcuda/runtime/JCuda.class in lib/jcuda-0.8.0.jar is hidden by lib/jcuda-0.6.0.jar (with different bytecode)
    JarClassLoader: Warning: jcuda/runtime/surfaceReference.class in lib/jcuda-0.8.0.jar is hidden by lib/jcuda-0.6.0.jar (with different bytecode)
    JarClassLoader: Warning: jcuda/runtime/textureReference.class in lib/jcuda-0.8.0.jar is hidden by lib/jcuda-0.6.0.jar (with different bytecode)
    JarClassLoader: Warning: jcuda/Sizeof.class in lib/jcuda-0.8.0.jar is hidden by lib/jcuda-0.6.0.jar (with different bytecode)
    JarClassLoader: Warning: jcuda/jcufft/cufftCompatibility.class in lib/jcufft-0.8.0.jar is hidden by lib/jcufft-0.6.0.jar (with different bytecode)
    JarClassLoader: Warning: jcuda/jcufft/cufftHandle.class in lib/jcufft-0.8.0.jar is hidden by lib/jcufft-0.6.0.jar (with different bytecode)
    JarClassLoader: Warning: jcuda/jcufft/cufftResult.class in lib/jcufft-0.8.0.jar is hidden by lib/jcufft-0.6.0.jar (with different bytecode)
    JarClassLoader: Warning: jcuda/jcufft/cufftType.class in lib/jcufft-0.8.0.jar is hidden by lib/jcufft-0.6.0.jar (with different bytecode)
    JarClassLoader: Warning: jcuda/jcufft/JCufft.class in lib/jcufft-0.8.0.jar is hidden by lib/jcufft-0.6.0.jar (with different bytecode)
    JarClassLoader: Warning: jcuda/jcurand/curandDirectionVectorSet.class in lib/jcurand-0.8.0.jar is hidden by lib/jcurand-0.6.0.jar (with different bytecode)
    JarClassLoader: Warning: jcuda/jcurand/curandDiscreteDistribution.class in lib/jcurand-0.8.0.jar is hidden by lib/jcurand-0.6.0.jar (with different bytecode)
    JarClassLoader: Warning: jcuda/jcurand/curandGenerator.class in lib/jcurand-0.8.0.jar is hidden by lib/jcurand-0.6.0.jar (with different bytecode)
    JarClassLoader: Warning: jcuda/jcurand/curandOrdering.class in lib/jcurand-0.8.0.jar is hidden by lib/jcurand-0.6.0.jar (with different bytecode)
    JarClassLoader: Warning: jcuda/jcurand/curandRngType.class in lib/jcurand-0.8.0.jar is hidden by lib/jcurand-0.6.0.jar (with different bytecode)
    JarClassLoader: Warning: jcuda/jcurand/curandStatus.class in lib/jcurand-0.8.0.jar is hidden by lib/jcurand-0.6.0.jar (with different bytecode)
    JarClassLoader: Warning: jcuda/jcurand/JCurand.class in lib/jcurand-0.8.0.jar is hidden by lib/jcurand-0.6.0.jar (with different bytecode)
    JarClassLoader: Warning: jcuda/jcusparse/bsric02Info.class in lib/jcusparse-0.8.0.jar is hidden by lib/jcusparse-0.6.0.jar (with different bytecode)
    JarClassLoader: Warning: jcuda/jcusparse/bsrilu02Info.class in lib/jcusparse-0.8.0.jar is hidden by lib/jcusparse-0.6.0.jar (with different bytecode)
    JarClassLoader: Warning: jcuda/jcusparse/bsrsv2Info.class in lib/jcusparse-0.8.0.jar is hidden by lib/jcusparse-0.6.0.jar (with different bytecode)
    JarClassLoader: Warning: jcuda/jcusparse/csric02Info.class in lib/jcusparse-0.8.0.jar is hidden by lib/jcusparse-0.6.0.jar (with different bytecode)
    JarClassLoader: Warning: jcuda/jcusparse/csrilu02Info.class in lib/jcusparse-0.8.0.jar is hidden by lib/jcusparse-0.6.0.jar (with different bytecode)
    JarClassLoader: Warning: jcuda/jcusparse/csrsv2Info.class in lib/jcusparse-0.8.0.jar is hidden by lib/jcusparse-0.6.0.jar (with different bytecode)
    JarClassLoader: Warning: jcuda/jcusparse/cusparseAction.class in lib/jcusparse-0.8.0.jar is hidden by lib/jcusparse-0.6.0.jar (with different bytecode)
    JarClassLoader: Warning: jcuda/jcusparse/cusparseDiagType.class in lib/jcusparse-0.8.0.jar is hidden by lib/jcusparse-0.6.0.jar (with different bytecode)
    JarClassLoader: Warning: jcuda/jcusparse/cusparseDirection.class in lib/jcusparse-0.8.0.jar is hidden by lib/jcusparse-0.6.0.jar (with different bytecode)
    JarClassLoader: Warning: jcuda/jcusparse/cusparseFillMode.class in lib/jcusparse-0.8.0.jar is hidden by lib/jcusparse-0.6.0.jar (with different bytecode)
    JarClassLoader: Warning: jcuda/jcusparse/cusparseHandle.class in lib/jcusparse-0.8.0.jar is hidden by lib/jcusparse-0.6.0.jar (with different bytecode)
    JarClassLoader: Warning: jcuda/jcusparse/cusparseHybMat.class in lib/jcusparse-0.8.0.jar is hidden by lib/jcusparse-0.6.0.jar (with different bytecode)
    JarClassLoader: Warning: jcuda/jcusparse/cusparseHybPartition.class in lib/jcusparse-0.8.0.jar is hidden by lib/jcusparse-0.6.0.jar (with different bytecode)
    JarClassLoader: Warning: jcuda/jcusparse/cusparseIndexBase.class in lib/jcusparse-0.8.0.jar is hidden by lib/jcusparse-0.6.0.jar (with different bytecode)
    JarClassLoader: Warning: jcuda/jcusparse/cusparseMatDescr.class in lib/jcusparse-0.8.0.jar is hidden by lib/jcusparse-0.6.0.jar (with different bytecode)
    JarClassLoader: Warning: jcuda/jcusparse/cusparseMatrixType.class in lib/jcusparse-0.8.0.jar is hidden by lib/jcusparse-0.6.0.jar (with different bytecode)
    JarClassLoader: Warning: jcuda/jcusparse/cusparseOperation.class in lib/jcusparse-0.8.0.jar is hidden by lib/jcusparse-0.6.0.jar (with different bytecode)
    JarClassLoader: Warning: jcuda/jcusparse/cusparsePointerMode.class in lib/jcusparse-0.8.0.jar is hidden by lib/jcusparse-0.6.0.jar (with different bytecode)
    JarClassLoader: Warning: jcuda/jcusparse/cusparseSolveAnalysisInfo.class in lib/jcusparse-0.8.0.jar is hidden by lib/jcusparse-0.6.0.jar (with different bytecode)
    JarClassLoader: Warning: jcuda/jcusparse/cusparseSolvePolicy.class in lib/jcusparse-0.8.0.jar is hidden by lib/jcusparse-0.6.0.jar (with different bytecode)
    JarClassLoader: Warning: jcuda/jcusparse/cusparseStatus.class in lib/jcusparse-0.8.0.jar is hidden by lib/jcusparse-0.6.0.jar (with different bytecode)
    JarClassLoader: Warning: jcuda/jcusparse/JCusparse.class in lib/jcusparse-0.8.0.jar is hidden by lib/jcusparse-0.6.0.jar (with different bytecode)
    JarClassLoader: Warning: Null manifest from input stream associated with: lib/libJCublas-apple-x86_64.dylib
    JarClassLoader: Warning: Null manifest from input stream associated with: lib/libJCublas-linux-x86_64.so
    JarClassLoader: Warning: Null manifest from input stream associated with: lib/libJCublas2-apple-x86_64.dylib
    JarClassLoader: Warning: Null manifest from input stream associated with: lib/libJCublas2-linux-x86_64.so
    JarClassLoader: Warning: Null manifest from input stream associated with: lib/libJCudaDriver-apple-x86_64.dylib
    JarClassLoader: Warning: Null manifest from input stream associated with: lib/libJCudaDriver-linux-x86_64.so
    JarClassLoader: Warning: Null manifest from input stream associated with: lib/libJCudaRuntime-apple-x86_64.dylib
    JarClassLoader: Warning: Null manifest from input stream associated with: lib/libJCudaRuntime-linux-x86_64.so
    JarClassLoader: Warning: Null manifest from input stream associated with: lib/libJCufft-apple-x86_64.dylib
    JarClassLoader: Warning: Null manifest from input stream associated with: lib/libJCufft-linux-x86_64.so
    JarClassLoader: Warning: Null manifest from input stream associated with: lib/libJCurand-apple-x86_64.dylib
    JarClassLoader: Warning: Null manifest from input stream associated with: lib/libJCurand-linux-x86_64.so
    JarClassLoader: Warning: Null manifest from input stream associated with: lib/libJCusparse-apple-x86_64.dylib
    JarClassLoader: Warning: Null manifest from input stream associated with: lib/libJCusparse-linux-x86_64.so
    JarClassLoader: Warning: com/sun/pdfview/FullScreenWindow$PickMe.class in lib/PDFRenderer-0.9.1.jar is hidden by lib/pdf-renderer-1.0.5.jar (with different bytecode)
    JarClassLoader: Warning: com/sun/pdfview/FullScreenWindow.class in lib/PDFRenderer-0.9.1.jar is hidden by lib/pdf-renderer-1.0.5.jar (with different bytecode)
    JarClassLoader: Warning: com/sun/pdfview/PDFChangeStrokeCmd.class in lib/PDFRenderer-0.9.1.jar is hidden by lib/pdf-renderer-1.0.5.jar (with different bytecode)
    JarClassLoader: Warning: com/sun/pdfview/PDFDocCharsetEncoder.class in lib/PDFRenderer-0.9.1.jar is hidden by lib/pdf-renderer-1.0.5.jar (with different bytecode)
    JarClassLoader: Warning: com/sun/pdfview/PDFFile.class in lib/PDFRenderer-0.9.1.jar is hidden by lib/pdf-renderer-1.0.5.jar (with different bytecode)
    JarClassLoader: Warning: com/sun/pdfview/PDFFillAlphaCmd.class in lib/PDFRenderer-0.9.1.jar is hidden by lib/pdf-renderer-1.0.5.jar (with different bytecode)
    JarClassLoader: Warning: com/sun/pdfview/PDFFillPaintCmd.class in lib/PDFRenderer-0.9.1.jar is hidden by lib/pdf-renderer-1.0.5.jar (with different bytecode)
    JarClassLoader: Warning: com/sun/pdfview/PDFImage$DecodeComponentColorModel.class in lib/PDFRenderer-0.9.1.jar is hidden by lib/pdf-renderer-1.0.5.jar (with different bytecode)
    JarClassLoader: Warning: com/sun/pdfview/PDFImage.class in lib/PDFRenderer-0.9.1.jar is hidden by lib/pdf-renderer-1.0.5.jar (with different bytecode)
    JarClassLoader: Warning: com/sun/pdfview/PDFImageCmd.class in lib/PDFRenderer-0.9.1.jar is hidden by lib/pdf-renderer-1.0.5.jar (with different bytecode)
    JarClassLoader: Warning: com/sun/pdfview/PDFObject.class in lib/PDFRenderer-0.9.1.jar is hidden by lib/pdf-renderer-1.0.5.jar (with different bytecode)
    JarClassLoader: Warning: com/sun/pdfview/PDFPage.class in lib/PDFRenderer-0.9.1.jar is hidden by lib/pdf-renderer-1.0.5.jar (with different bytecode)
    JarClassLoader: Warning: com/sun/pdfview/PDFParser$ParserState.class in lib/PDFRenderer-0.9.1.jar is hidden by lib/pdf-renderer-1.0.5.jar (with different bytecode)
    JarClassLoader: Warning: com/sun/pdfview/PDFParser.class in lib/PDFRenderer-0.9.1.jar is hidden by lib/pdf-renderer-1.0.5.jar (with different bytecode)
    JarClassLoader: Warning: com/sun/pdfview/PDFPopCmd.class in lib/PDFRenderer-0.9.1.jar is hidden by lib/pdf-renderer-1.0.5.jar (with different bytecode)
    JarClassLoader: Warning: com/sun/pdfview/PDFPushCmd.class in lib/PDFRenderer-0.9.1.jar is hidden by lib/pdf-renderer-1.0.5.jar (with different bytecode)
    JarClassLoader: Warning: com/sun/pdfview/PDFRenderer$GraphicsState.class in lib/PDFRenderer-0.9.1.jar is hidden by lib/pdf-renderer-1.0.5.jar (with different bytecode)
    JarClassLoader: Warning: com/sun/pdfview/PDFRenderer.class in lib/PDFRenderer-0.9.1.jar is hidden by lib/pdf-renderer-1.0.5.jar (with different bytecode)
    JarClassLoader: Warning: com/sun/pdfview/PDFStrokeAlphaCmd.class in lib/PDFRenderer-0.9.1.jar is hidden by lib/pdf-renderer-1.0.5.jar (with different bytecode)
    JarClassLoader: Warning: com/sun/pdfview/PDFStrokePaintCmd.class in lib/PDFRenderer-0.9.1.jar is hidden by lib/pdf-renderer-1.0.5.jar (with different bytecode)
    JarClassLoader: Warning: com/sun/pdfview/PDFTextFormat.class in lib/PDFRenderer-0.9.1.jar is hidden by lib/pdf-renderer-1.0.5.jar (with different bytecode)
    JarClassLoader: Warning: com/sun/pdfview/PDFViewer$15.class in lib/PDFRenderer-0.9.1.jar is hidden by lib/pdf-renderer-1.0.5.jar (with different bytecode)
    JarClassLoader: Warning: com/sun/pdfview/PDFViewer.class in lib/PDFRenderer-0.9.1.jar is hidden by lib/pdf-renderer-1.0.5.jar (with different bytecode)
    JarClassLoader: Warning: com/sun/pdfview/PDFXformCmd.class in lib/PDFRenderer-0.9.1.jar is hidden by lib/pdf-renderer-1.0.5.jar (with different bytecode)
    JarClassLoader: Warning: com/sun/pdfview/PDFXref.class in lib/PDFRenderer-0.9.1.jar is hidden by lib/pdf-renderer-1.0.5.jar (with different bytecode)
    JarClassLoader: Warning: com/sun/pdfview/colorspace/CMYKColorSpace.class in lib/PDFRenderer-0.9.1.jar is hidden by lib/pdf-renderer-1.0.5.jar (with different bytecode)
    JarClassLoader: Warning: com/sun/pdfview/colorspace/PDFColorSpace.class in lib/PDFRenderer-0.9.1.jar is hidden by lib/pdf-renderer-1.0.5.jar (with different bytecode)
    JarClassLoader: Warning: com/sun/pdfview/decode/DCTDecode.class in lib/PDFRenderer-0.9.1.jar is hidden by lib/pdf-renderer-1.0.5.jar (with different bytecode)
    JarClassLoader: Warning: com/sun/pdfview/decode/MyTracker.class in lib/PDFRenderer-0.9.1.jar is hidden by lib/pdf-renderer-1.0.5.jar (with different bytecode)
    JarClassLoader: Warning: com/sun/pdfview/decode/PDFDecoder.class in lib/PDFRenderer-0.9.1.jar is hidden by lib/pdf-renderer-1.0.5.jar (with different bytecode)
    JarClassLoader: Warning: com/sun/pdfview/decode/Predictor.class in lib/PDFRenderer-0.9.1.jar is hidden by lib/pdf-renderer-1.0.5.jar (with different bytecode)
    JarClassLoader: Warning: com/sun/pdfview/decrypt/CryptFilterDecrypter.class in lib/PDFRenderer-0.9.1.jar is hidden by lib/pdf-renderer-1.0.5.jar (with different bytecode)
    JarClassLoader: Warning: com/sun/pdfview/decrypt/IdentityDecrypter.class in lib/PDFRenderer-0.9.1.jar is hidden by lib/pdf-renderer-1.0.5.jar (with different bytecode)
    JarClassLoader: Warning: com/sun/pdfview/decrypt/PDFDecrypter.class in lib/PDFRenderer-0.9.1.jar is hidden by lib/pdf-renderer-1.0.5.jar (with different bytecode)
    JarClassLoader: Warning: com/sun/pdfview/decrypt/StandardDecrypter.class in lib/PDFRenderer-0.9.1.jar is hidden by lib/pdf-renderer-1.0.5.jar (with different bytecode)
    JarClassLoader: Warning: com/sun/pdfview/font/BuiltinFont.class in lib/PDFRenderer-0.9.1.jar is hidden by lib/pdf-renderer-1.0.5.jar (with different bytecode)
    JarClassLoader: Warning: com/sun/pdfview/font/PDFFont.class in lib/PDFRenderer-0.9.1.jar is hidden by lib/pdf-renderer-1.0.5.jar (with different bytecode)
    JarClassLoader: Warning: com/sun/pdfview/font/PDFFontDescriptor.class in lib/PDFRenderer-0.9.1.jar is hidden by lib/pdf-renderer-1.0.5.jar (with different bytecode)
    JarClassLoader: Warning: com/sun/pdfview/font/TTFFont$PointRec.class in lib/PDFRenderer-0.9.1.jar is hidden by lib/pdf-renderer-1.0.5.jar (with different bytecode)
    JarClassLoader: Warning: com/sun/pdfview/font/TTFFont$RenderState.class in lib/PDFRenderer-0.9.1.jar is hidden by lib/pdf-renderer-1.0.5.jar (with different bytecode)
    JarClassLoader: Warning: com/sun/pdfview/font/TTFFont.class in lib/PDFRenderer-0.9.1.jar is hidden by lib/pdf-renderer-1.0.5.jar (with different bytecode)
    JarClassLoader: Warning: com/sun/pdfview/font/Type1CFont.class in lib/PDFRenderer-0.9.1.jar is hidden by lib/pdf-renderer-1.0.5.jar (with different bytecode)
    JarClassLoader: Warning: com/sun/pdfview/font/Type1Font$PSParser.class in lib/PDFRenderer-0.9.1.jar is hidden by lib/pdf-renderer-1.0.5.jar (with different bytecode)
    JarClassLoader: Warning: com/sun/pdfview/font/Type1Font.class in lib/PDFRenderer-0.9.1.jar is hidden by lib/pdf-renderer-1.0.5.jar (with different bytecode)
    JarClassLoader: Warning: com/sun/pdfview/font/Type3Font.class in lib/PDFRenderer-0.9.1.jar is hidden by lib/pdf-renderer-1.0.5.jar (with different bytecode)
    JarClassLoader: Warning: com/sun/pdfview/font/ttf/NameTable$NameRecord.class in lib/PDFRenderer-0.9.1.jar is hidden by lib/pdf-renderer-1.0.5.jar (with different bytecode)
    JarClassLoader: Warning: com/sun/pdfview/font/ttf/NameTable.class in lib/PDFRenderer-0.9.1.jar is hidden by lib/pdf-renderer-1.0.5.jar (with different bytecode)
    JarClassLoader: Warning: com/sun/pdfview/font/ttf/TrueTypeFont.class in lib/PDFRenderer-0.9.1.jar is hidden by lib/pdf-renderer-1.0.5.jar (with different bytecode)
    JarClassLoader: Warning: com/sun/pdfview/function/FunctionType3.class in lib/PDFRenderer-0.9.1.jar is hidden by lib/pdf-renderer-1.0.5.jar (with different bytecode)
    JarClassLoader: Warning: com/sun/pdfview/pattern/PDFShader.class in lib/PDFRenderer-0.9.1.jar is hidden by lib/pdf-renderer-1.0.5.jar (with different bytecode)
    JarClassLoader: Warning: com/sun/pdfview/pattern/PatternType1$1.class in lib/PDFRenderer-0.9.1.jar is hidden by lib/pdf-renderer-1.0.5.jar (with different bytecode)
    JarClassLoader: Warning: com/sun/pdfview/pattern/PatternType1$TilingPatternPaint.class in lib/PDFRenderer-0.9.1.jar is hidden by lib/pdf-renderer-1.0.5.jar (with different bytecode)
    JarClassLoader: Warning: com/sun/pdfview/pattern/PatternType1$Type1PaintContext.class in lib/PDFRenderer-0.9.1.jar is hidden by lib/pdf-renderer-1.0.5.jar (with different bytecode)
    JarClassLoader: Warning: com/sun/pdfview/pattern/PatternType1.class in lib/PDFRenderer-0.9.1.jar is hidden by lib/pdf-renderer-1.0.5.jar (with different bytecode)
    JarClassLoader: Warning: test/TTFTest.class in lib/PDFRenderer-0.9.1.jar is hidden by lib/pdf-renderer-1.0.5.jar (with different bytecode)
    JarClassLoader: Warning: test/TestType1CFont.class in lib/PDFRenderer-0.9.1.jar is hidden by lib/pdf-renderer-1.0.5.jar (with different bytecode)
    
    opened by MilosStanic 2
  • InitializeFont gives NullPointerException

    InitializeFont gives NullPointerException

    I managed to train a language model, but get a NPE when trying to initialize the font model (step 2). I use the self-contained Jar (0.3-SNAPSHOT, downloaded 2 May 2016).

    Error stacktrace is attached. npe.txt

    opened by jbingel 2
  • Download Page for the self-contained jar is not found

    Download Page for the self-contained jar is not found

    This page is not found: http://www.cs.utexas.edu/~dhg/maven-repository/snapshots/edu/berkeley/cs/nlp/ocular/0.3-SNAPSHOT/ocular-0.3-SNAPSHOT-with_dependencies.jar . Thanks

    opened by pakdanan 1
  • Can't handle JBIG2 pdf images

    Can't handle JBIG2 pdf images

    When I run the training step on some .pdfs of historical Indian census files, I get the following error:

    Extracting text line images from ../data/district_reports/raw_pdfs/1981/27582_1981_MAI.pdf, page 3
    Error reading image
    com.sun.pdfview.PDFParseException: Unknown coding method:JBIG2Decode
    

    and then

    java.lang.NullPointerException
    	at com.sun.pdfview.font.TTFFont.getOutline(TTFFont.java:170)
    	at com.sun.pdfview.font.CIDFontType2.getOutline(CIDFontType2.java:270)
    	at com.sun.pdfview.font.OutlineFont.getGlyph(OutlineFont.java:130)
    	at com.sun.pdfview.font.PDFFont.getCachedGlyph(PDFFont.java:308)
    	at com.sun.pdfview.font.PDFFontEncoding.getGlyphFromCMap(PDFFontEncoding.java:155)
    	at com.sun.pdfview.font.PDFFontEncoding.getGlyphs(PDFFontEncoding.java:115)
    	at com.sun.pdfview.font.PDFFont.getGlyphs(PDFFont.java:274)
    	at com.sun.pdfview.PDFTextFormat.doText(PDFTextFormat.java:269)
    	at com.sun.pdfview.PDFParser.iterate(PDFParser.java:752)
    	at com.sun.pdfview.BaseWatchable.run(BaseWatchable.java:101)
    	at java.base/java.lang.Thread.run(Thread.java:834)
    

    I think what's going on here is that the .pdf contains .jbig2 images, but the program doesn't know how to read these.

    opened by bholtdwyer 0
  • Nasty warnings on make

    Nasty warnings on make

    Hi! When I tried to make the jar, it gave me some warnings about "illegal" operations that "will be denied in a future release". Just FYI.

    git clone https://github.com/tberg12/ocular.git
    cd ocular
    ./make_jar.sh
    Cloning into 'ocular'...
    remote: Enumerating objects: 7062, done.
    remote: Total 7062 (delta 0), reused 0 (delta 0), pack-reused 7062
    Receiving objects: 100% (7062/7062), 239.38 MiB | 3.95 MiB/s, done.
    Resolving deltas: 100% (3000/3000), done.
    cp: lib/JCuda-All-0.6.0-bin-linux-x86_64/*: No such file or directory
    cp: lib/JCuda-All-0.6.0-bin-apple-x86_64/*: No such file or directory
    OpenJDK 64-Bit Server VM warning: Ignoring option MaxPermSize; support was removed in 8.0
    Getting org.scala-sbt sbt 0.13.8 ...
    WARNING: An illegal reflective access operation has occurred
    WARNING: Illegal reflective access by org.apache.ivy.util.url.IvyAuthenticator (file:/Users/holtdwyer/Dropbox/india_dams/code/ocular/sbt-launch-0.13.8.jar) to field java.net.Authenticator.theAuthenticator
    WARNING: Please consider reporting this to the maintainers of org.apache.ivy.util.url.IvyAuthenticator
    WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
    WARNING: All illegal access operations will be denied in a future release
    

    I then get a fatal error:

    [error] scala.reflect.internal.MissingRequirementError: object java.lang.Object in compiler mirror not found.
    [error] Use 'last' for the full log.
    Project loading failed: (r)etry, (q)uit, (l)ast, or (i)gnore? 
    error: error while loading package, Missing dependency 'object java.lang.Object in compiler mirror', required by /Users/holtdwyer/.sbt/boot/scala-2.10.4/lib/scala-library.jar(scala/package.class)
    error: error while loading package, Missing dependency 'object java.lang.Object in compiler mirror', required by /Users/holtdwyer/.sbt/boot/scala-2.10.4/lib/scala-library.jar(scala/runtime/package.class)
    scala.reflect.internal.MissingRequirementError: object java.lang.Object in compiler mirror not found.
    	at scala.reflect.internal.MissingRequirementError$.signal(MissingRequirementError.scala:16)
    	at scala.reflect.internal.MissingRequirementError$.notFound(MissingRequirementError.scala:17)
    	at scala.reflect.internal.Mirrors$RootsBase.getModuleOrClass(Mirrors.scala:48)
    	at scala.reflect.internal.Mirrors$RootsBase.getModuleOrClass(Mirrors.scala:40)
    
    opened by bholtdwyer 0
  • Empty Array Error

    Empty Array Error

    Hello, I am attempting to initialize a font, and while running, I get the following: Ignoring empty character rendering: Cursor, D Exception in thread "main" java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at com.simontuffs.onejar.Boot.run(Boot.java:340) at com.simontuffs.onejar.Boot.main(Boot.java:166) Caused by: java.lang.ArrayIndexOutOfBoundsException: 0 at sun.font.CompositeStrike.getStrikeForSlot(CompositeStrike.java:79) at sun.font.CompositeStrike.getFontMetrics(CompositeStrike.java:97) at sun.font.FontDesignMetrics.initMatrixAndMetrics(FontDesignMetrics.java:359) at sun.font.FontDesignMetrics.(FontDesignMetrics.java:350) at sun.font.FontDesignMetrics.getMetrics(FontDesignMetrics.java:302) at sun.java2d.SunGraphics2D.getFontMetrics(SunGraphics2D.java:863) at edu.berkeley.cs.nlp.ocular.image.FontRenderer.renderString(FontRenderer.java:377) at edu.berkeley.cs.nlp.ocular.image.FontRenderer.getRenderedFont(FontRenderer.java:354) at edu.berkeley.cs.nlp.ocular.main.InitializeFont.run(InitializeFont.java:75) at edu.berkeley.cs.nlp.ocular.main.OcularRunnable.doMain(OcularRunnable.java:25) at edu.berkeley.cs.nlp.ocular.main.InitializeFont.main(InitializeFont.java:61) ... 6 more

    What array is empty, and how can I overcome this, please? My text sample has letters, numbers, and punctuation, so I'm not sure what is missing.

    Thank you, David

    opened by bleckley 0
  • Error: Could not find or load main class .main.class=edu.berkeley.cs.nlp.ocular.main.InitializeLanguageModel

    Error: Could not find or load main class .main.class=edu.berkeley.cs.nlp.ocular.main.InitializeLanguageModel

    Hello, I am trying to train an OCR model for some WW2 documents. I have no previous experience with Java. When I try to set up and run the program I encounter this error when initializing a language model for English. Does anyone know what is happening?

    PS F:\ocr\ocular>   java -Done-jar.main.class=edu.berkeley.cs.nlp.ocular.main.InitializeLanguageModel -mx7g -jar ocular-0.3-SNAPSHOT-with_dependencies.jar \
    >>     -inputTextPath mytext.txt \
    >>     -outputLmPath lm/english.lmser
    >>
    Error: Could not find or load main class .main.class=edu.berkeley.cs.nlp.ocular.main.InitializeLanguageModel
    -inputTextPath : The term '-inputTextPath' is not recognized as the name of a cmdlet, function, script file, or
    operable program. Check the spelling of the name, or if a path was included, verify that the path is correct and try
    again.
    At line:2 char:5
    +     -inputTextPath mytext.txt \
    +     ~~~~~~~~~~~~~~
        + CategoryInfo          : ObjectNotFound: (-inputTextPath:String) [], CommandNotFoundException
        + FullyQualifiedErrorId : CommandNotFoundException
    
    -outputLmPath : The term '-outputLmPath' is not recognized as the name of a cmdlet, function, script file, or operable
    program. Check the spelling of the name, or if a path was included, verify that the path is correct and try again.
    At line:3 char:5
    +     -outputLmPath lm/english.lmser
    +     ~~~~~~~~~~~~~
        + CategoryInfo          : ObjectNotFound: (-outputLmPath:String) [], CommandNotFoundException
        + FullyQualifiedErrorId : CommandNotFoundException
    
    opened by BlueSkyLT 0
  • Using the option -allowedFontsPath does not work as expected, please assist

    Using the option -allowedFontsPath does not work as expected, please assist

    Hi there, I am experimenting with Ocular to train an OCR model for 18th century dutch print. The option to initialize a font using only few of the installed fonts on my computer is interesting and I would like to try it. But in my case including the option: -allowFontsPath font_refs/dutch18thCE_fonts.txt does not seem to be doing anything. I attached the dutch18thCE_fonts.txt file here. Is it wrong? Thanks for any feedback on this Best, Marco (the Netherlands)

    dutch18thCE_fonts.txt

    opened by DutchPirate1966 9
  • NullPointerException constantly after 10th image

    NullPointerException constantly after 10th image

    Hi,

    I'm not sure if I'm allowed to post an issue since the last issue was so long ago. Maybe ocular is not in development anymore. Anyway, here are some details: WSL1 Ubuntu 18, tested with OpenJDK 8/9/11. Thanks in advance.

    nullpointer

    opened by filippesic 0
Owner
null
It is a image ocr tool using the Tesseract-OCR engine with the pytesseract package and has a GUI.

OCR-Tool It is a image ocr tool made in Python using the Tesseract-OCR engine with the pytesseract package and has a GUI. This is my second ever pytho

Khant Htet Aung 4 Jul 11, 2022
Indonesian ID Card OCR using tesseract OCR

KTP OCR Indonesian ID Card OCR using tesseract OCR KTP OCR is python-flask with tesseract web application to convert Indonesian ID Card to text / JSON

Revan Muhammad Dafa 5 Dec 6, 2021
Generic framework for historical document processing

dhSegment dhSegment is a tool for Historical Document Processing. Its generic approach allows to segment regions and extract content from different ty

Digital Humanities Laboratory 343 Dec 24, 2022
Repository collecting all the submodules for the new PyTorch-based OCR System.

OCRopus3 is being replaced by OCRopus4, which is a rewrite using PyTorch 1.7; release should be soonish. Please check github.com/tmbdev/ocropus for up

NVIDIA Research Projects 138 Dec 9, 2022
OCR system for Arabic language that converts images of typed text to machine-encoded text.

Arabic OCR OCR system for Arabic language that converts images of typed text to machine-encoded text. The system currently supports only letters (29 l

Hussein Youssef 144 Jan 5, 2023
Ready-to-use OCR with 80+ supported languages and all popular writing scripts including Latin, Chinese, Arabic, Devanagari, Cyrillic and etc.

EasyOCR Ready-to-use OCR with 80+ languages supported including Chinese, Japanese, Korean and Thai. What's new 1 February 2021 - Version 1.2.3 Add set

Jaided AI 16.7k Jan 3, 2023
A Python wrapper for the tesseract-ocr API

tesserocr A simple, Pillow-friendly, wrapper around the tesseract-ocr API for Optical Character Recognition (OCR). tesserocr integrates directly with

Fayez 1.7k Dec 31, 2022
FastOCR is a desktop application for OCR API.

FastOCR FastOCR is a desktop application for OCR API. Installation Arch Linux fastocr-git @ AUR Build from AUR or install with your favorite AUR helpe

Bruce Zhang 58 Jan 7, 2023
OCR-D-compliant page segmentation

ocrd_segment This repository aims to provide a number of OCR-D-compliant processors for layout analysis and evaluation. Installation In your virtual e

OCR-D 59 Sep 10, 2022
OCR software for recognition of handwritten text

Handwriting OCR The project tries to create software for recognition of a handwritten text from photos (also for Czech language). It uses computer vis

Břetislav Hájek 562 Jan 3, 2023
Turn images of tables into CSV data. Detect tables from images and run OCR on the cells.

Table of Contents Overview Requirements Demo Modules Overview This python package contains modules to help with finding and extracting tabular data fr

Eric Ihli 311 Dec 24, 2022
Code for the paper STN-OCR: A single Neural Network for Text Detection and Text Recognition

STN-OCR: A single Neural Network for Text Detection and Text Recognition This repository contains the code for the paper: STN-OCR: A single Neural Net

Christian Bartz 496 Jan 5, 2023
A pure pytorch implemented ocr project including text detection and recognition

ocr.pytorch A pure pytorch implemented ocr project. Text detection is based CTPN and text recognition is based CRNN. More detection and recognition me

coura 444 Dec 30, 2022
python ocr using tesseract/ with EAST opencv detector

pytextractor python ocr using tesseract/ with EAST opencv text detector Uses the EAST opencv detector defined here with pytesseract to extract text(de

Danny Crasto 38 Dec 5, 2022
Run tesseract with the tesserocr bindings with @OCR-D's interfaces

ocrd_tesserocr Crop, deskew, segment into regions / tables / lines / words, or recognize with tesserocr Introduction This package offers OCR-D complia

OCR-D 38 Oct 14, 2022
A set of workflows for corpus building through OCR, post-correction and normalisation

PICCL: Philosophical Integrator of Computational and Corpus Libraries PICCL offers a workflow for corpus building and builds on a variety of tools. Th

Language Machines 41 Dec 27, 2022
Tensorflow-based CNN+LSTM trained with CTC-loss for OCR

Overview This collection demonstrates how to construct and train a deep, bidirectional stacked LSTM using CNN features as input with CTC loss to perfo

Jerod Weinman 489 Dec 21, 2022
🖺 OCR using tensorflow with attention

tensorflow-ocr ?? OCR using tensorflow with attention, batteries included Installation git clone --recursive http://github.com/pannous/tensorflow-ocr

null 646 Nov 11, 2022
This is the implementation of the paper "Gated Recurrent Convolution Neural Network for OCR"

Gated Recurrent Convolution Neural Network for OCR This project is an implementation of the GRCNN for OCR. For details, please refer to the paper: htt

null 90 Dec 22, 2022