Convert scans of handwritten notes to beautiful, compact PDFs

Overview

noteshrink

Convert scans of handwritten notes to beautiful, compact PDFs -- see full writeup at https://mzucker.github.io/2016/09/20/noteshrink.html

Requirements

  • Python 2 or 3
  • NumPy 1.10 or later
  • SciPy
  • ImageMagick
  • Image module from PIL or Pillow

Usage

./noteshrink.py IMAGE1 [IMAGE2 ...]

Building the examples (already in example_output):

make

Packages

Packages are available for:

Derived works

Note: Projects listed here aren't necessarily tested or endorsed by me -- use with care!

Comments
  • SyntaxError: invalid syntax

    SyntaxError: invalid syntax

    I downloaded the release 0.1 installed the requirements via pip and run make.

    make
    mkdir -p example_output && \
    cd example_output && \
    ../noteshrink.py -O -g -w -s 20 -v 30 -o notesA.pdf ../examples/notesA*.jpg
      File "../noteshrink.py", line 159
        print '  running "{}"...'.format(cmd),
                                ^
    SyntaxError: invalid syntax
    makefile:9: die Regel für Ziel „example_output/notesA.pdf“ scheiterte
    make: *** [example_output/notesA.pdf] Fehler 1
    
    opened by juh2 8
  • Plagiarism

    Plagiarism

    I wanted just to warn you that I have seen your code in another repo. The noteshrink.py looks the same, except that we can find Created by Georgy Perevozchikov (gosha20777) 2018. on line 7.

    opened by kpym 5
  • Warning: Error opening image

    Warning: Error opening image

    warning: error opening C:\Users\Kartikey warning: error opening Kushwah\Downloads\Compressed\noteshrink-master\image.jpg running PDF command "convert output.pdf"... Invalid drive specification. warning: PDF command failed

    opened by 1stdevfriend 3
  • Added Python 3 Compatibility

    Added Python 3 Compatibility

    Python 3 compatibility is implemented by removing has_key and using __future__ for print function support. Also added ImageMagick as a requirement in README.md (since the convert command is part of ImageMagick).

    opened by tcyrus 3
  • Error with noteshrink :   TypeError: unique() got an unexpected keyword argument 'return_counts'

    Error with noteshrink : TypeError: unique() got an unexpected keyword argument 'return_counts'

    Hello,

    I wanted to try noteshrink. After installation, I tried with jpg files in examples folder. For all files, the same error : TypeError: unique() got an unexpected keyword argument 'return_counts'

    I don't know how to do ?

    Thanks.


    steph@Pergolesi $ noteshrink notesA1.jpg

    opened notesA1.jpg getting palette... Traceback (most recent call last): File "/usr/local/bin/noteshrink", line 9, in load_entry_point('noteshrink==0.1.0', 'console_scripts', 'noteshrink')() File "build/bdist.linux-x86_64/egg/noteshrink.py", line 582, in main File "build/bdist.linux-x86_64/egg/noteshrink.py", line 558, in notescan_main File "build/bdist.linux-x86_64/egg/noteshrink.py", line 381, in get_palette File "build/bdist.linux-x86_64/egg/noteshrink.py", line 106, in get_bg_color TypeError: unique() got an unexpected keyword argument 'return_counts'


    steph@Pergolesi $ python -V Python 2.7.6

    steph@Pergolesi $ cat /proc/version Linux version 3.19.0-32-generic (buildd@lgw01-43) (gcc version 4.8.2 (Ubuntu 4.8.2-19ubuntu1) ) #37~14.04.1-Ubuntu SMP Thu Oct 22 09:41:40 UTC 2015

    steph@Pergolesi $ pip list apt-xapian-index (0.45) apturl (0.4.1ubuntu4) argparse (1.2.1) BeautifulSoup (3.2.1) beautifulsoup4 (4.4.1) chardet (2.0.1) colorama (0.2.5) command-not-found (0.3) configglue (1.1.2) configobj (4.7.2) configparser (3.3.0r2) cssselect (0.9.1) cssutils (0.9.10) debtagshw (0.1) decorator (3.4.0) defer (1.0.6) deluge (1.3.6) dirspec (13.10) dnspython (1.11.1) duplicity (0.6.23) ecdsa (0.13) Electrum (2.0.2) eventlet (0.13.0) feedparser (5.1.3) googleplaydownloader (1.7) greenlet (0.4.2) html5lib (0.999) httplib2 (0.8) ipython (1.2.1) Jinja2 (2.7.2) kaa-base (0.99.1) kaa-metadata (0.7.8) lockfile (0.8) lxml (3.3.3) Mako (0.9.1) MarkupSafe (0.18) matplotlib (1.3.1) mechanize (0.2.5) Mirage (0.9.5.1) mysql (0.0.1) mysql-connector-python (2.0.4) MySQL-python (1.2.3) ndg-httpsclient (0.3.2) nemo-emblems (0.0.1) netifaces (0.8) nose (1.3.1) noteshrink (0.1.0) numpy (1.8.2) oauthlib (0.6.1) oneconf (0.3.7.14.04.1) PAM (0.4.2) pandas (0.13.1) paramiko (1.10.1) pbkdf2 (1.3) pdfshuffler (0.6.0) pexpect (3.1) Pillow (2.6.1) pip (1.5.4) piston-mini-client (0.7.5) protobuf (2.5.0) pyasn1 (0.1.7) pyasn1-modules (0.0.5) pychm (0.8.4) pycrypto (2.6.1) pycups (1.9.66) pycurl (7.19.3) pydns (2.3.6) pygobject (3.12.0) pyinotify (0.9.4) pyOpenSSL (0.13) pyparsing (2.0.1) pyPdf (1.13) pyserial (2.6) pysmbc (1.0.14.1) pysqlite (1.0.1) pysrt (1.0.1) python-apt (0.9.3.5ubuntu2) python-dateutil (1.5) python-debian (0.1.21-nmu2ubuntu2) python-epson-printer (1.3) python-escpos (1.0.1) python-libtorrent (0.16.13) pytz (2012c) pyusb (1.0.0b1) pyxdg (0.25) pyzmq (14.0.1) qrcode (5.1) reportlab (3.0) requests (2.2.1) requests-oauthlib (0.6.1) scipy (0.13.3) sessioninstaller (0.0.0) setuptools (3.3) simplegeneric (0.8.1) six (1.5.2) slowaes (0.1a1) sympy (0.7.4.1) system-service (0.1.6) tlslite (0.4.8) tornado (3.1.1) tweepy (3.5.0) Twisted-Core (13.2.0) Twisted-Names (13.2.0) Twisted-Web (13.2.0) urllib3 (1.7.1) uTidylib (0.2) vboxapi (1.0) wsgiref (0.1.2) wxPython (2.8.12.1) wxPython-common (2.8.12.1) zope.interface (4.0.5)

    opened by stephanova 3
  • Value error on reshaping image

    Value error on reshaping image

    Python 2.7.12 (default, Jul 1 2016, 15:12:24)

    Traceback (most recent call last):
      File "./noteshrink.py", line 585, in <module>
        main()
      File "./noteshrink.py", line 582, in main
        notescan_main(options=get_argument_parser().parse_args())
      File "./noteshrink.py", line 557, in notescan_main
        samples = sample_pixels(img, options)
      File "./noteshrink.py", line 340, in sample_pixels
        pixels = img.reshape((-1, 3))
    ValueError: total size of new array must be unchanged
    

    test1

    i have installed all requirements except notescanm which isn't pip installable.

    opened by rachmadaniHaryono 3
  • Multi-version Support?

    Multi-version Support?

    Hi Matt, awesome work and a very enjoyable write-up, thank you.

    I've looked over the code briefly, and it's very neatly structured and compartmentalised. None of the dependencies look like no-gos for Python3, and the file-handling seems to be compartmentalised by PIL/Pillow.

    So, I'm wondering why this is Python2 only? If it's merely personal preference and all it would take are some modernisations from __future__ and some print function calls (or six, if it came to that), would you accept pull requests to make this work on both?

    opened by cathalgarvey 3
  • Unit testing

    Unit testing

    I recently found noteshrink and I'm fond of its utility! Thanks for giving it to the world!

    In order to confidently make noteshrink compatible with Python 3 (per issue #4), introduction of unit tests would be beneficial. This will provide a baseline to compare behaviors as well as help with alteration of existing features/introduction of new features over time.

    I would like to help with this, but I want to feel this out before I get the horse too far ahead of the cart! The code structure seems amenable to unit testing for the most part.

    opened by daneah 2
  • Publish it on PyPI

    Publish it on PyPI

    This way it would be easier for users to install your program (i. e. pip install noteshrink). I've added the setup.py (in this PR), so you'd just have to run python setup.py register (it would ask to authorize on PyPI the first time you run it) and python setup.py publish.

    opened by notpushkin 2
  • Licensing information

    Licensing information

    opened by jakub-olczyk 2
  • PDF converter requirement

    PDF converter requirement

    By default it tries to convert the output image to PDF, using a command-line program called "convert". This doesn't seem to be one of the listed dependencies - can someone explain what it is and how to install it?

    BTW on macOS I can get it working with sips using this parameter: -c 'sips %i -s format pdf --out %o'

    opened by davidread 1
  • Doc photos taken by cameras

    Doc photos taken by cameras

    Hello,

    First of all, gotta say that this work is insanely good. I've been trying to apply for pictures taken by cameras instead but as this work is mainly aimed at scanned documents, I've been having fairly noisy results(images below). I've wondered if you have any tips to handle documents that have fairly varying light distribution(not as uniform as the light from a scanner) would it possible to get a clean white background on the image?

    Thank you in advance

    original Saved_file copy 3

    with noteshrink page0000

    opened by zqngetsu96 4
  • ValueError: a must be greater than 0 unless no samples are taken

    ValueError: a must be greater than 0 unless no samples are taken

    I have got this error:

      File "E:\projects\noteshrink\noteshrink.py", line 590, in <module>
        main()
      File "E:\projects\noteshrink\noteshrink.py", line 586, in main
        notescan_main(options=get_argument_parser().parse_args())
      File "E:\projects\noteshrink\noteshrink.py", line 561, in notescan_main
        palette = get_palette(samples, options)
      File "E:\projects\noteshrink\noteshrink.py", line 387, in get_palette
        centers, _ = kmeans(samples[fg_mask].astype(np.float32),
      File "C:\Users\bebag\AppData\Local\Programs\Python\Python39\lib\site-packages\scipy\cluster\vq.py", line 454, in kmeans
        guess = _kpoints(obs, k)
      File "C:\Users\bebag\AppData\Local\Programs\Python\Python39\lib\site-packages\scipy\cluster\vq.py", line 480, in _kpoints
        idx = np.random.choice(data.shape[0], size=k, replace=False)
      File "mtrand.pyx", line 902, in numpy.random.mtrand.RandomState.choice
    ValueError: a must be greater than 0 unless no samples are taken
    opened by VaskMykola 1
  • ValueError: a must be greater than 0 unless no samples are taken

    ValueError: a must be greater than 0 unless no samples are taken

    Ive got this error:Traceback (most recent call last): File "E:\projects\noteshrink\noteshrink.py", line 590, in main() File "E:\projects\noteshrink\noteshrink.py", line 586, in main notescan_main(options=get_argument_parser().parse_args()) File "E:\projects\noteshrink\noteshrink.py", line 561, in notescan_main palette = get_palette(samples, options) File "E:\projects\noteshrink\noteshrink.py", line 387, in get_palette centers, _ = kmeans(samples[fg_mask].astype(np.float32), File "C:\Users\bebag\AppData\Local\Programs\Python\Python39\lib\site-packages\scipy\cluster\vq.py", line 454, in kmeans guess = _kpoints(obs, k) File "C:\Users\bebag\AppData\Local\Programs\Python\Python39\lib\site-packages\scipy\cluster\vq.py", line 480, in _kpoints idx = np.random.choice(data.shape[0], size=k, replace=False) File "mtrand.pyx", line 902, in numpy.random.mtrand.RandomState.choice ValueError: a must be greater than 0 unless no samples are taken`

    opened by VaskMykola 0
  • Add C# and VB.Net project to derived works in your readme

    Add C# and VB.Net project to derived works in your readme

    Hello,

    FDNCRED and i have rewritten your project in C# and VB.net with the AccordNet Framework. Please add a link to your readme under derived works. https://github.com/Phreak87/NoteShrink

    opened by Phreak87 0
  • Convert the result to vector graphics

    Convert the result to vector graphics

    Hi,

    depending on the input, the output may be shrunk even more when converted to vector graphics, as you also noticed in https://mzucker.github.io/2018/05/14/maptrace.html :)

    Here is a quick and dirty example: ./noteshrink.py -e ".pnm" -P "convert %i %o" -c "potrace -b pdf -o %o %i" examples/tree.jpg

    Maybe the example can be added to the README?

    Unfortunately potrace only supports binary images.

    Ciao, Antonio

    opened by ao2 1
This is used to convert a string to an Image with Handwritten Characters.

Text-to-Handwriting-using-python This is used to convert a string to an Image with Handwritten Characters. text_to_handwriting(string: str, save_to: s

Akashdeep Mahata 3 Aug 15, 2022
This can be use to convert text in a file to handwritten text.

TextToHandwriting This can be used to convert text to handwriting. Clone this project or download the code. Run TextToImage.py give the filename of th

Ashutosh Mahapatra 2 Feb 6, 2022
Extract tables from scanned image PDFs using Optical Character Recognition.

ocr-table This project aims to extract tables from scanned image PDFs using Optical Character Recognition. Install Requirements Tesseract OCR sudo apt

Abhijeet Singh 209 Dec 6, 2022
Python library to extract tabular data from images and scanned PDFs

Overview ExtractTable - API to extract tabular data from images and scanned PDFs The motivation is to make it easy for developers to extract tabular d

Org. Account 165 Dec 31, 2022
Detect text blocks and OCR poorly scanned PDFs in bulk. Python module available via pip.

doc2text doc2text extracts higher quality text by fixing common scan errors Developing text corpora can be a massive pain in the butt. Much of the tex

Joe Sutherland 1.3k Jan 4, 2023
Detect handwritten words in a text-line (classic image processing method).

Word segmentation Implementation of scale space technique for word segmentation as proposed by R. Manmatha and N. Srimal. Even though the paper is fro

Harald Scheidl 190 Jan 3, 2023
Handwritten Text Recognition (HTR) using TensorFlow 2.x

Handwritten Text Recognition (HTR) system implemented using TensorFlow 2.x and trained on the Bentham/IAM/Rimes/Saint Gall/Washington offline HTR data

Arthur Flôr 160 Dec 21, 2022
Handwritten Number Recognition using CNN and Character Segmentation

Handwritten-Number-Recognition-With-Image-Segmentation Info About this repository This Repository is aimed at reading handwritten images of numbers an

Sparsha Saha 17 Aug 25, 2022
Handwritten Text Recognition (HTR) system implemented with TensorFlow.

Handwritten Text Recognition with TensorFlow Update 2021: more robust model, faster dataloader, word beam search decoder also available for Windows Up

Harald Scheidl 1.5k Jan 7, 2023
OCR software for recognition of handwritten text

Handwriting OCR The project tries to create software for recognition of a handwritten text from photos (also for Czech language). It uses computer vis

Břetislav Hájek 562 Jan 3, 2023
Use Convolutional Recurrent Neural Network to recognize the Handwritten line text image without pre segmentation into words or characters. Use CTC loss Function to train.

Handwritten Line Text Recognition using Deep Learning with Tensorflow Description Use Convolutional Recurrent Neural Network to recognize the Handwrit

sushant097 224 Jan 7, 2023
Handwritten Text Recognition (HTR) system implemented with TensorFlow (TF) and trained on the IAM off-line HTR dataset. This Neural Network (NN) model recognizes the text contained in the images of segmented words.

Handwritten-Text-Recognition Handwritten Text Recognition (HTR) system implemented with TensorFlow (TF) and trained on the IAM off-line HTR dataset. T

null 27 Jan 8, 2023
Apply different text recognition services to images of handwritten documents.

Handprint The Handwritten Page Recognition Test is a command-line program that invokes HTR (handwritten text recognition) services on images of docume

Caltech Library 117 Jan 2, 2023
This repo contains a script that allows us to find range of colors in images using openCV, and then convert them into geo vectors.

Vectorizing color range This repo contains a script that allows us to find range of colors in images using openCV, and then convert them into geo vect

Development Seed 9 Jul 27, 2022
Convert Text-to Handwriting Using Python

Convert Text-to Handwriting Using Python Description In this project we'll use python library that's "pywhatkit" for converting text to handwriting. t

null 8 Nov 19, 2022
This tool will help you convert your text to handwriting xD

So your teacher asked you to upload written assignments? Hate writing assigments? This tool will help you convert your text to handwriting xD

Saurabh Daware 4.2k Jan 7, 2023
MeshToGeotiff - A fast Python algorithm to convert a 3D mesh into a GeoTIFF

MeshToGeotiff - A fast Python algorithm to convert a 3D mesh into a GeoTIFF Python class for converting (very fast) 3D Meshes/Surfaces to Raster DEMs

null 8 Sep 10, 2022
Convert PDF/Image to TXT using EasyOcr - the best OCR engine available!

PDFImage2TXT - DOWNLOAD INSTALLER HERE What can you do with it? Convert scanned PDFs to TXT. Convert scanned Documents to TXT. No coding required!! In

Hans Alemão 2 Feb 22, 2022
This is a GUI for scrapping PDFs with the help of optical character recognition making easier than ever to scrape PDFs.

pdf-scraper-with-ocr With this tool I am aiming to facilitate the work of those who need to scrape PDFs either by hand or using tools that doesn't imp

Jacobo José Guijarro Villalba 75 Oct 21, 2022
Scans pdfs for links written in plaintext and checks if they are active or returns an error code.

Scans pdfs for links written in plaintext and checks if they are active or returns an error code. It then generates a report of its findings. Extract references (pdf, url, doi, arxiv) and metadata from a PDF.

Marshal Miller 22 Nov 21, 2022