A document format conversion service based on Pandoc.

David Lougheed

Last update: Jul 18, 2022

Related tags

Overview

reformed

Document format conversion service based on Pandoc.

Usage

The API specification for the Reformed server is as follows:

`GET /api/v1/formats`: Lists available input and output formats for documents

Response

{
  "input": {
    "commonmark": {
      "mime": "text/markdown",
      "ext": "md",
      "detail": "CommonMark Markdown"
    },
    "docx": {
      "mime": "application/vnd.openxmlformats-officedocument.wordprocessingml.document",
      "ext": "docx",
      "detail": "Word docx"
    },
    // ...
  },
  "output": {
    "commonmark": {
      "mime": "text/markdown",
      "ext": "md",
      "detail": "CommonMark Markdown"
    },
    "docx": {
      "mime": "application/vnd.openxmlformats-officedocument.wordprocessingml.document",
      "ext": "docx",
      "detail": "Word docx"
    },
    // ...
    "latex": {
      "mime": "text/x-tex",
      "ext": "tex",
      "detail": "LaTeX"
    },
    // ...
  }
}

`POST /api/v1/from/[input format]/to/[output format]`: Converts a document from one format to another

Request

The request should be made with the multipart/form-data encoding.

Parameters

The request parameters are as follows:

File `document`

Document to convert. For example, to convert a docx file to a pdf file, the following cURL command will work:

curl -X POST -F '[email protected]' http://localhost:8000/api/v1/from/docx/to/pdf > test.pdf

Boolean `bundle`

Whether to bundle the created document and any media (extracted pictures from e.g. a .docx file) together in a .zip archive.

If the form value for this option is anything except a blank string, it will be treated as True.

If no media is generated and this option is set, this will return the reformatted document in a .zip archive by itself.

If media is generated and this option is not set, any extracted media will be discarded and just the document will be returned.

Boolean Pandoc flags

This endpoint supports the following Pandoc standalone flags: ascii, gladtex, html-q-tags, incremental, listings, mathml, no-highlight, number-sections, preserve-tabs, reference-links, section-divs, standalone, strip-comments, toc.

If the form value for a given flag is anything except a blank string, it will be added to the Pandoc call.

See the Pandoc manual for more information on these flags' effects.

Pandoc flags with choices

This endpoint supports the following Pandoc flags which have specific choices: eol, markdown-headings, reference-location, top-level-division, track-changes, wrap.

If the form value for a given flag is valid, it will be added to the Pandoc call.

See the Pandoc manual for more information on these flags' effects.

Integer `columns` (Pandoc option)

If specified and a valid integer, this will add the --columns=XX option to the Pandoc call. The value is bounded to 1 <= columns <= 300 by Reformed.

See the Pandoc manual's description for more.

Integer `dpi` (Pandoc option)

If specified and a valid integer, this will add the --dpi=XX option to the Pandoc call. The value is bounded to 36 <= dpi <= 600 by Reformed.

See the Pandoc manual's description for more.

Integer `toc-depth` (Pandoc option)

If specified and a valid integer, this will add the --toc-depth=XX option to the Pandoc call. The value is bounded to 1 <= toc-depth <= 6 by Reformed.

See the Pandoc manual's description for more.

Response

A binary stream with the MIME type specified in the list of formats. Content-Disposition is forced to be an attachment to prevent files from rendering in the browser.

If an error is encountered, this will instead be a JSON response with an error key specifying what went wrong.

Configuration

A few configuration environment variables are available for the Reformed server, listed here with their default values:

# Maximum buffer size for requests, in bytes - mostly useful for controlling file uploads
# Defaults to 25 MiB
REFORMED_MAX_BUFFER_SIZE=26214400

# Port to accept requests on
REFORMED_PORT=8000

# Number of worker processes to start
REFORMED_WORKERS=2

Deploying

Main-branch and tagged releases are both automatically published as Docker images to the GitHub Container Registry. These images can be run in the standard fashion as a daemon, and expose a Tornado HTTP server on port 8000.

See the package listing for more information on pulling the image.

Developing and Testing

The development requirements are specified in requirements.dev.txt.

To test with coverage, use the following command:

coverage run -m unittest -v

To run the linter, use the following command:

flake8 reformed

You might also like...

Service for visualisation of high dimensional for hydrosphere

hydro-visualization Service for visualization of high dimensional for hydrosphere DEPENDENCIES DEBUG_ENV = bool(os.getenv("DEBUG_ENV", False)) APP_POR

1 Nov 12, 2021

Documentation generator for C++ based on Doxygen and mosra/m.css.

mosra/m.css is a Doxygen-based documentation generator that significantly improves on Doxygen's default output by controlling some of Doxygen's more unruly options, supplying it's own slick HTML+CSS generation and adding a fantastic live search feature.

109 Dec 7, 2022

Dev Centric Tools for Mkdocs Based Documentation

docutools MkDocs Documentation Tools For Developers This repo is providing a set of plugins for mkdocs material compatible documentation. It is meant

14 Sep 10, 2022

Fast syllable estimation library based on pattern matching.

Syllables: A fast syllable estimator for Python Syllables is a fast, simple syllable estimator for Python. It's intended for use in places where speed

26 Dec 14, 2022

Explicit, strict and automatic project version management based on semantic versioning.

Explicit, strict and automatic project version management based on semantic versioning. Getting started End users Semantic versioning Project version

6 Jan 25, 2022

This repo provides a package to automatically select a random seed based on ancient Chinese Xuanxue

🤞 Random Luck Deep learning is acturally the alchemy. This repo provides a package to automatically select a random seed based on ancient Chinese Xua

33 Jan 3, 2023

script to calculate total GPA out of 4, based on input gpa.csv

gpa_calculator script to calculate total GPA out of 4 based on input gpa.csv to use, create a total.csv file containing only one integer showing the t

1 Feb 7, 2022

Code for our SIGIR 2022 accepted paper : P3 Ranker: Mitigating the Gaps between Pre-training and Ranking Fine-tuning with Prompt-based Learning and Pre-finetuning

P3 Ranker Implementation for our SIGIR2022 accepted paper: P3 Ranker: Mitigating the Gaps between Pre-training and Ranking Fine-tuning with Prompt-bas

14 Jan 4, 2023

Matlab Python Heuristic Battery Opt - SMOP conversion and manual conversion

SMOP is Small Matlab and Octave to Python compiler. SMOP translates matlab to py

1 Jan 12, 2022

Conversion of Image, video, text into ASCII format

asciju Python package that converts image to ascii Free software: MIT license

11 Aug 22, 2022

Django-Audiofield is a simple app that allows Audio files upload, management and conversion to different audio format (mp3, wav & ogg), which also makes it easy to play audio files into your Django application.

Django-Audiofield Description: Django Audio Management Tools Maintainer: Areski Contributors: list of contributors Django-Audiofield is a simple app t

167 Nov 10, 2022

Simple, minimal conversion of Bus Open Data Service SIRI-VM data to JSON

0 Jan 22, 2022

Open Source research tool to search, browse, analyze and explore large document collections by Semantic Search Engine and Open Source Text Mining & Text Analytics platform (Integrates ETL for document processing, OCR for images & PDF, named entity recognition for persons, organizations & locations, metadata management by thesaurus & ontologies, search user interface & search apps for fulltext search, faceted search & knowledge graph)

Open Semantic Search https://opensemanticsearch.org Integrated search server, ETL framework for document processing (crawling, text extraction, text a

684 Jan 6, 2023

Universal Office Converter - Convert between any document format supported by LibreOffice/OpenOffice.

Automated conversion and styling using LibreOffice Universal Office Converter (unoconv) is a command line tool to convert any document format that Lib

2.4k Jan 3, 2023

PAGE XML format collection for document image page content and more

PAGE-XML PAGE XML format collection for document image page content and more For an introduction, please see the following publication: http://www.pri

46 Nov 14, 2022

Split given PDF document into 4 page groups and convert them to booklet format

PUTO: PDF to Booklet converter Split given PDF document into 4 page groups and convert them to booklet format. It creates a PDF like shown below: Fir

3 Mar 12, 2022

This repository contains a set of benchmarks of different implementations of Parquet (storage format) - Arrow (in-memory format).

Parquet benchmarks This repository contains a set of benchmarks of different implementations of Parquet (storage format) - Arrow (in-memory format).

11 Dec 21, 2022

Json2Xml tool will help you convert from json COCO format to VOC xml format in Object Detection Problem.

JSON 2 XML All codes assume running from root directory. Please update the sys path at the beginning of the codes before running. Over View Json2Xml t

6 Aug 22, 2022

Png2Jpg tool will help you convert from png image format to jpg images format.

PNG 2 JPG All codes assume running from root directory. Please update the sys path at the beginning of the codes before running. Over View Png2Jpg too

2 Dec 27, 2021

Releases(v0.1.0)

v0.1.0(Aug 14, 2021)

Initial release, missing a few things (mostly bibliography-related.)
Source code(tar.gz)
Source code(zip)

A document format conversion service based on Pandoc.

Related tags

Overview

reformed

Usage

GET /api/v1/formats: Lists available input and output formats for documents

Response

POST /api/v1/from/[input format]/to/[output format]: Converts a document from one format to another

Request

Parameters

File document

Boolean bundle

Boolean Pandoc flags

Pandoc flags with choices

Integer columns (Pandoc option)

Integer dpi (Pandoc option)

Integer toc-depth (Pandoc option)

Response

Configuration

Deploying

Developing and Testing

You might also like...

Service for visualisation of high dimensional for hydrosphere

Documentation generator for C++ based on Doxygen and mosra/m.css.

Dev Centric Tools for Mkdocs Based Documentation

Fast syllable estimation library based on pattern matching.

Explicit, strict and automatic project version management based on semantic versioning.

This repo provides a package to automatically select a random seed based on ancient Chinese Xuanxue

script to calculate total GPA out of 4, based on input gpa.csv

Code for our SIGIR 2022 accepted paper : P3 Ranker: Mitigating the Gaps between Pre-training and Ranking Fine-tuning with Prompt-based Learning and Pre-finetuning

Matlab Python Heuristic Battery Opt - SMOP conversion and manual conversion

Conversion of Image, video, text into ASCII format

Django-Audiofield is a simple app that allows Audio files upload, management and conversion to different audio format (mp3, wav & ogg), which also makes it easy to play audio files into your Django application.

Simple, minimal conversion of Bus Open Data Service SIRI-VM data to JSON

Universal Office Converter - Convert between any document format supported by LibreOffice/OpenOffice.

PAGE XML format collection for document image page content and more

Split given PDF document into 4 page groups and convert them to booklet format

This repository contains a set of benchmarks of different implementations of Parquet (storage format) - Arrow (in-memory format).

Json2Xml tool will help you convert from json COCO format to VOC xml format in Object Detection Problem.

Png2Jpg tool will help you convert from png image format to jpg images format.

Releases(v0.1.0)

v0.1.0(Aug 14, 2021)

Owner

David Lougheed

A simple document management REST based API for collaboratively interacting with documents

xeuledoc - Fetch information about a public Google document.

API spec validator and OpenAPI document generator for Python web frameworks.

Mayan EDMS is a document management system.

Word document generator with python

Searches a document for hash tags. Support multiple natural languages. Works in various contexts.

Documentation for the lottie file format

[Unofficial] Python PEP in EPUB format

Pydocstringformatter - A tool to automatically format Python docstrings that tries to follow recommendations from PEP 8 and PEP 257.

A course-planning, course-map rendering and GPA-calculation web service, designed for the SFU (Simon Fraser University) student.

`GET /api/v1/formats`: Lists available input and output formats for documents

`POST /api/v1/from/[input format]/to/[output format]`: Converts a document from one format to another

File `document`

Boolean `bundle`

Integer `columns` (Pandoc option)

Integer `dpi` (Pandoc option)

Integer `toc-depth` (Pandoc option)