This repository provides a set functions to extract paragraphs from AWS Textract responses.

Juan Anzola

Last update: Jan 26, 2022

Related tags

Third-party APIs Wrappers extract-paragraphs-with-aws-textract

Overview

extract-paragraphs-with-aws-textract

Since AWS Textract (the AWS OCR service) does not have a native function to extract paragraphs, this repository provides a set of Python 3.X functions built on top of the AWS Python SDK (boto3) to extract paragraphs from AWS Textract responses.

PLEASE NOTE THAT:

It is assumed that your client has the neccesary IAM permissions to access the different AWS resources required.
Since AWS Textract analyze PDF files by running asynchronous operations, the current version assumes that you've already created an s3 bucket and that the PDF files are already stored there. If not, please go to the boto3 docs to know how to create a bucket as well as upload files.
The paragraph_constructor is an ad hoc function for my use case. You may have to adapt it based on the space between lines in your data.

UPCOMING FEATURES:

Address abstract cases with the paragrpah_constructor function.
Export data in different formats.
AWS CloudFormation template for a serverless architecture to execute the functions when a new object is uploaded in your S3 bucket.

Please feel free to suggest new features or improvements to the current code. <3

You might also like...

SSH-Restricted deploys an SSH compliance rule (AWS Config) with auto-remediation via AWS Lambda if SSH access is public.

SSH-Restricted SSH-Restricted deploys an SSH compliance rule with auto-remediation via AWS Lambda if SSH access is public. SSH-Auto-Restricted checks

30 Nov 8, 2022

AWS Auto Inventory allows you to quickly and easily generate inventory reports of your AWS resources.

Photo by Denny Müller on Unsplash AWS Automated Inventory ( aws-auto-inventory ) Automates creation of detailed inventories from AWS resources. Table

123 Dec 26, 2022

aws-lambda-scheduler lets you call any existing AWS Lambda Function you have in a future time.

aws-lambda-scheduler aws-lambda-scheduler lets you call any existing AWS Lambda Function you have in the future. This functionality is achieved by dyn

57 Dec 17, 2022

Project template for using aws-cdk, Chalice and React in concert, including RDS Postgresql and AWS Cognito

What is This? This repository is an opinonated project template for using aws-cdk, Chalice and React in concert. Where aws-cdk and Chalice are in Pyth

4 Nov 7, 2022

POC de uma AWS lambda que executa a consulta de preços de criptomoedas, e é implantada na AWS usando Github actions.

Cryptocurrency Prices Overview Instalação Repositório Configuração CI/CD Roadmap Testes Overview A ideia deste projeto é aplicar o conteúdo estudado s

3 Aug 31, 2022

Python + AWS Lambda Hands OnPython + AWS Lambda Hands On

Python + AWS Lambda Hands On Python Criada em 1990, por Guido Van Rossum. "Bala de prata" (quase). Muito utilizado em: Automatizações - Selenium, Beau

8 Sep 9, 2022

Unauthenticated enumeration of services, roles, and users in an AWS account or in every AWS account in existence.

Quiet Riot 🎶 C'mon, Feel The Noise 🎶 An enumeration tool for scalable, unauthenticated validation of AWS principals; including AWS Acccount IDs, roo

89 Jan 5, 2023

AWS Blog post code for running feature-extraction on images using AWS Batch and Cloud Development Kit (CDK).

Batch processing with AWS Batch and CDK Welcome This repository demostrates provisioning the necessary infrastructure for running a job on AWS Batch u

7 Oct 18, 2022

Aws-lambda-requests-wrapper - Request/Response wrapper for AWS Lambda with API Gateway

AWS Lambda Requests Wrapper Request/Response wrapper for AWS Lambda with API Gat

1 May 20, 2022

This repository provides a set functions to extract paragraphs from AWS Textract responses.

Related tags

Overview

extract-paragraphs-with-aws-textract

You might also like...

SSH-Restricted deploys an SSH compliance rule (AWS Config) with auto-remediation via AWS Lambda if SSH access is public.

AWS Auto Inventory allows you to quickly and easily generate inventory reports of your AWS resources.

aws-lambda-scheduler lets you call any existing AWS Lambda Function you have in a future time.

Project template for using aws-cdk, Chalice and React in concert, including RDS Postgresql and AWS Cognito

POC de uma AWS lambda que executa a consulta de preços de criptomoedas, e é implantada na AWS usando Github actions.

Python + AWS Lambda Hands OnPython + AWS Lambda Hands On

Unauthenticated enumeration of services, roles, and users in an AWS account or in every AWS account in existence.

AWS Blog post code for running feature-extraction on images using AWS Batch and Cloud Development Kit (CDK).

Aws-lambda-requests-wrapper - Request/Response wrapper for AWS Lambda with API Gateway

Owner

Juan Anzola

Automated AWS account hardening with AWS Control Tower and AWS Step Functions

Implement backup and recovery with AWS Backup across your AWS Organizations using a CI/CD pipeline (AWS CodePipeline).

A suite of utilities for AWS Lambda Functions that makes tracing with AWS X-Ray, structured logging and creating custom metrics asynchronously easier

Nasdaq Cloud Data Service (NCDS) provides a modern and efficient method of delivery for realtime exchange data and other financial information. This repository provides an SDK for developing applications to access the NCDS.

A solution designed to extract, transform and load Chicago crime data from an RDS instance to other services in AWS.

A Python Library to interface with Flickr REST API, OAuth & JSON Responses

A Python Library to interface with LinkedIn API, OAuth and JSON responses

Proxy server that records responses for UI testing (and other things)

Automatically compile an AWS Service Control Policy that ONLY allows AWS services that are compliant with your preferred compliance frameworks.