AWS Blog post code for running feature-extraction on images using AWS Batch and Cloud Development Kit (CDK).

AWS Samples

Last update: Oct 18, 2022

Related tags

Third-party APIs Wrappers python aws machine-learning computer-vision deep-learning aws-batch cdk

Overview

Batch processing with AWS Batch and CDK

Welcome

This repository demostrates provisioning the necessary infrastructure for running a job on AWS Batch using Cloud Development Kit (CDK). The AWS Batch job reads images from an S3 bucket, runs inference over image-to-vector computer vision model, and stores the results in DynamoDB. Code can be easily modified to fit other batch job transformations you might want to perform.

This code repository is part of the Deep learning image vector embeddings at scale using AWS Batch and CDK AWS DevOps Blog post.

Pre-requisites

Create and source a Python virtualenv on MacOS and Linux, and install python dependencies:

$ python3 -m venv .env
$ source .env/bin/activate
$ pip install -r requirements.txt

Install the latest version of the AWS CDK CLI:

$ npm i -g aws-cdk

Usage

Current code creates a the AWS Batch infrastructure, S3 Bucket for reading the data from, a DynamoDB table to write te batch operation results. Once the infrastructure is provisioned trough AWS CDK, you need to upload the images you want to process to the created S3 bucket. Once you've done that, go to the created AWS Lambda and submit a job. This will trigger a job execution on AWS Batch and you should see the results in the created DynamoDB table.

To deploy and run the batch inference, follow the following steps:

Make sure you have AWS CDK installed and working, all the dependencies of this project defiend in the requirements.txt file, as well as having an installed and configured Docker in your environment;
Set the CDK_DEPLOY_ACCOUNT ENV variable to the name of the AWS account you want to use (pre-defined with AWS CLI);
Set the CDK_DEPLOY_REGION ENV variable to the name of the region you want to deploy the infra in (e.g. 'us-west-2');
Run cdk deploy in the root of this project and wait for the deployment to finish successfully;
Upload the images you need to proccess to the newly created S3 bucket under a S3 bucket path (e.g. /images). Use this path in the next step;
Go to the created AWS Lambda and execute the lambda function with the following JSON:

{
"Paths": [
    "images"
   ]
}

In the AWS console, go to AWS batch and make sure the jobs are submitted and are running successfully;
Open the created DynamoDB table and validate the results are there;
You can now use a DynamoDB client to read and consume the results;

License

This library is licensed under the MIT-0 License. See the LICENSE file.

Elemeno.ai standard development kit in Python

Overview A set of glue code and utilities to make using elemeno AI platform a smooth experience Free software: Apache Software License 2.0 Installatio

3 Dec 14, 2022

💻 A fully functional local AWS cloud stack. Develop and test your cloud & Serverless apps offline!

LocalStack - A fully functional local AWS cloud stack LocalStack provides an easy-to-use test/mocking framework for developing Cloud applications. Cur

45.3k Jan 2, 2023

Multi-Branch CI/CD Pipeline using CDK Pipelines.

Using AWS CDK Pipelines and AWS Lambda for multi-branch pipeline management and infrastructure deployment. This project shows how to use the AWS CDK P

36 Dec 23, 2022

Using AWS Batch jobs to bulk copy/sync files in S3

14 Sep 19, 2022

Implement backup and recovery with AWS Backup across your AWS Organizations using a CI/CD pipeline (AWS CodePipeline).

Backup and Recovery with AWS Backup This repository provides you with a management and deployment solution for implementing Backup and Recovery with A

8 Nov 22, 2022

AHA is an incident management & communication framework to provide real-time alert customers when there are active AWS event(s). For customers with AWS Organizations, customers can get aggregated active account level events of all the accounts in the Organization. Customers not using AWS Organizations still benefit alerting at the account level.

Table of Contents Introduction Architecture Configuring an Endpoint Creating a Amazon Chime Webhook URL Creating a Slack Webhook URL Creating a Micros

215 Dec 23, 2022

Create CDK projects with projen

The Projenator: I'll be back! Description This is a CDKv2 project that takes the grind out of setting up new cdk projects/implementations by using aut

2 Dec 11, 2021

Automated AWS account hardening with AWS Control Tower and AWS Step Functions

Automate activities in Control Tower provisioned AWS accounts Table of contents Introduction Architecture Prerequisites Tools and services Usage Clean

20 Dec 7, 2022

Dante, my discord bot. Open source project in development and not optimized for other filesystems, install and setup script in development

DanteMode (In private development for ~6 months) Dante, my discord bot. Open source project in development and not optimized for other filesystems, in

2 Nov 5, 2021

Comments

Requirements, server and performance
Thank you for great example!

Could you give some clues on a few missing questions:

How the requirements for processing code are defined? The requirements.txt in the project root only lists CDK libraries.

What virtual server have you tested the demo and which performance (say, images per second) did you achieved? We're working on the similar project and wanted to check what resources to use.
opened by karelin 2
Automm cv benchmark stack

Issue #, if available:

Description of changes: This PR adds a AWS batch-based benchmarking infra for AutoGluon MultiModal.

Note: this won't work out of the box because of the rendering issue on the shared_memory_size. README on instructions for workaround to be updated.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

opened by suzhoum 1
User access policy issues

Unfortunately, README does not describe setting up user and access rights to run the example :(

Just creating an IAM user with sufficient policies is not working, in my case after some study and setting the permissions, I've stuck with an error

AccessDeniedException: User: arn:aws:iam::XXXXXXXXXXXX:user/YYYYYYYY is not authorized to perform: ecr:CreateRepository on resource: arn:aws:ecr:eu-central-1:XXXXXXXXXXXX:repository/aws-cdk/assets because no identity-based policy allows the ecr:CreateRepository action

Help/Step-by-step description for non-DevOps people will be greatly appreciated!

opened by karelin 1

AWS Blog post code for running feature-extraction on images using AWS Batch and Cloud Development Kit (CDK).

Related tags

Overview

Batch processing with AWS Batch and CDK

Welcome

Pre-requisites

Usage

License

You might also like...

Elemeno.ai standard development kit in Python

💻 A fully functional local AWS cloud stack. Develop and test your cloud & Serverless apps offline!

Multi-Branch CI/CD Pipeline using CDK Pipelines.

Using AWS Batch jobs to bulk copy/sync files in S3

Implement backup and recovery with AWS Backup across your AWS Organizations using a CI/CD pipeline (AWS CodePipeline).

Create CDK projects with projen

Automated AWS account hardening with AWS Control Tower and AWS Step Functions

Dante, my discord bot. Open source project in development and not optimized for other filesystems, install and setup script in development

Comments

Requirements, server and performance

Automm cv benchmark stack

User access policy issues

Owner

AWS Samples

CloudFormation template and CDK stack that contains a CustomResource with Lambda function to allow the setting of the targetAccountIds attribute of the EC2 Image Builder AMI distribution settings which is not currently supported (as of October 2021) in CloudFormation or CDK.

Cdk-python-crud-app - CDK Python CRUD App

Project template for using aws-cdk, Chalice and React in concert, including RDS Postgresql and AWS Cognito

Infrastructure template and Jupyter notebooks for running RoseTTAFold on AWS Batch.

Deploy a STAC API and a dynamic mosaic tiler API using AWS CDK.

This solution helps you deploy Data Lake Infrastructure on AWS using CDK Pipelines.

A SageMaker Projects template to deploy a model from Model Registry, choosing your preferred method of deployment among async (Asynchronous Inference), batch (Batch Transform), realtime (Real-time Inference Endpoint). More to be added soon!

Recommended AWS CDK project structure for Python applications

Criando Lambda Functions para Ingerir Dados de APIs com AWS CDK

DIAL(Did I Alert Lambda?) is a centralised security misconfiguration detection framework which completely runs on AWS Managed services like AWS API Gateway, AWS Event Bridge & AWS Lambda