AWS Blog post code for running feature-extraction on images using AWS Batch and Cloud Development Kit (CDK).

Overview

Batch processing with AWS Batch and CDK

Welcome

This repository demostrates provisioning the necessary infrastructure for running a job on AWS Batch using Cloud Development Kit (CDK). The AWS Batch job reads images from an S3 bucket, runs inference over image-to-vector computer vision model, and stores the results in DynamoDB. Code can be easily modified to fit other batch job transformations you might want to perform.

This code repository is part of the Deep learning image vector embeddings at scale using AWS Batch and CDK AWS DevOps Blog post.

Pre-requisites

  1. Create and source a Python virtualenv on MacOS and Linux, and install python dependencies:
$ python3 -m venv .env
$ source .env/bin/activate
$ pip install -r requirements.txt
  1. Install the latest version of the AWS CDK CLI:
$ npm i -g aws-cdk

Usage

Current code creates a the AWS Batch infrastructure, S3 Bucket for reading the data from, a DynamoDB table to write te batch operation results. Once the infrastructure is provisioned trough AWS CDK, you need to upload the images you want to process to the created S3 bucket. Once you've done that, go to the created AWS Lambda and submit a job. This will trigger a job execution on AWS Batch and you should see the results in the created DynamoDB table.

To deploy and run the batch inference, follow the following steps:

  1. Make sure you have AWS CDK installed and working, all the dependencies of this project defiend in the requirements.txt file, as well as having an installed and configured Docker in your environment;
  2. Set the CDK_DEPLOY_ACCOUNT ENV variable to the name of the AWS account you want to use (pre-defined with AWS CLI);
  3. Set the CDK_DEPLOY_REGION ENV variable to the name of the region you want to deploy the infra in (e.g. 'us-west-2');
  4. Run cdk deploy in the root of this project and wait for the deployment to finish successfully;
  5. Upload the images you need to proccess to the newly created S3 bucket under a S3 bucket path (e.g. /images). Use this path in the next step;
  6. Go to the created AWS Lambda and execute the lambda function with the following JSON:
{
"Paths": [
    "images"
   ]
}
  1. In the AWS console, go to AWS batch and make sure the jobs are submitted and are running successfully;
  2. Open the created DynamoDB table and validate the results are there;
  3. You can now use a DynamoDB client to read and consume the results;

License

This library is licensed under the MIT-0 License. See the LICENSE file.

You might also like...
Elemeno.ai standard development kit in Python

Overview A set of glue code and utilities to make using elemeno AI platform a smooth experience Free software: Apache Software License 2.0 Installatio

💻  A fully functional local AWS cloud stack. Develop and test your cloud & Serverless apps offline!
💻 A fully functional local AWS cloud stack. Develop and test your cloud & Serverless apps offline!

LocalStack - A fully functional local AWS cloud stack LocalStack provides an easy-to-use test/mocking framework for developing Cloud applications. Cur

Multi-Branch CI/CD Pipeline using CDK Pipelines.
Multi-Branch CI/CD Pipeline using CDK Pipelines.

Using AWS CDK Pipelines and AWS Lambda for multi-branch pipeline management and infrastructure deployment. This project shows how to use the AWS CDK P

 Using AWS Batch jobs to bulk copy/sync files in S3
Using AWS Batch jobs to bulk copy/sync files in S3

Using AWS Batch jobs to bulk copy/sync files in S3

Implement backup and recovery with AWS Backup across your AWS Organizations using a CI/CD pipeline (AWS CodePipeline).
Implement backup and recovery with AWS Backup across your AWS Organizations using a CI/CD pipeline (AWS CodePipeline).

Backup and Recovery with AWS Backup This repository provides you with a management and deployment solution for implementing Backup and Recovery with A

AHA is an incident management & communication framework to provide real-time alert customers when there are active AWS event(s). For customers with AWS Organizations, customers can get aggregated active account level events of all the accounts in the Organization. Customers not using AWS Organizations still benefit alerting at the account level.
Create CDK projects with projen

The Projenator: I'll be back! Description This is a CDKv2 project that takes the grind out of setting up new cdk projects/implementations by using aut

Automated AWS account hardening with AWS Control Tower and AWS Step Functions
Automated AWS account hardening with AWS Control Tower and AWS Step Functions

Automate activities in Control Tower provisioned AWS accounts Table of contents Introduction Architecture Prerequisites Tools and services Usage Clean

Dante, my discord bot. Open source project in development and not optimized for other filesystems, install and setup script in development

DanteMode (In private development for ~6 months) Dante, my discord bot. Open source project in development and not optimized for other filesystems, in

Comments
  • Requirements, server and performance

    Requirements, server and performance

    Thank you for great example!

    Could you give some clues on a few missing questions:

    1. How the requirements for processing code are defined? The requirements.txt in the project root only lists CDK libraries.
    2. What virtual server have you tested the demo and which performance (say, images per second) did you achieved? We're working on the similar project and wanted to check what resources to use.
    opened by karelin 2
  • Automm cv benchmark stack

    Automm cv benchmark stack

    Issue #, if available:

    Description of changes: This PR adds a AWS batch-based benchmarking infra for AutoGluon MultiModal.

    Note: this won't work out of the box because of the rendering issue on the shared_memory_size. README on instructions for workaround to be updated.

    By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

    opened by suzhoum 1
  • User access policy issues

    User access policy issues

    Unfortunately, README does not describe setting up user and access rights to run the example :(

    Just creating an IAM user with sufficient policies is not working, in my case after some study and setting the permissions, I've stuck with an error

    AccessDeniedException: User: arn:aws:iam::XXXXXXXXXXXX:user/YYYYYYYY is not authorized to perform: ecr:CreateRepository on resource: arn:aws:ecr:eu-central-1:XXXXXXXXXXXX:repository/aws-cdk/assets because no identity-based policy allows the ecr:CreateRepository action

    Help/Step-by-step description for non-DevOps people will be greatly appreciated!

    opened by karelin 1
Cdk-python-crud-app - CDK Python CRUD App

Welcome to your CDK Python project! You should explore the contents of this proj

Shapon Sheikh 1 Jan 12, 2022
Project template for using aws-cdk, Chalice and React in concert, including RDS Postgresql and AWS Cognito

What is This? This repository is an opinonated project template for using aws-cdk, Chalice and React in concert. Where aws-cdk and Chalice are in Pyth

Rasmus Jones 4 Nov 7, 2022
Infrastructure template and Jupyter notebooks for running RoseTTAFold on AWS Batch.

AWS RoseTTAFold Infrastructure template and Jupyter notebooks for running RoseTTAFold on AWS Batch. Overview Proteins are large biomolecules that play

AWS Samples 20 May 10, 2022
Deploy a STAC API and a dynamic mosaic tiler API using AWS CDK.

Earth Observation API Deploy a STAC API and a dynamic mosaic tiler API using AWS CDK.

Development Seed 39 Oct 30, 2022
This solution helps you deploy Data Lake Infrastructure on AWS using CDK Pipelines.

CDK Pipelines for Data Lake Infrastructure Deployment This solution helps you deploy data lake infrastructure on AWS using CDK Pipelines. This is base

AWS Samples 66 Nov 23, 2022
Davide Gallitelli 3 Dec 21, 2021
Recommended AWS CDK project structure for Python applications

Recommended AWS CDK project structure for Python applications The project implements a user management backend component that uses Amazon API Gateway,

AWS Samples 110 Jan 6, 2023
Criando Lambda Functions para Ingerir Dados de APIs com AWS CDK

LIVE001 - AWS Lambda para Ingerir Dados de APIs Fazer o deploy de uma função lambda com infraestrutura como código Lambda vai numa API externa e extra

Andre Sionek 12 Nov 20, 2022
DIAL(Did I Alert Lambda?) is a centralised security misconfiguration detection framework which completely runs on AWS Managed services like AWS API Gateway, AWS Event Bridge & AWS Lambda

DIAL(Did I Alert Lambda?) is a centralised security misconfiguration detection framework which completely runs on AWS Managed services like AWS API Gateway, AWS Event Bridge & AWS Lambda

CRED 71 Dec 29, 2022