This solution helps you deploy Data Lake Infrastructure on AWS using CDK Pipelines.

Overview

CDK Pipelines for Data Lake Infrastructure Deployment

This solution helps you deploy data lake infrastructure on AWS using CDK Pipelines. This is based on AWS blog Deploy data lake ETL jobs using CDK Pipelines. We recommend you to read the blog before you proceed with the solution.

CDK Pipelines is a construct library module for painless continuous delivery of CDK applications. CDK stands for Cloud Development Kit. It is an open source software development framework to define your cloud application resources using familiar programming languages.

This solution helps you:

  1. deploy data lake infrastructure on AWS using CDK Pipelines
  2. leverage the benefit of self-mutating feature of CDK Pipelines. For example, whenever you check your CDK app's source code in to your version control system, CDK Pipelines can automatically build, test, and deploy your new version
  3. increase the speed of prototyping, testing, and deployment of new ETL workloads

Contents


Data lake

In this section we talk about Data lake architecture and its infrastructure.


Architecture

To level set, let us design a data lake. As shown in the figure below, we use Amazon S3 for storage. We use three S3 buckets - 1) raw bucket to store raw data in its original format 2) conformed bucket to store the data that meets the quality requirements of the lake 3) purpose-built data that is used by analysts and data consumers of the lake.

The Data Lake has one producer which ingests files into the raw bucket. We use AWS Lambda and AWS Step Functions for orchestration and scheduling of ETL workloads.

We use AWS Glue for ETL and data cataloging, Amazon Athena for interactive queries and analysis. We use various AWS services for logging, monitoring, security, authentication, authorization, notification, build, and deployment.

Note: AWS Lake Formation is a service that makes it easy to set up a secure data lake in days. Amazon QuickSight is a scalable, serverless, embeddable, machine learning-powered business intelligence (BI) service built for the cloud. These two services are not used in this solution.

Conceptual Data Lake


Infrastructure

Now we have the Data Lake design, let's deploy its infrastructure. It includes the following resources:

  1. Amazon Virtual Private Cloud (VPC)
  2. Subnets
  3. Security Groups
  4. Route Table(s)
  5. VPC Endpoints
  6. Amazon S3 buckets for:
    1. raw data
    2. conformed data
    3. purpose-built
  7. Amazon DynamoDB table for ETL jobs auditing

Figure below represents the infrastructure resources we provision for Data Lake.

Data Lake Infrastructure Architecture


The solution

We use a centralized deployment model to deploy data lake infrastructure across dev, test, and prod environments.


Centralized deployment

To demonstrate this solution, we need 4 AWS accounts as follows:

  1. Central deployment account to create CDK pipelines
  2. Dev account for dev data lake
  3. Test account for test data lake
  4. Prod account for production data lake

Figure below represents the centralized deployment model.

Alt

There are few interesting details to point out here:

  1. Data Lake infrastructure source code is organized into three branches - dev, test, and production
  2. Each branch is mapped to a CDK pipeline and it turn mapped to a target environment. This way, code changes made to the branches are deployed iteratively to their respective target environment
  3. From CDK perspective, we apply the the following bootstrapping principles
    1. the central deployment account will utilize a standard bootstrap
    2. each target account will require a cross account trust policy to allow access from the centralized deployment account

Continuous delivery of data lake infrastructure using CDK Pipelines

Figure below illustrates the continuous delivery of data lake infrastructure.

Alt

There are few interesting details to point out here:

  1. The DevOps administrator checks in the code to the repository.
  2. The DevOps administrator (with elevated access) facilitates a one-time manual deployment on a target environment. Elevated access includes administrative privileges on the central deployment account and target AWS environments.
  3. CodePipeline periodically listens to commit events on the source code repositories. This is the self-mutating nature of CodePipeline. It’s configured to work with and is able to update itself according to the provided definition.
  4. Code changes made to the main branch of the repo are automatically deployed to the dev environment of the data lake.
  5. Code changes to the test branch of the repo are automatically deployed to the test environment.
  6. Code changes to the prod branch of the repo are automatically deployed to the prod environment.

Source code structure

Table below explains how this source ode structured:

File / Folder Description
app.py Application entry point.
pipeline_stack.py Pipeline stack entry point.
pipeline_deploy_stage.py Pipeline deploy stage entry point.
s3_bucket_zones_stack.py Stack creates S3 buckets - raw, conformed, and purpose-built. This also creates an S3 bucket for server access logging and AWS KMS Key to enabled server side encryption for all buckets.
tagging.py Program to tag all provisioned resources.
vpc_stack.py Contains all resources related to the VPC used by Data Lake infrastructure and services. This includes: VPC, Security Groups, and VPC Endpoints (both Gateway and Interface types).
resources This folder has static resources such as architecture diagrams, developer guide etc.

Automation scripts

This repository has the following automation scripts to complete steps before the deployment:

# Script Purpose
1 bootstrap_deployment_account.sh Used to bootstrap deployment account
2 bootstrap_target_account.sh Used to bootstrap target environments for example dev, test, and production.
3 configure_account_secrets.py Used to configure account secrets for e.g. GitHub access token.

Prerequisites

This section has various steps you need to perform before you deploy data lake resources on AWS.


Software installation

  1. AWS CLI - make sure you have AWS CLI configured on your system. If not, refer to Configuring the AWS CLI for more details.

  2. AWS CDK - install compatible AWS CDK version

    npm install -g [email protected]
  3. Python - make sure you have Python SDK installed on your system. We recommend Python 3.7 and above.

  4. GitHub Fork - we recommend you fork the repository so you are in control of deployed resources.

Logistical requirements

  1. Four AWS accounts. One of them acts like a central deployment account. The other three are for dev, test, and prod accounts. Optional: To test this solution with central deployment account and one target environment for e.g. dev, refer to developer_instructions.md for detailed instructions.

  2. Number of branches on your GitHub repo - You need to start with at least one branch for e.g. main to start using this solution. test and prod branches can be added at the beginning or after the deployment of data lake infrastructure on dev environment.

  3. Administrator privileges - you need to administrator privileges to bootstrap your AWS environments and complete initial deployment. Usually, these steps can be performed by a DevOps administrator of your team. After these steps, you can revoke administrative privileges. Subsequent deployments are based on self-mutating natures of CDK Pipelines.

  4. AWS Region selection - we recommend you to use the same AWS region (e.g. us-east-2) for deployment, dev, test, and prod accounts for simplicity. However, this is not a hard requirement.


AWS environment bootstrapping

Environment bootstrap is standard CDK process to prepare an AWS environment ready for deployment. Follow the steps:

  1. Go to project root directory where app.py file exists

  2. Create Python virtual environment. This is a one-time activity.

    python3 -m venv .venv
  3. Expected output: you will see a folder with name .venv created in project root folder. You can run the following command to see its contents ls -lart .venv/

    total 8
    drwxr-xr-x   2 user_id  staff   64 Jun 23 15:25 include
    drwxr-xr-x   3 user_id  staff   96 Jun 23 15:25 lib
    drwxr-xr-x   6 user_id  staff  192 Jun 23 15:25 .
    -rw-r--r--   1 user_id  staff  114 Jun 23 15:25 pyvenv.cfg
    drwxr-xr-x  16 user_id  staff  512 Jun 23 15:27 bin
    drwxr-xr-x  21 user_id  staff  672 Jun 23 15:28 ..
  4. Activate Python virtual environment

    source .venv/bin/activate
  5. Install dependencies

    pip install -r requirements.txt
  6. Expected output: run the below command and verify all dependencies are installed

    ls -lart .venv/lib/python3.9/site-packages/
  7. Enable execute permissions for scripts

    chmod 700 ./lib/prerequisites/bootstrap_deployment_account.sh
    chmod 700 ./lib/prerequisites/bootstrap_target_account.sh
  8. Before you bootstrap central deployment account account, set environment variable

    export AWS_PROFILE=replace_it_with_deployment_account_profile_name_b4_running

    Important:

    1. This command is based on the feature Named Profiles.
    2. If you want to use an alternative option then refer to Configuring the AWS CLI and Environment variables to configure the AWS CLI for details. Be sure to follow those steps for each configuration step moving forward.
  9. Bootstrap central deployment account

    ./lib/prerequisites/bootstrap_deployment_account.sh
  10. When you see the following text, enter y, and press enter/return

    Are you sure you want to bootstrap {
       "UserId": "user_id",
       "Account": "deployment_account_id",
       "Arn": "arn:aws:iam::deployment_account_id:user/user_id"
    }? (y/n)y
  11. Expected outputs:

    1. In your terminal, you see Environment aws://deployment_account_id/us-east-2 bootstrapped.

    2. You see a stack created in your deployment account as follows

      bootstrap_central_deployment_account

    3. You see an S3 bucket created in central deployment account. The name is like cdk-hnb659fds-<assets-deployment_account_id>-us-east-2

  12. Before you bootstrap dev account, set environment variable

    export AWS_PROFILE=replace_it_with_dev_account_profile_name_b4_running
  13. Bootstrap dev account

    Important: Your configured environment must target the Dev account

    ./lib/prerequisites/bootstrap_target_account.sh <central_deployment_account_id> arn:aws:iam::aws:policy/AdministratorAccess

    When you see the following text, enter y, and press enter/return

    Are you sure you want to bootstrap {
     "UserId": "user_id",
     "Account": "dev_account_id",
     "Arn": "arn:aws:iam::dev_account_id:user/user_id"
    } providing a trust relationship to: deployment_account_id using policy arn:aws:iam::aws:policy/AdministratorAccess? (y/n)
  14. Expected outputs:

    1. In your terminal, you see Environment aws://dev_account_id/us-east-2 bootstrapped.

    2. You see a stack created in your deployment account as follows

      bootstrap_central_deployment_account

    3. You see an S3 bucket created in central deployment account. The name is like cdk-hnb659fds-assets-<dev_account_id>-us-east-2

  15. Before you bootstrap test account, set environment variable

    export AWS_PROFILE=replace_it_with_test_account_profile_name_b4_running
  16. Bootstrap test account

    Important: Your configured environment must target the Test account

    ./lib/prerequisites/bootstrap_target_account.sh <central_deployment_account_id> arn:aws:iam::aws:policy/AdministratorAccess

    When you see the following text, enter y, and press enter/return

    Are you sure you want to bootstrap {
       "UserId": "user_id",
       "Account": "test_account_id",
       "Arn": "arn:aws:iam::test_account_id:user/user_id"
    } providing a trust relationship to: deployment_account_id using policy arn:aws:iam::aws:policy/AdministratorAccess? (y/n)
  17. Expected outputs:

    1. In your terminal, you see Environment aws://test_account_id/us-east-2 bootstrapped.

    2. You see a stack created in your Deployment account as follows

      bootstrap_central_deployment_account

    3. You see an S3 bucket created in central deployment account. The name is like cdk-hnb659fds-assets-<test_account_id>-us-east-2

  18. Before you bootstrap prod account, set environment variable

    export AWS_PROFILE=replace_it_with_prod_account_profile_name_b4_running
  19. Bootstrap Prod account

    Important: Your configured environment must target the Prod account

    ./lib/prerequisites/bootstrap_target_account.sh <central_deployment_account_id> arn:aws:iam::aws:policy/AdministratorAccess

    When you see the following text, enter y, and press enter/return

    Are you sure you want to bootstrap {
       "UserId": "user_id",
       "Account": "prod_account_id",
       "Arn": "arn:aws:iam::prod_account_id:user/user_id"
    } providing a trust relationship to: deployment_account_id using policy arn:aws:iam::aws:policy/AdministratorAccess? (y/n)
  20. Expected outputs:

    1. In your terminal, you see Environment aws://prod_account_id/us-east-2 bootstrapped.

    2. You see a stack created in your Deployment account as follows

      bootstrap_central_deployment_account

    3. You see an S3 bucket created in central deployment account. The name is like cdk-hnb659fds-assets-<prod_account_id>-us-east-2


Application configuration

Before we deploy our resources we must provide the manual variables and upon deployment the CDK Pipelines will programmatically export outputs for managed resources. Follow the below steps to setup your custom configuration:

  1. Note: You can safely commit these values to your repository

  2. Go to configuration.py and fill in values under local_mapping dictionary within the function get_local_configuration as desired.

    Example:

    local_mapping = {
        DEPLOYMENT: {
            ACCOUNT_ID: 'add_your_deployment_account_id_here',
            REGION: 'us-east-2',
            # If you use GitHub / GitHub Enterprise, this will be the organization name
            GITHUB_REPOSITORY_OWNER_NAME: 'aws-samples',
            # Use your forked repo here!
            # This is used in the Logical Id of CloudFormation resources
            # We recommend capital case for consistency. e.g. DataLakeCdkBlog
            GITHUB_REPOSITORY_NAME: 'aws-cdk-pipelines-datalake-infrastructure',
            LOGICAL_ID_PREFIX: 'DataLakeCDKBlog',
            # This is used in resources that must be globally unique!
            # It may only contain alphanumeric characters, hyphens, and cannot contain trailing hyphens
            # E.g. unique-identifier-data-lake
            RESOURCE_NAME_PREFIX: 'cdkblog-e2e',
        },
        DEV: {
            ACCOUNT_ID: 'add_your_dev_account_id_here',
            REGION: 'us-east-2',
            VPC_CIDR: '10.20.0.0/24'
        },
        TEST: {
            ACCOUNT_ID: 'add_your_test_account_id_here',
            REGION: 'us-east-2',
            VPC_CIDR: '10.10.0.0/24'
        },
        PROD: {
            ACCOUNT_ID: 'add_your_prod_account_id_here',
            REGION: 'us-east-2',
            VPC_CIDR: '10.0.0.0/24'
        }
    }

AWS CodePipeline and GitHub integration

Integration between AWS CodePipeline and GitHub requires a personal access token. This access token is stored in Secrets Manager. This is a one-time setup and is applicable for all target AWS environments and all repositories created under the organization in GitHub.com. Follow the below steps:

  1. Note: Do NOT commit these values to your repository

  2. Create a personal access token in your GitHub. Refer to Creating a personal access token for details

  3. Go to configure_account_secrets.py and fill in the value for attribute MY_GITHUB_TOKEN

  4. Run the below command

    python3 ./lib/prerequisites/configure_account_secrets.py
  5. Expected output 1:

    Pushing secret: /DataLake/GitHubToken
  6. Expected output 2: A secret is added to AWS Secrets Manager with name /DataLake/GitHubToken


Deployment


Deploying for the first time

Configure your AWS profile to target the central Deployment account as an Administrator and perform the following steps:

  1. Open command line (terminal)

  2. Go to project root directory where cdk.json and app.py exist

  3. Run the command cdk ls

  4. Expected output: It lists CDK Pipelines and target account stacks on the console. A sample is below:

    DevDataLakeCDKBlogInfrastructurePipeline
    ProdDataLakeCDKBlogInfrastructurePipeline
    TestDataLakeCDKBlogInfrastructurePipeline
    DevDataLakeCDKBlogInfrastructurePipeline/Dev/DevDataLakeCDKBlogInfrastructureIam
    DevDataLakeCDKBlogInfrastructurePipeline/Dev/DevDataLakeCDKBlogInfrastructureS3BucketZones
    DevDataLakeCDKBlogInfrastructurePipeline/Dev/DevDataLakeCDKBlogInfrastructureVpc
    ProdDataLakeCDKBlogInfrastructurePipeline/Prod/ProdDataLakeCDKBlogInfrastructureIam
    ProdDataLakeCDKBlogInfrastructurePipeline/Prod/ProdDataLakeCDKBlogInfrastructureS3BucketZones
    ProdDataLakeCDKBlogInfrastructurePipeline/Prod/ProdDataLakeCDKBlogInfrastructureVpc
    TestDataLakeCDKBlogInfrastructurePipeline/Test/TestDataLakeCDKBlogInfrastructureIam
    TestDataLakeCDKBlogInfrastructurePipeline/Test/TestDataLakeCDKBlogInfrastructureS3BucketZones
    TestDataLakeCDKBlogInfrastructurePipeline/Test/TestDataLakeCDKBlogInfrastructureVpc

    Note:

    1. Here, DataLakeCDKBlog string literal is the value of LOGICAL_ID_PREFIX configured in configuration.py
    2. The first three stacks represent the CDK Pipeline stacks which will be created in the deployment account. For each, target environment, there will be three stacks.
  5. Set your environment variable back to deployment account

    export AWS_PROFILE=deployment_account_profile_name_here
  6. Run the command cdk deploy --all

  7. Expected outputs:

    1. In the deployment account's CloudFormation console, you will see the following CloudFormation stacks created

      CloudFormation_stacks_in_deployment_account

    2. In the deployment account's CodePipeline console, you will see the following Pipeline triggered

      CloudFormation_stacks_in_deployment_account

    3. In the dev data lake account's CloudFormation console, you will see the following stacks are completed successfully

      cdk_deploy_output_deployment_account_cfn_stacks


Iterative Deployment

Pipeline you have created using CDK Pipelines module is self mutating. That means, code checked to GitHub repository branch will kick off CDK Pipeline mapped to that branch.


Data lake ETL jobs

You can use the data lake infrastructure to deploy ETL jobs. We provided AWS CDK Pipelines for Data Lake ETL Deployment to help you accomplish this task.


Additional resources

In this section, we provide some additional resources.


Clean up

  1. Delete stacks using the command cdk destroy --all. When you see the following text, enter y, and press enter/return.

    Are you sure you want to delete: TestDataLakeCDKBlogInfrastructurePipeline, ProdDataLakeCDKBlogInfrastructurePipeline, DevDataLakeCDKBlogInfrastructurePipeline (y/n)?

    Note: This operation deletes stacks only in central deployment account

  2. To delete stacks in development account, log onto Dev account, go to AWS CloudFormation console and delete the following stacks:

    1. Dev-DevDataLakeCDKBlogInfrastructureVpc
    2. Dev-DevDataLakeCDKBlogInfrastructureS3BucketZones
    3. Dev-DevDataLakeCDKBlogInfrastructureIam

    Note:

    1. Deletion of Dev-DevDataLakeCDKBlogInfrastructureS3BucketZones will delete the S3 buckets (raw, conformed, and purpose-built). This behavior can be changed by modifying the retention policy in s3_bucket_zones_stack.py
  3. To delete stacks in test account, log onto Dev account, go to AWS CloudFormation console and delete the following stacks:

    1. Test-TestDataLakeCDKBlogInfrastructureVpc
    2. Test-TestDataLakeCDKBlogInfrastructureS3BucketZones
    3. Test-TestDataLakeCDKBlogInfrastructureIam

    Note:

    1. The S3 buckets (raw, conformed, and purpose-built) have retention policies attached and must be removed manually when they are no longer needed.
  4. To delete stacks in prod account, log onto Dev account, go to AWS CloudFormation console and delete the following stacks:

    1. Prod-ProdDataLakeCDKBlogInfrastructureVpc
    2. Prod-ProdDataLakeCDKBlogInfrastructureS3BucketZones
    3. Prod-ProdDataLakeCDKBlogInfrastructureIam

    Note:

    1. The S3 buckets (raw, conformed, and purpose-built) have retention policies attached and must be removed manually when they are no longer needed.
  5. Optional:

    1. If you are not using AWS CDK for other purposes, you can also remove CDKToolkit stack in each target account.

    2. Note: The asset S3 bucket has a retention policy and must be removed manually.

  6. For more details refer to AWS CDK Toolkit


AWS CDK

Refer to CDK Instructions for detailed instructions


Developer guide

Refer to Developer guide for more details of this project.


Authors and reviewers

The following people are involved in the design, architecture, development, and testing of this solution:

  1. Isaiah Grant, Cloud Consultant, 2nd Watch, Inc.
  2. Ravi Itha, Senior Data Architect, Amazon Web Services Inc.
  3. Muhammad Zahid Ali, Data Architect, Amazon Web Services Inc.

The following people are involved in the reviews:

  1. Mike Apted, Principal Solutions Architect, Amazon Web Services Inc.
  2. Nikunj Vaidya, Senior DevOps Specialist, Amazon Web Services Inc.

License Summary

This sample code is made available under the MIT-0 license. See the LICENSE file.

Comments
  • Infrastructure Pipeline fails for Codepipeline using Source Action Provider as Github Apps to access the Repositories

    Infrastructure Pipeline fails for Codepipeline using Source Action Provider as Github Apps to access the Repositories

    The Infrastructure Pipeline fails when the source action is changed from using personal access token to github app to access the repositories. The AWS Codepipeline recommends using github apps to access the repositories.

    opened by AditModi 2
  • initial lmd commit

    initial lmd commit

    Issue #, if available:

    Description of changes:

    By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

    opened by BranfordTGbieor 1
  • Feature/cdk pipelines init

    Feature/cdk pipelines init

    Issue #, if available:

    Description of changes:

    CDK synth is parameterized

    By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

    opened by itharavi 1
  • init

    init

    Issue #, if available: N/A

    Description of changes: CDK Application containing foundational infrastructure for the Data Lake utilizing CDK Pipelines for a central deployment strategy. Also includes bash scripts to assist with prerequisites.

    By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

    opened by igrant-2ndWatch 1
  • [cdkv2] deploy failed :  Policy contains a statement with one or more invalid principals. (Service: Kms, Status Code: 400)

    [cdkv2] deploy failed : Policy contains a statement with one or more invalid principals. (Service: Kms, Status Code: 400)

    I tried the steps in section prerequisites, they went ok Then i tried cdk deploy --all (i accepted the iam policies changes), it return below error

    ProdBaotranIdInfrastructurePipeline: creating CloudFormation changeset...
    1:32:31 PM | CREATE_FAILED        | AWS::KMS::Key               | ProdBaotranIdDatal...ryptionKeyCAA0B3FD
    Resource handler returned message: "Policy contains a statement with one or more invalid principals. (Service: Kms, Status Code: 400, Request ID: de419785-3167-48ee-9245-c0baf15efa86)" (RequestToken: c55417e7-027f-370e-2f92-401a1a5a4539,
    HandlerErrorCode: InvalidRequest)
    
    
     ❌  ProdBaotranIdInfrastructurePipeline failed: Error: The stack named ProdBaotranIdInfrastructurePipeline failed creation, it may need to be manually deleted from the AWS console: ROLLBACK_COMPLETE: Resource handler returned message: "Policy contains a statement with one or more invalid principals. (Service: Kms, Status Code: 400, Request ID: de419785-3167-48ee-9245-c0baf15efa86)" (RequestToken: c55417e7-027f-370e-2f92-401a1a5a4539, HandlerErrorCode: InvalidRequest)
        at FullCloudFormationDeployment.monitorDeployment (/Users/baotran/.npm-global/lib/node_modules/aws-cdk/lib/api/deploy-stack.ts:496:13)
        at processTicksAndRejections (internal/process/task_queues.js:95:5)
        at deployStack2 (/Users/baotran/.npm-global/lib/node_modules/aws-cdk/lib/cdk-toolkit.ts:241:24)
        at /Users/baotran/.npm-global/lib/node_modules/aws-cdk/lib/deploy.ts:39:11
        at run (/Users/baotran/.npm-global/lib/node_modules/p-queue/dist/index.js:163:29)
    
     ❌ Deployment failed: Error: Stack Deployments Failed: Error: The stack named ProdBaotranIdInfrastructurePipeline failed creation, it may need to be manually deleted from the AWS console: ROLLBACK_COMPLETE: Resource handler returned message: "Policy contains a statement with one or more invalid principals. (Service: Kms, Status Code: 400, Request ID: de419785-3167-48ee-9245-c0baf15efa86)" (RequestToken: c55417e7-027f-370e-2f92-401a1a5a4539, HandlerErrorCode: InvalidRequest)
        at deployStacks (/Users/baotran/.npm-global/lib/node_modules/aws-cdk/lib/deploy.ts:61:11)
        at runMicrotasks (<anonymous>)
        at processTicksAndRejections (internal/process/task_queues.js:95:5)
        at CdkToolkit.deploy (/Users/baotran/.npm-global/lib/node_modules/aws-cdk/lib/cdk-toolkit.ts:314:7)
        at initCommandLine (/Users/baotran/.npm-global/lib/node_modules/aws-cdk/lib/cli.ts:357:12)
    
    Stack Deployments Failed: Error: The stack named ProdBaotranIdInfrastructurePipeline failed creation, it may need to be manually deleted from the AWS console: ROLLBACK_COMPLETE: Resource handler returned message: "Policy contains a statement with one or more invalid principals. (Service: Kms, Status Code: 400, Request ID: de419785-3167-48ee-9245-c0baf15efa86)" (RequestToken: c55417e7-027f-370e-2f92-401a1a5a4539, HandlerErrorCode: InvalidRequest)
    

    Screen Shot 2022-10-18 at 11 47 14

    I tried again with verbose

    Screen Shot 2022-10-18 at 13 40 46

    It seems some bug with kms 's polices .

    Branch: cdkv2 python: 3.9.8 cdk : 2.46 OS: mac m1

    opened by baotran2207 0
  • bootstrapping failed with error InvalidClientTokenId

    bootstrapping failed with error InvalidClientTokenId

    After filled configuration.py file, i tried to run ./lib/prerequisites/bootstrap_deployment_account.sh but it failed with below error

    Branch:cdkv2 OS: Mac Python: 3.9.8 CDK: 2.46.0 (build 5a0595e)

    ****858 : Development account ****858 : Dev account (same with development account) ****223 : Test account ****615 : Prod account

    ⏳  Bootstrapping environment aws://*****858/ap-east-1...
     ⏳  Bootstrapping environment aws://*****615/ap-east-1...
     ⏳  Bootstrapping environment aws://*****223/ap-east-1...
     ❌  Environment aws://*****223/ap-east-1 failed bootstrapping: Error: Need to perform AWS calls for account ******223, but the current credentials are for *******858
        at SdkProvider.forEnvironment (/Users/baotran/.npm-global/lib/node_modules/aws-cdk/lib/api/aws-auth/sdk-provider.ts:184:60)
        at Function.lookup (/Users/baotran/.npm-global/lib/node_modules/aws-cdk/lib/api/bootstrap/deploy-bootstrap.ts:31:18)
        at Bootstrapper.modernBootstrap (/Users/baotran/.npm-global/lib/node_modules/aws-cdk/lib/api/bootstrap/bootstrap-environment.ts:81:21)
        at /Users/baotran/.npm-global/lib/node_modules/aws-cdk/lib/cdk-toolkit.ts:626:24
        at async Promise.all (index 2)
        at CdkToolkit.bootstrap (/Users/baotran/.npm-global/lib/node_modules/aws-cdk/lib/cdk-toolkit.ts:623:5)
        at initCommandLine (/Users/baotran/.npm-global/lib/node_modules/aws-cdk/lib/cli.ts:357:12)
     ❌  Environment aws://****615/ap-east-1 failed bootstrapping: Error: Need to perform AWS calls for account ****6615, but the current credentials are for ******858
        at SdkProvider.forEnvironment (/Users/baotran/.npm-global/lib/node_modules/aws-cdk/lib/api/aws-auth/sdk-provider.ts:184:60)
        at Function.lookup (/Users/baotran/.npm-global/lib/node_modules/aws-cdk/lib/api/bootstrap/deploy-bootstrap.ts:31:18)
        at Bootstrapper.modernBootstrap (/Users/baotran/.npm-global/lib/node_modules/aws-cdk/lib/api/bootstrap/bootstrap-environment.ts:81:21)
        at /Users/baotran/.npm-global/lib/node_modules/aws-cdk/lib/cdk-toolkit.ts:626:24
        at async Promise.all (index 1)
    
    Need to perform AWS calls for account *******223, but the current credentials are for *****858
     ❌  Environment aws://****858/ap-east-1 failed bootstrapping: InvalidClientTokenId: The security token included in the request is invalid.
        at Request.extractError (/Users/baotran/.npm-global/lib/node_modules/aws-sdk/lib/protocol/query.js:50:29)
        at Request.callListeners (/Users/baotran/.npm-global/lib/node_modules/aws-sdk/lib/sequential_executor.js:106:20)
        at Request.emit (/Users/baotran/.npm-global/lib/node_modules/aws-sdk/lib/sequential_executor.js:78:10)
        at Request.emit (/Users/baotran/.npm-global/lib/node_modules/aws-sdk/lib/request.js:686:14)
        at Request.transition (/Users/baotran/.npm-global/lib/node_modules/aws-sdk/lib/request.js:22:10)
        at AcceptorStateMachine.runTo (/Users/baotran/.npm-global/lib/node_modules/aws-sdk/lib/state_machine.js:14:12)
        at /Users/baotran/.npm-global/lib/node_modules/aws-sdk/lib/state_machine.js:26:10
        at Request.<anonymous> (/Users/baotran/.npm-global/lib/node_modules/aws-sdk/lib/request.js:38:9)
        at Request.<anonymous> (/Users/baotran/.npm-global/lib/node_modules/aws-sdk/lib/request.js:688:12)
        at Request.callListeners (/Users/baotran/.npm-global/lib/node_modules/aws-sdk/lib/sequential_executor.js:116:18) {
      code: 'InvalidClientTokenId',
      time: 2022-10-14T04:11:56.215Z,
      requestId: '62a87b65-2421-42a7-bd2d-7bfd05f2f6d5',
      statusCode: 403,
      retryable: false,
      retryDelay: 25.6356976364817
    

    I though this bootstrap development ls only access to Development account , why it tries to access other env accounts ?

    opened by baotran2207 0
  • Update configuration.py

    Update configuration.py

    Issue #, if available:

    Description of changes:

    By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

    opened by minasys 0
  • Cdk testing

    Cdk testing

    Issue #, if available:

    Description of changes:

    By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

    opened by itharavi 0
  • Fix markdown formatting

    Fix markdown formatting

    Issue #, if available:

    Description of changes: Fix markdown formatting

    By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

    opened by igrant-2ndWatch 0
  • Feature/remove ssm usage

    Feature/remove ssm usage

    Issue #, if available: N/A

    Description of changes: Flatten CloudFormation exports for ingestion due to tokenization.

    By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

    opened by igrant-2ndWatch 0
  • Changes per end-to-end testing

    Changes per end-to-end testing

    Issue #, if available:

    Description of changes: Minor code changes Updated README per end to end testing

    By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

    opened by itharavi 0
  • Cleanup issue (cdk destroy --all)

    Cleanup issue (cdk destroy --all)

    Branch: cdkv2 OS: mac

    When i run the cdk destroy --all , cdk just destroy the InfrastructurePipeline but the stacks in target accounts still exists .

    This is cloudformation stack in test account, not development account (which is central , host InfrastructurePipeline ) . Screen Shot 2022-10-21 at 17 19 07

    So i checked again the document , it says vpc and s3bucketszone stacks should be visible in development account, but in my case , they are in target account (eg: test account). Am doing something wrong or the document is not updated for cdkv2 ?

    opened by baotran2207 0
Cdk-python-crud-app - CDK Python CRUD App

Welcome to your CDK Python project! You should explore the contents of this proj

Shapon Sheikh 1 Jan 12, 2022
Project template for using aws-cdk, Chalice and React in concert, including RDS Postgresql and AWS Cognito

What is This? This repository is an opinonated project template for using aws-cdk, Chalice and React in concert. Where aws-cdk and Chalice are in Pyth

Rasmus Jones 4 Nov 7, 2022
AWS Blog post code for running feature-extraction on images using AWS Batch and Cloud Development Kit (CDK).

Batch processing with AWS Batch and CDK Welcome This repository demostrates provisioning the necessary infrastructure for running a job on AWS Batch u

AWS Samples 7 Oct 18, 2022
Multi-Branch CI/CD Pipeline using CDK Pipelines.

Using AWS CDK Pipelines and AWS Lambda for multi-branch pipeline management and infrastructure deployment. This project shows how to use the AWS CDK P

AWS Samples 36 Dec 23, 2022
DIAL(Did I Alert Lambda?) is a centralised security misconfiguration detection framework which completely runs on AWS Managed services like AWS API Gateway, AWS Event Bridge & AWS Lambda

DIAL(Did I Alert Lambda?) is a centralised security misconfiguration detection framework which completely runs on AWS Managed services like AWS API Gateway, AWS Event Bridge & AWS Lambda

CRED 71 Dec 29, 2022
Recommended AWS CDK project structure for Python applications

Recommended AWS CDK project structure for Python applications The project implements a user management backend component that uses Amazon API Gateway,

AWS Samples 110 Jan 6, 2023
Criando Lambda Functions para Ingerir Dados de APIs com AWS CDK

LIVE001 - AWS Lambda para Ingerir Dados de APIs Fazer o deploy de uma função lambda com infraestrutura como código Lambda vai numa API externa e extra

Andre Sionek 12 Nov 20, 2022
This repository contains code written in the AWS Cloud Development Kit (CDK)

This repository contains code written in the AWS Cloud Development Kit (CDK) which launches infrastructure across two different regions to demonstrate using AWS AppSync in a multi-region setup.

AWS Samples 5 Jun 3, 2022
Implement backup and recovery with AWS Backup across your AWS Organizations using a CI/CD pipeline (AWS CodePipeline).

Backup and Recovery with AWS Backup This repository provides you with a management and deployment solution for implementing Backup and Recovery with A

AWS Samples 8 Nov 22, 2022
Automated AWS account hardening with AWS Control Tower and AWS Step Functions

Automate activities in Control Tower provisioned AWS accounts Table of contents Introduction Architecture Prerequisites Tools and services Usage Clean

AWS Samples 20 Dec 7, 2022
aws-lambda-scheduler lets you call any existing AWS Lambda Function you have in a future time.

aws-lambda-scheduler aws-lambda-scheduler lets you call any existing AWS Lambda Function you have in the future. This functionality is achieved by dyn

Oğuzhan Yılmaz 57 Dec 17, 2022
Build better AWS infrastructure

Sceptre About Sceptre is a tool to drive AWS CloudFormation. It automates the mundane, repetitive and error-prone tasks, enabling you to concentrate o

sceptre 1.4k Jan 4, 2023
Troposphere and shellscript based AWS infrastructure automation creates an awsapigateway lambda with a go backend

Automated-cloudformation-infra Troposphere and shellscript based AWS infrastructure automation. Feel free to clone and edit for personal usage. The en

null 1 Jan 3, 2022
Infrastructure template and Jupyter notebooks for running RoseTTAFold on AWS Batch.

AWS RoseTTAFold Infrastructure template and Jupyter notebooks for running RoseTTAFold on AWS Batch. Overview Proteins are large biomolecules that play

AWS Samples 20 May 10, 2022
AWS Auto Inventory allows you to quickly and easily generate inventory reports of your AWS resources.

Photo by Denny Müller on Unsplash AWS Automated Inventory ( aws-auto-inventory ) Automates creation of detailed inventories from AWS resources. Table

AWS Samples 123 Dec 26, 2022
This repository contains ready to deploy automations on AWS

aws-automation-plugins This repository contains ready to deploy automations on AWS. How-To All projects in this repository contain a deploy.sh file wh

Akesh Patil 8 Sep 20, 2022
A solution designed to extract, transform and load Chicago crime data from an RDS instance to other services in AWS.

This project is intended to implement a solution designed to extract, transform and load Chicago crime data from an RDS instance to other services in AWS.

Yesaswi Avula 1 Feb 4, 2022