AWS Quick Start Team

Overview

EKS CDK Quick Start (in Python)

DEVELOPER PREVIEW NOTE: Thise project is currently available as a preview and should not be considered for production use at this time.

This Quick Start is a reference architecture and implementation of how you can use the Cloud Development Kit (CDK) to orchestrate the Elastic Kubernetes Serivce (EKS) to quickly deploy a more complete and "production ready" Kubernetes environment on AWS.

What does this Quick Start create for you:

  1. An appropriate VPC (/22 CDIR w/1024 IPs by default - though you can edit this in cluster-bootstrap/cdk.json) with public and private subnets across three availability zones.
    1. Alternatively, just flip create_new_vpc to False and then specify the name of your VPC under existing_vpc_name in cluster-bootstrap/cdk.json to use an existing VPC. CDK will automatically work out which subnets are public and which are private and deploy to the private ones.
      1. Note that if you do this you'll also have to tag your subnets as per https://aws.amazon.com/premiumsupport/knowledge-center/eks-vpc-subnet-discovery/
  2. A new EKS cluster with:
    1. A dedicated new IAM role to create it from. The role that creates the cluster is a permanent, and rather hidden, full admin role that doesn't appear in nor is subject to the aws-auth config map. So, you want a dedicated role explicitly for that purpose like CDK does for you here that you can then restrict access to assume unless you need it (e.g. you lock yourself out of the cluster with by making a mistake in the aws-auth configmap).
      1. Alternatively, you can specify an existing role ARN to make the administrator by flipping to False in create_new_cluster_admin_role and then putting the arn to use in existing_admin_role_arn in cluster-bootstrap/cdk.json.
    2. A new Managed Node Group with 3 x m5.large instances spread across 3 Availability Zones.
      1. You can change the instance type and quantity by changing eks_node_quantity and/or eks_node_instance_type in cluster-bootstrap/eks-cluster.py.
    3. All control plane logging to CloudWatch Logs enabled (defaulting to 1 month's retention within CloudWatch Logs).
  3. The AWS Load Balancer Controller (https://kubernetes-sigs.github.io/aws-load-balancer-controller) to allow you to seamlessly use ALBs for Ingress and NLB for Services.
  4. External DNS (https://github.com/kubernetes-sigs/external-dns) to allow you to automatically create/update Route53 entries to point your 'real' names at your Ingresses and Services.
  5. A new managed Amazon Elasticsearch Domain behind a private VPC endpoint as well as an aws-for-fluent-bit DaemonSet (https://github.com/aws/aws-for-fluent-bit) to ship all your container logs there - including enriching them with the Kubernetes metadata using the kubernetes fluent-bit filter.
    1. Note that this provisions a single node 10GB managed Elasticsearch Domain suitable for a proof of concept. To use this in production you'll likely need to edit the es_capacity section of cluster-bootstrap/cdk.json to scale this out from a capacity and availability perspective. For more information see https://docs.aws.amazon.com/elasticsearch-service/latest/developerguide/sizing-domains.html.
    2. Note that this provisions an Elasticsearch and Kibana that does not have a login/password configured. It is secured instead by network access controlled by it being in a private subnet and its security group. While this is acceptable for the creation of a Proof of Concept (POC) environment, for production use you'd want to consider implementing Cognito to control user access to Kibana - https://docs.aws.amazon.com/elasticsearch-service/latest/developerguide/fgac.html#fgac-walkthrough-iam
  6. (Temporarily until the AWS Managed Prometheus/Grafana are available) The kube-prometheus Operator (https://github.com/prometheus-operator/kube-prometheus) which deploys you a Prometheus on your cluster that will collect all your cluster metrics as well as a Grafana to visualise them.
    1. You can adjust the disk size of these in cluster-bootstrap/cdk.json
    2. TODO: Add some initial alerts for sensible common items in the cluster via Prometheus/Alertmanager
  7. The AWS EBS CSI Driver (https://github.com/kubernetes-sigs/aws-ebs-csi-driver). Note that new development on EBS functionality has moved out of the Kubernetes mainline to this externalised CSI driver.
  8. The AWS EFS CSI Driver (https://docs.aws.amazon.com/eks/latest/userguide/efs-csi.html). Note that new development on EFS functionality has moved out of the Kubernetes mainline to this externalised CSI driver.
  9. An OPA Gatekeeper to enforce preventative security and operational policies (https://github.com/open-policy-agent/gatekeeper). A set of example policies is provided as well - see gatekeeper-policies/README.md
  10. The cluster autoscaler (CA) (https://github.com/kubernetes/autoscaler). This will scale your EC2 instances to ensure you have enough capacity to launch all of your Pods as they are deployed/scaled.
  11. The metrics-server (required for the Horizontal Pod Autoscaler (HPA)) (https://github.com/kubernetes-sigs/metrics-server)
  12. The Calico Network Policy Provider (https://docs.aws.amazon.com/eks/latest/userguide/calico.html). This enforces any NetworkPolicies that you specify.
  13. The AWS Systems Manager (SSM) agent. This allows for various management activities (e.g. Inventory, Patching, Session Manager, etc.) of your Instances/Nodes by AWS Systems Manager.

All of the add-ons are optional and you control whether you get them with variables in cluster-bootstrap/cdk.json that you flip to True/False

Why Cloud Development Kit (CDK)?

The Cloud Development Kit (CDK) is a tool where you can write infrastructure-as-code with 'actual' code (TypeScript, Python, C#, and Java). This takes these languages and 'compiles' them into a CloudFormation template for the AWS CloudFormation engine to then deploy and manage as stacks.

When you develop and deploy infrastructure with the CDK you don't edit the intermediate CloudFormation but, instead, let CDK regenerate it in response to changes in the upstream CDK code.

What makes CDK uniquely good when it comes to our EKS Quickstart is:

  • It handles the IAM Roles for Service Accounts (IRSA) rather elegantly and creates the IAM Roles and Policies, creates the Kubernetes service accounts, and then maps them to each other.
  • It has implemented custom CloudFormation resources with Lambda invoking kubectl and helm to deploy manifests and charts as part of the cluster provisioning.
    • Until we have Managed Add-Ons for the common things with EKS like the above this can fill the gap and provision us a complete cluster with all the add-ons we need.

Getting started

You can either deploy this from your machine or leverage CodeBuild. The advantage of using CodeBuild is it also sets up a 'GitOps' approach where when you merge changes to the cluster-bootstrap folder it'll (re)run cdk deploy for you. This means that to update the cluster you just change this file and merge.

Deploy from CodeBuild

To use the CodeBuild CloudFormation Template:

  1. Generate a personal access token on GitHub - https://docs.github.com/en/github/authenticating-to-github/creating-a-personal-access-token
  2. Edit cluster-codebuild/EKSCodeBuildStack.template.json to change Location to your GitHub repo/path
  3. Run aws codebuild import-source-credentials --server-type GITHUB --auth-type PERSONAL_ACCESS_TOKEN --token to provide your token to CodeBuild
  4. Deploy cluster-codebuild/EKSCodeBuildStack.template.json
  5. Go to the CodeBuild console, click on the Build project that starts with EKSCodeBuild, and then click the Start build button.
  6. (Optional) You can click the Tail logs button to follow along with the build process

NOTE: This also enables a GitOps pattern where changes to the cluster-bootstrap folder on the branch mentioned (main by default) will re-trigger this CodeBuild to do another cdk deploy via web hook.

Deploy from your laptop

Alternatively, you can deploy from any machine (your laptop, a bastion EC2 instance, etc.).

There are some prerequisites you likely will need to install on the machine doing your environment bootstrapping including Node, Python, the AWS CLI, the CDK, fluxctl and Helm

Pre-requisites - Ubuntu 20.04.2 LTS (including via Windows 10's WSL)

Run sudo ./ubuntu-prepreqs.sh

Pre-requisites - Mac

  1. Install Homebrew (https://brew.sh/)
  2. Run ./mac-prereqs.sh
  3. Edit your ~/.zprofile and/or your ~/.bash_profile to put $(brew --prefix)/opt/python/libexec/bin at the start of your PATH statement so that the brew things installed take precedence over the built-in often outdated options like python2. You can do this with a export PATH=/opt/homebrew/opt/python/libexec/bin:$PATH.

Deploy from CDK locally

  1. Make sure that you have your AWS CLI configured with administrative access to the AWS account in question (e.g. an aws s3 ls works)
    1. This can be via setting your access key and secret in your .aws folder via aws configure or in your environment variables by copy and pasting from AWS SSO etc.
  2. Run cd quickstart-eks-cdk-python/cluster-bootstrap
  3. Run sudo npm install --upgrade -g aws-cdk to ensure your CDK is up to date
  4. Run pip install --upgrade -r requirements.txt to install the required Python bits of the CDK
  5. Run export CDK_DEPLOY_REGION=ap-southeast-2 replacing ap-southeast-2 with your region of choice
  6. Run export CDK_DEPLOY_ACCOUNT=123456789123 replacing 123456789123 with your AWS account number
  7. (Optional) If you want to make an existing IAM User or Role the cluster admin rather than creating a new one then edit cluster-bootstrap/cdk.json and comment out the current cluster_admin_role and uncomment the one beneath it and fill in the ARN of the User/Role you'd like there.
  8. (Only required the first time you use the CDK in this account) Run cdk bootstrap to create the S3 bucket where it puts the CDK puts its artifacts
  9. (Only required the first time ES in VPC mode is used in this account) Run aws iam create-service-linked-role --aws-service-name es.amazonaws.com
  10. Run cdk deploy --require-approval never

Deploy Open Policy Agent (OPA) Gatekeeper and the policies

By default Gatekeeper and the policies are set to "False". As CDK deploys all the add-ons in parallel adding an admission controller introduced intermittent issues preventing the template from deploying - so you should deploy the cluster first then set these to "True" in cluster-bootstrap/cdk.json and re-run the cdk deploy after the environment is up to introduce Gatekeeper.

Them, for the sample policies we've deployed Flux to deploy them via GitOps. In order for that to work, though, we'll need to get the SSH key that Flux generated and add it to GitHub to give us the required access.

fluxctl and the required access is set up on the Bastion - if you have deployed that:

  1. Connect to the Bastion via Systems Manager Session Manager or code-server
  2. Run fluxctl identity --k8s-fwd-ns kube-system
  3. Take the SSH key that has been ouput and add it to GitHub by following these instructions - https://docs.github.com/en/github/authenticating-to-github/adding-a-new-ssh-key-to-your-github-account
  4. (Optional) If you don't want to wait up to 5 minutes for Flux to sync you can run fluxctl sync --k8s-fwd-ns kube-system

TODO: Update this to use Flux v2 which should GA soon.

Deploy and set up a Bastion based on an EC2 instance accessed securely via Systems Manager's Session Manager

If you set deploy_bastion to True in cluster-bootstrap/cdk.json then the template will deploy an EC2 instance with all the tools to manage your cluster.

To access this bastion:

  1. Go to the Systems Manager Server in the AWS Console
  2. Go to Managed Instances on the left hand navigation pane
  3. Select the instance with the name EKSClusterStack/CodeServerInstance
  4. Under the Instance Actions menu on the upper right choose Start Session
  5. You need to run sudo bash to get to root's profile where we've set up kubectl
  6. Run kubectl get nodes to see that all the tools are there and set up for you.

Set up your Client VPN to access the environment

If you set deploy_vpn to True in cluster-bootstrap/cdk.json then the template will deploy a Client VPN so that you can securely access the cluster's private VPC subnets from any machine. You'll need this to be able to reach the Kibana for your logs and Grafana for your metrics by default (unless you are using an existing VPC where you have already arranged such connectivity)

Note that you'll also need to create client and server certificates and upload them to ACM by following these instructions - https://docs.aws.amazon.com/vpn/latest/clientvpn-admin/client-authentication.html#mutual - and update ekscluster.py with the certificate ARNs for this to work.

Once it has created your VPN you then need to configure the client:

  1. Open the AWS VPC Console and go to the Client VPN Endpoints on the left panel
  2. Click the Download Client Configuration button
  3. Edit the downloaded file and add:
    1. A section at the bottom for the server cert in between and
    2. Then under that another section for the client private key between and under that
  4. Install the AWS Client VPN Client - https://aws.amazon.com/vpn/client-vpn-download/
  5. Create a new profile pointing it at that configuration file
  6. Connect to the VPN

Once you are connected it is a split tunnel - meaning only the addresses in your EKS VPC will get routed through the VPN tunnel.

You then need to add the EKS cluster to your local kubeconfig by running the command in the clusterConfigCommand Output of the EKSClusterStack.

Then you should be able to run a kubectl get all -A and see everything running on your cluster.

How access to Elasticsearch and Kibana is secured

We put the Elasticsearch both in the VPC (i.e. not on the Internet) as well as in its own Security Group - which will give access by default only from our EKS cluster's SG (so that can ship the logs to it) as well as to from our (optional) Client VPN's Security Group to allow us access Kibana when on VPN.

Since this ElasticSearch can only be reached if you are both within the private VPC network and allowed by this Security Group, then it is low risk to allow 'open access' to it - especially in a Proof of Concept (POC) environment. As such, we've configured its default access policy so that no login and password and are required - choosing to control access to it from a network perspective instead.

For production use, though, you'd likely want to consider implementing Cognito to facilitate authentication/authorisation for user access to Kibana - https://docs.aws.amazon.com/elasticsearch-service/latest/developerguide/fgac.html#fgac-walkthrough-iam

Connect to Kibana and do initial setup

  1. Once that new access policy has applied click on the Kibana link on the Elasticsearch Domain's Overview Tab
  2. Click "Explore on my own" in the Welcome page
  3. Click "Connect to your Elasticsearch index" under "Use Elasticsearch Data"
  4. Close the About index patterns box
  5. Click the Create Index Pattern button
  6. In the Index pattern name box enter fluent-bit* and click Next step
  7. Pick @timestamp from the dropbown box and click Create index pattern
  8. Then go back Home and click Discover

TODO: Walk through how to do a few basic things in Kibana with searching and dashboarding your logs.

Checking out Grafana and the out-of-the-box metrics dashboards

We have deployed an in-VPC private Network Load Balancer (NLB) to access your Grafana service to visualise the metrics from the Prometheus we've deployed onto the cluster.

To access this enter the following command get service grafana-nlb --namespace=kube-system to find the address of this under EXTERNAL-IP. Alternatively, you can find the Grafana NLB in the AWS EC2 console and get its address from there.

Once you go to that page the default login/password is admin/prom-operator.

There are some default dashboards that ship with this which you can see by going to Home on top. This will take you to a list view of the available dashboards. Some good ones to check out include:

  • Kubernetes / Compute Resources / Cluster
    • This gives you a whole cluster view
  • Kubernetes / Compute Resources / Namespace (Pods)
    • There is a namespace dropdown at the top and it'll show you the graphs including the consumption in that namespace broken down by Pod
  • Kubernetes / Compute Resources / Namespace (Workloads)
    • Similar to the Pod view but instead focuses on Deployment, StatefulSet and DaemonSet views

Within all of these dashboards you can click on names as links and it'll drill down to show you details relevant to that item.

Deploy some sample apps to explore our new Kubernetes environment and its features

TODO: Walk through deploying some apps that show off some of the cluster add-ons we've installed

Upgrading your cluster

Since we are explicit both with the EKS Control Plane version as well as the Managed Node Group AMI version upgrading these is simply incrementing these versions, saving cluster-bootstrap/cdk.json and then running a cdk deploy.

As per the EKS Upgrade Instructions you start by upgrading the control plane, then any required add-on versions and then the worker nodes.

Upgrade the control plane by changing eks_version in cluster-bootstrap/cdk.json. You can see what to put there by looking at the CDK documentation for KubernetesVersion. Then run cdk deploy - or let the CodeBuild GitOps provided in cluster-codebuild do it for you.

Upgrade the worker nodes by updating eks_node_ami_version in cluster-bootstrap/cdk.json with the new version. You find the version to type there in the EKS Documentation as shown here:

Upgrading an add-on

Each of our add-ons are deployed via Helm Charts and are explicit about the chart version being deployed. In the comment above each chart version we link to the GitHub repo for that chart where you can see what the current chart version is and can see what changes may have been rolled in since the one cited in the template.

To upgrade the chart version update the chart version to the upstream version you see there, save it and then do a cdk deploy.

NOTE: While we were thinking about parametising the chart versions within cluster-bootstrap/cdk.json, it is possible as the Chart versions change that the values you have to specify might also change. As such, we have not done so as a reminder that this change might require a bit of research and testing rather than just popping a new version number parameter in and expecting it'll work.

You might also like...
Project template for using aws-cdk, Chalice and React in concert, including RDS Postgresql and AWS Cognito

What is This? This repository is an opinonated project template for using aws-cdk, Chalice and React in concert. Where aws-cdk and Chalice are in Pyth

POC de uma AWS lambda que executa a consulta de preços de criptomoedas, e é implantada na AWS usando Github actions.
POC de uma AWS lambda que executa a consulta de preços de criptomoedas, e é implantada na AWS usando Github actions.

Cryptocurrency Prices Overview Instalação Repositório Configuração CI/CD Roadmap Testes Overview A ideia deste projeto é aplicar o conteúdo estudado s

Python + AWS Lambda Hands OnPython + AWS Lambda Hands On
Python + AWS Lambda Hands OnPython + AWS Lambda Hands On

Python + AWS Lambda Hands On Python Criada em 1990, por Guido Van Rossum. "Bala de prata" (quase). Muito utilizado em: Automatizações - Selenium, Beau

Unauthenticated enumeration of services, roles, and users in an AWS account or in every AWS account in existence.

Quiet Riot 🎶 C'mon, Feel The Noise 🎶 An enumeration tool for scalable, unauthenticated validation of AWS principals; including AWS Acccount IDs, roo

AWS Blog post code for running feature-extraction on images using AWS Batch and Cloud Development Kit (CDK).

Batch processing with AWS Batch and CDK Welcome This repository demostrates provisioning the necessary infrastructure for running a job on AWS Batch u

Aws-lambda-requests-wrapper - Request/Response wrapper for AWS Lambda with API Gateway

AWS Lambda Requests Wrapper Request/Response wrapper for AWS Lambda with API Gat

AWS-serverless-starter - AWS Lambda serverless stack via Serverless framework
AWS-serverless-starter - AWS Lambda serverless stack via Serverless framework

Serverless app via AWS Lambda, ApiGateway and Serverless framework Configuration

Aws-cidr-finder - A Python CLI tool for finding unused CIDR blocks in AWS VPCs

aws-cidr-finder Overview An Example Installation Configuration Contributing Over

AWS CloudSaga - Simulate security events in AWS

AWS CloudSaga - Simulate security events in AWS AWS CloudSaga is for customers to test security controls and alerts within their Amazon Web Services (

Owner
AWS Quick Start
Automated gold-standard deployments on AWS
AWS Quick Start
DIAL(Did I Alert Lambda?) is a centralised security misconfiguration detection framework which completely runs on AWS Managed services like AWS API Gateway, AWS Event Bridge & AWS Lambda

DIAL(Did I Alert Lambda?) is a centralised security misconfiguration detection framework which completely runs on AWS Managed services like AWS API Gateway, AWS Event Bridge & AWS Lambda

CRED 71 Dec 29, 2022
This project is based on discord.py and is meant to be a 'Quick Start Bot' to cut down on the time it takes to write complex discord bots.

This project is based on discord.py and is meant to be a 'Quick Start Bot' to cut down on the time it takes to write complex discord bots.

Alec Ibarra 1 Mar 3, 2022
Automated AWS account hardening with AWS Control Tower and AWS Step Functions

Automate activities in Control Tower provisioned AWS accounts Table of contents Introduction Architecture Prerequisites Tools and services Usage Clean

AWS Samples 20 Dec 7, 2022
Implement backup and recovery with AWS Backup across your AWS Organizations using a CI/CD pipeline (AWS CodePipeline).

Backup and Recovery with AWS Backup This repository provides you with a management and deployment solution for implementing Backup and Recovery with A

AWS Samples 8 Nov 22, 2022
Automatically compile an AWS Service Control Policy that ONLY allows AWS services that are compliant with your preferred compliance frameworks.

aws-allowlister Automatically compile an AWS Service Control Policy that ONLY allows AWS services that are compliant with your preferred compliance fr

Salesforce 189 Dec 8, 2022
SSH-Restricted deploys an SSH compliance rule (AWS Config) with auto-remediation via AWS Lambda if SSH access is public.

SSH-Restricted SSH-Restricted deploys an SSH compliance rule with auto-remediation via AWS Lambda if SSH access is public. SSH-Auto-Restricted checks

Adrian Hornsby 30 Nov 8, 2022
AWS Auto Inventory allows you to quickly and easily generate inventory reports of your AWS resources.

Photo by Denny Müller on Unsplash AWS Automated Inventory ( aws-auto-inventory ) Automates creation of detailed inventories from AWS resources. Table

AWS Samples 123 Dec 26, 2022
A suite of utilities for AWS Lambda Functions that makes tracing with AWS X-Ray, structured logging and creating custom metrics asynchronously easier

A suite of utilities for AWS Lambda Functions that makes tracing with AWS X-Ray, structured logging and creating custom metrics asynchronously easier

Amazon Web Services - Labs 1.9k Jan 7, 2023
aws-lambda-scheduler lets you call any existing AWS Lambda Function you have in a future time.

aws-lambda-scheduler aws-lambda-scheduler lets you call any existing AWS Lambda Function you have in the future. This functionality is achieved by dyn

Oğuzhan Yılmaz 57 Dec 17, 2022