Pyspark sam - Analyze Big Sequence Alignments with PySpark in AWS EMR

Overview

pyspark_sam

This repo hosts my code for the article "Analyze Big Sequence Alignments with PySpark in AWS EMR".

Prerequisite

  1. Spark

  2. AWS CLI

  3. AWS Account

Run

Follow the instruction in the article. Once you have uploaded the files into your S3 bucket, run

aws emr create-cluster --name "Spark_step_pip" \
    --release-label emr-6.5.0 \
    --applications Name=Spark \
    --log-uri s3://[your_S3_bucket]/logs/ \
    --instance-type m5.xlarge \
    --instance-count 3 \
    --bootstrap-actions Path=s3://[your_S3_bucket]/emr_bootstrap.sh \
    --use-default-roles --auto-terminate \
    --steps "Type=Spark,Name=SparkProgram,ActionOnFailure=CONTINUE,Args=[--deploy-mode,cluster,--master,yarn,--py-files,s3://[your_S3_bucket]/helper_function.py,s3://[your_S3_bucket]/spark_3mer.py,s3://[your_S3_bucket]/test.sam,[your_S3_bucket],sankey.json]" 

When the job finishes, download the sankey.json. And run this command to visualize:

python sankey.py sankey.json

Authors

  • Sixing Huang - Concept and Coding

License

This project is licensed under the MIT License - see the LICENSE file for details

You might also like...
aws-lambda-scheduler lets you call any existing AWS Lambda Function you have in a future time.

aws-lambda-scheduler aws-lambda-scheduler lets you call any existing AWS Lambda Function you have in the future. This functionality is achieved by dyn

Project template for using aws-cdk, Chalice and React in concert, including RDS Postgresql and AWS Cognito

What is This? This repository is an opinonated project template for using aws-cdk, Chalice and React in concert. Where aws-cdk and Chalice are in Pyth

POC de uma AWS lambda que executa a consulta de preços de criptomoedas, e é implantada na AWS usando Github actions.
POC de uma AWS lambda que executa a consulta de preços de criptomoedas, e é implantada na AWS usando Github actions.

Cryptocurrency Prices Overview Instalação Repositório Configuração CI/CD Roadmap Testes Overview A ideia deste projeto é aplicar o conteúdo estudado s

Python + AWS Lambda Hands OnPython + AWS Lambda Hands On
Python + AWS Lambda Hands OnPython + AWS Lambda Hands On

Python + AWS Lambda Hands On Python Criada em 1990, por Guido Van Rossum. "Bala de prata" (quase). Muito utilizado em: Automatizações - Selenium, Beau

Unauthenticated enumeration of services, roles, and users in an AWS account or in every AWS account in existence.

Quiet Riot 🎶 C'mon, Feel The Noise 🎶 An enumeration tool for scalable, unauthenticated validation of AWS principals; including AWS Acccount IDs, roo

AWS Blog post code for running feature-extraction on images using AWS Batch and Cloud Development Kit (CDK).

Batch processing with AWS Batch and CDK Welcome This repository demostrates provisioning the necessary infrastructure for running a job on AWS Batch u

Aws-lambda-requests-wrapper - Request/Response wrapper for AWS Lambda with API Gateway

AWS Lambda Requests Wrapper Request/Response wrapper for AWS Lambda with API Gat

AWS-serverless-starter - AWS Lambda serverless stack via Serverless framework
AWS-serverless-starter - AWS Lambda serverless stack via Serverless framework

Serverless app via AWS Lambda, ApiGateway and Serverless framework Configuration

Aws-cidr-finder - A Python CLI tool for finding unused CIDR blocks in AWS VPCs

aws-cidr-finder Overview An Example Installation Configuration Contributing Over

Owner
Sixing Huang
A triple Neo4j certified data scientist. I am currently working at BGI in Shenzhen.
Sixing Huang
DIAL(Did I Alert Lambda?) is a centralised security misconfiguration detection framework which completely runs on AWS Managed services like AWS API Gateway, AWS Event Bridge & AWS Lambda

DIAL(Did I Alert Lambda?) is a centralised security misconfiguration detection framework which completely runs on AWS Managed services like AWS API Gateway, AWS Event Bridge & AWS Lambda

CRED 71 Dec 29, 2022
Cookiecutter templates for Serverless applications using AWS SAM and the Rust programming language.

Cookiecutter SAM template for Lambda functions in Rust This is a Cookiecutter template to create a serverless application based on the Serverless Appl

AWS Samples 24 Nov 11, 2022
Automated AWS account hardening with AWS Control Tower and AWS Step Functions

Automate activities in Control Tower provisioned AWS accounts Table of contents Introduction Architecture Prerequisites Tools and services Usage Clean

AWS Samples 20 Dec 7, 2022
Implement backup and recovery with AWS Backup across your AWS Organizations using a CI/CD pipeline (AWS CodePipeline).

Backup and Recovery with AWS Backup This repository provides you with a management and deployment solution for implementing Backup and Recovery with A

AWS Samples 8 Nov 22, 2022
AWS Glue PySpark - Apache Hudi Quick Start Guide

AWS Glue PySpark - Apache Hudi Quick Start Guide Disclaimer: This is a quick start guide for the Apache Hudi Python Spark connector, running on AWS Gl

Gabriel Amazonas Mesquita 8 Nov 14, 2022
Automatically compile an AWS Service Control Policy that ONLY allows AWS services that are compliant with your preferred compliance frameworks.

aws-allowlister Automatically compile an AWS Service Control Policy that ONLY allows AWS services that are compliant with your preferred compliance fr

Salesforce 189 Dec 8, 2022
SSH-Restricted deploys an SSH compliance rule (AWS Config) with auto-remediation via AWS Lambda if SSH access is public.

SSH-Restricted SSH-Restricted deploys an SSH compliance rule with auto-remediation via AWS Lambda if SSH access is public. SSH-Auto-Restricted checks

Adrian Hornsby 30 Nov 8, 2022
AWS Auto Inventory allows you to quickly and easily generate inventory reports of your AWS resources.

Photo by Denny Müller on Unsplash AWS Automated Inventory ( aws-auto-inventory ) Automates creation of detailed inventories from AWS resources. Table

AWS Samples 123 Dec 26, 2022
A suite of utilities for AWS Lambda Functions that makes tracing with AWS X-Ray, structured logging and creating custom metrics asynchronously easier

A suite of utilities for AWS Lambda Functions that makes tracing with AWS X-Ray, structured logging and creating custom metrics asynchronously easier

Amazon Web Services - Labs 1.9k Jan 7, 2023