Open-Source Python CLI package for copying DynamoDB tables and items in parallel batch processing + query natural & Global Secondary Indexes (GSIs)

Overview

dynamo-cmdline

Python Command-Line Interface Package to copy Dynamodb data in parallel batch processing + query natural & Global Secondary Indexes (GSIs).

Author: Simon Ryu

This packaging distribution is published on PyPI, found here.

What is DynamoDB?

Dynamo is a NoSQL database. A table in Dynamo is defined by its Partition Key, which uniquely identifies a list of records. The lists are ordered by the Sort Key, an optional key that along with Partition Key, form a primary key referred to as a composite primary key. This gives an additional flexibility when querying data. Each partition (Partition Key) can be thought of as a filing cabinet drawer, containing a bunch of related records which may or may not be sorted (Sort Key) depending on your need. Accounting for this optionality, the DynamodbTable class in module dynamodb_table.py can represent tables with either simple (Partition Key) or composite (Partition Key + Sort Key) primary key.

Dynamo vs Relational database

Dynamo differs from traditional, relational databases in that tables cannot be queried by random fields. Because it is structured to guarantee fast and scalable queries, tables also cannot be joined, grouped or unioned. A specific item can be found by specifiying a partition key and sort key, or a range of values within a partition, filtered by the sort key. Although filtering by other fields (attributes) is possible, it is highly discouraged as AWS charges the user based on how much data is read, and non-key filters occur after the reads happen. Querying, and especially copying large amount of data across different AWS environments must be done with extreme care (double the query operations!), hence this CLI package was distributed.

GSI

Since querying is limited to the table's primary key, how can we address many different access patterns? The answer is global secondary indexes, or GSIs. A GSI allows you to essentially re-declare your table with a new key schema. When an item is written into the table, the index will update automatically, so managing dual-writing is not a concern. Most importantly, the GSI can be queried directly just like the natural table, just as fast.

Libraries Used

  • AWS SDK for Python - Boto3
  • multiprocessing
You might also like...
[WIP]An ani-cli like cli tool for movies and webseries

mov-cli A cli to browse and watch movies. Installation This project is a work in progress. However, you can try it out python git clone https://github

flora-dev-cli (fd-cli) is command line interface software to interact with flora blockchain.

Install git clone https://github.com/Flora-Network/fd-cli.git cd fd-cli python3 -m venv venv source venv/bin/activate pip install -e . --extra-index-u

AWS Interactive CLI - Allows you to execute a complex AWS commands by chaining one or more other AWS CLI dependency

AWS Interactive CLI - Allows you to execute a complex AWS commands by chaining one or more other AWS CLI dependency

PdpCLI is a pandas DataFrame processing CLI tool which enables you to build a pandas pipeline from a configuration file.

PdpCLI Quick Links Introduction Installation Tutorial Basic Usage Data Reader / Writer Plugins Introduction PdpCLI is a pandas DataFrame processing CL

A Python package for a basic CLI and GUI user interface
A Python package for a basic CLI and GUI user interface

Bun Bun (Basic user interface) is a small Python package for a basic user interface. Table of contents Introduction Installation Usage Known issues an

Python package with library and CLI tool for analyzing SeaFlow data

Seaflowpy A Python package for SeaFlow flow cytometer data. Table of Contents Install Read EVT/OPP/VCT Files Command-line Interface Configuration Inte

Dead simple CLI tool to try Python packages - It's never been easier! :package:
Dead simple CLI tool to try Python packages - It's never been easier! :package:

try - It's never been easier to try Python packages try is an easy-to-use cli tool to try out Python packages. Features Install specific package versi

Zero-config CLI for TypeScript package development
Zero-config CLI for TypeScript package development

Despite all the recent hype, setting up a new TypeScript (x React) library can be tough. Between Rollup, Jest, tsconfig, Yarn resolutions, ESLint, and

Owner
null
Unofficial Open Corporates CLI: OpenCorporates is a website that shares data on corporations under the copyleft Open Database License. This is an unofficial open corporates python command line tool.

Unofficial Open Corporates CLI OpenCorporates is a website that shares data on corporations under the copyleft Open Database License. This is an unoff

Richard Mwewa 30 Sep 8, 2022
CryptoCo-py is a Python CLI application that uses CoinGecko API to allow the user to query cryptocurrency information by typing simple commands.

CryptoCo-py is a Python CLI application that uses CoinGecko API to allow the user to query cryptocurrency information by typing simple com

null 1 Jan 10, 2022
lazy_table - a python-tabulate wrapper for producing tables from generators

A python-tabulate wrapper for producing tables from generators. Motivation lazy_table is useful when (i) each row of your table is generated by a poss

Parsiad Azimzadeh 52 Nov 12, 2022
A command line tool to query source code from your current Python env

wxc wxc (pronounced "which") allows you to inspect source code in your Python environment from the command line. It is based on the inspect module fro

Clément Robert 13 Nov 8, 2022
An open-source CLI tool for backing up RDS(PostgreSQL) Locally or to Amazon S3 bucket

An open-source CLI tool for backing up RDS(PostgreSQL) Locally or to Amazon S3 bucket

null 1 Oct 30, 2021
Sink is a CLI tool that allows users to synchronize their local folders to their Google Drives. It is similar to the Git CLI and allows fast and reliable syncs with the drive.

Sink is a CLI synchronisation tool that enables a user to synchronise local system files and folders with their Google Drives. It follows a git C

Yash Thakre 16 May 29, 2022
Python-Stock-Info-CLI: Get stock info through CLI by passing stock ticker.

Python-Stock-Info-CLI Get stock info through CLI by passing stock ticker. Installation Use the following command to install the required modules at on

Ayush Soni 1 Nov 5, 2021
Yts-cli-streamer - A CLI movie streaming client which works on yts.mx API written in python

YTSP It is a CLI movie streaming client which works on yts.mx API written in pyt

null 1 Feb 5, 2022
gget is a free and open-source command-line tool and Python package that enables efficient querying of genomic databases.

gget is a free and open-source command-line tool and Python package that enables efficient querying of genomic databases. gget consists of a collection of separate but interoperable modules, each designed to facilitate one type of database querying in a single line of code.

Pachter Lab 570 Dec 29, 2022
A simple CLI based any Download Tool, that find files and let you stream or download thorugh WebTorrent CLI or Aria or any command tool

Privateer A simple CLI based any Download Tool, that find files and let you stream or download thorugh WebTorrent CLI or Aria or any command tool How

Shreyash Chavan 2 Apr 4, 2022