Visualize size of directories, s3 buckets.

Overview

Dir Sizer

This is a work in progress, right now consider this an Alpha or Proof of Concept level.

dir_sizer is a utility to visualize the size of a directory, or a directory like thing. Right now it produces webpages that look like this:

Showing the relative size of folders, making it easy to see what areas of a folder are using the most space.

It supports scanning and producing reports for AWS S3 Buckets and directories in the local file system. In the future it will add support for other sources, including Google Storage Bucket, Azure Blobs, remote filesystems via SSH, and perhaps others.

To run it, right now you'll need a recent version of Python installed, along with the boto3 package available. Then you can run a command like:

python dir_sizer.py --s3 --bucket example-bucket --output example.html

Which will produce a file called "example.html" showing details of the "example-bucket" AWS S3 Bucket.

You can track the current outstanding work items. Questions? Feedback? E-mail me.

You might also like...
Automatically Visualize any dataset, any size with a single line of code.  Created by Ram Seshadri. Collaborators Welcome. Permission Granted upon Request.
Automatically Visualize any dataset, any size with a single line of code. Created by Ram Seshadri. Collaborators Welcome. Permission Granted upon Request.

AutoViz Automatically Visualize any dataset, any size with a single line of code. AutoViz performs automatic visualization of any dataset with one lin

Automatically Visualize any dataset, any size with a single line of code.  Created by Ram Seshadri. Collaborators Welcome. Permission Granted upon Request.
Automatically Visualize any dataset, any size with a single line of code. Created by Ram Seshadri. Collaborators Welcome. Permission Granted upon Request.

AutoViz Automatically Visualize any dataset, any size with a single line of code. AutoViz performs automatic visualization of any dataset with one lin

Playing videos through S3 buckets (Wasabi, AWS, etc.) through client-side VideoJS player

Playing videos through S3 buckets (Wasabi, AWS, etc.) through client-side VideoJS player without incurring ingress/egree traffic on EC2 Instance.

Check subdomains for Open S3 buckets

SuBuket v1.0 Check subdomains for Open S3 buckets Coded by kaiz3n Basically, this tool makes use of another tool (sublist3r) to fetch subdomains, and

AWSXenos will list all the trust relationships in all the IAM roles and S3 buckets
AWSXenos will list all the trust relationships in all the IAM roles and S3 buckets

AWS External Account Scanner Xenos, is Greek for stranger. AWSXenos will list all the trust relationships in all the IAM roles, and S3 buckets, in an

S3-plugin is a high performance PyTorch dataset library to efficiently access datasets stored in S3 buckets.

S3-plugin is a high performance PyTorch dataset library to efficiently access datasets stored in S3 buckets.

A tool for creating credentials for accessing S3 buckets

s3-credentials A tool for creating credentials for accessing S3 buckets For project background, see s3-credentials: a tool for creating credentials fo

Split your patch similarly to `git add -p` but supporting multiple buckets
Split your patch similarly to `git add -p` but supporting multiple buckets

split-patch.py This is git add -p on steroids for patches. Given a my.patch you can run ./split-patch.py my.patch You can choose in which bucket to p

Organize Django settings into multiple files and directories. Easily override and modify settings. Use wildcards and optional settings files.
Organize Django settings into multiple files and directories. Easily override and modify settings. Use wildcards and optional settings files.

Organize Django settings into multiple files and directories. Easily override and modify settings. Use wildcards in settings file paths and mark setti

pytest plugin for manipulating test data directories and files

pytest-datadir pytest plugin for manipulating test data directories and files. Usage pytest-datadir will look up for a directory with the name of your

Synchronize local directories with Tahoe-LAFS storage grids
Synchronize local directories with Tahoe-LAFS storage grids

Gridsync Gridsync aims to provide a cross-platform, graphical user interface for Tahoe-LAFS, the Least Authority File Store. It is intended to simplif

A cd command that learns - easily navigate directories from the command line

NAME autojump - a faster way to navigate your filesystem DESCRIPTION autojump is a faster way to navigate your filesystem. It works by maintaining a d

Organize Django settings into multiple files and directories. Easily override and modify settings. Use wildcards and optional settings files.
Organize Django settings into multiple files and directories. Easily override and modify settings. Use wildcards and optional settings files.

Organize Django settings into multiple files and directories. Easily override and modify settings. Use wildcards in settings file paths and mark setti

A JupyterLab extension that allows opening files and directories with external desktop applications.
A JupyterLab extension that allows opening files and directories with external desktop applications.

A JupyterLab extension that allows opening files and directories with external desktop applications.

A simple script that can watch a list of directories for change and does some action
A simple script that can watch a list of directories for change and does some action

plot_watcher A simple script that can watch a list of directories and does some action when a specific kind of change happens In its current implement

It's a simple python script to take backup of directories (compressing) then the same to move your mentioned S3 bucket with the help of AWS IAM User.

Directory Backup Moved to S3 (Pyscript) Description Here it's a python script that needs to use this script simply create a directory backup and moved

DirBruter is a Python based CLI tool. It looks for hidden or existing directories/files using brute force method. It basically works by launching a dictionary based attack against a webserver and analyse its response.

DirBruter DirBruter is a Python based CLI tool. It looks for hidden or existing directories/files using brute force method. It basically works by laun

Creates folders into a directory to categorize files in that directory by file extensions and move all things from sub-directories to current directory.

Categorize and Uncategorize Your Folders Table of Content TL;DR just take me to how to install. What are Extension Categorizer and Folder Dumper Insta

CLI Utility to encode and recursively recreate directories with ffmpeg.
CLI Utility to encode and recursively recreate directories with ffmpeg.

FFenmass CLI Utility to encode and recursively recreate directories with ffmpeg. Report Bug · Request Feature Table of Contents Getting Started Prereq

Comments
  • Does --inventory require some specific metadata fields to be present?

    Does --inventory require some specific metadata fields to be present?

    I'm trying to use --inventory since my bucket have around ~22800 objects and doing just regular --s3 --bucket name --inventory --output name.html only generate up to 1000 objects (can this be overridden by the way? Since aws s3api list-object-versions can show all of the objects), but upon issuing ./dir_sizer.py --s3 --bucket 90poe-athens-go-mod-proxy --inventory --output ~/s3-cost/90poe-athens-go-mod-proxy.html I'm greeted with:

    Using S3 Inventory report "test-inventory" generated 2022-08-25 03:00:00...
    Scanning, gathered 1000 totaling 175.90 MiB...Traceback (most recent call last):
      File "/Users/alexk/src/dir_sizer/./dir_sizer.py", line 264, in <module>
        main()
      File "/Users/alexk/src/dir_sizer/./dir_sizer.py", line 190, in main
        for filename, size in load_files(opts, abstraction):
      File "/Users/alexk/src/dir_sizer/./dir_sizer.py", line 256, in load_files
        for filename, size in abstraction.scan_folder(opts):
      File "/Users/alexk/src/dir_sizer/s3_abstraction.py", line 310, in scan_folder
        for i, cur in enumerate(s3_list_objects(msg, opts, s3)):
      File "/Users/alexk/src/dir_sizer/s3_abstraction.py", line 274, in s3_list_objects
        'Size': int(row['Size']),
    ValueError: invalid literal for int() with base 10: ''
    

    What am I doing wrong here?

    Thanks for the awesome project otherwise!

    opened by theodrim 4
  • Does not work anymore.

    Does not work anymore.

    Hello, thank you for an awesome project. It was working for me quite fine few months ago, but tested yesterday and received following stack trace:

    Scanning...Traceback (most recent call last):
      File "/Users/alexk/src/dir_sizer/dir_sizer.py", line 264, in <module>
        main()
      File "/Users/alexk/src/dir_sizer/dir_sizer.py", line 190, in main
        for filename, size in load_files(opts, abstraction):
      File "/Users/alexk/src/dir_sizer/dir_sizer.py", line 256, in load_files
        for filename, size in abstraction.scan_folder(opts):
      File "/Users/alexk/src/dir_sizer/s3_abstraction.py", line 313, in scan_folder
        for i, cur in enumerate(s3_list_objects(msg, opts, s3)):
      File "/Users/alexk/src/dir_sizer/s3_abstraction.py", line 272, in s3_list_objects
        for row in get_bucket_inventory(msg, s3, opts['s3_bucket'], required_fields=required_fields, prefix=prefix):
      File "/Users/alexk/src/dir_sizer/s3_abstraction.py", line 206, in get_bucket_inventory
        inv_prefix = config['Destination']['S3BucketDestination']['Prefix']
    KeyError: 'Prefix'
    
    

    Environment: Python 3.9.13, macos, fresh virtual env with boto3 and Pillow installed. Inventory file: 94990aad-1014-4013-809c-82a266e40f14.csv.gz s3 inventory config: image

    I've additionally tested inside fresh linux vm, as well as docker container (with minimal dockerfile), but got same error.

    Should I still define default prefix for scanning ? Please let me know if I should provide any additional information, and once again thanks for the help and awesome tool!

    opened by theodrim 3
Owner
Scott Seligman
I write code, sometimes it does what I want.
Scott Seligman
A tool for creating credentials for accessing S3 buckets

s3-credentials A tool for creating credentials for accessing S3 buckets For project background, see s3-credentials: a tool for creating credentials fo

Simon Willison 138 Jan 6, 2023
It's a simple python script to take backup of directories (compressing) then the same to move your mentioned S3 bucket with the help of AWS IAM User.

Directory Backup Moved to S3 (Pyscript) Description Here it's a python script that needs to use this script simply create a directory backup and moved

Yousaf K Hamza 3 Mar 4, 2022
KTUN Öğrenci Bilgi Sistemine bağlanıp her 15 dakikada notları kontrol eden ve değişiklik olduğu zaman size Discord Webhook ile mesaj atan uygulama.

KTUN_Obis KTUN Öğrenci Bilgi Sistemi KTUN Öğrenci Bilgi Sistemine selenium kullanarak girip setttings.py dosyasında verdiğiniz bilgeri doldurup ardınd

İbrahim Uysal 5 Oct 27, 2022
Use Seaborn to visualize interpret the byte layout of Solana account types

solana-account-vis Use Seaborn to visually interpret the byte layout of Solana account types Usage from account_visualization import generate_account_

Jarry Xiao 15 Aug 25, 2022
Quickly visualize docker networks with graphviz.

Docker Network Graph Visualize the relationship between Docker networks and containers as a neat graphviz graph. Example Usage usage: docker-net-graph

Leo Verto 43 Dec 12, 2022
Simple web browser to visualize HiC tracks

HiCBrowser : A simple web browser to visualize Hi-C and other genomic tracks Fidel Ramirez, José Villaveces, Vivek Bhardwaj Installation You can insta

The deepTools ecosystem 14 Jun 21, 2022
Generate and Visualize Data Lineage from query history

Tokern Lineage Engine Tokern Lineage Engine is fast and easy to use application to collect, visualize and analyze column-level data lineage in databas

Tokern 237 Dec 29, 2022
A bot discord that can create directories, file, rename, move, navigate throw directories etc....

File Manager Discord What is the purpose of this program ? This program is made for a Discord bot. Its purpose is to organize the messages sent in a c

null 1 Feb 2, 2022
🍅🍅🍅YOLOv5-Lite: lighter, faster and easier to deploy. Evolved from yolov5 and the size of model is only 1.7M (int8) and 3.3M (fp16). It can reach 10+ FPS on the Raspberry Pi 4B when the input size is 320×320~

YOLOv5-Lite:lighter, faster and easier to deploy Perform a series of ablation experiments on yolov5 to make it lighter (smaller Flops, lower memory, a

pogg 1.5k Jan 5, 2023
A very simple tool for situations where optimization with onnx-simplifier would exceed the Protocol Buffers upper file size limit of 2GB, or simply to separate onnx files to any size you want.

sne4onnx A very simple tool for situations where optimization with onnx-simplifier would exceed the Protocol Buffers upper file size limit of 2GB, or

Katsuya Hyodo 10 Aug 30, 2022