Shared utility scripts for AI for Earth projects and team members

Overview

Overview

Shared utilities developed by the Microsoft AI for Earth team

The general convention in this repo is that users who want to consume these utilities will add the top-level path of the repo to their Python path, so it's okay to assume that other packages/modules within the repo are available. The "scrap" directory can be used for standalone, one-time-use scripts that you might otherwise have emailed to someone.

Contents

  • path_utils.py: Miscellaneous useful utils for path manipulation, things that could almost be in os.path, but aren't.

  • matlab_porting_tools.py: A few ported Matlab functions that makes it easier to port other, larger Matlab functions to Python.

  • write_html_image_list.py: Given a list of image file names, writes an HTML file that shows all those images, with optional one-line headers above each.

  • sas_blob_utils.py: Helper functions for dealing with Shared Access Signatures (SAS) tokens for Azure Blob Storage.

  • TF_OD_API: A Dockerfile and a script to prepare a Docker image for use with the TensorFlow Object Detection API.

  • gDrive_download.py: Semi-automatic script for bulk download from shared Google Drives using the gDrive Python SDK.

  • azure-sdk-calc-storage-size: Script for recursively computing the size of all blobs and files in an Azure subscription.

  • azure-metrics-calc-storage-size: Script for computing the total size of all storage accounts in an Azure subscription (using Azure Metrics).

  • ai4e_azure_utils.py: Functions for interacting with the Azure Storage SDK

  • ai4e_web_utils.py: Functions for interacting with http requests

  • geospatial: Classes and utility functions for processing geospatial data for machine learning applications

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.microsoft.com.

When you submit a pull request, a CLA-bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., label, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact [email protected] with any additional questions or comments.

Comments
  • Added dockerfile and sh script to install TFODAPI (TF v 1.15) on Azure Ubuntu 16.04 DSVM

    Added dockerfile and sh script to install TFODAPI (TF v 1.15) on Azure Ubuntu 16.04 DSVM

    I added two files that allow easy installation of TFODAPI w/ TF 1.15 on an Azure Ubuntu 16.04 DSVM. Both of the files are modified copies of the existing TFODAPI w/ TF 1.12 dockerfile and sh scripts.

    I found that TF 1.15 is a much smoother experience. I was hitting memory errors on computationally expensive models using TF v1.2, even using batch sizes of 1 and small images. I haven't had any issues since making the switch to TF 1.15. My choice of TF 1.15 came from a lot of research, with many recommendations from issues such asthis one.

    This may not be super useful in the long run. I see that Azure has 18.04 DSVMs out for testing. Also, I read somewhere that TFODAPI will eventually move to TF 2.0. However, in the meantime, it may help someone!

    opened by rosswin 2
  • This repo is missing important files

    This repo is missing important files

    There are important files that Microsoft projects should all have that are not present in this repository. A pull request has been opened to add the missing file(s). When the pr is merged this issue will be closed automatically.

    Microsoft teams can learn more about this effort and share feedback within the open source guidance available internally.

    Merge this pull request

    opened by microsoft-github-policy-service[bot] 0
  • Adding Microsoft SECURITY.MD

    Adding Microsoft SECURITY.MD

    Please accept this contribution adding the standard Microsoft SECURITY.MD :lock: file to help the community understand the security policy and how to safely report security issues. GitHub uses the presence of this file to light-up security reminders and a link to the file. This pull request commits the latest official SECURITY.MD file from https://github.com/microsoft/repo-templates/blob/main/shared/SECURITY.md.

    Microsoft teams can learn more about this effort and share feedback within the open source guidance available internally.

    opened by microsoft-github-policy-service[bot] 0
  • Created a class for interfacing with the NAIP data on Azure

    Created a class for interfacing with the NAIP data on Azure

    I made the class from the NAIP example notebook (https://azure.microsoft.com/en-us/services/open-datasets/catalog/naip/) a little more portable as I've copied/modified it in several new projects now.

    opened by calebrob6 0
  • Update to azure-storage-blob v12

    Update to azure-storage-blob v12

    @Siyu: I updated sas_blob_utils to use the new Azure Blob Storage Python SDK v12.

    There are several places where I changed functionality and/or the interface:

    1. in check_blob_existence(), I removed the following check because it would unnecessarily raise an error on public Azure blobs such as: https://lilablobssc.blob.core.windows.net/nacti-unzipped/part0/sub000/2010_Unit150_Ivan097_img0003.jpg
    if SasBlob.get_resource_type_from_uri(sas_uri) != 'blob':
        raise ValueError('The SAS token provided is not for a blob.')
    
    1. I replaced create_blob_from_bytes(), create_blob_from_text(), and create_blob_from_stream() with a single method upload_blob() because the new v12 API no longer distinguishes between the different upload types.

    2. I renamed the 1st parameter of generate_blob_sas_uri() from sas_uri to container_sas_uri.

    3. I sort the list of blob names from list_blobs_in_container() for determinism.

    I manually tested each of the methods without issue:

    from sas_blob_utils import SasBlob
    
    from azure.core.exceptions import HttpResponseError
    
    
    PUBLIC_BLOB_URI = 'https://lilablobssc.blob.core.windows.net/nacti-unzipped/part0/sub000/2010_Unit150_Ivan097_img0003.jpg'
    PUBLIC_CONTAINER_URI = 'https://lilablobssc.blob.core.windows.net/nacti-unzipped'
    ANOTHER_PUBLIC_CONTAINER_URI = 'https://lilablobssc.blob.core.windows.net/wcs'
    
    CONTAINER_SAS = 'st=2020-01-01T00%3A00%3A00Z&se=2034-01-01T00%3A00%3A00Z&sp=rl&sv=2019-07-07&sr=c&sig=rsgUcvoniBu/Vjkjzubh6gliU3XGvpE2A30Y0XPW4Vc%3D'
    PUBLIC_CONTAINER_URI_WITH_SAS = PUBLIC_CONTAINER_URI + '?' + CONTAINER_SAS
    
    PRIVATE_BLOB_URI = **
    PRIVATE_CONTAINER_SAS = **
    PRIVATE_BLOB_URI_WITH_SAS = PRIVATE_BLOB_URI + '?' + PRIVATE_CONTAINER_SAS
    
    INVALID_BLOB_URI = "https://lilablobssc.blob.core.windows.net/nacti-unzipped/part0/sub000/2010_Unit150_Ivan000_img0003.jpg"
    
    PRIVATE_ACCOUNT_NAME = **
    PRIVATE_ACCOUNT_SAS = **
    PRIVATE_UPLOAD_CONTAINER_URI = **
    PRIVATE_UPLOAD_CONTAINER_SAS = **
    PRIVATE_UPLOAD_CONTAINER_URI_WITH_SAS = PRIVATE_UPLOAD_CONTAINER_URI + '?' + PRIVATE_UPLOAD_CONTAINER_SAS
    
    
    def test_get_account_from_uri():
        print('test_get_account_from_uri')
        assert SasBlob.get_account_from_uri(PUBLIC_BLOB_URI) == 'lilablobssc'
    
    
    def test_get_container_from_uri():
        print('test_get_container_from_uri')
        assert SasBlob.get_container_from_uri(PUBLIC_BLOB_URI) == 'nacti-unzipped'
    
    
    def test_get_blob_from_uri():
        print('test_get_blob_from_uri')
        expected_blobname = 'part0/sub000/2010_Unit150_Ivan097_img0003.jpg'
        assert SasBlob.get_blob_from_uri(PUBLIC_BLOB_URI) == expected_blobname
    
        assert SasBlob.get_blob_from_uri(PUBLIC_CONTAINER_URI) is None
    
    
    def test_get_sas_key_from_uri():
        print('test_get_sas_key_from_uri')
        assert SasBlob.get_sas_key_from_uri(PUBLIC_CONTAINER_URI) is None
        assert SasBlob.get_sas_key_from_uri(PUBLIC_CONTAINER_URI_WITH_SAS) == CONTAINER_SAS
    
    
    def test_check_blob_existence():
        print('test_check_blob_existence')
        print('- PUBLIC_BLOB_URI...')
        assert SasBlob.check_blob_existence(PUBLIC_BLOB_URI) == True
    
        print('- PRIVATE_BLOB_URI...')
        assert SasBlob.check_blob_existence(PRIVATE_BLOB_URI) == False
        print('- PUBLIC_CONTAINER_URI...')
        try:
            SasBlob.check_blob_existence(PUBLIC_CONTAINER_URI)
            assert False
        except ValueError:
            pass
        print('- PUBLIC_CONTAINER_URI with blob name...')
        assert SasBlob.check_blob_existence(
            PUBLIC_CONTAINER_URI,
            provided_blob_name='part0/sub000/2010_Unit150_Ivan097_img0003.jpg') == True
        print('- INVALID_BLOB_URI...')
        assert SasBlob.check_blob_existence(INVALID_BLOB_URI) == False
    
    
    def test_list_blobs_in_container():
        print('test_list_blobs_in_container')
        blobs_list = SasBlob.list_blobs_in_container(ANOTHER_PUBLIC_CONTAINER_URI,
            max_number_to_list=100)
        expected = sorted(['wcs_20200403_bboxes.json.zip', 'wcs_camera_traps.json.zip', 'wcs_camera_traps_00.zip', 'wcs_camera_traps_01.zip', 'wcs_camera_traps_02.zip', 'wcs_camera_traps_03.zip', 'wcs_camera_traps_04.zip', 'wcs_camera_traps_05.zip', 'wcs_camera_traps_06.zip', 'wcs_specieslist.csv', 'wcs_splits.json'])
        assert blobs_list == expected
    
    
    def test_generate_writable_container_sas():
        new_sas_uri = SasBlob.generate_writable_container_sas(
            account_name=PRIVATE_ACCOUNT_NAME,
            account_key=PRIVATE_ACCOUNT_SAS,
            container_name='christest',
            access_duration_hrs=1)
        print(new_sas_uri)
    
    
    def test_upload_blob():
        print('test_upload_blob')
        try:
            print('- testing upload to public container, should fail...')
            SasBlob.upload_blob(PUBLIC_CONTAINER_URI_WITH_SAS, 'failblob', data='fail')
            assert False
        except HttpResponseError:
            # HttpResponseError('This request is not authorized to perform this operation using this permission.')
            pass
    
        try:
            print('- testing upload to private container, should succeed...')
            blob_url = SasBlob.upload_blob(PRIVATE_UPLOAD_CONTAINER_URI_WITH_SAS, 'successblob', data='success')
            assert SasBlob.get_blob_from_uri(blob_url) == 'successblob'
        except Exception:
            assert False
    
    
    def test_get_blob_to_stream():
        output, props = SasBlob.get_blob_to_stream(PRIVATE_BLOB_URI_WITH_SAS)
        print(props)
        return
    
    
    if __name__ == '__main__':
        test_get_account_from_uri()
        test_get_container_from_uri()
        test_get_blob_from_uri()
        test_get_sas_key_from_uri()
        test_check_blob_existence()
        test_list_blobs_in_container()
        test_generate_writable_container_sas()
        test_upload_blob()
        test_get_blob_to_stream()
    
    opened by chrisyeh96 0
  • Hardcoded file separator

    Hardcoded file separator

    https://github.com/microsoft/ai4eutils/blob/eedfca42ba1a2b4a0b7628c1ea9439d5720accf5/gDrive_download.py#L54

    https://github.com/microsoft/ai4eutils/blob/eedfca42ba1a2b4a0b7628c1ea9439d5720accf5/gDrive_download.py#L73

    The path split symbol problem is different for different operating systems. Hard-coded file separators should not be used. Instead, use a platform-independent API provided by the language library,such as os.path.join or os.sep

    opened by QiAnXinCodeSafe 0
  • Add utility functions for dealing with Azure blob storage SAS tokens.

    Add utility functions for dealing with Azure blob storage SAS tokens.

    (Reviews optional - just want to make you aware of this)

    Functions in this module are used to parse various parts in the SAS token and carry out actions on the blob store based on a SAS. This consolidates sas_blob.py from the AIforEarth-API-Development repo and a few new functions to for example list all blobs in a container.

    Things needing review:

    • Should we use enumerations in get_resource_type_from_uri and get_permissions_from_uri
    • list_blobs_in_container

    Please test the functions against your expected output when calling them.

    opened by yangsiyu007 0
Owner
Microsoft
Open source projects and samples from Microsoft
Microsoft
A command-line utility that creates projects from cookiecutters (project templates), e.g. Python package projects, VueJS projects.

Cookiecutter A command-line utility that creates projects from cookiecutters (project templates), e.g. creating a Python package project from a Python

null 18.6k Jan 2, 2023
Scripts to convert the Ted-MDB corpora into the formats for DISRPT shared task and the converted corpora

Scripts to convert the Ted-MDB corpora into the formats for DISRPT shared task and the converted corpora.

null 1 Feb 8, 2022
Randomly distribute members by groups making sure that every sector is represented

Generate Groups Randomly distribute members by groups making sure that every sector is represented The Scenario Imagine that you have a large group of

Jorge Gomes 1 Oct 22, 2021
An application to see if your Ethereum staking validator(s) are members of the current or next post-Altair sync committees.

eth_sync_committee.py Since the Altair upgrade, 512 validators are randomly chosen every 256 epochs (~27 hours) to form a sync committee. Validators i

null 4 Oct 27, 2022
Get a list of all offline/online members in a discord server

Discord server insights Get a list of all offline/online members in a discord server. Uses Selenium to crawl invite links. Config Download Chrome driv

Prakhar Gurunani 3 Oct 21, 2022
Beginner Projects A couple of beginner projects here

Beginner Projects A couple of beginner projects here, listed from easiest to hardest :) selector.py: simply a random selector to tell me who to faceti

Kylie 272 Jan 7, 2023
A collection of daily usage utility scripts in python. Helps in automation of day to day repetitive tasks.

Kush's Utils Tool is my personal collection of scripts which is used to automated daily tasks. It is a evergrowing collection of scripts and will continue to evolve till the day I program. This is also my first python project.

Kushagra 10 Jan 16, 2022
Implements a polyglot REPL which supports multiple languages and shared meta-object protocol scope between REPLs.

MetaCall Polyglot REPL Description This repository implements a Polyglot REPL which shares the state of the meta-object protocol between the REPLs. Us

MetaCall 10 Dec 28, 2022
Cool little Python scripts & projects I've made.

Little Python Projects A repository for neat little Python scripts I've made! How to run a script: *NOTE: You'll need to install Python v3 or higher.

dood 1 Jan 19, 2022
Doom o’clock is a website/project that features a countdown of “when will the earth end” and a greenhouse gas effect emission prediction that’s predicted

Doom o’clock is a website/project that features a countdown of “when will the earth end” and a greenhouse gas effect emission prediction that’s predicted

shironeko(Hazel) 4 Jan 1, 2022
A small site to list shared directories

Nebula Server Directories This site can be used to list folder and subdirectories in your server : Python It's required to have Python 3.8 or more ins

Adrien J. 1 Dec 28, 2021
Earth centric orbit propagation tool. Built from scratch in python.

Orbit-Propogator Earth centric orbit propagation tool. Built from scratch in python. Functionality includes: tracking sattelite location over time plo

Adam Klein 1 Mar 13, 2022
Earth-to-orbit ballistic trajectories with atmospheric resistance

Earth-to-orbit ballistic trajectories with atmospheric resistance Overview Space guns are a theoretical technology that reduces the cost of getting bu

null 1 Dec 3, 2021
GEGVL: Google Earth Based Geoscience Video Library

Google Earth Based Geoscience Video Library is transforming to Server Based. The

null 3 Feb 11, 2022
python scripts - mostly automation scripts

python python scripts - mostly automation scripts You can set your environment in various ways bash #!/bin/bash python - locally on remote host #!/bi

Enyi 1 Jan 5, 2022
Thumbor-bootcamp - learning and contribution experience with ❤️ and 🤗 from the thumbor team

Thumbor-bootcamp - learning and contribution experience with ❤️ and ?? from the thumbor team

Thumbor (by @globocom) 9 Jul 11, 2022
A tool to guide you for team selection based on mana and ruleset using your owned cards.

Splinterlands_Teams_Guide A tool to guide you for team selection based on mana and ruleset using your owned cards. Built With This project is built wi

Ruzaini Subri 3 Jul 30, 2022
Team Curie is a group of people working together to achieve a common aim

Team Curie is a group of people working together to achieve a common aim. We are enthusiasts!.... We are setting the pace!.... We offer encouragement and motivation....And we believe TeamWork makes the DreamWork.

null 4 Aug 7, 2021
A Red Team tool for exfiltrating sensitive data from Jira tickets.

Jir-thief This Module will connect to Jira's API using an access token, export to a word .doc, and download the Jira issues that the target has access

Antonio Piazza 82 Dec 12, 2022