Unified file system operation experience for different backend

MEGVII Research

Last update: Dec 14, 2022

Related tags

Overview

megfile - Megvii FILE library

Docs: http://megvii-research.github.io/megfile

megfile provides a silky operation experience with different backends (currently including local file system and OSS), which enable you to focus more on the logic of your own project instead of the question of "Which backend is used for this file?"

megfile provides:

Almost unified file system operation experience. Target path can be easily moved from local file system to OSS.
Complete boundary case handling. Even the most difficult (or even you can't even think of) boundary conditions, megfile can help you easily handle it.
Perfect type hints and built-in documentation. You can enjoy the IDE's auto-completion and static checking.
Semantic version and upgrade guide, which allows you enjoy the latest features easily.

megfile's advantages are:

smart_open can open resources that use various protocols, including fs, s3, http(s) and stdio. Especially, reader / writer of s3 in megfile is implemented with multi-thread, which is faster than known competitors.
smart_glob is available on s3. And it supports zsh extended pattern syntax of [], e.g. s3://bucket/video.{mp4,avi}.
All-inclusive functions like smart_exists / smart_stat / smart_sync. If you don't find the functions you want, submit an issue.
Compatible with pathlib.Path interface, referring to S3Path and SmartPath.

Quick Start

Here's an example of writing a file to OSS, syncing to local, reading and finally deleting it.

from megfile import smart_open, smart_exists, smart_sync, smart_remove, smart_glob
from megfile.smart_path import SmartPath

# open a file in s3 bucket
with smart_open('s3://playground/refile-test', 'w') as fp:
    fp.write('refile is not silver bullet')

# test if file in s3 bucket exist
smart_exists('s3://playground/refile-test')

# copy files or directories
smart_sync('s3://playground/refile-test', '/tmp/playground')

# remove files or directories
smart_remove('s3://playground/refile-test')

# glob files or directories in s3 bucket
smart_glob('s3://playground/video-?.{mp4,avi}')

# or in local file system
smart_exists('/tmp/playground/refile-test')

# smart_open also support protocols like http / https
smart_open('https://www.google.com')

# SmartPath interface
path = SmartPath('s3://playground/megfile-test')
if path.exists():
    with path.open() as f:
        result = f.read(7)
        assert result == b'megfile'

Installation

PyPI

pip3 install megfile

You can specify megfile version as well

pip3 install "megfile~=0.0"

Build from Source

megfile can be installed from source

git clone [email protected]:megvii-research/megfile.git
cd megfile
pip3 install -U .

Development Environment

git clone [email protected]:megvii-research/megfile.git
cd megfile
sudo apt install libgl1-mesa-glx libfuse-dev fuse
pip3 install -r requirements.txt -r requirements-dev.txt

How to Contribute

We welcome everyone to contribute code to the megfile project, but the contributed code needs to meet the following conditions as much as possible:

You can submit code even if the code doesn't meet conditions. The project members will evaluate and assist you in making code changes
- Code format: Your code needs to pass code format check. megfile uses yapf as lint tool and the version is locked at 0.27.0. The version lock may be removed in the future
- Static check: Your code needs complete type hint. megfile uses pytype as static check tool. If pytype failed in static check, use # pytype: disable=XXX to disable the error and please tell us why you disable it.
  
  Note : Because pytype doesn't support variable type annation, the variable type hint format introduced by py36 cannot be used.
  
  i.e. variable: int is invalid, replace it with variable # type: int
- Test: Your code needs complete unit test coverage. megfile uses pyfakefs and moto as local file system and OSS virtual environment in unit tests. The newly added code should have a complete unit test to ensure the correctness
You can help to improve megfile in many ways:
- Write code.
- Improve documentation.
- Report or investigate bugs and issues.
- If you find any problem or have any improving suggestion, submit a new issuse as well. We will reply as soon as possible and evaluate whether to adopt.
- Review pull requests.
- Star megfile repo.
- Recommend megfile to your friends.
- Any other form of contribution is welcomed.

Comments

feat request: support http/https download function

For example,

import megfile

path = "https://dl.fbaipublicfiles.com/detectron2/ImageNetPretrained/MSRA/R-50.pkl"
megfile.smart_copy(path, local_path)
# or new interface like: megfile.smart_cache(path, local_path)

opened by FateScript 3

重构第一阶段

https://github.com/megvii-research/megfile/issues/126

fs_save_as、fs_symlink、s3_save_as、s3_symlink、smart_save_as、smart_symlink 参数顺序调换，将 path 放到首位 get_http_session 去掉没有用的 timeout 参数

opened by LoveEatCandy 2

ValueError: unacceptable mode: 'w+b'

from megfile import smart_open
with smart_open('/tmp/test-open-w-p', 'w+b') as f:
    f.write(b"test")
# 正常运行

with smart_open('s3://yl-share/tmp/tmp/test-open-w-p', 'w+b') as f:
    f.write(b"test")
# ValueError: unacceptable mode: 'w+b'

bug

opened by DIYer22 2

smart_open 不兼容 open pipe 管道

我将 builtins.open = smart_open 后, 发现 smart_open 不支持管道 pipe, 最小复现如下:

import os, sys
from megfile import smart_open as open

print ("The child will write text to a pipe and ")
print ("the parent will read the text written by child...")

# 文件描述符 r, w 用于读、写
r, w = os.pipe() 

processid = os.fork()
if processid:
    # 父进程
    # 关闭文件描述符 w
    os.close(w)
    r = open(r)
    print ("Parent reading")
    str = r.read()
    print ("text =", str)
    # sys.exit(0)
else:
    # 子进程
    os.close(r)
    w = open(w, 'w')
    print ("Child writing")
    w.write("Text written by child...")
    w.close()
    print ("Child closing")
    sys.exit(0)

Traceback (most recent call last):

  File "/home/dl/megvii/project/ai_asrs/jinyu_data_code/analysis_domain_gap.py", line 26, in <module>
    r = open(r)

  File "/home/dl/mygit/megfile/megfile/smart.py", line 436, in smart_open
    return SmartPath(path).open(mode, **options)

  File "/home/dl/mygit/megfile/megfile/smart_path.py", line 37, in __init__
    pathlike = self._create_pathlike(path)

  File "/home/dl/mygit/megfile/megfile/smart_path.py", line 64, in _create_pathlike
    protocol, path_without_protocol = cls._extract_protocol(path)

  File "/home/dl/mygit/megfile/megfile/smart_path.py", line 59, in _extract_protocol
    raise ProtocolNotFoundError('protocol not found: %r' % path)

ProtocolNotFoundError: protocol not found: 67

opened by DIYer22 2

CLI cp 的行为
目前 CLI 拷贝的行为是 megfile cp src dst 把 src 拷到 dst，如果 dst 是个目录会报是个目录的错，这点跟 cp 不一样期望改为和 cp 行为一致：

如果目标位置是个目录，把文件以 basename 拷进去

加 -T / --no-target-directory 保持目前行为

mv 看起来也是
opened by bbtfr 2
megfile 重构优化
[x] s3, fs, http, stdio 将 functions 挪到 class，并自动生成 s3.py, fs.py, http.py, stdio.py

[ ] 自动生成 smart_path.py 及 smart.py，smart 在知道 protocol 是哪个后，判断入参不对时报错

[ ] 优化 s3 请求，添加带 cache 参数的方法，同一个方法内可以用 cache

optimization
opened by LoveEatCandy 1

Owner

MEGVII Research

Power Human with AI. 持续创新拓展认知边界非凡科技成就产品价值

GitHub http://megvii-research.github.io/megfile

Complete system for facial identity system. Include one-shot model, database operation, features visualization, monitoring

2 Dec 28, 2021

Unofficial implementation of the Involution operation from CVPR 2021

involution_pytorch Unofficial PyTorch implementation of "Involution: Inverting the Inherence of Convolution for Visual Recognition" by Li et al. prese

46 Dec 7, 2022

Multiple types of NN model optimization environments. It is possible to directly access the host PC GUI and the camera to verify the operation. Intel iHD GPU (iGPU) support. NVIDIA GPU (dGPU) support.

mtomo Multiple types of NN model optimization environments. It is possible to directly access the host PC GUI and the camera to verify the operation.

24 Mar 2, 2022

Accelerated SMPL operation, commonly used in generate 3D human mesh, STAR included.

SMPL2 An enchanced and accelerated SMPL operation which commonly used in 3D human mesh generation. It takes a poses, shapes, cam_trans as inputs, outp

20 Oct 17, 2022

Liecasadi - liecasadi implements Lie groups operation written in CasADi

liecasadi liecasadi implements Lie groups operation written in CasADi, mainly di

14 Nov 5, 2022

Unified Interface for Constructing and Managing Workflows on different workflow engines, such as Argo Workflows, Tekton Pipelines, and Apache Airflow.

Couler What is Couler? Couler aims to provide a unified interface for constructing and managing workflows on different workflow engines, such as Argo

781 Jan 3, 2023

UMT is a unified and flexible framework which can handle different input modality combinations, and output video moment retrieval and/or highlight detection results.

Unified Multi-modal Transformers This repository maintains the official implementation of the paper UMT: Unified Multi-modal Transformers for Joint Vi

Applied Research Center (ARC), Tencent PCG

84 Jan 4, 2023

A collection of pre-trained StyleGAN2 models trained on different datasets at different resolution.

Awesome Pretrained StyleGAN2 A collection of pre-trained StyleGAN2 models trained on different datasets at different resolution. Note the readme is a

1.1k Dec 24, 2022

UmlsBERT: Clinical Domain Knowledge Augmentation of Contextual Embeddings Using the Unified Medical Language System Metathesaurus

UmlsBERT: Clinical Domain Knowledge Augmentation of Contextual Embeddings Using the Unified Medical Language System Metathesaurus General info This is

71 Oct 25, 2022

Colossal-AI: A Unified Deep Learning System for Large-Scale Parallel Training

ColossalAI An integrated large-scale model training system with efficient parallelization techniques Installation PyPI pip install colossalai Install

7.1k Jan 3, 2023

Build a medical knowledge graph based on Unified Language Medical System (UMLS)

UMLS-Graph Build a medical knowledge graph based on Unified Language Medical System (UMLS) Requisite Install MySQL Server 5.6 and import UMLS data int

6 Dec 25, 2022

Prometheus exporter for Cisco Unified Computing System (UCS) Manager

prometheus-ucs-exporter Overview Use metrics from the UCS API to export relevant metrics to Prometheus This repository is a fork of Drew Stinnett's or

6 Nov 7, 2022

PyTorch implementation for ACL 2021 paper "Maria: A Visual Experience Powered Conversational Agent".

Maria: A Visual Experience Powered Conversational Agent This repository is the Pytorch implementation of our paper "Maria: A Visual Experience Powered

22 Dec 12, 2022

A neuroanatomy-based augmented reality experience powered by computer vision. Features 3D visuals of the Atlas Brain Map slices.

Brain Augmented Reality (AR) A neuroanatomy-based augmented reality experience powered by computer vision that features 3D visuals of the Atlas Brain

10 Oct 6, 2022

Main repository for the HackBio'2021 Virtual Internship Experience for #Team-Greider ❤️

Hello ?? #Team-Greider The team of 20 people for HackBio'2021 Virtual Bioinformatics Internship ?? ??️ ??‍?? HackBio: https://thehackbio.com ?? Ask us

7 Oct 20, 2022

This is the official code of our paper "Diversity-based Trajectory and Goal Selection with Hindsight Experience Relay" (PRICAI 2021)

Diversity-based Trajectory and Goal Selection with Hindsight Experience Replay This is the official implementation of our paper "Diversity-based Traje

6 Jul 18, 2022

Unified file system operation experience for different backend

Related tags

Overview

megfile - Megvii FILE library

Quick Start

Installation

PyPI

Build from Source

Development Environment

How to Contribute

Comments

feat request: support http/https download function

重构第一阶段

ValueError: unacceptable mode: 'w+b'

smart_open 不兼容 open pipe 管道

CLI cp 的行为

megfile 重构优化

Owner

MEGVII Research

Complete system for facial identity system. Include one-shot model, database operation, features visualization, monitoring

Unofficial implementation of the Involution operation from CVPR 2021

Multiple types of NN model optimization environments. It is possible to directly access the host PC GUI and the camera to verify the operation. Intel iHD GPU (iGPU) support. NVIDIA GPU (dGPU) support.

Accelerated SMPL operation, commonly used in generate 3D human mesh, STAR included.

Liecasadi - liecasadi implements Lie groups operation written in CasADi

Unified Interface for Constructing and Managing Workflows on different workflow engines, such as Argo Workflows, Tekton Pipelines, and Apache Airflow.

UMT is a unified and flexible framework which can handle different input modality combinations, and output video moment retrieval and/or highlight detection results.

A collection of pre-trained StyleGAN2 models trained on different datasets at different resolution.

UmlsBERT: Clinical Domain Knowledge Augmentation of Contextual Embeddings Using the Unified Medical Language System Metathesaurus

Colossal-AI: A Unified Deep Learning System for Large-Scale Parallel Training

Build a medical knowledge graph based on Unified Language Medical System (UMLS)

Prometheus exporter for Cisco Unified Computing System (UCS) Manager

PyTorch implementation for ACL 2021 paper "Maria: A Visual Experience Powered Conversational Agent".

A neuroanatomy-based augmented reality experience powered by computer vision. Features 3D visuals of the Atlas Brain Map slices.

Main repository for the HackBio'2021 Virtual Internship Experience for #Team-Greider ❤️

This is the official code of our paper "Diversity-based Trajectory and Goal Selection with Hindsight Experience Relay" (PRICAI 2021)

Official Repository for the ICCV 2021 paper "PixelSynth: Generating a 3D-Consistent Experience from a Single Image"

Computer vision - fun segmentation experience using classic and deep tools :)

Yggdrasil - A simplistic bot designed to streamline your server experience