Galaxy images labelled by morphology (shape). Aimed at ML development and teaching

Mike Walmsley

Last update: Nov 28, 2022

Related tags

Deep Learning galaxy_mnist

Overview

GalaxyMNIST

Galaxy images labelled by morphology (shape). Aimed at ML debugging and teaching.

Contains 10,000 images of galaxies (3x64x64), confidently labelled by Galaxy Zoo volunteers as belonging to one of four morphology classes.

Installation

git clone https://github.com/mwalmsley/galaxy_mnist
pip install -e galaxy_mnist

The only dependencies are pandas, scikit-learn, and h5py (for .hdf5 support). (py)torch is required but not specified as a dependency, because you likely already have it and may require a very specific version (e.g. from conda, AWS-optimised, etc).

Use

Simply use as with MNIST:

from galaxy_mnist import GalaxyMNIST

dataset = GalaxyMNIST(
    root='/some/download/folder',
    download=True,
    train=True  # by default, or set False for test set
)

Access the images and labels - in a fixed "canonical" 80/20 train/test division - like so:

images, labels = dataset.data, dataset.targets

You can also divide the data according to your own to your own preferences with load_custom_data:

(custom_train_images, custom_train_labels), (custom_test_images, custom_test_labels) = dataset.load_custom_data(test_size=0.8, stratify=True)

See load_in_pytorch.py for a working example.

Dataset Details

GalaxyMNIST has four classes: smooth and round, smooth and cigar-shaped, edge-on-disk, and unbarred spiral (you can retrieve this as a list with GalaxyMNIST.classes).

The galaxies are selected from Galaxy Zoo DECaLS Campaign A (GZD-A), which classified images taken by DECaLS and released in DR1 and 2. The images are as shown to volunteers on Galaxy Zoo, except for a 75% crop followed by a resize to 64x64 pixels.

At least 17 people must have been asked the necessary questions, and at least half of them must have answered with the given class. The class labels are therefore much more confident than from, for example, simply labelling with the most common answer to some question.

The classes are balanced exactly equally across the whole dataset (2500 galaxies per class), but only approximately equally (by random sampling) in the canonical train/test split. For a split with exactly equal classes on both sides, use load_custom_data with stratify=True.

You can see the exact choices made to select the galaxies and labels under the reproduce folder. This includes the notebook exploring and selecting choices for pruning the decision tree, and the script for saving the final dataset(s).

Citations and Further Reading

If you use this dataset, please cite Galaxy Zoo DECaLS, the data release paper from which the labels are drawn. Please also acknowledge the DECaLS survey (see the linked paper for an example).

You can find the original volunteer votes (and images) on Zenodo here.

A Machine Teaching Framework for Scalable Recognition

MEMORABLE This repository contains the source code accompanying our ICCV 2021 paper. A Machine Teaching Framework for Scalable Recognition Pei Wang, N

2 Dec 8, 2021

Instant-Teaching: An End-to-End Semi-Supervised Object Detection Framework

This repo is the official implementation of "Instant-Teaching: An End-to-End Semi-Supervised Object Detection Framework". @inproceedings{zhou2021insta

34 Dec 31, 2022

Shape-aware Semi-supervised 3D Semantic Segmentation for Medical Images

SASSnet Code for paper: Shape-aware Semi-supervised 3D Semantic Segmentation for Medical Images(MICCAI 2020) Our code is origin from UA-MT You can fin

125 Jan 3, 2023

DeFMO: Deblurring and Shape Recovery of Fast Moving Objects (CVPR 2021)

Evaluation, Training, Demo, and Inference of DeFMO DeFMO: Deblurring and Shape Recovery of Fast Moving Objects (CVPR 2021) Denys Rozumnyi, Martin R. O

139 Dec 26, 2022

Code for "3D Human Pose and Shape Regression with Pyramidal Mesh Alignment Feedback Loop"

PyMAF This repository contains the code for the following paper: 3D Human Pose and Shape Regression with Pyramidal Mesh Alignment Feedback Loop Hongwe

450 Dec 28, 2022

This is the official code release for the paper Shape and Material Capture at Home

This is the official code release for the paper Shape and Material Capture at Home. The code enables you to reconstruct a 3D mesh and Cook-Torrance BRDF from one or more images captured with a flashlight or camera with flash.

89 Dec 10, 2022

Official Pytorch implementation of "Beyond Static Features for Temporally Consistent 3D Human Pose and Shape from a Video", CVPR 2021

TCMR: Beyond Static Features for Temporally Consistent 3D Human Pose and Shape from a Video Qualtitative result Paper teaser video Introduction This r

215 Jan 6, 2023

Neural Factorization of Shape and Reflectance Under An Unknown Illumination

NeRFactor [Paper] [Video] [Project] This is the authors' code release for: NeRFactor: Neural Factorization of Shape and Reflectance Under an Unknown I

283 Jan 4, 2023

Official Code for ICML 2021 paper "Revisiting Point Cloud Shape Classification with a Simple and Effective Baseline"

Revisiting Point Cloud Shape Classification with a Simple and Effective Baseline Ankit Goyal, Hei Law, Bowei Liu, Alejandro Newell, Jia Deng Internati

115 Jan 4, 2023

Comments

Want To Reproduce The Dataset

Hello, I need a dataset with higher resolution, and I've found that the original size of the images was 3x424x424. I've read your "reproduce" folder, and tried to modify the lines about resizing the image. However, I failed because I didn't have your parquet files which were processed. In the file "create_dataset.py", I found that I lacked "gz_decals_volunteers_1_and_2_internal.parquet" and "latest_labels.parquet"（see the pictures below）, which I couldn't find in the website that the downloading link in your "README.md" refers to. Could you pleaze provide me with these two parquet files so that I don't need to process all those raw datasets again? Thanks a lot! yours, spdc-elm

opened by spdc-elm 8
AttributeError: 'GalaxyMNIST' object has no attribute 'data'

I have ran the python file "load_in_pytorch.py" with the following settings.(I have finished downloaded the file already)

However, there was a bug. I've check the file galaxy_mnist.py, and had no idea about why such a bug would happen. Can you help me? Thanks!

Environment: Ubuntu LTS 20.04, python3.7.11,pytorch 1.11.0

opened by spdc-elm 3

Galaxy images labelled by morphology (shape). Aimed at ML development and teaching

Related tags

Overview

GalaxyMNIST

Installation

Use

Dataset Details

Citations and Further Reading

You might also like...

A Machine Teaching Framework for Scalable Recognition

Instant-Teaching: An End-to-End Semi-Supervised Object Detection Framework

Shape-aware Semi-supervised 3D Semantic Segmentation for Medical Images

DeFMO: Deblurring and Shape Recovery of Fast Moving Objects (CVPR 2021)

Code for "3D Human Pose and Shape Regression with Pyramidal Mesh Alignment Feedback Loop"

This is the official code release for the paper Shape and Material Capture at Home

Official Pytorch implementation of "Beyond Static Features for Temporally Consistent 3D Human Pose and Shape from a Video", CVPR 2021

Neural Factorization of Shape and Reflectance Under An Unknown Illumination

Official Code for ICML 2021 paper "Revisiting Point Cloud Shape Classification with a Simple and Effective Baseline"

Comments

Want To Reproduce The Dataset

AttributeError: 'GalaxyMNIST' object has no attribute 'data'

Owner

Mike Walmsley

An image base contains 490 images for learning (400 cars and 90 boats), and another 21 images for testingAn image base contains 490 images for learning (400 cars and 90 boats), and another 21 images for testing

Request execution of Galaxy SARS-CoV-2 variation analysis workflows on input data you provide.

Toontown: Galaxy, a new Toontown game based on Disney's Toontown Online

Object Database for Super Mario Galaxy 1/2.

PassAPI is a password generator in hash format and fully developed in Python, with the aim of teaching how to handle and build

The offcial repository for 'CharacterBERT and Self-Teaching for Improving the Robustness of Dense Retrievers on Queries with Typos', SIGIR2022

Official implementation for (Refine Myself by Teaching Myself : Feature Refinement via Self-Knowledge Distillation, CVPR-2021)

PyTorch code for the paper "Curriculum Graph Co-Teaching for Multi-target Domain Adaptation" (CVPR2021)

Repository of Jupyter notebook tutorials for teaching the Deep Learning Course at the University of Amsterdam (MSc AI), Fall 2020

Teaching end to end workflow of deep learning