Semantic Segmentation using UNET
This is the implementation of UNET on Carvana Image Masking Kaggle Challenge
About the Dataset
This dataset contains a large number of car images (as .jpg files). Each car has exactly 16 images, each one taken at different angles.
For the training set, you are provided a .gif file that contains the manually cutout mask for each image. Link to download the dataset: Here
UNET Architecture
The UNET CNN architecture may be divided into the Encoder, Bottleneck and Decoder blocks, followed by a final segmentation output layer.
- Encoder: There are 4 Encoder blocks, each consisting of a convolutional block followed by a Spatial Max Pooling layer.
- Bottleneck: The Bottleneck consists of a single convolutional block.
- Decoder: There are 4 Decoder blocks, each consisting of a deconvolution operation, followed by a convolutional block, along with skip connections.
Note: The convolutional block consists of 2 conv2d operations each followed by a BatchNorm2d, finally followed by a ReLU activation.
Implementation Details
- Image preprocessing included augmentations like HorizontalFlip, VerticalFlip, Rotate.
- Dataloader object was created for both training and test data
- Training process was carried out for 10 epochs, using the Adam Optimizer with a Learning Rate 1e-4.
- Validation was carried out using Dice Loss and Intersection over Union Loss.
Installation and Quick Start
To use the repo and run inferences, please follow the guidelines below
-
Cloning the Repository:
$ git clone https://github.com/anushka17agarwal/Segmentation_UNET
-
Entering the directory:
$ cd unet/
-
Setting up the Python Environment with dependencies:
$ pip install -r requirements.txt
-
Running the file for inference:
$ python3 train.py