DeepFashion2 Dataset

DeepFashion2 is a comprehensive fashion dataset. It contains 491K diverse images of 13 popular clothing categories from both commercial shopping stores and consumers. It totally has 801K clothing clothing items, where each item in an image is labeled with scale, occlusion, zoom-in, viewpoint, category, style, bounding box, dense landmarks and per-pixel mask.There are also 873K Commercial-Consumer clothes pairs.
The dataset is split into a training set (391K images), a validation set (34k images), and a test set (67k images).
Examples of DeepFashion2 are shown in Figure 1.

Figure 1: Examples of DeepFashion2.

_{From (1) to (4), each row represents clothes images with different variations. At each row, we partition the images into two groups, the left three columns represent clothes from commercial stores, while the right three columns are from customers.In each group, the three images indicate three levels of difficulty with respect to the corresponding variation.Furthermore, at each row, the items in these two groups of images are from the same clothing identity but from two different domains, that is, commercial and customer.The items of the same identity may have different styles such as color and printing.Each item is annotated with landmarks and masks.}

Announcements

2020-2-6 We are holding DeepFashion2 challenges in CVPR 2020 Workshop including Clothes Landmark Estimation and Clothes Retrieval. Detailed information is available in Third Workshop on Computer Vision for Fashion, Art and Design.
2019-9-6 Baseline of released DeepFashion2 dataset is available.
2019-8-1 DeepFashion2 Challenge in 2019 ICCV Workshop ended.
2019-7-12 Due to the damage in CodaLab database, we have republished our competitions in DeepFashion2 Challenge. If you are participates of DeepFashion2 Challenge, please recreate an account and upload your results again in Landmark Estimation or Clothes Retrieval.
2019-7-1 Test images of DeepFashion2 are released in DeepFashion2 dataset. (Password for unzipping test files is the same as that for unzipping training and validation files.)
2019-5-28 Links to DeepFashion2 challenges in ICCV 2019 Workshop are released. Detailed information is available in Second Workshop on Computer Vision for Fashion, Art and Design.
2019-5-27 ICCV 2019 Workshop website is released: Second Workshop on Computer Vision for Fashion, Art and Design. Links to challenges will be released soon.

Download the Data

DeepFashion2 dataset is available in DeepFashion2 dataset. You need fill in the form to get password for unzipping files. Please refer to Data Description below for detailed information about dataset.

Data Organization

source: a string, where 'shop' indicates that the image is from commercial store while 'user' indicates that the image is taken by users.
pair_id: a number. Images from the same shop and their corresponding consumer-taken images have the same pair id.
- item 1
  - category_name: a string which indicates the category of the item.
  - category_id: a number which corresponds to the category name. In category_id, 1 represents short sleeve top, 2 represents long sleeve top, 3 represents short sleeve outwear, 4 represents long sleeve outwear, 5 represents vest, 6 represents sling, 7 represents shorts, 8 represents trousers, 9 represents skirt, 10 represents short sleeve dress, 11 represents long sleeve dress, 12 represents vest dress and 13 represents sling dress.
  - style: a number to distinguish between clothing items from images with the same pair id. Clothing items with different style numbers from images with the same pair id have different styles such as color, printing, and logo. In this way, a clothing item from shop images and a clothing item from user image are positive commercial-consumer pair if they have the same style number greater than 0 and they are from images with the same pair id.(If you are confused with style, please refer to issue#10.)
  - bounding_box: [x1,y1,x2,y2]，where x1 and y_1 represent the upper left point coordinate of bounding box, x_2 and y_2 represent the lower right point coordinate of bounding box. (width=x2-x1;height=y2-y1)
  - landmarks: [x1,y1,v1,...,xn,yn,vn], where v represents the visibility: v=2 visible; v=1 occlusion; v=0 not labeled. We have different definitions of landmarks for different categories. The orders of landmark annotations are listed in figure 2.
  - segmentation: [[x1,y1,...xn,yn],[ ]], where [x1,y1,xn,yn] represents a polygon and a single clothing item may contain more than one polygon.
  - scale: a number, where 1 represents small scale, 2 represents modest scale and 3 represents large scale.
  - occlusion: a number, where 1 represents slight occlusion(including no occlusion), 2 represents medium occlusion and 3 represents heavy occlusion.
  - zoom_in: a number, where 1 represents no zoom-in, 2 represents medium zoom-in and 3 represents lagre zoom-in.
  - viewpoint: a number, where 1 represents no wear, 2 represents frontal viewpoint and 3 represents side or back viewpoint.
- item 2
  ...
- item n

Please note that 'pair_id' and 'source' are image-level labels. All clothing items in an image share the same 'pair_id' and 'source'.

The definition of landmarks and skeletons of 13 categories are shown below. The numbers in the figure represent the order of landmark annotations of each category in annotation file. A total of 294 landmarks covering 13 categories are defined.

Figure 2: Definitions of landmarks and skeletons.

We do not provide data in pairs. In training dataset, images are organized with continuous 'pair_id' including images from consumers and images from shops. (For example: 000001.jpg(pair_id:1; from consumer), 000002.jpg(pair_id:1; from shop),000003.jpg(pair_id:2; from consumer),000004.jpg(pair_id:2; from consumer),000005.jpg(pair_id:2; from consumer), 000006.jpg(pair_id:2; from consumer),000007.jpg(pair_id:2; from shop),000008.jpg(pair_id:2; from shop)...) A clothing item from shop images and a clothing item from consumer image are positive commercial-consumer pair if they have the same style number which is greater than 0 and they are from images with the same pair id, otherwise they are negative pairs. In this way, you can construct training positive pairs and negative pairs in instance-level.

As is shown in the figure below, the first three images are from consumers and the last two images are from shops. These five images have the same 'pair_id'. Clothing items in orange bounding box have the same 'style':1. Clothing items in green bounding box have the same 'style': 2. 'Style' of other clothing items whose bouding boxes are not drawn in the figure is 0 and they can not construct positive commercial-consumer pairs. One positive commercial-consumer pair is the annotated short sleeve top in the first image and the annotated short sleeve top in the last image. Our dataset makes it possbile to construct instance-level pairs in a flexible way.

Data Description

Training images: train/image Training annotations: train/annos

Validation images: validation/image Validation annotations: validation/annos

Test images: test/image

Each image in seperate image set has a unique six-digit number such as 000001.jpg. A corresponding annotation file in json format is provided in annotation set such as 000001.json. We provide code to generate coco-type annotations from our dataset in deepfashion2_to_coco.py. Please note that during evaluation, image_id is the digit number of the image name. (For example, the image_id of image 000001.jpg is 1). Json files in json_for_validation and json_for_test are generated based on the above rule using deepfashion2_to_coco.py. In this way, you can generate groundtruth json files for evaluation for clothes detection task and clothes segmentation task, which are not listed in DeepFashion2 Challenge.

In validation set, we provide image-level information in keypoints_val_information.json, retrieval_val_consumer_information.json and retrieval_val_shop_information.json. ( In validation set, the first 10844 images are from consumers and the last 20681 images are from shops.) For clothes detection task and clothes segmentation task, which are not listed in DeepFashion2 Challenge, keypoints_val_information.json can also be used.

We provide keypoints_val_vis.json, keypoints_val_vis_and_occ.json, val_query.json and val_gallery.json for evaluation of validation set. You can get validation score locally using Evaluation Code and above json files. You can also submit your results to evaluation server in our DeepFashion2 Challenge.

In test set, we provide image-level information in keypoints_test_information.json, retrieval_test_consumer_information.json and retrieval_test_shop_information.json.( In test set, the first 20681 images are from consumers and the last 41948 images are from shops.) You need submit your results to evaluation server in our DeepFashion2 Challenge.

Dataset Statistics

Tabel 1 shows the statistics of images and annotations in DeepFashion2. (For statistics of released images and annotations, please refer to DeepFashion2 Challenge).

Table 1: Statistics of DeepFashion2.

	Train	Validation	Test	Overall
images	390,884	33,669	67,342	491,895
bboxes	636,624	54,910	109,198	800,732
landmarks	636,624	54,910	109,198	800,732
masks	636,624	54,910	109,198	800,732
pairs	685,584	query: 12,550 gallery: 37183	query: 24,402 gallery: 75,347	873,234

Figure 3 shows the statistics of different variations and the numbers of items of the 13 categories in DeepFashion2.

Figure 3: Statistics of DeepFashion2.

Benchmarks

Clothes Detection

This task detects clothes in an image by predicting bounding boxes and category labels to each detected clothing item. The evaluation metrics are the bounding box's average precision ${AP}_{box}$ , ${AP}_{box}^{IoU=0.50}$ , ${AP}_{box}^{IoU=0.75}$ .

Table 2: Clothes detection trained with released DeepFashion2 Dataset evaluated on validation set.

AP	AP50	AP75
0.638	0.789	0.745

Table 3: Clothes detection on different validation subsets, including scale, occlusion, zoom-in, and viewpoint.

		_Scale			_Occlusion			_{Zoom_in}			_Viewpoint		_Overall
	_small	_moderate	_large	_slight	_medium	_heavy	_no	_medium	_large	_{no wear}	_frontal	_{side or back}
_AP	_0.604	_0.700	_0.660	_0.712	_0.654	_0.372	_0.695	_0.629	_0.466	_0.624	_0.681	_0.641	_0.667
_AP50	_0.780	_0.851	_0.768	_0.844	_0.810	_0.531	_0.848	_0.755	_0.563	_0.713	_0.832	_0.796	_0.814
_AP75	_0.717	_0.809	_0.744	_0.812	_0.768	_0.433	_0.806	_0.718	_0.525	_0.688	_0.791	_0.744	_0.773

Landmark and Pose Estimation

This task aims to predict landmarks for each detected clothing item in an each image.Similarly, we employ the evaluation metrics used by COCOfor human pose estimation by calculating the average precision for keypoints ${AP}_{pt}$ , ${AP}_{pt}^{OKS=0.50}$ , ${AP}_{pt}^{OKS=0.75}$ where OKS indicates the object landmark similarity.

Table 4: Landmark estimation trained with released DeepFashion2 Dataset evaluated on validation set.

	AP	AP50	AP75
vis	0.605	0.790	0.684
vis && hide	0.529	0.775	0.596

Table 5: Landmark Estimation on different validation subsets, including scale, occlusion, zoom-in, and viewpoint.Results of evaluation on visible landmarks only and evaluation on both visible and occlusion landmarks are separately shown in each row

		_Scale			_Occlusion			_{Zoom_in}			_Viewpoint		_Overall
	_small	_moderate	_large	_slight	_medium	_heavy	_no	_medium	_large	_{no wear}	_frontal	_{side or back}
_AP	_0.587 0.497	_0.687 0.607	_0.599 0.555	_0.669 0.643	_0.631 0.530	_0.398 0.248	_0.688 0.616	_0.559 0.489	_0.375 0.319	_0.527 0.510	_0.677 0.596	_0.536 0.456	_0.641 0.563
_AP50	_0.780 0.764	_0.854 0.839	_0.782 0.774	_0.851 0.847	_0.813 0.799	_0.534 0.479	_0.855 0.848	_0.757 0.744	_0.571 0.549	_0.724 0.716	_0.846 0.832	_0.748 0.727	_0.820 0.805
_AP75	_0.671 0.551	_0.779 0.703	_0.678 0.625	_0.760 0.739	_0.718 0.600	_0.440 0.236	_0.786 0.714	_0.633 0.537	_0.390 0.307	_0.571 0.550	_0.771 0.684	_0.610 0.506	_0.728 0.641

Figure 4 shows the results of landmark and pose estimation.

Figure 4: Results of landmark and pose estimation.

Clothes Segmentation

This task assigns a category label (including background label) to each pixel in an item.The evaluation metrics is the average precision including ${AP}_{mask}$ , ${AP}_{mask}^{IoU=0.50}$ , ${AP}_{mask}^{IoU=0.75}$ computed over masks.

Table 6: Clothes segmentation trained with released DeepFashion2 Dataset evaluated on validation set.

AP	AP50	AP75
0.640	0.797	0.754

Table 7: Clothes Segmentation on different validation subsets, including scale, occlusion, zoom-in, and viewpoint.

		_Scale			_Occlusion			_{Zoom_in}			_Viewpoint		_Overall
	_small	_moderate	_large	_slight	_medium	_heavy	_no	_medium	_large	_{no wear}	_frontal	_{side or back}
_AP	_0.634	_0.703	_0.666	_0.720	_0.656	_0.381	_0.701	_0.637	_0.478	_0.664	_0.689	_0.635	_0.674
_AP50	_0.811	_0.865	_0.798	_0.863	_0.824	_0.543	_0.861	_0.791	_0.591	_0.757	_0.849	_0.811	_0.834
_AP75	_0.752	_0.826	_0.773	_0.836	_0.780	_0.444	_0.823	_0.751	_0.559	_0.737	_0.810	_0.755	_0.793

Figure 5 shows the results of clothes segmentation.

Figure 5: Results of clothes segmentation.

Consumer-to-Shop Clothes Retrieval

Given a detected item from a consumer-taken photo, this task aims to search the commercial images in the gallery for the items that are corresponding to this detected item. In this task, top-k retrieval accuracy is employed as the evaluation metric. We emphasize the retrieval performance while still consider the influence of detector. If a clothing item fails to be detected, this query item is counted as missed.

Table 8: Consumer-to-Shop Clothes Retrieval trained with released DeepFashion2 Dataset using detected box evaluated on validation set.

	Top-1	Top-5	Top-10	Top-15	Top-20
class	0.079	0.198	0.273	0.329	0.366
keypoints	0.182	0.326	0.416	0.469	0.510
segmentation	0.135	0.271	0.350	0.407	0.447
class+keys	0.192	0.345	0.435	0.488	0.524
class+seg	0.152	0.295	0.379	0.435	0.477

Table 9: Consumer-to-Shop Clothes Retrieval on different subsets of some validation consumer-taken images. Each query item in these images has over 5 identical clothing items in validation commercial images. Results of evaluation on ground truth box and detected box are separately shown in each row. The evaluation metrics are top-20 accuracy.

		_Scale			_Occlusion			_{Zoom_in}			_Viewpoint			_Overall
	_small	_moderate	_large	_slight	_medium	_heavy	_no	_medium	_large	_{no wear}	_frontal	_{side or back}	_top-1	_top-10	_top-20
_class	_0.520 0.485	_0.630 0.537	_0.540 0.502	_0.572 0.527	_0.563 0.508	_0.558 0.383	_0.618 0.553	_0.547 0.496	_0.444 0.405	_0.546 0.499	_0.584 0.523	_0.533 0.487	_0.102 0.091	_0.361 0.312	_0.470 0.415
_pose	_0.721 0.637	_0.778 0.702	_0.735 0.691	_0.756 0.710	_0.737 0.670	_0.728 0.580	_0.775 0.710	_0.751 0.701	_0.621 0.560	_0.731 0.690	_0.763 0.700	_0.711 0.645	_0.264 0.243	_0.562 0.497	_0.654 0.588
_mask	_0.624 0.552	_0.714 0.657	_0.646 0.608	_0.675 0.639	_0.651 0.593	_0.632 0.555	_0.711 0.654	_0.655 0.613	_0.526 0.495	_0.644 0.615	_0.682 0.630	_0.637 0.565	_0.193 0.186	_0.474 0.422	_0.571 0.520
_pose+class	_0.752 0.691	_0.786 0.730	_0.733 0.705	_0.754 0.725	_0.750 0.706	_0.728 0.605	_0.789 0.746	_0.750 0.709	_0.620 0.582	_0.726 0.699	_0.771 0.723	_0.719 0.684	_0.268 0.244	_0.574 0.522	_0.665 0.617
_mask+class	_0.656 0.610	_0.728 0.666	_0.687 0.649	_0.714 0.676	_0.676 0.623	_0.654 0.549	_0.725 0.674	_0.702 0.655	_0.565 0.536	_0.684 0.648	_0.712 0.661	_0.658 0.604	_0.212 0.208	_0.496 0.451	_0.595 0.542

Figure 6 shows queries with top-5 retrieved clothing items. The first and the seventh column are the images from the customers with bounding boxes predicted by detection module, and the second to the sixth columns and the eighth to the twelfth columns show the retrieval results from the store.

Figure 6: Results of clothes retrieval.

Citation

If you use the DeepFashion2 dataset in your work, please cite it as:

@article{DeepFashion2,
  author = {Yuying Ge and Ruimao Zhang and Lingyun Wu and Xiaogang Wang and Xiaoou Tang and Ping Luo},
  title={A Versatile Benchmark for Detection, Pose Estimation, Segmentation and Re-Identification of Clothing Images},
  journal={CVPR},
  year={2019}
}

I'm trying to work with the dataset, let's say I have this annotation file

{
  'item2': {
    'segmentation': [
      [1, 2, 1, 17, 94, 58, 128, 2, 163, 2, 180, 86, 203, 173, 370, 149, 490, 81, 463, 1, 1, 2],
      [1, 2, 1, 17, 94, 58, 128, 2, 1, 2]
    ],
    'scale': 2,
    'viewpoint': 2,
    'zoom_in': 3,
    'landmarks': [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 94, 58, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 180, 86, 2, 203, 173, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
    'style': 0,
    'bounding_box': [0, 0, 495, 179],
    'category_id': 1,
    'occlusion': 1,
    'category_name': 'short sleeve top'
  },
  'source': 'shop',
  'pair_id': 1811,
  'item1': {
    'segmentation': [
      [237, 160, 378, 153, 461, 92, 519, 214, 535, 348, 440, 428, 292, 420, 247, 309, 237, 160]
    ],
    'scale': 2,
    'viewpoint': 2,
    'zoom_in': 1,
    'landmarks': [237, 160, 1, 378, 153, 2, 461, 92, 1, 247, 309, 2, 292, 420, 2, 440, 428, 2, 535, 348, 2, 519, 214, 2],
    'style': 1,
    'bounding_box': [227, 86, 543, 455],
    'category_id': 9,
    'occlusion': 2,
    'category_name': 'skirt'
  }
}

Which pair_id will belong to which items? There are many cases when I saw 5,6 item in an image, it's really confusing to find the correct pair_id with the correct item.

Hope you can clarify this for me. Thanks

How to visualize with result json file?

Hi. Thanks for sharing such a nice codes.

I'm a student who studies CV and I read it from paperswithcode.

I wanna visualize this code but I have no idea.

I understood that result json files are the information of bbox, segm etc...

But I don't know how to visualize with it....

If you've already uploaded the code to visualize, then I'm sorry that I haven't read it in details

opened by seokhyeonSong 15
Which tool did you use to annotate the image?

Thanks for your kindness to share the dataset. Could you please tell me which tool did you used to annotate the dataset? I want to add some custom data in the same format.

opened by asurada404 3
Different number of keypoins

Hi! Keypoint’s location is modeled as a one-hot mask. And it is done with Conv or ConvTranspose layer, where output_channel_size = num_keypoints. I see here: 1 2 But number of keypoints in DeepFashion2 is different for different classes (for example, in COCO dataset num_keypoints=17). How is this problem solved @geyuying ?

opened by amirassov 3

What is the pair_id for each items ?

opened by lamhoangtung 3

Image number error?

My unzipped images number of train/validation/test sets are:

(191961, 32153, 62629)

which is smaller than that described on the paper. is that my fault? I wonder why?

opened by Isaver23 2
Discrepancy is the number of images in the zip vs. discrepancy

Just download and unzipped the dataset zip files. The training set has 191961 images and the validation set has 32153. The description in the README.md says The dataset is split into a training set (391K images), a validation set (34k images), and a test set (67k images).

Is there going to be a second release with the remainder of the images?

opened by twosatnams 2
Commercial Usage To Build Models

Hi,

In the Google Form received via email, it is mentioned the dataset is available only for non-commerical research purpose. Is there any commercial license available for the dataset, if we need to use the data to create some models for a commercial product.

opened by drreddy 2
which features are fed into the matching network?

Hey guys, fantastic work.

I have a question about the paper. You feed the output of ROIAlign into the matching network. I'm having trouble understanding figure 4. How is the input for the matching network of a single image an NxNx256 tensor? N is the number of garment classes, correct? The output of ROIAlign is either 7x7x256 or 14x14x256 (depending on if you take the bbox stream or mask stream). How are you getting NxN?

Thanks!

opened by shaayaansayed 2
Interested in using this dataset for a larger project

My team is working on DARPA's Learning with Less Labeling program. We are responsible for collecting and making datasets available to about 15 teams from multiple organizations that will use them0 for development work, and would like to include the DeepFashion2 dataset. Unfortunately, your license currently restricts us from sharing this dataset outside of our organization. If you could contact me to discuss how we can share this dataset, I would really appreciate it. My email is on my github profile at https://github.com/wmburke.

opened by wmburke 1
How to visualize? (final)
Sorry for frequent issue makings...

I had similar issue before at here https://github.com/switchablenorms/DeepFashion2/issues/20

and I thought that i have to train and visualize as Detectron

And finally i trained and visualized as Detectron, I got accurate bbox but categories are different which

means skirt -> boat, shirt -> person like this

I realized that it was wrong with using detectron

And I read intensively your git

I think that

make coco type json with tools/deepfashion2coco.py

train with main.py and and make color splashed image(I think it is optional and usage is segmentation right?)

visualize with lib/visualize.py but I cannot find how to use visualize.py

there's no and arguments or init or main ....

to visualize with detectron it need pkl type weight and yaml type configs

but match_rcnn makes h5 type weight and relative configs...

Would you please how to visualize with key & segmentation & bbox? ex) which code can execute visualize.py or which can transform h5 and config into pkl and yaml type ...
opened by seokhyeonSong 1
Codalab competition website roll back

Hi,

Have you guys ever noticed that the competition website is broken and the new website is based on a back up several months ago? There is no Deepfashion2 challenge on the new website as well as our test results.

I believe that there is a necessary to recreate your challenge and maybe adjust the deadline.

You can refer to this issue https://github.com/codalab/codalab-competitions/issues/2636.

opened by Trueyellow 1
Invalid password for unzipping the dataset

I just filled the form for getting the password for unzipping the dataset and it is telling that the password for unzipping is incorrect. Any help is appreciated. Thank you.

opened by devashish-bhake 0
Deepfashion train or validation images were used in Deepfashion 2 test dataset?

I would like do know if there is any intersection between Deepfashion and Deepfashion 2.

To evaluate a model that was trained with Deepfashion dataset using Deepfashion 2 test images, I need to be sure that no images from Deepfashion 1 train and validation dataset are igual to Deepfashion 2 test images.

The reason for use Deepfashion 2 test dataset is because of the better quality and number of labels.

In the paper there is only this information:

“Raw data of DeepFashion2 are collected from two sources including DeepFashion [14] and online shopping websites. In particular, images of each consumer-to-shop pair in DeepFashion are included in DeepFashion2, while the other images are removed.” pag [3]

Thank you.

opened by Vinicius-ufsc 0
detect clothing color?

Is there any way to use this dataset to detect the color of clothing? I essentially want to run this through a dataset to find all the images containing "blue top" or "purple pants". It seems like this dataset might just tell you that different clothing of the same type has different "style", but not classify the style

style: a number to distinguish between clothing items from images with the same pair id. Clothing items with different style numbers from images with the same pair id have different styles such as color, printing, and logo. In this way, a clothing item from shop images and a clothing item from user image are positive commercial-consumer pair if they have the same style number greater than 0 and they are from images with the same pair id.(If you are confused with style, please refer to issue#10.)

opened by andykais 2
how to get bounding box annotations using deep fashion dataset?

Hi, I tried to train yolo model with deep fashion dataset but bounding box values shows wrong annotation when I check with roboflow , any body know how to retrieve bounding box values from these dataset

Thanks

opened by mohamednihal 0