Dept. of Robotics Science and Technology,
Chubu University


Team MC^2 : ARC2017 RGB-D Dataset

The Amazon Robotics Challenge (ARC) is a competition for automation of product distribution posed as a problem in which a robot must recognize and pick a specified item among items displayed on a storage shelf or in a tote. The dataset used by Team MC^2 for the ARC2017 is published on this page. The dataset consists of two types of data: RGB images and depth images acquired by Mitsubishi MELFA-3D Vision. For quantitative evaluation, a ground truth image labeled with a specified color and a text file including bounding box annotation are provided for each item. 3D models created by the Ortery Technologies, Inc. 3D MFP for each item are also available for the data augmentation.



Overview of Dataset

The 40 types of items used in the ARC2017 have various shapes and other attributes, including items packaged in boxes and vinyl, deformable items, etc.

Dataset structure

The dataset was created assuming that there are multiple items in the tote or one item in the tote. The dataset is divided into training data and test data assuming learning and reasoning of machine learning or deep learning.

* Test data: images with as many items in the tote as possible were shot twice (100 scenes × 2 shots = 200 images)
* Training data: images with the same scene as the test data (400 scenes × 2 shots = 800 images) and images with only 1 item (10 shots for each item = 410 images)

Folder structure of two types of scenes is shown below.
The folder is divided into test data and training data. Furthermore, the test data and training data are divided into bounding box data, RGB images, depth images, and labeling data.
For 3D model data, folders are divided for each item.


Data formats

The data formats are listed below.

RGB image
・File name “[test|train]/rgb/2017-XXX-XX.bmp”
・Size “1280×960 pixels”
Depth image
・File name “[test|train]/depth/2017-XXX-XX_d.bmp”
・Size “1280×960 pixels”
・Represents distance as values between 0 and 255 in a grayscale image
Bounding boxes data
・File name “[test|train]/boundingbox/2017-XXX-XX.txt”
・The item ID and the coordinates of the bounding box are written in the text file.
・The coordinates are described in the order of “[Box center X], [Box center Y], [Box width], [Box height]”. The bounding box data are also normalized from 0 to 1.


Label image (semantic segmentation ground truth)
・File name “[test|train]/segmentation/2017-XXX-XX_s.bmp”
・Size “1280×960 pixles”

A label image is created for each RGB image. For each item, a region corresponding to the item region is distinguished from the other regions by painting it with a specified color. The background region is painted black.
A number corresponding to each item of the labeling data is stored in the “ItemList.pdf” file, and the color code for each item is stored in the “SegmentationTeach” file. The color code is displayed in the table below, too.

Item ID R G B Color Item Image
BG 0 0 0
Item 1 85 0 0
Item 2 170 0 0
Item 3 255 0 0
Item 4 0 85 0
Item 5 85 85 0
Item 6 170 85 0
Item 7 255 85 0
Item 8 0 170 0
Item 9 85 170 0
Item ID R G B Color Item Image
Item 10 170 170 0
Item 11 255 170 0
Item 12 0 255 0
Item 13 85 255 0
Item 14 170 255 0
Item 15 255 255 0
Item 16 0 0 85
Item 17 85 0 85
Item 18 170 0 85
Item 19 255 0 85
Item ID R G B Color Item Image
Item 20 0 85 85
Item 21 85 85 85
Item 22 170 85 85
Item 23 255 85 85
Item 24 0 170 85
Item 25 85 170 85
Item 26 170 170 85
Item 27 255 170 85
Item 28 0 255 85
Item 29 85 255 85
Item ID R G B Color Item Image
Item 30 170 255 85
Item 31 255 255 85
Item 32 0 0 170
Item 33 85 0 170
Item 34 170 0 170
Item 35 255 0 170
Item 36 0 85 170
Item 37 85 85 170
Item 38 170 85 170
Item 39 255 85 170
Item 40 0 170 170

3D Models
・File name “3Dmodels/itemXX/itemXX.obj”
・Browsable by MeshLab, XCode, etc.
・Material data “itemXX.mtl” and texture data “itemXX_mapX.jpg” are attached.
The material data and texture data should be placed in the same directory as the obj file.

Evaluation method

For robot picking, estimation of the 6D pose of the item is ideal. However, this dataset includes non-rigid-body items and other items for which 6D pose estimation can be difficult.
We therefore used two kinds of evaluation methods: evaluation by overlap ratio (intersection over union (IoU)) of bounding box and evaluation by segmentation region.

  1. Evaluation of bounding box by IoU
    In the evaluation of the bounding box by IoU, we evaluate by comparing the overlap rate of the detected bounding box with that of the bounding box of the ground truth.
    For each detected bounding box, we compare it with the bounding boxes of all ground truth existing in the image with the IoU.
    The ground truth bounding box that has the highest IoU is assumed to be the estimated bounding box.
    When the class of this box and the class of the detected box are the same, the detection is successful, and if they are different, the detection has failed.
  2. Evaluation by segmentation regionEvaluation by segmentation region
    In evaluation by segmentation region, evaluation is performed by comparing the depth image segmentation results and the labeling data. For the segmentation evaluation, the segmentation results and the labeling data are compared in one-pixel units, and the detection rate and precision are calculated for all of the pixels in one image. From the calculated values, an F value is obtained. The segmentation is evaluated as successful if the F value is equal to or higher than a threshold (default: 0.5); otherwise, it is evaluated as failed.


All images are in bmp format, but because of their large size, we also prepared images converted to png format.
・Full Dataset: Team MC^2 : ARC2017 RGB-D Dataset (tar.gz, 4.30GiB)
・Full Dataset(png): Team MC^2 : ARC2017 RGB-D Dataset (png) (zip, 2.97GiB)
・Item list: ItemList.pdf (262KiB)
・Segmentation color code list: SegmentationTeach.pdf (310KiB) / SegmentationTeach.xlsx (331KiB)
・Evaluation code(Bounding Box): (9KiB)
・Evaluation code(Segmentation): Coming soon

How to use evaluation code

The code is different for each evaluation method.

Evaluation of bounding box by IoU

This evaluation code requires the libraries listed below.
・Python 2.x (recommended version >= 2.7.11)
・OpenCV 2.x
・numpy (recommended version >= 1.10)
・matplotlib (recommended version >= 1.4)

Preparetion of data
The evaluation code requires the data listed below.
・Put the test image in ./images/
・Put the detection results file (txt) in ./results/
・Put the ground truth data file (txt) in ./teach/

The test image and the ground truth of the bounding box can use the data in the “test” folder of the dataset.
The detection result should be in a format where each value is separated with a space. The evaluation code supports the following three formats.

1. Same format as ground truth

Class_ID Center_of_Box_X_axis Center_of_Box_Y_axis Width_of_Box Height_of_Box

The coordinates, width, and height must be normalized with 0 to 1 (divide by image size).

2. Format in which class name is described next to class ID

Class_ID Class_name Center_of_Box_X_axis Center_of_Box_Y_axis Width_of_Box Height_of_Box

The space character cannot be used in the class name. And the variable LABEL_FLAG of the program must be set to 1.

3. Format in which box information is described by coordinates of upper left and lower right

Class_ID Upper_left_of_box_X Upper_left_of_box_Y Lower_right_of_box_X Lower_right_of_box_Y

Set the program variable NORMALIZED to 0.
If the class name is included (as in format 2 above), you can deal with this by setting the program variable LABEL_FLAG to 1.

Program change
Please change “” as below.
IMAGE_EXT = “png” : Extension of test image( png, bmp )
WAITTIME = 0 : Number of milliseconds in OpenCV’s WaitKey() function
If you set this value to about 100, you can check the evaluation flow. If you set this to 0, the program does not check the evaluation flow and evaluates immediately.
LABEL_FLAG = 0 : Set to 1 if the detection result has a class name.
NORMALIZED = 1 : Set to 0 if the detection result is not normalized.

Please change “” as below.
THRESH = 0.55 : Threshold of IoU for a detection to be judged as successful

How to run
Please run “”; this program calculates the IoU of the detection results with all the the ground truth boxes included in this image. The ground truth box with the highest IoU is assumed to correspond to the box whose detection results we are trying to estimate. For the found box class and detected box, calculate the IoU of the two boxes. This calculates the results, prediction class, and ground truth class. Write it to a text file.
* If you will run this program again, please delete the whole folder of “./results/matchingResults/”.

Next, please run “”. This program calculates the confusion matrix using the text files obtained by “”. A successful detection is deemed to be when the IoU of the teacher signal and detection result exceeds a threshold value and the detected class is correct. When the calculation has finished, the program outputs the confusion matrix in the form of an image and the following 3 values.

* Recognition rate: the recognition rate excluding missed detections
* Miss rate: the rate of missed detections
* Mean IoU: average of all IoU values including false detections

Evaluation by segmentation region

Coming soon.

Evaluation of Team MC^2 recognition method

We used the code described above to evaluate the detection results of our method.

Evaluation Method
Single-Shot Multibox Detector(SSD): Detect object candidate regions and performs class estimation with a single network. It uses a “default box” to detect object candidate regions. The default box is applied to feature maps with multiple resolutions.
We propose a model that improves on SSD. This improved SSD uses estimations of “objectness” to facilitate the detection of item (i.e., classification into objects and non-objects) that are not included in the learning data (unknown class object). The architecture of SSD incorporating objectness estimations is quite simple and close to that of the original SSD. In addition to the object candidate region estimator and item classifier, we added an objectness discriminator. The objectness discriminator is exactly the same as the item classifier, and classifies images into two classes: object-like, and not object-like. Of these, the likelihood of something being classed as “object-like” is called its objectness, and if this exceeds a threshold value then it is judged to be an object. For a box that is judged to be object-like, the likelihood of it containing an item is checked, and the class with a high likelihood is output as the final detection result. At this time, if the likelihood of each class is extremely low, or if the likelihood of the background class is high, the item is deemed to be an unknown object.

Evaluation Result
Recognition rate: 0.890423572744
Miss rate: 0.256164383562
Mean IoU:0.792050506884

Confusion Matrix


*We would like to continue adding new evaluation results, so we welcome all contributions of evaluation results for this Dataset.


Although due care was taken in preparing this information on the published program and data, we make no guarantee concerning that information. The Machine Perception and Robotics Group, which created the program, the materials, and the Web site, accept no responsibility for any damage that may be caused by their use or by browsing this site. We also accept no responsibility for any damage that may arise from program revisions or redistribution. With agreement to these disclaimers, the published program may be used for research purposes only and under the responsibility of the user. For communication concerning commercial use, please contact the following person.