@article{,
title= {THE NORB DATASET, V1.0},
journal= {},
author= {Fu Jie Huang and Yann LeCun},
year= {},
url= {http://www.cs.nyu.edu/~ylclab/data/norb-v1.0/},
abstract= {This database is intended for experiments in 3D object reocgnition from shape. It contains images of 50 toys belonging to 5 generic categories: four-legged animals, human figures, airplanes, trucks, and cars. The objects were imaged by two cameras under 6 lighting conditions, 9 elevations (30 to 70 degrees every 5 degrees), and 18 azimuths (0 to 340 every 20 degrees).
The training set is composed of 5 instances of each category (instances 4, 6, 7, 8 and 9), and the test set of the remaining 5 instances (instances 0, 1, 2, 3, and 5).
CONTENT
The files are gzipped for download purpose. After uncompressed, they are in a simple binary matrix format, with file postfix ".mat". The file format is explained in a later section.
The "-dat" files store the image sequences. The "-cat" files store the corresponding category of the images. Each "-dat" file stores 29,160 image pairs (6 categories, 5 instances, 6 lightings, 9 elevations, and 18 azimuths). The 6-th category is for images without objects, which can be used to train a system to reject images as none of the 5 object categories. Each corresponding "-cat" file contains 29,160 category labels (0 for animal, 1 for human, 2 for plane, 3 for truck, 4 for car, 5 for blank).
Each "-info" file stores 29,160 10-dimensional vectors, which contain additional information about the corresponding images. The first 4 elements in the vector are:
- 1. the instance in the category (0 to 9)
- 2. the elevation (0 to 8, which mean cameras are 30, 35,40,45,50,55,60,65,70 degrees from the horizontal respectively)
- 3. the azimuth (0,2,4,...,34, multiply by 10 to get the azimuth in degrees)
- 4. the lighting condition (0 to 5)
and the next 6 elements describe the peturbations added to the object when superposed onto a cluttered background. (see next section)
For regular training and testing, "-dat" and "-cat" files are sufficient. "-info" files are provided in case some other forms of classification or preprocessing are needed.
JITTERED OBJECTS AND CLUTTERED BACKGROUND
After capturing, each image has been processed so that the object is centered in the image (the center of mass of object pixels are in the center of the image), scaled so that the bounding box is roughly 80x80 pixels, and placed on a uniform background, including the cast shadow.
And then 3 sources of variations are added to the data set: - the objects are peturbed - the objects are superposed onto complex background - distractor objects are added to the background
The objects are randomly peturbed in 5 ways. They are scaled by factors between 0.78 to 1.0; in-plane rotated -5 to +5 degrees; and shifted -6 to +6 pixels horizontally and vertically. The image intensities (in the range of 0 to 255) are a random value between -20 to +20; image contrasts are scaled in the range of 0.8 to 1.3. The peturbations are stored in the last 6 elements in the "-info" files:
- 5. horizontal shift (-6 to +6)
- 6. vertical shift (-6 to +6)
- 7. lumination change (-20 to +20)
- 8. contrast (0.8 to 1.3)
- 9. object scale (0.78 to 1.0)
- 10. rotation (-5 to +5 degrees)
The complex background images are extracted from a subset of natural scene images from Corel image library. The images contain scenes with large region contrasts such as lake against moutain, and irregular region boundaries.
One distractor object is added to each image. The distractor is located toward the boundary of the image, but can clutter the main object in the center.
There are images with only background and distractor objects. These images belong to their own category, as indicated in the category files.
FILE FORMAT
The files are stored in the so-called "binary matrix" file format, which is a simple format for vectors and multidimensional matrices of various element types. Binary matrix files begin with a file header which describes the type and size of the matrix, and then comes the binary image of the matrix.
The header is best described by a C structure:
struct header {
int magic; // 4 bytes
int ndim; // 4 bytes, little endian
int dim[3];
};
Note that when the matrix has less than 3 dimensions, say, it's a 1D vector, then dim[1] and dim[2] are both 1. When the matrix has more than 3 dimensions, the header will be followed by further dimension size information. Otherwise, after the file header comes the matrix data, which is stored with the index in the last dimension changes the fastest.
The magic number encodes the element type of the matrix:
- 0x1E3D4C51 for a single precision matrix
- 0x1E3D4C52 for a packed matrix
- 0x1E3D4C53 for a double precision matrix
- 0x1E3D4C54 for an integer matrix
- 0x1E3D4C55 for a byte matrix
- 0x1E3D4C56 for a short matrix
Since the files are generated on an Intel machine, they use the little-endian scheme to encode the 4-byte integers. Pay attention when you read the files on machines that use big-endian.
- The "-dat" files store a 4D tensor of dimensions 29160x2x108x108.
- The "-cat" files store a 1D vector of dimension 29,160.
- The "-info" files store a 2D matrix of dimensions 29160x10.
Here's a piece of Matlab code to show how to read some example files. (to avoid the endian confusion, we read bytes of the header):
>> fid=fopen('norb-5x46789x9x18x6x2x108x108-training-10-dat.mat','r');
>> fread(fid,4,'uchar'); % result = [85 76 61 30], it's a byte matrix
>> fread(fid,4,'uchar'); % result = [4 0 0 0], ndim = 4
>> fread(fid,4,'uchar'); % result = [232 113 0 0], dim0 = 29160 (=113*256+232)
>> fread(fid,4,'uchar'); % result = [2 0 0 0], dim1 = 2
>> fread(fid,4,'uchar'); % result = [108 0 0 0], dim2 = 108
>> fread(fid,4,'uchar'); % result = [108 0 0 0], dim3 = 108
>> imshow(transpose(reshape(fread(fid,108*108),108,108)),[0 255]); % show the first image
>> fid=fopen('norb-5x46789x9x18x6x2x108x108-training-10-cat.mat','r');
>> fread(fid,4,'uchar'); % [84 76 61 30], integer matrix
>> fread(fid,4,'uchar'); % [1 0 0 0] ndim = 1
>> fread(fid,4,'uchar'); % [232 113 0 0] dim0 = 29160 (=113*256+232)
>> fread(fid,4,'uchar'); % [1 0 0 0] (ignore this)
>> fread(fid,4,'uchar'); % [1 0 0 0] (ignore this)
>> fread(fid,10,'int'); % [0 1 2 3 4 5 0 1 2 3] (on little-endian CPU)
>> fid=fopen('norb-5x46789x9x18x6x2x108x108-training-10-info.mat','r');
>> fread(fid,4,'uchar'); % [84 76 61 30], integer matrix
>> fread(fid,4,'uchar'); % [2 0 0 0] ndim = 2
>> fread(fid,4,'uchar'); % [232 113 0 0] dim0 = 29160 (=113*256+232)
>> fread(fid,4,'uchar'); % [10 0 0 0] dim1 = 10
>> fread(fid,4,'uchar'); % [1 0 0 0] (ignore this)
>> fread(fid,10,'int'); % [8 5 10 4 -3 0 -6 1 0 -4] (on little-endian CPU)
},
keywords= {},
terms= {This database is provided for research purposes. It cannot be sold. Publications that include results obtained with this database should reference the following paper:
Y. LeCun, F.J. Huang, L. Bottou, Learning Methods for Generic Object Recognition with Invariance to Pose and Lighting. CVPR 2004.
}
}