ImperialAIEchocardiographyDataset_2020-12-05
Unity Imaging Collaborative

folder ImperialAIEchocardiographyDataset_2020-12-05 (2 files)
filelabels.zip 6.29MB
filepng-cache.zip 1.27GB
Type: Dataset

Bibtex:
@article{,
title= {ImperialAIEchocardiographyDataset_2020-12-05},
keywords= {echocardiography dataset, cardiac ultrasound, 2D echo images, echocardiography segmentation, Imperial AI Echo Dataset},
author= {Unity Imaging Collaborative},
abstract= {This is the latest versions of the datasets and code. They are constantly being added to. The code lives on github.

Download 2020-12-05 release:

Unity Imaging Echocardiography Model Development Dataset Images: Download
Unity Imaging Echocardiography Model Development Dataset Labels: Download
Unity Imaging Code: https://github.com/UnityImaging
For reproducibility, specific snapshots of the datasets and code used for publication are below.




Images - png-cache.zip
1) We curate a collection of DICOM files that will contribute to a dataset.

2) Each DICOM file is assigned to a dataset class - currently there are two

01 - development - training / tuning / internal validation images
02 - external validation images
3) Each DICOM file is given a 64 character hexadecimal code, e.g. 4d44413619e0161c5ab795bc1b899f7fb4bd0b2f5ab2efc881ecfc663d3bfb66

4) Each image within a DICOM (typically an individual frame for echo) gets given a number padded to 4 digits, starting from 0000 and going to 9999.

5) These images are extracted from the DICOM file, burnt-in meta-data masked, and saved as a png with their code as a filename - e.g. 01-4d44413619e0161c5ab795bc1b899f7fb4bd0b2f5ab2efc881ecfc663d3bfb66-0000.png

6) The individual images that make up a dataset for a paper are saved in a folder called png-cache, with sub directories for the dataset class (e.g. /01) and then the first two pairs of hexadecimal digits (e.g. /4d/44), i.e. /png-cache/01/4d/44/4d44413619e0161c5ab795bc1b899f7fb4bd0b2f5ab2efc881ecfc663d3bfb66-0000.png

7) This folder is then compressed to form png-cache.zip

Not all files may have an associated label - e.g. all the frames of a video may be included, but only a few of them have expert labels

Labels - labels.zip
These are stored as JSON files. The development dataset (provided as labels-all.json) is divided up into:

labels-train.json - training
labels-tune.json - tuning
labels-ival.json - internal validation
For each image file (which acts as the key), there is a dictionary for every possible label. Each label for an image may have a type of:

"off": the structure is definitely not in the image - i.e the outputs would be expected to be all zeros
"blurred": the structure is might be in the image, but there is no label available (either it was too blurry, or no one has tried to label it) - i.e the output would need to be masked from the loss function
"point": the structure is a single point, with the x and y coordinate from the x and y keys
"curve": the structure is a curve, repreesnted as a cubic spline, with the x and y coordinates of the control points in the x and y keys
For convenience each of the .json files have an equivalent .txt file with a list of the contained images.},
terms= {},
license= {https://creativecommons.org/licenses/by-nc-nd/4.0/},
superseded= {},
url= {https://data.unityimaging.net/}
}

10 day statistics (5 downloads)
Average Time 10 mins, 04 secs
Average Speed 2.11MB/s
Best Time 5 mins, 33 secs
Best Speed 3.83MB/s
Worst Time 16 mins, 50 secs
Worst Speed 1.26MB/s

Send Feedback Start
   0.000006
DB Connect
   0.000444
Lookup hash in DB
   0.000405
Get torrent details
   0.000126
Get torrent details, finished
   0.000200
Get authors
   0.000044
Parse bibtex
   0.000127
Write header
   0.000185
get stars
   0.000105
home tab
   0.000292
render right panel
   0.000006
render ads
   0.000363
fetch current hosters
   0.000221
Start get stats
   0.000435
End get stats
   0.000001
related datasets
   0.004550
Done