|
Terms
of Use
Home
Registration
Login/Download
SIEMENS
Siemens
Medical Solutions USA
Computer Aided Detection (CAD)
Soarian
Quality Measures
KDD CUP 2008
Background on Breast Cancer
Data Description
Challenge Description
Hints
Workshop on Mining Medical Data
Important Dates
Contact/FAQ
|
Data
for KDD Cup 2008
Data
A breast cancer screen typically
consists of 4 X-ray images; 2 images of each breast from different directions
(these views are called MLO and CC). Thus, most (but not all) patients
would have MLO and CC images of both their breasts, giving a total of 4
images per patient. For the purposes of the KDD Cup, each image is
represented by several candidates (see stage 1 above). For each candidate,
we provide the image ID and the patient ID, (x,y) location, several
features, and a class label indicating whether or not it is malignant. We
provide features computed from several standard image processing algorithms
– 117 in all – but due to confidentiality reasons we are unable to provide
some additional proprietary features. The labels indicate whether a
candidate is malignant or benign (based on either a radiologist’s
interpretation or a biopsy or both). Note that several candidates can
correspond to the same lesion. Thus, we also provide a unique lesion-ID for
the malignant lesions in the training data. However lesion-ID information
will not be included in the test data.
Training
Data:
To support this KDD Cup challenge,
training information is provided for a set of 118 malignant patients
(patients with at least one malignant mass lesion). We also include data
from 1594 normal patients – where all candidates are presumed to be benign.
The training set consists of a total of 102,294 candidate ROIs, each
described by 117 features, but only an extremely small fraction of these
102,294 candidates is actually malignant.
Test
Data:
We provide data from over 1000
patients in the same format, except no class label or lesion-ID will be
provided.
Supporting
software:
We will provide a software function
written in Matlab for plotting Free Response Receiver Operating Curves
(FROC). This function plots the sensitivity with which malignant cancers
are detected (on the y-axis) versus the average number of false alarms (on
the x-axis). A malignant cancer is correctly identified if at least one of
the examples corresponding to the lesion is labeled as malignant by the
classification algorithm.
NOTE: In order to better distinguish between the
participants’ entries, the training and testing data have been enriched
with some difficult cases; further, proprietary features are not included
in the dataset. The accuracy of the participants’ entries to KDD Cup 2008
should not be considered to be representative of the underlying Siemens CAD
software that generated the features.
|