|
Terms
of Use
Home
Registration
Login/Download
SIEMENS
Siemens
Medical Solutions USA
Computer Aided Detection (CAD)
Soarian Quality Measures
KDD CUP 2008
Background on Breast Cancer
Data Description
Challenge Description
Hints
Workshop on Mining Medical Data
Important Dates
Contact/FAQ
|
Hints and Machine Learning ideas that
may be useful for the challenge
The obvious
method of classification is to try to build classifiers that simply label
each candidate independently. Below we present a few ideas that
participants in the challenge may want to consider to potentially improve
their algorithms.
1.
Leverage two views of the same breast: Almost always, a cancerous
lesion is visible in both views (MLO, CC) of the breast – radiologists
routinely try to correlate the two views while diagnosing the patient. In rare cases, however, some lesions may
only be visible in one view, especially in certain areas of the breast.
However, negative candidates may either be present in one view (e.g., for
image artifacts) or in both views (e.g., if generated by the presence of
benign cyst).
Unfortunately,
since each view is a 2D image obtained from an orthogonal direction, it is
not possible to perfectly register (i.e., correlate the locations across)
the X-ray images using simple algorithms, e.g., using affine
transformations. However, some of a lesion’s features are typically
preserved across the two views; particularly, the distance of a lesion from
the nipple, and perhaps some of the features themselves relating to size of
the lesion, texture, etc. Thus the first idea that may be useful for this
challenge is to develop algorithms that simultaneously classify candidates
from a pair of images from the same breast. These algorithms could try to
exploit correlations in classification decisions for the same region of a
breast. To support this, training and testing data sets will include
features that identify the (x,y)
location of the nipple as well as the (x,y)
location of the candidate.
2. Class
Imbalance: Participants will be able to leverage ideas from classifier
design under extreme class imbalance (the vast majority of the regions are
normal, and only a small fraction of the regions are actually malignant),
and feature selection (a large number of features are proposed and several
of them may not be very useful for the task). The prevalence rate
(malignant patients as a fraction of all patients) may differ between the
training and testing sets.
3. Exploit
correlations within an image: Participants may develop novel algorithms
for exploiting potential correlations between the diagnoses of suspicious
regions within a single image (e.g. if they are spatially adjacent).
4. Optimize
AUC only in narrow FP range: It may be useful to develop training algorithms
to maximize the area under the ROC curve (AUC) in a clinically relevant
false positive (FP) range, a problem that has not been adequately addressed
in the machine learning/data-mining current literature.
|