|
Terms
of Use
Home
Registration
Login/Download
SIEMENS
Siemens
Medical Solutions USA
Computer Aided Detection (CAD)
Soarian
Quality Measures
KDD CUP 2008
Background on Breast Cancer
Data Description
Challenge Description
Hints
Workshop on Mining Medical Data
Important Dates
Contact/FAQ
|
Background
Breast cancer is a disease in which malignant (cancer)
cells form in the tissues of the breast. Breast cancer is the second
leading cause of cancer deaths in women today (after lung cancer) and is
the most common cancer among women, except for skin cancers. About 1.3 million women are expected to
be diagnosed annually with breast cancer worldwide, and about 465,000 will
die from the disease. In the United States
alone, in 2007 an estimated 240,510 women were expected to be diagnosed
with breast cancer, and 40,460 women are expected to have died from breast
cancer.
Screening is looking for cancer in
asymptomatic people – i.e., before a person has any symptoms of the
disease. Cancer screening can help find cancer at an early stage. When
abnormal tissue or cancer is found early, it is often easier to treat. By
the time symptoms appear, cancer may have begun to spread. The good news is that breast cancer death
rates have been dropping steadily since 1990, both because of earlier
detection via screening and better treatments.
The most
common breast cancer screening test is a mammogram.A mammogram
is an x-ray of the breast. The ability of a mammogram to find breast cancer
may depend on the size of the tumor, the density of the breast tissue, and
the skill of the radiologist. The mammogram is considered the standard of
care for most asymptomatic women. For instance, in the US
, insurance companies routinely reimburse for an annual screening
mammography examination, for all asymptomatic women over the age of 40.
These exams are credited with reducing the breast cancer death rate by
approximately 30% since 1990.
However,
the reading of screening mammograms is challenging. Findings
on a screening mammogram leading to further recall are identified in
approximately 5%-10% of patients, even though breast cancer is ultimately
confirmed in only three to ten cases in every 1,000 women screened. Perhaps
even more importantly, there
is compelling evidence that many breast cancers detected at screening
mammography are, in retrospect, visible on the previously obtained
mammograms but have been missed by the interpreting radiologist in the
prior year. There are several reasons for this: The complex radiographic structure of breast tissue,
particularly in dense breasts; the subtle nature of many mammographic
characteristics of early breast cancer; human oversight; poor quality films
and even fatigue or distraction are all reasons why cancer is not detected
by mammography.
To overcome
the known limitations of human observers, second (ie double) reading of
screening mammograms by another radiologist has been implemented at many
sites. Studies indicate a potential 4%-15% increase in the number of
cancers detected with double reading. In a radiology practice that performs
10,000 screening examinations per year, generally between 30-100 cancers
per year will be detected. Thus, double reading in this practice could
contribute to the diagnosis of 1-15 additional cancers per year. However,
this approach results in a doubling of the radiologist-effort so it is not
financially viable.
Rapid and
continuing advances in computer technology, as well as the ready adaptation
of radiology images to digital formats, have increased the interest in
computer prompting to enable the attending radiologist to act as his or her
own second reader. One very promising adaptation of computer-prompting
technology is computer-aided detection (CAD) in screening mammography.
Current CAD systems demonstrate a high rate of detecting cancerous features
on mammograms, but further improvements in both sensitivity and specificity
would lead to tremendous benefits both in terms of lives saved each year,
and in terms of reduction n the workload of radiologists. For the last 8-10
years, US insurance companies have begun to provide additional
reimbursement to mammographers who run CAD algorithms on the mammograms –
in other words, physicians are now reimbursed for running a machine
learning algorithm to help them better detect cancer.
In an
almost universal paradigm, the CAD problem is addressed by a 4 stage
system:
1.
candidate generation which identifies suspicious unhealthy candidate
regions of interest (candidate ROIs, or simply candidates) from a medical
image;
2. feature
extraction which computes descriptive features for each candidate so that
each candidate is represented by a vector x of numerical values or
attributes;
3.
classification which differentiates candidates that are malignant cancers
from the rest of the candidates based on x; and
4. visual
presentation of CAD findings to the radiologist.
In this
challenge, we focus on stage 3, learning the classifier to differentiate
malignant cancers from other candidates.
|