Cohort Follow-up Studies: Cardiovascular Disease
Screening for Disease
A tenet of public health is that primary prevention of disease is the best approach. If all cases of disease cannot be prevented, however, then the next best strategy is early detection of disease in asymptomatic, apparently healthy individuals. Screening is defined as the presumptive identification of unrecognized disease or defects by the application of tests, examinations, or other procedures that can be applied rapidly. The qualifier presumptive is included in the definition to emphasize the preliminary nature of screening; diagnostic confirmation is required, usually with the benefit of more thorough clinical examination and additional tests. As an illustration of screening, Figure 11–1 demonstrates a mammography (part A) and a blood pressure screening event (part B).
Some screening programs are conducted in order to screen interested and concerned individuals for specific health problems, such as hypertension, cervical cancer, or sickle-cell disease. An example of this type of screening program would be administration of a free thyroid test (serum level of thyroxine) to passersby in a shopping center or members of a senior citizens center.5 Other screening programs may be applied on a mass basis to almost all individuals in the population; an example is screening for phenylketonuria (PKU) among all neonates.
FIGURE 11–1 Mammography (part A) and a blood pressure screening event (part B).
Source: Reproduced from Centers for Disease Control and Prevention. Public Health Image Library. Image numbers 8295 and 7874. Available at http://phil.cdc.gov/phil/. Accessed April 19, 2012.
It should be noted that screening differs from diagnosis, which is the process of confirming an actual case of a disease.6,7 As a result of diagnosis, medical intervention, if appropriate, is initiated. Diagnostic tests are used in follow-up of positive screening test results (e.g., phenylalanine loading test in children positive on PKU screening) or directly for screening (e.g., fetal karyotyping in prenatal screening for Down syndrome). For example, if a thyroid test is administered to determine an exact cause of a patient’s illness, it would then be a diagnostic test.5 The thyroid test also could be a screening test, however, as will be demonstrated subsequently.
Screening for three types of cancer (breast, cervical, and colorectal) could go a long way in reducing mortality from these malignancies. In a typical year, 350,000 persons in the United States are diagnosed with these forms of cancer; 100,000 persons die from them each year. The U.S. Preventive Services Task Force (USPSTF) advocates for screening for these three types of cancer in order to reduce morbidity and mortality from them. Healthy People 2020 has established national targets for population levels of participation in screening tests. Figure 11–2, based on data from the National Health Interview Survey (NHIS), shows the percentage of men and women between the years 2000 and 2010 who were up-to-date on screening for breast cancer, cervical cancer, or colorectal cancer. Among the factors related to higher screening participation rates were education, screening availability, use of health care, and length of U.S. residence. Low rates of participation occurred among Asians in comparison with whites and blacks. In addition, persons of Hispanic descent were less likely than other groups to be screened for cervical and colorectal cancer.8
FIGURE 11–2 Percentage of men and women up-to-date on screening for breast, cervical, or colorectal cancer, via type of test, sex, and year—United States, 2000–2010.
Source: Reproduced from Centers for Disease Control and Prevention. Cancer screening–United States, 2010. MMWR. 2012; 61:42.
Multiphasic Screening
Although screening programs can be restricted to early detection of a single disease, a more cost-effective approach is to screen for more than one disease. Multiphasic screening is defined as the use of two or more screening tests together among large groups of people.9 The multiphasic screening examination may be administered as a pre-employment physical, and successfully passing the examination may be a necessary condition for employment in the organization. As an employee benefit, some companies repeat the screening examination on an annual basis and direct suggestive findings to the employee’s own physician while maintaining confidentiality of the results. Typical multiphasic screening programs assess risk factor status as well as individual and family history of illness, and they also collect physiologic and health measurements. Multiphasic screening also is a cornerstone of health maintenance organizations, such as Kaiser Permanente and Group Health Incorporated.
Mass Screening and Selective Screening
Mass screening (also known as population screening) refers to screening of total population groups on a large scale, regardless of any a priori information as to whether the individuals are members of a high-risk subset of the population. Selective screening, sometimes referred to as targeted screening, is applied to subsets of the population at high risk for disease or certain conditions as the result of family history, age, or previous exposures. It is likely to result in the greatest yield of true cases and represents the most economical utilization of screening measures. For example, screening tests for Tay-Sachs disease might be applied to individuals of Jewish extraction whose ancestors originated in Eastern Europe because this group has a higher frequency of the genetic alteration.
Mass Health Examinations
Several other activities are similar to screening but differ in one or more critical respects. Population or epidemiologic surveys aim to elucidate the natural history, prevalence, incidence, and duration of health conditions in defined populations.9 The purpose of these surveys is to gain new knowledge regarding the distribution and determinants of diseases in carefully selected populations. Thus, they are not considered screening because they imply no immediate health benefits to the participants.10
Epidemiologic surveillance aims at the protection of community health through case detection and intervention (e.g., tuberculosis control).11 It refers to the continuous observation of the trends and distribution of disease incidence in a community or other population over time to prevent disease or injury.12 Sources of data for surveillance include morbidity and mortality reports, for example, those reported by the Centers for Disease Control and Prevention. Around the early 1990s, surveillance activities detected an increase in tuberculosis in the United States as well as an increase in measles cases; subsequently, the latter disease was brought under control by stepped-up immunization of children. Surveillance programs are used for detection and control of conditions ranging from infectious diseases to injuries to chronic diseases.
Case finding, also referred to as opportunistic screening, is the utilization of screening tests for detection of conditions unrelated to the patient’s chief complaint.5,13 An example would be administration of a screening for colon cancer to a patient who came to a physician complaining of pharyngitis.
Appropriate Situations for Screening Tests and Programs
A number of criteria must be considered carefully before a decision is made to implement a screening program.9 Although the ideal situation is one in which all criteria are satisfied, numerous examples can be cited to illustrate how screening programs that violate one or more of these issues can still be extremely valuable (Exhibit 11–2).
EXHIBIT 11–2 Appropriate Situations for Screening
Social: The health problem should be important for the individual and the community. Diagnostic follow-up and intervention should be available to all who require them. There should be a favorable cost-benefit ratio. Public acceptance must be high.
Scientific: The natural history of the condition should be adequately understood. Identification should occur during prepathogenesis with sufficient lead time (see text for definition of lead time). There is sound case definition in addition to a policy regarding whom to treat as patients. A knowledge base exists for the efficacy of prevention and the occurrence of side effects. The prevalence of the disease or condition is high.
Ethical: The provider initiates the service and, therefore, should have evidence that the program can alter the natural history of the condition in a significant proportion of those screened. Suitable, acceptable tests for screening and diagnosis of the condition as well as acceptable, effective methods of prevention are available.
Source: Data are from Wilson JMG, Jungner F. Principles and practice of screening for disease, Public Health Papers, No. 34, World Health Organization, 1968; and from Cochrane AL, Holland WW. Validation of screening procedures. British Medical Bulletin, Vol 27, pp. 3–8, Churchill Livingstone; 1971.
Social
Of major importance is the magnitude of the health problem for which screening is being considered. Magnitude is relevant in a number of dimensions: to the community, in terms of economics, and medically. From the community perspective, the disease or outcome must be viewed as a major health problem. This means that there is general consensus that the health problem is of sufficiently high priority as to justify the commitment of resources to implement and carry out the program. Furthermore, acceptance of the program by the public must be high. For example, an effective screening test for a major health problem will not necessarily result in an effective screening program if the public refuses to participate.
Although tempted to do so, one must not automatically assume that screening programs are beneficial. To be successful over the long run early detection efforts must be cost-effective. Thus, one must consider the costs of the test itself, the costs of follow-up examinations, and the costs of treatments avoided. The most clear-cut evidence of cost-effectiveness manifests itself when the cost of the program itself is more than offset by the savings of more expensive treatment that would have been necessary had the condition advanced to a more serious stage. Oftentimes this may not be the case, however, and one must consider as benefits improvements in quality of life and the value of years of life saved. Negative costs should be considered also: There are emotional costs to healthy individuals who are falsely labeled as ill by a screening test and emotional costs to individuals (and their loved ones) who are diagnosed early and yet die quickly anyway.
An obvious determinant of the cost–benefit ratio of a screening program is the current cost to the medical community in the absence of screening. How much money is being spent to treat individuals with the disease? How many hospital beds are being utilized? What is the number of health personnel assigned to the problem? Diseases and conditions that are costly to treat may still be considered for early detection even if the scientific justification for screening is weaker than for a disease that represents less of a medical burden.
Scientific
Early detection efforts are most likely to be successful when the natural history of the disease is known. This knowledge permits identification of early stages of disease and appropriate biologic markers of progression. For example, it is known that individuals with high cholesterol and high blood pressure are at increased risk for coronary heart disease. Because these risk factors precede onset of an acute myocardial infarction, identification of such high-risk individuals may lead to medical intervention (changes in diet, exercise, weight loss, or use of drugs) to prevent the disease. This example illustrates that there also should be good tests (screening and diagnostic) to measure blood pressure and blood cholesterol and that effective treatment should be available.
Ethical
It is most desirable to implement screening programs for diseases that—when diagnosed early—have their natural history altered, that is, for which effective treatment is available. Note, however, that screening is sometimes done for diseases for which effective treatment is not available. For example, we are yet without a cure for infection with the human immunodeficiency virus. Screening is nonetheless important to prevent spread of the disease from infected to uninfected individuals and to improve the prognosis of those who may be affected by initiating appropriate treatments. For those diseases for which effective treatments are available, it is important to consider the capacity of the medical community to handle the increased number of individuals requiring definitive diagnoses. Suppose a volunteer organization decides to offer a free health screening for high cholesterol at the local community center and that 10,000 citizens attend. Suppose further that 1,000 citizens are found to have high cholesterol. These individuals are mailed a letter informing them of their results with the suggestion to see their physician for further evaluation. A number of ethical issues can be envisioned. What if physicians in the local medical community are unable to accommodate the sudden increased demand for their services? What if these individuals lack medical insurance and have no physician?
Characteristics of a Good Screening Test
There are five attributes of a good screening test: simple, rapid, inexpensive, safe, and acceptable9,10,14:
1. Simple: The test should be easy to learn and perform. One that can be administered by nonphysician medical personnel will necessarily cost less than one that requires years of medical training.
2. Rapid: The test should not take long to administer, and the results should be available soon. The amount of time required to screen an individual is directly related to the success of the program: If a screening test requires only 5 minutes out of a person’s schedule, it is likely to be perceived as being more valuable than one that requires an hour or more. Furthermore, immediate feedback is better than a test in which results may not be available for weeks or months. Results of a blood pressure screening are usually known immediately; results of a screen for high cholesterol must await laboratory analysis. Fortunately, much progress is being made in the development of rapid screening tests for many conditions.
3. Inexpensive: As discussed earlier, the cost–benefit ratio is an important criterion to consider in the evaluation of screening programs. The lower the cost of a screening test, the more likely it is that the overall program will be cost beneficial.
4. Safe: The screening test should not carry potential harm to screenees.
5. Acceptable: The test should be acceptable to the target group. An effective protocol has been developed to screen for testicular cancer, but acceptance rates among men have not been as high as for a similar procedure, mammography, among women.
Evaluation of Screening Tests
Recall that the purpose of a screening test is to classify individuals as to whether they are likely to have disease or be disease-free. To do this classification, a measuring instrument or combination of instruments is required. Examples of such instruments are clinical laboratory tests, a fever thermometer, weighing scales, and standardized questionnaires. The preceding section made no mention of the important issue of how well the screening test should actually work. This complex subject requires the introduction of several new concepts. The first and second of these concepts are reliability and validity.
Reliability
Reliability, also known as precision, is the ability of a measuring instrument to give consistent results on repeated trials. According to Morrison, reliability of a test refers to “its capacity to give the same result—positive or negative, whether correct or incorrect—on repeated application in a person with a given level of disease. Reliability depends on the variability in the manifestation on which the test is based (e.g., short-term fluctuation in blood pressure), and on the variability in the method of measurement and the skill with which it is made.”1(p 10)
Repeated measurement reliability refers to the degree of consistency between or among repeated measurements of the same individual on more than one occasion. For example, if one were to measure the height of an adult at different times, one would expect to observe similar results. That is because, in part, one’s true value of height is relatively constant (although we are actually slightly shorter at the end of the day than we were at the beginning!). There also might be slight errors in measurement from one occasion to another, however; some measurements overestimate and others underestimate the true value. Although one might expect to measure height reliably, other measures, such as blood pressure, may be much more unreliable than height. Technicians’ skills in the measurement of blood pressure, slight variations in the calibration of the manometer cuff, and variability in subjects’ true blood pressure levels from one occasion to another all affect the reliability of blood pressure measurements.
Internal consistency reliability evaluates the degree of agreement or homogeneity within a questionnaire measure of an attitude, personal characteristic, or psychological attribute. For example, a researcher may be interested in the relationship between general anxiety level and peptic ulcer. A multi-item paper-and-pencil measure for general anxiety may be utilized in the research. The Kuder–Richardson reliability coefficient measures the internal consistency reliability of this type of measure.15 It is based on the average intercorrelation of a set of items in a multi-item index. Chronbach’s α coefficient is used also to measure internal consistency reliability; a value of 0.7 or greater is generally accepted as satisfactory reliability and suggests that a set of items is measuring a common dimension.16 These two reliability measures are particularly applicable to epidemiologic research that uses survey measures, such as interviews or self-report questionnaires.
Interjudge reliability refers to reliability assessments derived from agreement among trained experts. The ratings of psychiatrists in psychiatric research, for example, may be used to measure an individual’s degree of psychiatric