LECTURE NOTES
EPIDEMIOLOGY
MADE EASY
Anthony K Mbonye (PhD, FRCP)
LECTURE NOTES
Epidemiology-Made Easy
Anthony K Mbonye (PhD, FRCP)
Professor, School of Public Health, College of Health Sciences,
Makerere University & Professor, Department of Maternal and Child
Health, Save The Mothers Programme, Uganda Christian University.
Epidemiology-Made Easy
1
Published by
Anthony K. Mbonye
Tel: +256-772411668
E-mail: akmbonye@musph.ac.ug
akmbonye@yahoo.com
P. O. Box 11853 Kampala, Uganda.
© Mbonye K. Anthony 2021
ISBN : 978-9970-9922-7-0
First Edition 2021
All rights reserved. No part of this publication may be copied or reproduced in any
form or by any means electronic, photocopying, or otherwise without the prior
written permission of the publisher.
This book can be cited as follows:
Mbonye AK. Lecture Notes on Epidemiology-Made Easy.
P.O Box 11853, Kampala Uganda, 2021.
Typesetting and production by Print Farm FZE, Dubai.
sales@printfarmdxb.com
2
Anthony K Mbonye
Table of Contents
Table of Contents ...........................................................................................................................03
Preface.................................................................................................................................................. 04
Lecture Series One: Epidemiology-definitions and Concepts................................... 05
Lecture Series Two: Epidemiology and Research.......................................................... 13
Lecture Series Three: Analytical Epidemiology ........................................................... 21
Lecture Series Four: Experimental Epidemiology......................................................... 35
Lecture Series Five: Measurement and Reporting of Outcomes ............................ 45
Lecture Series Six: Practical Steps in investigating a disease outbreak.............. 68
Lecture Series Seven: Criteria for judging a good research report ...................... 96
Epidemiology-Made Easy
3
Preface
This Lecture Series book was prompted by the need for a clear, well
organized and structured way for readers, especially students, to
understand epidemiology and how to apply it in disease prevention
and control. Through my experience teaching undergraduate and
postgraduate studies, I found that many students studying epidemiology
and biostatistics lack strong foundations in mathematics and statistics.
Thus, for a long time I have been interested in making epidemiology
easier and more accessible to such an audience, by approaching it from
the perspective of daily experiences, where people encounter diseases
and health issues. This book is meant to be a concise guide for nurses,
medical and paramedical students, policy makers, program managers,
and social workers.
This lecture series was developed out of my experience as a lecturer,
health professional and a policy maker, who has contributed to the
control of infectious and non-infectious diseases for over two decades.
The lecture notes are divided into two sections: the theoretical and
the practical sessions. The practical component includes questions and
activity sections designed to facilitate active participation and acquisition
of practical, analytical and critical thinking skills in disease control and
prevention. The exercises also expose the reader to real life situations,
preparing them for what lies ahead.
I hope this book gives the readers adequate knowledge and skills to
confront the high disease burden in Uganda.
4
Anthony K Mbonye
Lecture Notes - Series One
Epidemiology
Definitions and Concepts
Lecture Outline
1.
2.
3.
4.
5.
Definitions and concepts of Epidemiology
Descriptive studies
Case reports
Cross-sectional studies
Surveillance
Expectations
After reading this Lecture Series, it is expected that you should clearly
understand the definitions and concepts of basic epidemiology and the
different types of descriptive studies. You will be introduced to a practical
session that you are encouraged to do. Through this learning, you should
master how to interpret research reports based on basic epidemiology
techniques and how to use these in public health and control of infections.
1.1 What is Epidemiology?
Epidemiology has been defined as ‘the study of the distribution and
determinants of health related states or events in specific populations,
and the application of these data in the control of health problems’ (Last,
1988). This definition emphasizes that epidemiologists are concerned
not only with death, illness and disability, but also with more positive
health states and with the means to improve health.
The target of a study in epidemiology is a human population. A population
can be defined in geographical or other terms; for example, a specific
group of hospital patients or factory workers could be the unit of study.
The most common population used in epidemiology is that which exists
in a given area or country at a given time. This forms the basis for
defining subgroups with respect to sex, age group, ethnicity, and so
on. The structures of populations vary between geographical areas and
Epidemiology-Made Easy
5
time periods. Epidemiological analysis has to take such variations into
account.
1.2 Concepts of Epidemiology
Epidemiology has its origins in ideas first recorded over 2,000 years ago
by Hippocrates and other prominent thinkers of antiquity, who realised
that environmental factors can influence the occurrence of disease.
However, it was not until the nineteenth century that the distribution of
disease in specific human population groups was measured to any great
extent. This work marked not only the beginning of epidemiology, but
also some of its most spectacular achievements.
For example, the findings of John Snow that the risk of cholera in
London was related, among other things, to the drinking of water
supplied by a particular company. Snow’s epidemiological studies were
one aspect of a wide-ranging series of investigations that involved an
examination of physical, chemical, biological, sociological, and political
processes (Cameron and Jones 1983).
Snow located the home of each person who died from cholera in London
during 1848–49 and 1853–54, and noted an apparent association
between the source of drinking water and the deaths. He prepared a
statistical comparison of cholera deaths in districts with different water
supplies, and thereby showed that both the number of deaths and, more
importantly, the mortality rate were high among people supplied by
the Southwark company. On the basis of his meticulous research, Snow
constructed a theory about the communication of infectious diseases
in general, and suggested that cholera was spread by contaminated
water. He was thus able to encourage improvements in the water supply
long before the discovery of the organism responsible for cholera; his
research had a direct impact on public health policy.
1.3 Uses of Epidemiology
In the broad field of public health, epidemiology is used in a number
of ways. Early studies in epidemiology were concerned with the causes
(aetiology) of communicable diseases and such work remains, essential
since it can lead to the identification of prevention methods. In this
sense, epidemiology is a basic medical science with the goal of improving
the health of a population.
6
Anthony K Mbonye
The causation of some diseases can be linked exclusively to genetic
factors, as with sickle cell disease, but is more commonly the result
of an interaction between genetic and environmental factors. In this
context, environment is defined broadly to include any biological,
chemical, physical, psychological or other factors that can affect health.
Behaviour and lifestyle are of great importance in this connection and
epidemiology is increasingly used to study both their influence and
preventive intervention through health promotion.
Epidemiology is also concerned with the course and outcomes (natural
history) of diseases in individuals and groups. The application of
epidemiological principles and methods to problems encountered in the
practice of medicine with individual patients has led to the development
of clinical medicine.
Epidemiology is often used to describe the health status of population groups.
Knowledge of the disease burden in populations is essential for health
authorities, which seek to use limited resources to the best possible effect.
Epidemiology can be used to identify priority health programmes for
prevention and care. In some specialist areas, such as environmental
and occupational epidemiology, the emphasis is on studies of population
types of environmental exposure.
Recently, epidemiologists have become involved in evaluating the
effectiveness and efficiency of health services, by determining the
appropriate length of stay in hospital for specific conditions; the value
of treating high blood pressure; the efficiency of sanitation measures to
control diarrhoeal diseases; and the impact on public health of reducing
lead additives in petrol, and so on.
1.4 Descriptive Epidemiolocal Studies
The Five ‘W’ Questions
Traditional descriptive epidemiology has focused on several features:
person, place, time, agent, host, and environment. An alternative
approach is that of newspaper coverage. Good descriptive research,
should answer five basic ‘W’ questions – who, what, when, why and
where – and an implicit sixth question, so what?
8
Anthony K Mbonye
Who has the disease in question? Age and sex are usually described, but
other characteristic might be important too, including race, occupation,
or recreational activities. The risk of venous thromboembolism for
example, increases exponentially with age. Only 1% of breast cancers
occur in men, but a family history of breast cancer increases their risk.
Commercial fishing remains a risky business and having fun with an
all-terrain vehicle or snow mobile, especially when drunk, can be lethal.
What is the condition or disease being studied? Development of a clear,
specific, and measurable case definition is an essential step in description
epidemiology. Without such a description, the reader cannot interpret
the report. Generally, stringent criteria for case definitions are desirable.
In the early of HIV/AIDS, expanding the case definition of AIDS yielded
a sudden surge in new cases.
Why did the condition or disease arise? Descriptive studies often provide
clues about cause that can be pursued with more sophisticated research
designs. When is the condition common or rare? Time provides
important clues about health events. The prototype might be the
outbreak of gastroenteritis soon after ingestion of staphylococcal toxin.
Some temporal relations can be long – e.g., vaginal adenosis and clear cell
carcinoma of the vagina appeared years after that intrauterine exposure
to diethylstilboestrol. Furthermore, cervical and other epithelial cancers
develop decades after infection with human papillomavirus, and births
and deaths from pneumonia and influenza have regular seasonal patterns
as might sperm counts.
1.5 Types of Descriptive Epidemiological Studies
Descriptive studies consist of two major groups: those that deal with
individuals and those that relate to populations. Studies that involve
individuals are case reports, case-series reports, cross-sectional studies,
and surveillance; whereas ecological correlation studies examine
populations.
1.5.1 Case Report
The case report is the least publishable unit in the medical literature.
Often, an observant clinician reports an unusual disease or association,
which prompts further investigations with more rigorous study designs.
Epidemiology-Made Easy
9
For example, a clinician, among others, reported benign hepatocellular
adenomas, a rare tumour in women who had taken oral contraceptives.
A large case-control study pursed this lead and confirmed a strong
association between long-term use of high dose and this rare, but
sometimes deadly tumour. However, not all case reports deal with
serious health threats.
1.5.2 Case-Series Report
A case-series aggregates individual cases in one report. Sometimes the
appearances of several similar cases in a short period heralds an epidemic.
For example, a cluster of homosexual men in Los Angeles with a similar
clinical syndrome alerted the medical community to the AIDS epidemic
in North America. Whereas a report of a single unusual case might not
trigger further investigation, a case-series of several unusual cases (in
excess of what might be expected) adds to the concern. A convenient
feature of case-series reports is that they can constitute the case group
for a case-control study, which can then explore the causes of a disease.
1.5.3 Cross-sectional (Prevalence) Studies
Prevalence studies describe the health of populations. For example, in
the Uganda, periodic surveys of the health status of the populations are
done by the government – e.g., the Demographic Health Survey and the
Uganda Population-based HIV Impact Survey (UPHIA). These studies
provide a snapshot of the population at a particular time.
Prevalence studies can be done in smaller populations as well. For
example, the results of a survey done in a Puerto Rican pharmaceutical
factory indicated an exceptionally high prevalence of gynaecomastia
among employees. This finding led to the hypothesis that exposure
to ambient oestrogen dust in the plant might be the cause; serum
concentrations of oestrogen lent support to the hypothesis. After
improvements in dust control in the factory, the epidemic disappeared.
Similar prevalence studies have linked gynaecomastia with feeding of
refugees and tainted food.
Since both exposure and outcome are ascertained at the same time (the
defining feature of a cross-sectional study), costs are small and loss to follow
up is not a problem. However, because exposure and outcome are identified
10
Anthony K Mbonye
at one-time point, the temporal sequence is often impossible to work out.
1.5.4 Surveillance
Surveillance is another important type of descriptive study. Surveillance
can be thought of as watchfulness over a community. A more
formal definition is ‘the ongoing systematic collection, analysis and
interpretation of health data essential to the planning, implementation,
and evaluation of public health practice, closely integrated with the timely
dissemination of these data to those who need to know. Prevention and
control of the problem are fundamental parts of the feedback loop.
Surveillance can be either active or passive. Passive surveillance relies
on data generally gathered through traditional channels, such as
death certificates. By contrast, active surveillance searches for cases.
The reporting of abortion-related deaths provides an example. By
comparison with official statistics, active surveillance identifies about
twice as many deaths. Similarly, underreporting of maternal deaths
remains an international problem.
Epidemiological surveillance has made important contributions
to health, but none more impressive than smallpox eradication.
Surveillance and containment were responsible for the elimination of
smallpox from the world, an extraordinary public-health achievement.
Whereas mass immunisation of the world’s population had failed,
the approach of identification of cases through surveillance and then
immunisation of susceptible persons in the surrounding communities
stopped transmission. Without a non-human vector, the virus died out.
Practical Session:
1. A new laboratory test has been designed to test Hepatitis B. It is
cheap (0.5 $) and can easily be afforded by citizens in developing
countries like Uganda. However, it needs to be evaluated against
the Gold-Standard, the PCR test. Describe two epidemiological
parameters you will use to assess the performance of the new test.
2. List two uses of epidemiology and discuss how you can use
epidemiology to improve the health of children in your community.
Epidemiology-Made Easy
11
Bibliography
1. Beaglehole R, Bonita R, Kjellström T. Basic epidemiology. Geneva:
World Health Organization; 1993 Jan.
2. Cameron D, Jones IG. ‘John Snow, the Broad Street pump and
modern epidemiology. International Journal of Epidemiology. 1983
Jan 1;12(4):393-6.
3. Grimes DA, Schulz KF. ‘An overview of clinical research: the lay of
the land’. The Lancet. 2002 Jan 5;359(9300):57-61.
4. Grimes DA, Schulz KF. ‘Cohort studies: Matching towards
outcomes’. The Lancet. 2002 Jan 262;359(9304):341-45.
5. Last JM. ‘What is” clinical epidemiology?”’. Journal of Public Health
Policy. 1988 Jul 1;9(2):159-63.
6. Schulz KF, Grimes DA. ‘Descriptive studies: What they can and
cannot do’. The Lancet. 2002 Jan 12;359(9304):145-9.
7. Schulz KF, Grimes DA. ‘Case-Control studies: Research in Reverse.’
The Lancet. 2002 Feb 2;3 59(9304):431-34.
12
Anthony K Mbonye
Lecture Notes-Series Two
Epidemiology and Research
Lecture Outline
1.
2.
3.
4.
5.
6.
7.
8.
Classification of research
What studies can and cannot do
Cross-Section Studies
Cohort Study
Case Control Study
Non-Randomised Trials
Randomised studies
Areas for further research
Expectations
After reading this Lecture Series, it is expected that you should clearly
understand the different types of research design, as well as what type of
research design is applied where and when. Through this learning, you
should master how to interpret research study findings and learn how
to rapidly identify key points and insights when reading study reports.
2.1 Classification of Research
Most research can be grouped into two extensive categories: experimental
and observation research. Figure 2.1 shows that one can quickly decide
the type of research category by noting whether the investigators
assigned the exposure – e.g., treatments – or whether they observed
usual clinical practice or population behaviour and practices. For
experimental studies, one needs to distinguish whether the exposures
were assigned by a truly random technique (with concealment of the
upcoming assignment from those involved) or whether some other
allocation scheme was used, such as alternative assignment.
With observational studies, which dominate the literature, the next step
is to ascertain whether the study has a comparison or control group. If it
has a comparison or control group, the study is termed analytical. If not,
it is termed a descriptive study. If the study is analytical, the temporal
direction of the trial needs to be identified.
Epidemiology-Made Easy
13
If the study determines both exposures and outcomes at one-time point,
it is termed cross-sectional. An example would be measurement of blood
pressure of men admitted to a hospital with acute onset chest pain versus
their next door neighbours. This type of study provides a snapshot of
the population of sick and well at one-time point.
If the study begins with an exposure – e.g., condom–use and follows
men for a few years to measure outcomes, e.g., prevalence of sexually
transmitted diseases (STDs) – then it is deemed a cohort study. Cohort
studies can be either concurrent or non-concurrent.
Figure 2.1: Classification of types of clinical research
Did investigator
assign exposure?
Yes
No
Experimental study
Observational study
Random allocation?
Comparison group?
Yes
No
Randomised
controlled
trail
Exposure
Analytical
study
NonRandomised
controlled
trail
Descriptive
study
Direction?
Exposure and
outcome at
the same time
Outcome
Exposure
No
Yes
Outcome
Cohort
study
Casecontrol
study
Crosssectional
study
Source: Grimes & Schultz, 2002.
14
Anthony K Mbonye
By contrast, if the analytical study begins with an outcome – e.g.,
prevalence of STDs – and looks back in time for an exposure, such as
condom-use, then the study is a case control study.
Studies without comparison groups are called descriptive studies. At the
bottom of the research hierarchy is the case report. When more than
one patient is described, it becomes a case–series report.
2.2 What studies can and cannot do
Is the study design appropriate for the question?
Starting at the bottom of the research hierarchy, descriptive studies
are often the first foray into a new area of medicine. Investigators
do descriptive studies to describe the frequency, natural history, and
possible determinants of a condition. The results of these studies show
how many people develop a disease for a condition over time, describe
the characteristics of the disease and those affected, and generate
hypotheses about the cause of the disease. These hypotheses can be
assessed through more rigorous research, such as analytical studies or
randomised controlled trials. An example of a descriptive study would
be the early reports of hepatis B disease and yellowing of eyes syndrome.
An important caveat (often forgotten or intentional ignored) is that
descriptive studies which don’t have a comparison group, do not allow
assessment of association. Only comparative studies (both analytical and
experimental) enable assessment of possible causal associations.
2.3 Cross-Section Studies
Sometimes termed as frequency surveys or a prevalence studies, cross
sectional studies are done to examine the presence or absence of an
exposure at a particular time. Thus, prevalence is the focus. Since both
outcome and exposure are ascertained at the same time, the temporal
relation between the two might be unclear. For example, assume
that a cross sectional study finds obesity to be more common among
women with arthritis compared to those without arthritis. Did the
extra weight load on joints lead to arthritis or did women with arthritis
become involuntarily inactive and then obese? This type of question is
unanswerable in a cross sectional study.
Epidemiology-Made Easy
15
2.4 Cohort Studies
Cohort studies proceed in a logical sequence: from exposure to outcome.
Hence, this type of research is easier to understand than case-control
studies. Investigators identify a group with an exposure of interest and
another group or groups without the exposure. The investigators then
follow the exposed and unexposed groups forward in time to determine
outcomes. If the exposed groups develop a higher incidence of the
outcome than the unexposed, then the exposed is associated with an
increased risk of the outcome.
The cohort study has important strengths and weaknesses. Because
exposure is identified at the outset, one can assume that the exposure
preceded the outcome. Recall bias is less of a concern than in the case
control study. The cohort study enables calculation of true incidences
rates, relative risks, and attributable risks. However, for the study of
rare events or events that take years to develop, this type of research
design can be slow to yield results and thus prohibitively expensive.
Nonetheless, several famous large cohort studies continue to provide
important information.
Figure 2.2: Temporal direction of three study designs:
Exposure
Exposure
Cohort Study
Case-Control Study
Outcome
Outcome
Cross-sectional Study
Exposure
Outcome
Time
Source: Grimes & Schultz, 2002
16
Anthony K Mbonye
2.5 Case Control Studies
Case-control studies work backwards. Because thinking in this direction
is not intuitive for clinicians, case-control studies are often widely
misunderstood. Starting with an outcome, such as a disease, this type of
study looks backward in time for exposures that might have caused the
outcome. As shown in figure 2.2, investigators define a group with an
outcome (for example, ovarian cancer) and a group without an outcome
(controls). Then, through chart reviews, interviews, or other means,
the investigators ascertain the prevalence (or amount) of exposure to a
risk factor – e.g., oral contraceptives, ovulation induction drugs in both
groups. If the prevalence of the exposure is higher among cases than
among controls, then the exposure is associated with an increased risk
of the outcome.
Case-control studies are especially useful for outcomes that are rare
or that take a long time to develop, such as cardiovascular disease
and cancer. These studies often require less time, effort and money
than would cohort studies. The challenge with case-control studies
is choosing an appropriate control group. Controls should be similar
to cases in all important respects excepts for not having the outcome
in question. Inappropriate control groups have ruined many case
control studies and caused much harm. Additionally, recall bias (better
recollection of exposures among the cases than among the controls)
is a persistent difficultly in studies that rely on memory. Because the
case control study lacks denominators, investigators cannot calculate
incidence rates, relative risks or attributable risks. Instead, odds ratios
are the measure of association used; when the outcome is uncommon
– e.g., most cancers – the odds ratio provide a good proxy for the true
relative risk.
Outbreaks of food borne diseases act as a good prototype for
demonstrating the value of case-control studies. Those with vomiting
and diarrhoea are asked about food exposures, as are a sample of those
not ill. If a higher proportion of those ill report having eaten a food
than those well, the food becomes suspect. In this way, German potato
salad on a ship was linked with a serious outbreak of shigella resistant to
several antibiotics.
Epidemiology-Made Easy
17
2.6 Non-Randomised Trials
Some experimental trials do not randomly allocate participants to
exposures – e.g., treatments or prevention strategies. Instead of using
truly random techniques, investigators often use methods that fall
short of the mark – e.g., alternate assignment. The US Preventive
Services Task Force and Canadian Task Force on the Periodic Health
Examination designate this research design as class II-1, indicating less
scientific rigour than randomised trials but more than analytical studies.
After the investigators have assigned participants to treatment groups,
the way a non-randomised trial is done and analysed resembles that
of a cohort study. The exposed and unexposed are followed forward
in time to ascertain the frequency of outcomes. Advantages of a nonrandomised trial include use of a concurrent control group and uniform
ascertainment of outcomes for both groups. However, selection bias can
occur.
2.7 Randomised Controlled Trails
The randomised controlled trial is the only known way to avoid selection
and confounding biases in clinical research. This design approximates
the controlled experiment of basic science. It also resembles the cohort
study in several respects, with the important exception of randomisation
of participants to exposures (figure 2.2).
The hallmark of randomised controlled trials is assignment of
participants to exposures purely by the play of chance. Randomised
controlled trials reduce the likelihood of bias determining outcomes.
When properly implemented, random allocation precludes selection
bias. Trials feature uniform diagnostic criteria for outcomes and often
blinding those involved to the exposure each participant is receiving,
reduces information bias. A unique strength of this study design is that it
eliminates confounding bias, both known and unknown. Furthermore,
the trial tends to be statistically efficient. If properly designed and
done, a randomised controlled trial is likely to be free of bias and is
thus especially useful for examination of small or moderate effects. In
observational studies, bias might easily account for small to moderate
differences.
18
Anthony K Mbonye
Randomised controlled trials have drawbacks as well, however. External
validity is one. Whereas the randomised controlled trial, if properly
done, has internal validity – i.e., it measures what it sets out to measure
– it might not have external validity. This term indicates the extent
to which results can be generalised to the broader community. Unlike
the observational study, the randomised controlled trial includes only
volunteers who pass through a screening process before inclusion.
Those who volunteer for trials tend to be different from those who do
not; for example, their health might be better. Another limitation is that
a randomised controlled trial cannot be used in some instances, since
intentional exposure to harmful substances – e.g., toxins, bacteria, or
other noxious exposures – would be unethical. As with cohort studies,
the randomised controlled trial can be prohibitively expensive. Indeed,
the cost of large trials runs into the tens of millions of US dollars.
Questions to stimulate further reading:
1. Uganda reported an outbreak of COVID-19 in March
2020. Several interventions have been implemented to stop
the spread of the disease. These include lockdowns, hand
washing and wearing masks. Other interventions have
been implemented at the level of health facilities to improve
treatment of COVID-19 patients.
a) What study design would you implement to assess the level
knowledge of the population on COVID-19 prevention? And
why?
b) What study design would you implement to study
perceptions, opinions and behavioural practices towards
COVID-19 prevention?
2. Pfizer has developed a vaccine against COVID-19, what study
design would you implement to evaluate the efficacy of the
vaccine?
3. Discuss the merits and limitations of the study designs in 1
and 2 above.
Epidemiology-Made Easy
19
Practical Session:
1. A disease outbreak has been reported in Entebbe town. Five patients
with a dry cough, fever, sore throat, hoarse voice and lack of smell
with a history of having recently returned from travel abroad, and
they have reported to health units around Entebbe town in the last
2 days. Three of the patients were having difficulty in breathing and
needed to be given oxygen at Entebbe Grade B hospital.
a) Discuss three steps in your immediate plan to investigate the disease outbreak.
b) What study design is appropriate at this moment?
c) What further investigations are you likely to carry out?
2. Two students, a masters and PhD, were designing studies as part
of their degree programs. The master’s student wanted to find out
whether young girls exposed to contraceptives were at less risk
of having unwanted pregnancies and abortions. The PhD student
wanted to assess the effect of long term contraceptives use on the
risk of ovarian cancer.
a) Discuss with examples, three types of study designs the masters
student could use for the study.
b) Discuss with examples, two study designs the PhD student
could use to test his/her hypothesis (s).
Bibliography
1. Beaglehole R, Bonita R, Kjellström T. Basic epidemiology. Geneva:
World Health Organization; 1993 Jan.
2. Grimes DA, Schulz KF. ‘An overview of clinical research: the lay of
the land’. The Lancet. 2002 Jan 5;359(9300):57-61.
3. Grimes DA, Schulz KF. ‘Bias and causal associations in observational
research’. The Lancet. 2002 Jan 19;359(9302):248-52.
4. Schulz KF, Grimes DA. ‘Descriptive studies: What they can and
cannot do’. 2002 Jan 12;359(9304):145-9.
20
Anthony K Mbonye
Lecture Notes-Series Three
Analytical Epidemiology
Lecture Outline
1. Case Control Studies
2. Cohort Control Studies
Expectations
After reading this Lecture Series, it is expected that you should clearly
understand what is analytical epidemiology and the different types of
analytical studies. You will be introduced to a practical session that you
are encouraged to do. Through this learning, you should master how to
interpret research reports based on analytical epidemiology and how to
use these in public health and control of infections.
3.1 Case-Control Studies
Case-control studies contribute greatly to the research toolbox of
an epidemiologist. They embody the strengths and weaknesses of
observational epidemiology. Moreover, epidemiologists use them to
study a huge variety of associations.
The strength of case-control studies can be appreciated in early research
done by investigators hoping to understand the cause of AIDS. Casecontrol studies identified risk groups – e.g., homosexual men, intravenous
drug users, and blood transfusion recipients – and risk factors – e.g.,
multiple sex partners, receptive anal intercourse in homosexual men,
and not using condoms. Based on such studies, blood banks restricted
high risk individuals from donating blood and educational programmes
began to promote safer behaviours. As a result of these precautions, the
rate of HIV-1 transmission was greatly reduced, even before the virus
had been identified.
By comparison with other study designs, case-control studies can yield
important findings in a relatively short time, and with relatively little
money and effort deployed. This apparently quick road to research
results entices many newly trained epidemiologists. However, caseEpidemiology-Made Easy
21
control studies tend to be more susceptible to biases compared to other
analytical, epidemiological designs. Rothman et al (2008) comments
that: ‘because it need not be extremely expensive nor time consuming
to conduct a case-control study, many studies have been conducted by
would be investigators who lack even a rudimentary appreciation for
epidemiologic principles. Occasionally such haphazard research can
produce fruitful or even extremely important results, but often the
results are wrong because basic research principles have been violated.’
3.1.2 A Case-Control study design
Case-control study designs might seem easy to understand, but many
clinicians stumble over them. Because this type of study runs backwards
by comparison with most other studies, it often confuses researchers
and readers alike. In cohort studies, for example, study groups are
defined by exposure. In case-control studies, however, study groups are
defined by outcome. To study the association between smoking and
lung cancer, therefore people with lung cancer are enrolled to form the
case group, and people without lung cancer are identified as controls.
Researchers then look back in time to ascertain each person’s exposure
status (smoking history), hence the retrospective nature of this study
design. Investigators compare the frequency of smoking exposure in the
Figure 3.1: Classification of types of clinical research
Case control study design
Past or Present
Exposure
Yes
Exposure
No
Exposure
Exposure
Yes
No
Present
Population with
outcome (cases)
With outcome
Sample
of cases
Population
without outcome
(controls)
No outcome
Sample
controls
Time
Source: Shultz & Grimes, 2002
22
Anthony K Mbonye
case group with that in the control group, and calculate a measure of
association.
Unlike cohort studies, case-control studies cannot yield incidence rates.
Instead, they provide an odds ratio, derived from the proportion of
individuals exposed in each of the case and control groups. When the
incidence rate of a particular outcome in the population of interest is
low (usually under 5% in both the exposed and unexposed suffices) the
odds ratio from a case-control study is a good estimate of relative risk.
3.1.3 Advantages and Disadvantages of case-control designs
Epidemiologists often tout case-control studies as the most efficient
design in terms of time, money, and effort. This assertion makes sense
when the incidence rate of an outcome is low, since in a cohort design
the researchers would have to follow up many individuals to identify
one with the outcome. Case-control studies are also efficient in the
investigation of diseases that have a long latency period – e.g., cancer –
in which instance a cohort study would involve many years of follow-up
before the outcome became evident.
Finally, many methodological issues affect the validity of the results of
case-control studies, and two factors – i.e., choosing a control group and
obtaining exposure history – can greatly affect a study’s vulnerability to bias.
Selection of case and control groups:
3.1.4 Case Group
All the cases from a population could theoretically be included as
participants in a case-control study. For practical reasons, however, only
a sample is frequently studied. Investigators should, therefore, state how
the sample was selected, providing a clear definition of the outcome being
studied including, for example, clinical symptoms, laboratory results,
and diagnostic methods used. Furthermore, researchers should detail
eligibility criteria used for selection, such as age range and location (clinic,
hospital, population-based). Finally, they should gather data, preferably
from incident (new) rather than prevalence (both old and new) cases,
since diagnostic patterns change overtime, recent diagnoses are likely to
be more consistent than those obtained from different periods.
Epidemiology-Made Easy
23
3.1.5 Control Group
Controls should be free of the disease (outcome) being studied, but
should be representative of those individuals who would have been
selected as cases had they been the population at risk of becoming cases.
Selection of controls must be independent of the exposure being
investigated. When investigators consider potential control groups,
they must anticipate all the potential biases that could arise, making this
task one of the hardest in epidemiology.
Suppose investigators selected individuals with myocardial infarction
from the cardiology ward of a large city hospital that serves the entire
city as cases, but identified people without infarction from the emergency
medicine ward that serves the city. Unfortunately, the exposure history for
patients from the city would not usually accurately reflect that of patients
statewide. For example, the exposure of interest – e.g., a new blood
pressure drug – might not be available to patients in outlying areas of the
state but be commonly prescribed in the city. In this example, therefore,
either the controls should be chosen from the entire state, like the cases,
or the investigators should exclude all individuals who lived outside the
local community served by the emergency medicine ward. Moreover,
controls should be selected independent of exposure. Assume that this new
antihypertensive drug causes drowsiness and slows reaction time. Such side
effects might lead to automobile accidents, with injured drivers entering the
emergency medicine department. Thus, the investigator’s control group
would include an abnormally high proportion of individuals exposed to the
new antihypertensive, a biased comparison with the case group.
3.2 Cohort Control Studies
The term cohort has military, not medical, roots. A cohort was a 300600 man unit in the Roman army; ten cohorts formed a legion. Thus a
cohort study consists of bands or groups of persons marching forward
in time from an exposure to one or more outcomes.
This analogy might be helpful, since cohort studies have confusing
synonyms: incidences, longitudinal, forward looking, follow up,
concurrent, and prospective study. Although the terminology can seem
daunting, the cohort study is easy for clinicians to understand, since it
flows in a logical direction (unlike the case-control study).
24
Anthony K Mbonye
3.2.1 Data Collection
A cohort study follows up two or more groups from exposure to
outcome. In this simplest form, a cohort study compares the experiences
of a group exposed to some factor with another group not exposed to
the factor. If the former group has a higher or lower frequency of an
outcome than the unexposed, then an association between exposure and
outcome is evident.
The defining characteristic of all cohort studies is that they track people
forward in time from exposure to outcome. Researchers doing this kind
of study must, therefore, go forward in time from the present to choose
their cohorts. Either way, a cohort study moves in the same direction,
although gathering data might not. For example, an investigator who
wants to study the epidemic of multiple births stemming from assisted
reproductive technologies could begin a cohort study now. Women
exposed to these technologies and a similar group who conceived
naturally, could be tracked forward through their pregnancies to
monitor the frequency of multiple births (a concurrent cohort study).
Alternatively, the investigator might use existing medical records and go
back in time several years to identify women exposed and not exposed
to these technologies. The investigator would then track them forward
through records to note the birth outcomes. Again, the study moves
from exposure to outcome, though the data collection occurred after.
3.2.2 Advantages of Cohort Studies
Cohort studies have many appealing features. They are the best way
to ascertain both the incidence and natural history of a disease. The
temporal sequence between putative cause and outcome is usually clear:
the exposed and unexposed can often be seen to be free of the outcome
at the onset. By contrast, this chicken-and-egg question often frustrates
cross-sectional and case-control studies. For example, in a case-control
study, patients with chronic widespread pain were more likely to have
mental illness than controls. As such, do mood and anxiety disorders
increase this risk, or do patients with chronic pain develop mood and
anxiety disorders as a result of their disorder?
Cohort studies are useful for investigating multiple outcomes that might
arise after a single exposure. An illustrative case would be cigarette
Epidemiology-Made Easy
25
smoking (the exposure) and stroke, emphysema, oral cancer and heart
disease (the outcomes). Although assessment of many outcomes is
often cited as a positive attribute of cohort studies, this feature can be
abused. For example, testing the associations between exposure and
many outcomes, but only reporting the significant ones, represents
misleading science. Investigators should preferably have planned
primary and secondary associations to examine (sometimes called
hypothesis confirmation). Although investigators can look at other
outcomes (hypothesis generation), they should report the findings of
all examination, not just significant ones, so that readers can correctly
interpret the results.
The cohort design is also useful in the study of rare exposures: a
researcher can often recruit people with uncommon exposures – e.g.,
to ionising radiation or chemicals – in the workplace. A hospital or
factory might provide a large number of individuals with the exposure
of interest, which would be rare in the general population. Since the
investigator does not assign exposure, no ethical concerns arise.
Cohort studies also reduce the risk of survivor bias. Diseases that are
rapidly fatal are difficult to study because of this factor. For example, a
hospital based case control study of the link between snow–shovelling
and myocardial infarction would miss all those who died in the driveway.
A cohort study would be a less biased (but more cumbersome) approach:
compare rates of myocardial infarction among those who shovel and
those who do not shovel. Finally, cohort studies allow calculation of
incidence rates, relative risks, and confidence intervals.
Other outcome measures in cohort studies include life table rates,
survival curves, and hazard ratios. By contrast, case-control studies
cannot provide incidence when the outcome is uncommon.
3.2.3 Disadvantages of Cohort Studies
Cohort studies also have important limitations. Selection bias is in built
into cohort studies. For example, in a cohort study investigating effects
of jogging on cardiovascular disease, those who choose to jog probably
differ in other important ways (such as diet and smoking) from those
who do not exercise. In theory, both groups should be the same in all
26
Anthony K Mbonye
important respects, except for the exposure design is not optimum for
rare diseases or those that take a long time to develop – e.g., cancer.
However, several large (and thus expensive) cohort studies have made
landmark contributions to our knowledge of uncommon diseases. Loss
of follow up can be difficult, even at 1 month, and particularly so with
longitudinal studies that continue for decades. Furthermore, differential
losses to follow up between those exposed and unexposed can bias
results. Over time, the exposure status of study participants can change,
for example the proportion of women who use oral contraceptives
will switch to an intrauterine devices, and vice-verse. In such events,
partitioning might be needed to avoid a blurring of exposure, sometimes
termed contamination.
3.2.4 What to Look for with Cohort Studies?
Who is at risk?
All participants (both exposed and unexposed) in a cohort study must be
at risk of developing the outcome. For example, since women who have
had a tubal sterilisation operation have almost no risk of salpingitis, they
should not be included in cohort studies of pelvic inflammatory disease.
Who is exposed?
Cohort studies need a clear, unambiguous definition of the exposed in the
cohort. This definition sometimes involves quantifying the exposure by
degree, rather than just yes or no. For example, the minimum exposure
might have to be 14 cigarettes per day or less or 3-6 months of oral
contraceptives. Definition of exposure levels in this way can result in
more than two groups, e.g., non-smokers, light smokers and heavy
smokers.
Who is an appropriate control?
The key notion is that controls (the unexposed) should be similar to the
exposed in all the important respects, except for the lack of exposure. If
so, the unexposed groups will reveal the background rate of the outcome
in the community.
The unexposed group can come from either internal (persons from the
same time and place, such as a hospital ward) or external sources. Internal
comparisons are most desirable. In a particular population, individual
Epidemiology-Made Easy
27
segregate by themselves (or through medical interventions) into
exposure status – e.g., cigarette smoking, occupation, contraception. For
example, in a cohort study, 138 patients with HIV-1 associated Kaposi’s
sarcoma were divided into two groups: those with oral and those with
cutaneous lesions. The presence of oral lesions (the exposure) had a
poorer prognosis, with a medical survival (the outcome) one-third that
of the other group.
If satisfactory internal controls are not available, researchers look
elsewhere (sometimes termed a double cohort study). In a trial of an
occupational exposure, finding an adequate number of employees in
the factory without the exposure might be difficult. Hence, one might
choose workers in a similar factory in the same community. This choice
assumes that workers in the other factory have the same baseline risk
of the outcome in question, which might not be the case. Even less
desirable is use of population norms: disease-specific mortality rates are
an example. A researcher might compare lung cancer death rates among
workers in the factory with rates of persons of the same age and sex in the
population. Bias inevitably creeps into such comparisons because of the
healthy worker effect: those who work are healthier, in general, than those
who do not (or cannot) work. Additionally, the desire to reap economic
benefits from a certain outcome might further bias comparisons.
Assessment of outcomes
Outcomes must be defined in advance: they should be clear, specific
and measurable. Identification of outcomes should be comparable in
every way for the exposed and unexposed to avoid bias. Failure to define
outcomes leads to uninterpretable results. Keeping those who judge
outcomes unaware of the exposure status of participants (blinding) in a
cohort study is important for subjective outcomes, such as tenderness or
erythema. By contrast, with objective outcome measures.
Outcome data can come from many sources. For mortality studies,
death certificates are often used. Although convenient, the validity of
the clinical information is highly variable. For non-fatal outcomes,
sources include hospital charts, insurance records, laboratory records,
disease registries, hospital discharge logs and physical examination,
and measurement of participants. Optimally, the person who judges
28
Anthony K Mbonye
outcomes should be unaware of the exposure. When diagnoses vary
in their confidence, assignment of levels of assurance might be helpful,
such as definite probable and suspect.
3.2.5 Tracking participants over time
How to minimise loss to-follow-up
Although loss of participants damages the power and precision of a study,
differential loss to follow-up is more problematic. If the likelihood of
loss to follow-up is related both to exposure and outcomes then bias can
result. For example, some participants given a new antibiotic might have
such poor outcomes that they are unable to complete questionnaires or
to return for examination. Their disappearance from the cohort would
make the new antibiotic look better than it is.
The best way of dealing with loss to follow-up is to avoid it. For example,
restrict participation to only those judged likely to complete the study.
Obtaining the names of several family members or friends who do not
live with the respondent is often helpful at the start of such studies. The
participants’ family doctor might also be helpful. Should the respondent
move, these contacts would probably know their new address. Motor
vehicle registration records can be useful in such instances too.
Furthermore, national vital statistics registries facilitate follow up.
Participants can be offered financial compensation for their time lost
from work as a result of the study. Diligent tracking of participants is
hard work, and might require hiring personnel for this task alone.
3.2.6 Reporting Cohort Studies
Many researchers who conduct cohort studies report their findings in
an unsatisfactory way. An investigator’s first challenge is to convince the
editor (the readers) that the exposed and unexposed groups were indeed
similar in all important respects, except for the exposure. The first table
in reports of cohort studies customarily provides demographic and other
prognostic factors for both groups with hypothesis testing (P values) to
show the likelihood that observed differences could be due to chance.
For dichotomous outcome measures, such as being sick or feeling well,
the investigator should provide raw data sufficient for the reader to
confirm the results. For cumulative incidence, the investigator should
Epidemiology-Made Easy
29
calculate the proportion who developed the outcome during the specified
study interval. For incidence rates, the value is expressed per unit of time.
Then relative risks and confidence intervals should be provided. Use of P
value should not replace interval estimation (relative risk with confidence
intervals), and should only be used as supplementary information.
Like other observational studies, cohort studies have built in bias.
Investigators should identify potential biases in their data and show
how these might have affected results. Whenever possible, confounding
factors should be discussed in detail.
3.3 Controlling for Confounding
Case-control studies need to address confounding bias. This type of
bias can be dealt with in the design phase by restriction or matching,
but researchers generally prefer to handle it in the analysis phase
with analytical techniques such as logistic regression or stratification
with Mantel–Haenszel approaches. If this second approach is used,
investigators should plan carefully in advance what potentially
confounding variables to obtain data for; irrespective of the analytical
approach used, researchers cannot control for the variable for which
they have no data
Calculating Odds Ratio and Relative risk
What is an Odds Ratio?
An odds ratio is defined as the (odds of the event in the exposed group)
divided by the (odds of the event in the non-exposed group). If the data
is set up in a 2 x 2 table as shown in table 3.3, page 33, then the odds ratio
is (a/b) / (c/d) = ad/bc.
Odds ratios are commonly used to report case-control studies. The odds
ratio helps identify how likely an exposure is to lead to a specific event.
The larger the odds ratio, the higher odds that the event will occur with
exposure. Odds ratios smaller than one, imply the event has fewer odds
of happening with the exposure.
The following is an example to demonstrate calculation of the odds ratio
(OR):
30
Anthony K Mbonye
Practical Session
If we have a hypothetical group of smokers (exposed) and non-smokers
(not exposed), then we can look for the rate of lung cancer (event). If
17 smokers have lung cancer, 83 smokers do not have lung cancer, one
non-smoker has lung cancer, and 99 non-smokers do not have lung
cancer, the odds ratio is calculated as follows.
First, we calculate the odds in the exposed group.
• Odds in exposed group = (smokers with lung cancer) / (smokers
without lung cancer) = 17/83 = 0.205
Next, we calculate the odds for the non-exposed group.
• Odds in not exposed group = (non-smokers with lung cancer) /
(non-smokers without lung cancer) = 1/99 = 0.01
Finally, we can calculate the odds ratio.
• Odds ratio = ad/bc (Table 3.2) = (odds in exposed group) / (odds in
not exposed group) = 0.205 / 0.01 = 20.5
Thus, using the odds ratio, this hypothetical group of smokers has
20 times the odds of developing lung cancer than non-smokers. The
question then arises: is this significant?
Table 3.2: Risk of smoking and lung cancer
Exposure factor
Smokers
Non smokers
Total
Cases
Controls
Total
a
b
100
17
83
c
d
1
99
18
182
100
200
Odds Ratio and Confidence Intervals
To examine whether this finding is significant, the confidence interval
needs to be calculated. The confidence interval gives an expected range
for the true odds ratio for the population to fall within. If estimating
the odds of lung cancer in smokers versus non-smokers of the general
Epidemiology-Made Easy
31
population based on a smaller sample, the true population odds ratio
may be different than the odds ratio found in the sample. In order to
calculate the confidence interval, the alpha, or the level of significance,
is specified. An alpha of 0.05 means the confidence interval is 95% (1 –
alpha) the true odds ratio of the overall population is within range. A
confidence level of 95% is traditionally chosen in the medical literature
(but other confidence intervals can be used). The formula for calculating
confidence intervals are complex and are usually done through readily
available statistical computer packages.
If the confidence interval for the odds ratio includes the number 1, then
the calculated odds ratio would not be considered statistically significant.
This can be seen from the interpretation of the odds ratio. An odds ratio
greater than 1 implies there are greater odds of the event happening in
the exposed versus the non-exposed group. An odds ratio of less than 1
implies the odds of the event happening in the exposed group are less
than in the non-exposed group. While an odds ratio of exactly 1 means
the odds of the event happening are the exact same in the exposed versus
the non-exposed group. Thus, if the confidence interval includes 1 (e.g.,
[0.01- 2], [0.99- 1.01], or [0.99- 100] all include one in the confidence
interval), then the expected true population odds ratio may be above or
below 1, so it is uncertain whether the exposure increases or decreases
the odds of the event happening with the specified level of confidence.
The odds ratio can be confused with relative risk. As stated above, the
odds ratio is a ratio of 2 odds. As odds of an event are always positive,
the odds ratio is always positive and ranges from zero to very large.
What is a relative risk?
This is a ratio of probabilities of the event occurring in all exposed
individuals versus the event occurring in all non-exposed individuals.
The relative risk, is calculated thus: {a / (a+b)} / {c / (c+d)}.
If the disease condition (event) is rare, then the odds ratio and relative
risk may be comparable, but the odds ratio will overestimate the risk if
the disease is more common.
32
Anthony K Mbonye
In such cases, the odds ratio should be avoided, and the relative risk
will be a more accurate estimation of risk. Commonly, odds ratios will
be reported in case-control studies, in which relative risks cannot be
calculated.
The relative risk for the above hypothetical example of smokers versus
non-smokers developing lung cancer is calculated as:
Relative Risk = (17/100) / (1/100) = 0.17 / 0.01 = 17
Thus in this example, the relative risk is 17. Thus smokers have a relative
risk 17 times to have lung cancer compared to non-smokers.
Practical Session:
1. A researcher wanted to find out the effect of drinking boiled water
on the impact of diarrhoea. He visited a nearby health centre IV and
looked for children aged < 5 years who had developed diarrhoea and
visited the facility for treatment in the previous one year. He took
history of hygiene practices, including drinking boiled water, in an
equal number of children who visited the facility for other illnesses
or immunisation in the same period.
2. The results turned out as follows: Among 200 households that
did not boil water, 60 children had diarrhoea; while among the
comparison group of 200 households that boiled water only 20
children had diarrhoea.
a) Using a 2 by 2 table, calculate the risk of diarrhoea among children due to unboiled water.
b) How do you interpret this?
c) What is the prevalence of diarrhoea at the health facility?
d) What was the relative risk of getting diarrhoea in the exposed
group?
Questions to stimulate further reading:
1. Discuss the merits and limitations of case control and cohort studies
2. List the uses of cohort studies in epidemiology and discuss
how you can use the study design to improve the health of
your community.
Epidemiology-Made Easy
33
Bibliography:
1. Beaglehole R, Bonita R, Kjellström T. Basic epidemiology. Geneva:
World Health Organization; 1993 Jan.
2. Bonita R, Beaglehole R, Kjellstrom T. Basic epidemiology: World
Health Organization. Geneva, Switzerland. 2006.
3. Grimes DA, Schulz KF. ‘Cohort studies: Matching towards
outcomes’. The Lancet. 2002 Jan 262;359(9304):341-45.
4. Rothman KJ, Greenland S, Lash TL. Case–control studies. Encyclopaedia
of Quantitative Risk Analysis and Assessment. 2008 Sep 15;1.
5. Schulz KF, Grimes DA. ‘Case-Control studies: Research in Reverse’.
The Lancet. 2002 Feb 2;3 59(9304):431-34.
6. Schulz KF, Grimes DA. ‘The Lancet handbook of essential concepts
in clinical research’. The Lancet; 2006.
7. Tenny S and Hoffman MR. ‘Odds ratio (OR).’ University of Nebraska
Medical Center and SIU School of Medicine, (2017).
Answers to the practical session
Table 3.3: Calculating Odds ratio and Relative Risk
Exposure
Households with no
boiled water diarrhoea
Households with
boiled water
Cases
Controls
Diarrhoea
No diarrhoea
Total
a
60
b
140
200
c
20
d
180
200
400
a)Total
The risk of diarrhoea expressed80as odds ratio is320
defined as the odds of getting diarrhoea in the exposed group/risk of disease in the
non-exposed group = (a/c)/(b/d) = ab/cd = 140*60/20*180=2.3
b) The risk of diarrhoea among children with households that don’t boil
drinking water is 2.3 times higher than households which boil water.
c) The prevalence of diarrhoea at this facility is 80/400 = 20%
d) Relative risk = {a / (a+b)}/{c / (c+d)} = (60/200)/ (20/200) = 0.3/0.1=3
34
Anthony K Mbonye
Lecture Notes-Series Four
Experimental Epidemiology
Lecture Outline
1. Experimental Epidemiology
2. Randomised Controlled Trails
3. Randomisation
4. Blinding
5. Field trials
6. Community trials
Expectations
After reading this Lecture Series, it is expected that you should clearly
understand what is experimental epidemiology and the different types
of studies and techniques to achieve the robustness of the studies You
will be introduced to a practical session that you are encouraged to do.
Through this learning, you should master how to interpret research
reports based on experimental epidemiology and how to use these in
public health and control of infections.
4.1 Experimental Epidemiology
Intervention or experimental epidemiology involves attempting to
change a variable in one or more groups of people. This could mean the
elimination of a dietary factor thought to cause allergy, or testing a new
treatment on a selected group of patients. The effects of an intervention
are measured by comparing the outcome in the experimental group with
that in a control group. Since the interventions are strictly administered
according to a protocol, ethical considerations are of paramount
importance in the design of these studies. For example, no patient
should be denied appropriate treatment as a result of participation in an
experiment, and the treatment being tested must be acceptable in light
of current knowledge.
Experimental epidemiology can take one of three forms:
• Randomised controlled trial
• Field trials
• Community trials
Epidemiology-Made Easy
35
4.2 Randomised Controlled Trails
A randomised controlled trial (or randomised clinical trial) is an
epidemiological experiment to study a new prevention or therapeutic
regimens. Subjects in a population are randomly allocated to groups,
usually called treatment and control groups and the results are assessed
by comparing the outcome in the two or more groups. The outcome of
interest will vary but may be better treatment with a drug, or improved
practices in a hospital setting.
Figure 1: Design of a Randomised Controlled Trial:
Study
population
Potential
participants
Invitation to
participans
Selection
by defined
criteria
Non-participants
(do not meet
selection criteria)
Potential
participants
Participants
Randomisation
Control
Treatment
36
Anthony K Mbonye
To ensure that the groups being compared are equivalent, patients
are allocated to them randomly, i.e., by chance. Within the limits of
chance, randomisation ensures that control and treatment groups
will be comparable at the start of the investigation; any difference
between groups are chance occurrences unaffected by the conscious or
unconscious biases of the investigators.
The intervention under test may be a new drug or a new regimen, such as
new drug to prevent malaria in pregnancy. All subjects in the trial must
meet the specified criteria for the condition under investigation, and
other criteria are usually specified to ensure a reasonably homogeneous
group of subjects, e.g., only patients with long standing or mild disease.
Randomised controlled trials have been helpful in assessing the value of
new therapies to combat diseases. For example, a trial using rice-based or
glucose–based oral rehydration solution involved 342 patients with acute
watery diarrhoea during an epidemic of cholera in Bangladesh in 1983
(Molla et al., 1985). The patients were randomly assigned to treatment
with either glucose-based or rice-based oral rehydration solution. The
study showed that the glucose component of oral rehydration solution
could be replaced by rice powder with improved results, as indicated
by decreases in mean stool output and intake of solution. Studies such
as this have important implications for the efficient use of health care
resources in developing countries. Glucose is a costly manufactured
product and is not always available in countries where diarrhoeal
diseases are a major problem.
The details of a randomised controlled trial of early discharge from
hospital after myocardial infarction are shown in figure 2 below. The
study suggests that, for carefully selected patients with uncomplicated
myocardial infarction, discharge after three days does not harm the
patient.
Fewer patients were readmitted or had subsequent problems than in the
late discharge group. However, only a small proportion of all myocardial
infarction patients were included in the study, and its power was thus
limited because of the small sample size.
Epidemiology-Made Easy
37
Figure 2: Randomised Controlled Trial of Early Hospital Discharge after
Myocardial Infarction
Myocardia patients
Complicated
excluded (329)
Uncomplicated (179)
Randomised (80)
Not included in
study (99)
Early discharge (40)
Late discharge (40)
Outcomes
0
0
Deaths
6
10
Hospital read mission
3
8
Patients with angina
0
5
Re-infections
Source: Topol et al 1988
4.3 Randomisation
In order to attribute a difference in outcome between the two trial arms
to the new treatment being tested, the characteristics of people should
be similar between the groups.
•
•
•
38
Randomly allocation of subjects produces groups that are as
similar as possible with regard to all characteristics except the
trial interventions.
The only systematic difference between the two arms should
be the treatment given.
Therefore, any differences in results observed at the end of
the trial should be due to the effect of the new treatment, and
not to any other
Anthony K Mbonye
Randomisation is a process for allocating subjects between the different
trial interventions. Each subject has the same chance of being allocated
to any group, which ensures similarity in characteristics between the
arms. This minimises the effect of both known confounders, and thus
has a distinct advantage over observational studies in which statistical
adjustments can only be made for known confounders. Although
randomisation is designed to produce groups with similar characteristics,
there will always be small differences because of chance variation. Thus
randomisation cannot produce identical groups.
Randomisation also minimises bias. If either the researcher or trial
subject is allowed to decide which intervention is allocated, then subjects
with a certain characteristic, for example, those who are younger or
suffering less severe disease, could be over represented in one of the
trial arms. This could produce a bias which makes the new intervention
look effective when it really is not, or overestimate the treatment effect.
Selection bias can occur if a choosing a particular subject for the trial is
influenced by knowing the next treatment allocation.
Allocation bias involves giving the trial treatment that the clinical
or subject feels might be most beneficial. Sometimes, the researcher
has access to the list of randomisation from which the next allocation
can be seen, possibly creating allocation bias. This can be avoided if
randomisation is done through a central office (for example, a clinical
trial units) or a computer system, so that the research has no control
over either process (called allocation concealment).
4.4 Blinding
The randomisation process minimises the potential for bias, but the
benefit could be greater if the trial intervention given to each subject
is concealed. Subjects or researchers may have expectations associated
with a particular treatment, and knowing which was given can create
bias. This can affect how people respond to treatment, and how the
researcher manages or assesses the subject. In subjects, this bias is
specifically referred to as the placebo effect. Humans have a remarkable
psychological ability to affect their own health status. The effect of any
of these biases could result in subjects receiving the new intervention
appearing to do the action of the new treatment.
Epidemiology-Made Easy
39
Clinical trials are described as double-blind if neither the subject nor
anyone involved in giving the treatment, or managing or assessing the
subject, is aware of which treatment was given. In single-blind trials,
usually only the subject is blind to the treatment they have received.
A placebo has no known active component. It is often referred to as
a ‘sugar pill’ because many treatment trials involve swallowing tablets.
However, a placebo could also be a saline injection, a sham surgical
procedure, sham medical device or any other intervention that is meant
to resemble the test intervention, but has no known effect on the disease
of interest, and no adverse effect.
Using placebo needs to be fully justified in any clinical trial. While there
are some arguments against placebos such as a sham surgery, these trials
can provide valuable evidence on the effectiveness of a new intervention.
They can be conducted as long as there is ethical approval, and patients
are fully aware that they may be assigned to the sham group.
When it is not possible to conceal the trial interventions, an outcome
measure that does not depend on the personal opinion of the subject or
researcher is best. For example, in a trial evaluating hypnotherapy for
smoking cessation, a subjective measure would be to ask the subjects
if they stopped smoking at, say, 1 year. However, there could be some
continuing smokers who misreport their smoking status. An objective
endpoint would be to measure serum or urinary Nicotine, as a marker of
current smoking status, because this is specific to tobacco smoke inhalation
and so less prone to bias than a questionnaire on self-reported habits.
Summary Points:
•
•
•
•
Clinical trials are essential for evaluating new methods of disease
detection, prevention and treatment.
Clinical trials, especially when randomised, are considered to
provide the strongest evidence.
Randomisation minimises the effect of confounding and bias, and
blinding further reduces the potential for bias.
4.5 Field Trials
Field trials, in contrast to clinical trials, involve people who are disease
40
Anthony K Mbonye
free but presumed to be at risk. Data collection takes place ‘in the field‘,
usually among non-institutionalised people in the general population.
Since the subjects are disease free and the purpose is to prevent the
occurrence of diseases that may occur with relatively low frequency,
field trials are often huge undertakings involving major logistic and
financial considerations. For example, one of the largest field trials
ever undertaken was that of the Salk vaccine for the prevention or
poliomyelitis, which involved over one million children. Even study of
the prevention of coronary heart disease in high-risk middle aged males
involved screening 360,000 men to identify 12,866 men eligible for the
trial. In each of these two examples, randomisation was used to allocate
participants to various treatment groups.
The field trial method can be used to evaluate interventions aimed at
reducing exposure, without necessarily measuring the occurrence of
health effects. For instance, different protective methods for pesticide
exposure have been tested in this way and measurement of blood lead
levels in children has shown the protection provided by elimination
of lead paint in the home environment. Such intervention studies can
often be carried out on a small scale at low cost.
4.6 Community Trials
In this form of experiment, the treatment groups are communities
rather than individuals. This is particularly appropriate for diseases that
have their origins in social conditions, which in turn can most easily
be influenced by intervention directed at group behaviour as well as
at individuals. Cardiovascular disease is a good example of a condition
appropriate for community trials (Farquhar et. al., 1977), several of
which are under way in this field (Salonen et al., 1986). A limitation of
such studies is that only a small number of communities can be included,
and random allocation of communities is not practicable: other methods
are required to ensure that any differences found at the end of the
study can be attributed to the intervention rather than to inherent
differences between communities. Furthermore, it is difficult to isolate
the communities where intervention is taking place from general social
changes that may be occurring. Consequently, this type of study may
underestimate the effect of intervention.
Epidemiology-Made Easy
41
Table…..: Application of Different Observational Study Designs:
Table 4.1: Application of Different Observational Study Designs:
Ecological
Cross-
Case-
Cohort
Sectional Control
Investigation of
rare disease
++++
-
+++++
Investigation of
rare cause
Testing multiple
effects of
Cause
Study of multiple
exposure and
determinants
Measurements of
time relationship
++
-
-
+++++
++
-
-
+++++
++
++
-
+++++
++
-
+b
+++++
Direct
measurement to
incidence
-
-
+c
+++++
Investigation of
long latent periods
-
-
+++
-
§
Key +…….+++++
indicates the degree of suitability
§
-
Not suitable
§
b
If prospective
§ c
If population based
Source: Beaglehole R, Bonita R, Kjellström T. Basic epidemiology. Geneva:
World Health Organization; 1993 Jan.
42
Anthony K Mbonye
Table 4.2: Advantages and disadvantages of different observational study
designs:
Ecological
Cross-
Case-
Sectional
Control
Cohort
Probability of recall bias
NA
Medium
High
Low
Selection bias
NA
High
High
Low
Loss of follow up
NA
NA
Low
High
Confounding
High
Medium
Medium
Low
Time required
Low
Medium
Medium
High
COST
Low
Medium
Medium High
Source: Beaglehole & Kjellström, 1993
Practical session:
1. A new drug has been developed to treat Hepatitis B. It is cheap (3.0
$) per dose and can easily be afforded by developing countries like
Uganda. However, it needs to be evaluated for its effectiveness.
Describe an epidemiological study design you will use to assess the
drug efficacy.
2. Discuss the importance of randomisation and why it is important in
clinical studies.
3. Discuss the term confounding, its importance and how it can be
overcome.
4. Discuss why recall and selection bias are high with case control
studies.
Epidemiology-Made Easy
43
Bibliography
1. Beaglehole R, Bonita R, Kjellström T. Basic epidemiology. Geneva:
World Health Organization; 1993 Jan.
2. Farquhar JW. The Stanford cardiovascular disease prevention
programs. Annals of the New York Academy of Sciences. Apr.1991
3. Molla AM, Ahmed SM, Greenough 3rd WB. Rice-based oral
rehydration solution decreases the stool volume in acute diarrhoea.
Bulletin of the World Health Organization. 1985;63(4):751.
4. Salonen JT, Salonen R, Seppänen K, Rauramaa R, Tuomilehto J.
HDL, HDL2, and HDL3 subfractions, and the risk of acute myocardial
infarction. A prospective population study in eastern Finnish men.
Circulation. 1991 Jul;84(1):129-39.
5. Schulz KF, Grimes DA. ‘Allocation concealment in randomised
trials: defending against deciphering’. The Lancet. 2002 Feb
16;359(9306):614-8.
6. Schulz KF, Grimes DA. ‘Blinding in randomised trials: hiding who
go what’.
7. The Lancet. 2002 Feb 23;359:696-700.
8. Schulz KF, Grimes DA. ‘Generation of allocation sequencies in
randomised trials. Chance not choice’. The Lancet. 2002 Feb 9;359:
515-19.
44
Anthony K Mbonye
Lecture Notes-Series Five
Measurement and
Reporting of Outcomes
Lecture Outline
1.
2.
3.
4.
Measurement of Outcomes
Sample size calculations
Errors and bias
Reporting of outcomes
Expectations
After reading this Lecture Series, it is expected that you should clearly
know how to measure and repot epidemiological outcomes. You will also
understand how to calculate the sample size of a study; and you will be
introduced to epidemiological errors and biases and how to overcome
them. Later, you will be introduced to a practical session that you are
encouraged to do. Through this learning, you should master how to
measure and repot epidemiological outcomes and how to use these in
public health and control of infections.
5.1 Types of Outcomes
Outcome measures fall into two basic categories: counting people
and taking measurements on people. There is a special case of taking
measurements’ that is based on time-to-event data. It is useful to
distinguish between them because it helps to define the trial objectives,
and methods of sample size calculation and statistical analysis. First, the
unit of interest is determined, usually a person. Second, consider what
will be done to the unit of interest. The outcome measure will involve
either counting how many people have a particular characteristic (i.e.,
put them into mutually exclusive groups, such as ‘dead’ or ‘alive’), or
taking measurements on them. In some situations, taking a measurement
on someone involves counting something, but the unit of interest is still
a person. Box 1 below shows examples of outcome measures.
Having measured the endpoint for each trial subject, it is necessary to
summarise the data in a form that can be readily communicated to others.
Epidemiology-Made Easy
45
Box 1: Outcome Measures when the Unit of Interest is a Person:
Counting people (Binary or Categorical data)
Dead or alive
Admitted to hospital (Yes or No)
Suffered a first heart attack (Yes or No)
Recovered from disease (Yes or No)
Severity of disease (Mild, Moderate, Severe)
Ability to perform household duties (none, a little, some, moderate, high)
Taking measurements on people (continuous data)
Blood pressure
Body weight
Cholesterol level
Size of tumour
Counting People
This type of outcome measure is easily summarised by calculating the
percentage of the population. For example, the effect of flu vaccine can
be examined by counting how many developed flu in the vaccinated
group, and dividing this number by the total number of patients in
that group. This proportion (or percentage) is the risk, i.e., the risk
of developing flu if vaccinated. The same calculation is made in the
unvaccinated group, i.e., the risk of developing flu if not vaccinated.
Taking Measurement on People:
Table 5.1: Measuring levels of cholesterol.
46
3.6
3.8
3.9
4.1
4.2
4.5
4.5
4.8
5.1
5.3
5.4
5.4
5.6
5.8
5.9
6.0
6.1
6.1
6.2
6.3
6.4
6.5
6.6
6.8
6.9
7.1
7.2
7.2
7.3
7.4
7.5
7.7
8.0
8.1
8.1
8.2
8.3
9.0
9.1
10.0
Anthony K Mbonye
Table 5.1 above shows levels cholesterol (mmol/L) for 40 healthy men,
all aged 45 years (ranked in order of size). These data are summarised
by two parameters: the ‘average’ level of cholesterol and measure of
spread or variability. The average, often referred to as a measure of
central tendency, can be described by either the means or median. It is
where the middle of the distribution lies. The mean is more commonly
reported and often taken to be the same as the average. Another measure
of average is the mode – the most frequently occurring value – but there
are few instances where this is the best summary measure.
The mean is the sum of all the values divided by the number of
observations. In the example above, the mean is 256/40 = 6.4 mmol/L.
The median is the value that has half the observations above it and half
below. In the example, it is halfway between the 20th and 21st value;
median = (6.3 + 6.4)/2 = 6.35 mmol/L.
One measure of spread is the standard deviation. It quantifies the amount
of variability in a group of people, i.e., how much the data spreads about
from the mean. It is calculated as:
√ Sum of (the distance of each data point from the mean)2
(Number of data values – 1)
In the example, the standard deviation is 1.57 mmol/L: the cholesterol
levels differ from the mean value of 6.4 by, on average, 1.57 mmol/L.
Another measure of spread is the interquartile range. This is the
difference between the 25th centile (the value that has a quarter of the
data below it and three quarters above it) and the 75th centile (the value
that has three-quarters of the data below it and a quarter above it).
Epidemiology-Made Easy
47
Table 5.2: Measuring the interquartile range
Cholesterol (mmol/l)
Number of men
Percentage
3.0 – 3.9
3
7.5
1.0 – 4.9
5
12.5
5.0 – 5.9
7
17.5
6.0 – 6.9
10
25.0
7.0 – 7.9
7
17.5
8.0 – 8.9
5
12.5
9.0 – 9.9
2
5.0
10.0 – 10.9
1
2.5
TOTAL
40
100.0
Source: Hackshaw, 2009
In the example, there are 40 observations so the 25th centile is between
the 10th and 11th data points (i.e. 5.32 mmol/L) and the 75th centile is
between the 30th and 31st data points (i.e. 7.47 mmol/L). Sometimes, the
actual 25th and 75th centile are presented instead of the interquartile range.
Deciding which measures of average spread to use depends on whether
the distribution is symmetric or not. To help determine this, the data
is grouped into categories of cholesterol levels and the frequency
distribution is examined. The shape is reasonably symmetric, that the
distribution is Gaussian or Normal (‘N’ is in capital letters to avoid
confusion with the usual definition of the normal, which can indicate
people without disease). This is more easily visualised by drawing a
curve around the histogram, which is said to be bell-shaped.
When data are normally distributed, the mean and median are similar.
The preferred measures of average and spread are the mean and
standard deviation, because they have useful mathematical properties
which underlie many statistical methods used to analyse this type of
48
Anthony K Mbonye
data. When the data are not Normally distributed, the median and
interquartile range are better measures. To understand why, consider
the outcome measure number of days in hospital for 20 patients. It is
clear that the distribution is not symmetric. It is skewed to the right
(this is where the tail end of the data is). When most of the data are
towards the right, the distribution is said to be skewed to the left.
The summary statistics that describe this data are:
Mean
Median
= 17 days
= 9 days
Standard deviation = 19 days
Interquartile range= 8 days
The middle of the data, and spread, are better represented by the median
and interquartile range. The mean and standard deviation are heavily
influenced by a few very high values.
When data are skewed, it is sometimes possible to transform it, usually
by taking logarithms or the square root. Many biological measurements
only have a Normal (symmetric) distribution after the logarithm is
taken, so using the log of the values would produce a histogram that
has a similar shape to the figure. The mean is calculated using the log
of the values, and the result is back transformed to the original scale.
For example, if the mean of the transformed values is 0.81, using log
to the base10, the calculation 10 0.81 = 6.5 produces the mean value
of the original scale. This is called a geometric mean. Sometimes no
transformation is possible that will turn a skewed distribution into a
Normal one. In these situations, the median and interquartile range
should be used.
A probability (or centile) plot can be used to determine whether data
is normally distributed or not. Many statistical software packages can
provide this. An example can be seen above which includes the 40
cholesterol measurements. If the observations lie reasonably along a
straight line, the data are Normally distributed. Another simple check is
to examine whether the mean = 2 x standard deviation produces sensible
numbers. In the example above, with a mean of 17 days, it would be, (2 x
19); the lower limit of – 21 days is clearly implausible.
Epidemiology-Made Easy
49
5.3 Time–To-Event Data
A specific category of ‘taking measurements on people’ involves
examining the time taken until an event has occurred, based on the
difference between two calendar dates. An event could be defined in
many ways, and one of the simplest and most commonly used ‘is death’ –
hence the term survival analysis which is applied to this type of data. In
the following seven subjects, the endpoint is time from randomisation
until death (in years), and all have died.
4.5
6.1
6.7
8.3
9.1
9.4
10.0
The mean (7.7 years) or median (8.3 years) are easily calculated. In
another group of nine subjects, not all have died at the time of statistical
analysis.
2.7
dead
2.9
dead
3.3
alive
4.7
dead
5.1
alive
6.8
alive
7.2
dead
7.8
dead
9.1
alive
The mean or median cannot be calculated in the usual way until the
subjects have died, which could take many years, and it is incorrect to
ignore those still alive because the summary measure would be biased
downward. An alternative is to obtain the survival rate at say, 3 years
survival rate is 7/9 = 78%. This is simply an example of ‘counting
people’. However, every subject needs to be followed up for at least
3 years, unless they died before hand, and the outcome (dead or alive)
must be known at that point for all of them. In many studies this is not
possible, particularly with long follow up, because contact is lost with
some subjects.
In 1958 a statistical method was developed that changed the way this
type of data was displayed and analysed. In the example above, the timeto-event variable is treated as ‘time’ from randomisation until death
or last known to be alive’ (instead of ‘time from randomisation until
death’), and there is another variable with the values 0 or 1 to indicate
‘still alive’ or dead’. A subject who is still alive, or last known to be alive
at a certain date, is said to be censored. The two variables are used in
a life-table from which it is possible to construct a Kaplan-Meier plot.
This approach uses the last available information on every subject and
50
Anthony K Mbonye
how long they lived for, or have been in the study. It is therefore less of
a concern if contact with some subjects was lost, because having the date
when they were last known to be alive still provides information. The
table below is based on the group of nine subjects. The plot looks like a
series of steps. Every time a subject dies, the step drops down (the first
drop is at 2.7 years) when subjects are censored, four in the example,
they contribute no further information to the analysis after that date. In
large studies with many deaths, the plot looks smoother.
It is possible to estimate survival rates at specific time points, and the
median survival. For the 5-year survival rate, a vertical line is drawn
on the X-axis at ‘5’ and the corresponding Y-axis value is taken when
the line hits the curve: 65%. The median is the time at which half the
subjects have died. A horizontal line is drawn on the Y-axis at ‘50%’ and
the corresponding X-axis value is taken when the line hits the curve: 7.2
years. 5.3:
These
estimates
more
accurately
obtained
from
the life-table.
Table
Life
Table are
of the
survival
data
of nine
patients
Table 5.3: Life Table of the survival data of nine patients
Time Since
Censored (0=Yes,
Number Of
Percentage
Randomisation
1=Dead)
Patients at Risk
Alive (Survival
(Years)
Rate %)
0
-
9
100
2.7
1
9
89
2.9
1
8
78
3.3
0
7
78
4.7
1
6
65
5.1
0
5
65
6.8
0
4
65
7.2
1
3
43
7.8
1
2
22
9.1
0
1
22
Source: Hackshaw, 2009
Epidemiology-Made Easy
51
When some subjects are censored, i.e., not all have died, the KaplanMeier median survival is not the same as finding the median from a
ranked list of numbers (as in the example). They are only identified when
every subject has died, which is rare in trials. The median is used instead
of the mean, because time-to-event date often has a skewed distribution.
The Kaplan-Meier plot starts off with every subject alive at time zero,
this is the most common form in the literature. This type of plot is
useful when deaths tend to occur early on. However, it is possible to
have a plot in which no subject has died at time zero.
5.4 Different Types of Time-To-Event Outcome Measures
In the section above, the ’event’ in the time-to-event data is ‘death’,
called overall survival because it relates to death from any cause. The
methods can apply to an endpoint that involves measuring the time
until a specified event has occurred; for example, time from entry to
a trial until the occurrence or recurrence of a disorder, such as severe
exacerbation of asthma, or any change in health status, such as time until
hospital discharge.. Overall survival is simple because it only requires
the date of death. Cause-specific survival requires, in addition, accurate
confirmation of cause of death (such as pathology records), which is not
always available or reliably recorded. Also, cause-specific survival means
that deaths from causes other than that of interest are not counted as
an event (they are censored). This may be inappropriate when the
treatment has serious side-effects. A new therapy may reduce the lung
cancer death rate but increase the risk of dying from treatment-related
side effects, for example, cardiovascular disease. Here, overall survival is
probably more appropriate.
When an event is disease incidence, recurrence or progression, the
date when this occurs is required. However, obtaining accurate dates
is difficult unless subjects are examined regularly. The date is usually
when the disease was first discovered. This is either the date when the
subject was due to have one of the regular examinations specified in
the trial protocol, or after the subject developed symptoms and received
clinical confirmation. Subjects in the trial arms should therefore have
their regular examinations at a similar time.
When the measure is based on two or more event types and a subject
could have both events, such as disease occurrence followed by death, it
52
Anthony K Mbonye
is usual to consider only the date of the first event in the analysis. This
is because the patient may be managed differently afterwards: the trial
treatment changes or stops, non-trial therapies are given, or patients
may be given the treatment from the other trial arm. When this occurs,
it is difficult dealing with sub-sequent events, and how to attribute
differences in the endpoint to the trial treatments. Unlike overall
survival, disease-, progression or event-free survival are unaffected
by subsequent treatments because only the first event matters in the
analysis.
Box 3: Time-To-Event Outcome Measures in Trials
Endpoint
Event
Comments
Overall
survival
Death from any cause
Disease-free
survival
First recurrence of the
disease
Death from any cause
Event-free
survival
First recurrence of disease
First occurrence of other
specified diseases
Death from any cause
First sign of disease
progression
Death from any cause
Easily defined
May mask the effect of
an intervention if it only
affects a specific disease.
Useful when patients are
thought to be free from
disease after treatment,
so patients have a good
prognosis.
Need date of recurrence.
Similar to disease free
survival
Progression
free survival
Disease (or
cause)-specific
survival
Death from the disease of
interest
Epidemiology-Made Easy
Useful for advanced
disease, where patients
have not been ‘cured’
after treatment, and are
expected to get worse in
the near future.
Needs
date
of
progression
Useful when examining
interventions that are
not expected to have an
effect on any disease
53
apart from the one of
interest.
Death from any cause
Endpoint
Event
Disease
Overall(or
cause)-specific
survival
survival
Death
Deathfrom
fromthe
anydisease
cause of
interest
Disease-free
survival
First recurrence of the
disease
Death from any cause
Endpoint
An event is defined as
follows. All other subjects
are
censored
First
recurrence of disease
Event-free
survival
Time-totreatment
Progression
failure
First occurrence of other
specified
First
sign diseases
of disease
Death
from
progression any cause
First sign
disease
Death
fromofant
cause
progression
Stopped treatment
Death from any cause
have not been ‘cured’
after treatment, and are
expected to get worse in
the near future.
Needs
date
of
Comments
progression
Useful
when examining
Easily defined
interventions
May mask thethat
effectare
of
not
expected
to
have
an
an intervention if it only
effect
any disease.
disease
affects on
a specific
apart
one are
of
Usefulfrom
whenthe
patients
interest.
thought to be free from
Need
accurate
recording
disease
after treatment,
and
confirmation
of
so patients
have a good
Comments
cause
of
death.
prognosis.
Assumes
is
Need date treatment
of recurrence.
not
associated
Similar
to disease with
free
death
from other causes.
survival
Similar to progression
free survival
Useful for advanced
disease, where patients
have not been ‘cured’
after treatment, and are
Source: Hackshaw, 2009
expected to get worse in
the near future.
Recurrence: there was no clinical evidence ofNeeds
the disease date
shortly after
of
treatment, but the disease returned later on. progression
Disease (or
Death from the disease of
Useful when examining
cause)-specific
Progression (or relapse):
disease after treatment,
interest the patient still had the
interventions
that are
survival
but
it got worse later. Disease and event-free
survival
may
be used
not expected to have
an
interchangeably, so it is useful to be clear abouteffect
the precise
definition.
on any
disease
apart from the one of
5.6 Measurement of Outcomes:
interest.
Identification and quantification of outcomes Need
is the accurate
core consideration
recording
of research. However, slippery terminology often
matters
and complicates
confirmation
of
for investigators and readers alike. For example,
the
term
rate
(as
in
cause of death.
maternal mortality rate) has been misused in
textbookstreatment
and journal
Assumes
is
articles for decades. Additionally, rate is often
interchangeably
not used
associated
with
with proportion and ratio. Figure 3 presents
a simple
approach
to
death
from other
causes.
free survival
classification of these common terms.
54
Anthony K Mbonye
Figure 3: Distinguishing Rates, Population, and Ratios
Ratio
Is numerator included
in denominator?
Yes
No
Is time included
in denominator?
Yes
No
Measure:
Rate
Proportion
Ratio
Example:
Incidence rate
Prevalence rate Maternal mortality ratio
Source: Grimes & Schultz, 2002
A ratio is a value obtained by dividing one number by another. These
two numbers can be either related or unrelated. This feature – i.e.,
relatedness of numerator and denominator – divides ratios into two
groups: those in which the numerator is included in the denominator –
e.g., rate and proportion – and those in which it is not.
A rate measures the frequency of an event in a population. As shown
in figure 3, the numerator (those with the outcome) of a rate must be
contained in the denominator (those at risk of the outcome). Although
all ratios feature a numerator and denominator, rates have two
distinguishing characteristics: time and a multiplier. Rates indicate the
time during which the outcomes occur and a multiplier, commonly to
a base ten, to yield whole numbers. An example would be an incidence
rate, indicating the number of new cases of disease in a population at
risk over a defined interval of time - e.g., 11 cases of tuberculosis per
100,000 persons per year.
Proportion is often used synonymously with rate, but the former
does not have a time component. Like a rate, a proportion must have
Epidemiology-Made Easy
55
the numerator contained in the denominator. Since the numerator
and denominator have the same units, those divide out, leaving a
dimensionless quantity; a number without units. An example of a
proportion is prevalence – e.g., 27 of 100 at risk have malaria. This
number indicates how many of a population who are at risk have a
condition at a particular time (here, 27%); since documentation of new
cases over time is not involved, prevalence is more properly considered
a proportion than a rate.
Although all rates and proportions are ratios, the opposite is not true.
In some ratios, the numerator is not included in the denominator.
Perhaps the most common example is the maternal mortality ratio.
The definition includes women who die of pregnancy related causes
in the numerator and women with livebirths (usually 100,000) in the
denominator. However, not all those in the numerator are included in
the denominator – e.g., a woman who dies of an ectopic pregnancy
cannot be in the denominator of women with live births. Thus, this is
actually a ratio, not a rate, a fact only recently appreciated.
5.7 Measures of Association
Relative risk (also termed the risk ratio) is another useful ratio: the
frequency of outcome in the exposed group divided by the frequency of
outcome in the unexposed. If the frequency of the outcome is the same
in both groups, then the ratio is 1.0, indicating on association between
exposure and outcome. By contrast, the ratio will be greater than 1.0,
implying an increased risk associated with exposure. Conversely, if the
frequency of disease is less among the exposed, then the relative risk will
be less than 1.0, implying a protective effect.
The odds ratio has different meanings in different settings. In case
control studies, this measure is the usual measure of association. It
indicates the odds of the exposure among the case group divided by the
odds of the exposure among controls. If cases and controls have equal
odds of having the exposure, the odds ratio is 1.0, indicating no effect. If
the cases have a higher odds of exposure than the controls, then the ratio
is greater than 1.0, implying an increased risk associated with exposure.
Similarly, odds ratios less than 1.0 indicate a protective effect.
56
Anthony K Mbonye
An odds ratio can also be calculated for cross-sectional, cohort, and
randomised controlled studies. Here, the disease-odds ratio is the
ratio of the odds in favour of disease in the exposed versus that in the
unexposed. In this context, the odds ratio has some appealing statistical
features when studies are aggregated in meta analyses, but the odds
ratio does not indicate the relative risk when the proportion with the
outcome is greater than 5-10% - i.e., the term has little clinical relevance
or meaning with higher incidence rates.
The confidence interval reflects the precision of study results. The
interval provides a range of values for a variable, such as a proportion,
relative risk, or odds ratio, that has a specified probability of containing
the true value for the entire population from which the study sample
was taken. Although 95% CIs are the most commonly used, others such
as 90%, are seen (and advocated). The wider the confidence interval,
the less precision exists in the result, and vice versa. For relative risks
and odds ratios, when the 95% CI does not include 1.0, the difference is
significant at the usual 0.05 level.
5.8 Sample Size Calculations
The desirable size of a proposed study can be assessed using standard
formulae. Information on the following variables is required before the
formula can be employed:
•
•
•
•
•
Required level of statistical significance for the expected result
Acceptable chance of missing a real effect
Magnitude of the effect under investigation
Amount of disease in the population
Relative sizes of the groups being compared
In reality, sample size is often determined by logistic and financial
considerations, and a compromise always has to be made between
sample size and costs. A practical guide to determining size in health
studies has been published by the WHO (Lwanga & Lemeshow, 1991).
The precision of a study can also be improved by ensuring that the groups
are of appropriate relative size. This is often an issue of concern in case–
control studies when a decision is required on the number of controls
Epidemiology-Made Easy
57
to be chosen for each case. It is not possible to be definitive about the
ideal ratio of controls to cases, since this depends on the relative costs of
accumulating cases and controls. If cases are scarce and controls plentiful, it
is appropriate to increase the ratio of controls to cases. In general, however,
there may be little point in having more than four controls for each case.
It is important to ensure that there is sufficient similarity between cases
and controls when the data are to be analysed by, for example, age group
or social class; if most cases and only a few controls were in the older age
groups, the study would be inefficient and a wasted effort.
Summary Points
•
•
•
•
•
•
Trials should have clearly defined outcome measures (endpoints).
Secondary endpoints should be closely correlated with ‘primary’
endpoints and have been validated, especially if they are used as the
main trial endpoint.
Outcome measures could involve ‘counting people’, ‘taking
measurements on people’ or ‘time-to-event’ data.
Counting people: data are summarised by a percentage or
proportions.
Taking measurements on people: data are summarised by average
and spread (mean and standard deviation if the data are Normally
distributed, median and interquartile range if the data are skewed).
Time–to-event data: when not all patients have had the event of
interest, the data can be summarised using a Kaplan-Meier plot,
median value, or survival or event rate at a specific time point.
5.9 Potential errors in Epidemiological Studies
An important purpose of most epidemiological investigations is to
measure accurately the occurrence of disease (or other outcome).
Epidemiological measurement is, however, not easy and there are many
possible sources of errors in measurement. Much attention is devoted
to minimising errors and, since they can never be completely eliminated,
assessing their importance. Error can be either random or systematic.
Random Error
This is Random error due to chance alone, of an observation on a sample
from the true population value. This may lead to lack of precision in the
measurement of an association.
58
Anthony K Mbonye
Random error can never be completely eliminated since we can study
only a sample of the population, individual variation always occurs and
no measurement is perfectly accurate. Random error can be reduced
by the careful measurement of exposure and outcome thus making
individual measurements as precise as possible. Sampling error occurs
as part of the process of selecting study participants who are always a
sample of a larger population, and the best way to reduce it is to increase
the size of the study.
Systematic Error
Systematic error (or bias) occurs in epidemiology when there is a
tendency to produce results that differ in a systematic manner from the
true values. A study with a small systematic error is said to have a high
accuracy.
Systematic error is a particular hazard because epidemiologists usually
have no control over participants in studies, unlike the situation in
laboratory experiments. Furthermore, it is often difficult to obtain
representative samples of source populations. Some variables of interest
in epidemiology are particularly difficult to measure, among them
personality type, alcohol consumption habits, and past exposures to
rapidly changing environmental conditions, and this difficulty may lead
to systematic error.
The possible sources of systematic error in epidemiology are many and
varied, indeed over 30 specific types of bias have been identified. The
principal biases are:
• Selection bias
• Measurement (or classification) bias
Selection Bias
Selection bias occurs when there is a systematic difference between the
characteristics of the people selected for a study and the characteristics
of those who are not. An obvious source of selection bias occurs when
participants select themselves for a study, either because they are unwell
or because they are particularly worried about an exposure. It is well
known, for example, that people who respond to an invitation to
participate in a study on the effects of smoking differ in their smoking
Epidemiology-Made Easy
59
habits from non-responders, the latter are usually heavier smokers. In
studies of children’s health, where parental cooperation is required,
selection bias may also occur. In a cohort study of new-born children
(Victoria et al., 1987), the proportion successfully followed up for 12
months varied according to income level of the parents. If individual
entering or remaining in a study display different associations from
those who do not, a biased estimate of the association between exposure
and outcome is produced.
An important selection bias is introduced when the disease or factor
under investigation itself make people unavailable for study. For
example, in a factory where workers are exposed to formaldehyde,
those who suffer most from eye irritation are likely to leave their jobs at
their own request or after medical advice. The remaining workers are
less affected and in a prevalence study conducted in the workplace, the
association between formaldehyde exposure and eye irritation may be
very misleading.
Measurement Bias
Measurement bias occurs when the individual measurement or
classification of disease or exposure is inaccurate (i.e., they don’t
measure correctly what they are supposed to measure). There are
many sources of measurement bias and their effects are of varying
importance. For instance, biochemical or physiological measurements
are never completely accurate and different laboratories often produce
different results on the same specimen. If the specimens of the exposed
and control groups are analysed randomly by different laboratories with
insufficient joint quality assurance procedures, the errors will be random
and less potentially serious for the epidemiological analysis, than in the
situation where all specimens from the exposed group are analysed
in one laboratory and all those from the control group are analysed
in another. If the laboratories produce systematically different results
when analysing the same specimen, the epidemiological evaluation
becomes biased.
A form of measurement bias of particular importance in retrospective
case-control studies is known as recall bias. This occurs when there is
a differential recall of information by cases and controls; for instance,
60
Anthony K Mbonye
cases may be more likely to recall past exposure, especially if it is widely
known to be associated with the disease under study (for example, lack of
exercise and heart disease). Recall bias can either exaggerate the degree
of effect associated with the exposure (as with heart patients being more
likely to admit to a past lack of exercise) or underestimate of it (if cases
are more likely than controls to deny past exposure).
If measurement bias occurs equally in the groups being compared (nondifferential bias) it almost always results in an underestimate of the true
strength of the relationship. This form of bias may account for some
apparent discrepancies between the results of different epidemiological
studies.
5.10 Confounding
In a study of the association between exposure to a cause (or risk factor)
and the occurrence of disease, confounding can occur when another
exposure exists in the population and is associated both with the disease
and the exposure being studied. A problem arises if this extraneous
factor – itself a risk factor of the health outcome – is unequally distributed
between the exposure subgroups.
Confounding occurs when the
effects of two exposures (risk factors) have not been separated and it is
therefore incorrectly concluded that the effect is due to one rather than
the other variable. For instance, in a study of the association between
tobacco smoking and lung cancer, age would be a confounding factor if
the average ages of the non-smoking and smoking groups in the study
population were very different, since lung cancer incidence increases
with age.
Confounding can have a very important influence, possibly even
changing the apparent direction of an association, a variable that
appears to be protective may, after control of confounding, be found to
be harmful. The most common concern over confounding is that it may
create the appearance of a cause-effect relationship that in reality does
not exist. For a variable to be confounder, it must, in its own right, be a
determinant of the occurrence of disease (i.e., a risk factor) and with the
exposure and lung cancer, smoking is not a confounder if the smoking
habits are identified in the exposed and control groups.
Epidemiology-Made Easy
61
Age and social class are often confounders in epidemiological studies.
An association between high blood pressure and coronary heart disease
may in truth represent concomitant changes in the two variables that
occur with increasing age; the potential confounding effect of age has to
be considered, and when this is done it is seen that high blood pressure
indeed increases the risk of coronary heart disease.
Another example of confounding is shown in the figure 5.11 below.
Confounding may be the explanation for the relationship demonstrated
between coffee consumption and the risk of coronary heart disease,
since it is known that coffee consumption is associated with cigarette
smoking: people who drink coffee are more likely to smoke than people
who do not drink coffee. It is also well known that cigarette smoking is
a cause of coronary heart disease. It is thus possible that the relationship
between coffee consumption and coronary heart disease, merely reflects
the known causal association of smoking with the disease. In this
situation smoking confounds the apparent relationship between coffee
Figure 5.11: Confounding: coffee drinking, cigarette smoking, and coronary heart
consumption and coronary heart disease.
Figure 5.11: Coffee drinking, cigarette smoking, and coronary heart disease
EXPOSURE
DISEASE
(coffee consumption)
(coronary heart disease)
CONFOUNDING
cigarette smoking
Control of confounding
Several methods are available to control confounding, either through
study
or during the analysis of results.
Control
of design
confounding
methods
used
to control
confounding
the design
of design or
SeveralThe
methods
arecommonly
available to
control
confounding,
eitherinthrough
study
an epidemiological study are:
during•theRandomisation
analysis of results.
•
•
Restriction
Matching
The methods commonly used to control confounding in the design of an
epidemiological study are:
•
62
Randomisation
Anthony K Mbonye
At the analysis stage, confounding can be controlled by:
• Stratification
• Statistical modelling
Randomisation: which is applicable only to experimental studies, is the
ideal method for ensuring that potential confounding variables are
equally distributed among the groups being compared. The samples sizes
have to be sufficiently large to avoid random misdistribution of such
variables. Randomisation avoids the association between potentially
confounding variables and the exposure that is being considered.
Restriction: can be used to limit the study to people who have particular
characteristic. For example, in a study on the effects of coffee on coronary
heart disease, participation in the study of confounded by cigarette smoking.
Matching: If matching is used to control confounding the study
participants are selected so as to ensure that potential confounding
variables are evenly distributed in the two groups being compared. For
example, in a case–control study of exercise and coronary heart disease,
each patient with heart disease can be matched with a control of the
same age group and sex to ensure that confounding by age and sex does
not occur. Matching has been used extensively in case control studies,
but it can lead to problems in the selection of controls if the matching
criteria are too strict or too numerous; this is called overmatching.
Matching can be expensive and time consuming, but is particularly
useful if the danger exists of there being on overlap between cases and
controls, as where the cases are likely to be older than the controls.
Analysis: In large studies it is usually preferred to control for confounding
in the analytical phase rather than in the design phase. Confounding can
then be controlled by stratification, which involves the measurement
of the strength of associations in well-defined and homogeneous
categories (strata) of the confounding variable. If age is a confounder,
the association may be measured in, say 10-year age groups; if sex of
ethnicity is a confounder, the association is measured separately in men
and women or in the different ethnic groups. Methods are available for
summarising the overall association by producing a weighted average of
the estimates calculated in each separate stratum.
Epidemiology-Made Easy
63
Questions to stimulate further reading:
1.
2.
3.
4.
Discuss two types of errors commonly encountered in epidemiolocal studies.
How can the errors be minimised?
Discuss the term confounding and the different strategies to address it
Uganda has a high burden of malaria, HIV/AIDS, hepatitis B,
measles, cholera, typhoid, and non-communicable diseases. Describe
how an epidemiologist can help policy makers to control these
diseases.
5. Describe several study designs they may use to provide policy
relevant data.
6. The Malaria Control Division in the Ministry of Health, together
with the Reproductive Health Division, want to evaluate a new
antimalarial drug called Atekin for malaria prevention in
pregnancy. It is hypothesized that Atekin has more beneficial effects
in reducing parasitemia and anemia in pregnancy.
a) What are the outcome indicators for this study?
b) What study design would best measure the outcome indicators?
c) What are the strengths and weaknesses of the design of
your choice?
7. Define the term surveillance.
a)Describe the types of surveillance systems commonly use in
epidemiology.
b) Describe how surveillance can help in the control of the frequent viral haemorrhagic fevers (like Ebola) in Uganda.
Practical session
1. Discuss important considerations when calculating sample size of a
study.
2. Why are outcome measures important in writing research grant
proposals?
3. Discuss the importance of time-to-event studies.
4. A Case Control study was conducted in Mukono district to
determine if children who live in households with dirty and old
latrines, experience more episodes of diarrhea compared to children
from households with new and improved latrines.
64
Anthony K Mbonye
In total, 100 ‘cases’ (60 with old latrines and 40 with new latrines) and
400 ‘controls’ (150 with old latrines and 250 with new latrines) were
included in the study.
a. Calculate the risk of getting diarrhoea expressed as the Odds Ratio
Table
5.4a Cases
of diarrhoea
(OR)
using
2 x 2 table.
(10 marks).among children by type of latrine
Table 5.4 Cases of diarrhoea among children by type of latrine
Households with
old latrines
(exposed)
Households with
new latrines (nonexposed)
Total
Children with
diarrhea (Cases)
Control
Total
a
60
b
150
210
c
40
d
250
290
100
400
500
Answers:
Odds ratio is defined as (odds of the event in the exposed group)/(odds
of the event of the non-exposed).
The formula to calculate odds ratio is, (a/b)/(c/d) =ad/
bc=60x250/150x40=2.5
b. Interpret results of the Odds Ratio (5 marks)
The odds ratio of 2.5 implies that the risk of getting diarrhea among
children who reside in households with old latrines is 2.5 times higher that those in households with new latrines.
2. One thousand women in reproductive age visited a cervical cancer clinic and were tested for cervical cancer using CareHPV test. Pap smear was used as the ‘Gold standard’ for detecting cervical cancer.
Epidemiology-Made Easy
65
Table 5.5: Calculating the sensitivity, specificity and predictive value of a test
CAREHPV Test
(Positive)
Pap Smear
(Positive)
Have disease
Pap Smear
(Negative)
Have no disease
160
80
a
TOTAL
b
240
(False positives)
CAREHPV Test
(Negative)
TOTAL
c
d
40
(False negatives)
720
760
200
Total positive
800
Total negative
with no disease
1,000
Total
Population
with disease
a. Using information above, calculate the prevalence of HPV in this population.
Prevalence of the diseases is: Total positive with disease/Total population
= 200/1000=20%
b. What was the sensitivity of the CareHPV test?
Sensitivity of the test is: True positives/Total Positives with disease or a/
(a+c)x 100 = 160/200 x100=80%
c. What was the Specificity of the CareHPV test?
True negatives/Total negatives with no disease or d/(b+d)X 100
= 720/800 X 100=90%
d. What was the Predicative Value of a positive CareHPV test?
The positive predictive value of the test is: a / (a+b) x 100 =160/ (160+40) x 100=80%
e. What was the Predicative Value of a negative CareHPV test?
The negative predictive value of the test is: d / (d+c) x 100= 720/ (720+40) x 100=94.7%
3. Uganda has an increasing prevalence of cancer diseases, common
among them is cancer of the cervix, breast, lung, colon, prostate, etc.
Accordingly, cases in Kyadondo region are captured in the cancer
registry and there are efforts to capture all cancer cases in the country.
66
Anthony K Mbonye
In follow up of cancer patients, the time of diagnosis of cancer and the
cure, relapse or death are usually recoded.
a) Describe the technique you would use to find out the survival
rate of women who get cancer of the cervix.
b) Why is this epidemiological parameter important?
. c) If you wanted to find out the risk factors that expose women to
cancer of the cervix, what study design(s) would you prefer, and
why?
Bibliography
1. Beaglehole R, Bonita R, Kjellström T. Basic epidemiology. Geneva:
World Health Organization; 1993 Jan.
2. Grimes DA, Schulz KF. ‘Bias and causal associations in observational
research’. The Lancet. 2002 Jan 19;359(9302):248-52.
3. Grimes DA, Schulz KF. ‘Uses and abuses of tests’. The Lancet. 2002
March 9, 359:881-84.
4. Hackshaw A. A concise Guide to Clinical trials. Willey-Blackwell,
BMJ Books, 2009.
5. Lwanga SK, Lemeshow S, World Health Organization. Sample size
determination in health studies: a practical manual. World Health
Organization; 1991.
6. Schulz KF, Grimes DA. ‘Sample size slippages in randomised trials:
exclusions and the lost and the way forward’. The Lancet. 2002, March
2;359: 781-85.
7. Schulz KF, Grimes DA. ‘Sample size calculations in randomised trials:
mandatory and mystical’. The Lancet. 2005 Apr 9;365(9467):1348-53.
8. Schulz KF, Grimes DA. The Lancet handbook of essential concepts in
clinical research. Lancet; 2006.
9. Schulz KF. ‘Randomised trials, human nature, and reporting
guidelines’. The Lancet. 1996 Aug 31;348(9027):596-8.
10. Schulz KF. ‘Randomised trials, human nature, and reporting
guidelines’. The Lancet. 1996 Aug 31;348(9027):596-8.
Epidemiology-Made Easy
67
Lecture Notes-Series Six
Practical Steps in
investigating a disease
outbreak
Lecture Outline
1.
2.
3.
4.
Background
Key steps in disease investigation
Detailed explanation of the key steps for disease investigation
Practical Session
Expectations
After reading this Lecture Series, it is expected that you should clearly
know practical steps required to investigate a diseases outbreak. Later,
you will be introduced to a practical session that you are encouraged
to do. Through this learning, you should master how to investigate a
diseases outbreak and how to use the findings in designing prevention
and control measures.
6.1 Background:
Once there is a reported disease outbreak and a decision to conduct a
field investigation of that outbreak has been made, working quickly and
swiftly is critical. Below are the essential practical steps necessary to
investigate a diseases outbreak. These will then be explained and their
relevancy emphasized in subsequent sections.
6.2 Key steps in disease investigation:
1.
2.
3.
4.
5.
6.
7.
68
Prepare for field work
Establish the existence of an outbreak
Verify the diagnosis
Construct a working case definition
Find cases systematically and record the data
Perform descriptive epidemiological analyses
Develop a working hypotheses
Anthony K Mbonye
8. Evaluate the hypotheses
9. From time to time, review, refine, and re-evaluate the hypotheses
10. Compare and reconcile with laboratory and/or environmental
studies
11. Implement control and prevention measures
12. Initiate or maintain surveillance
13. Communicate the findings
6.3 Detailed explanation of the key steps in disease investigation
Step 1: Prepare for field work
Field preparations can be grouped into two categories: (a) scientific and
investigative issues, and (b) management and operational issues. Good
preparation in both categories is needed to facilitate a smooth field
investigation.
As a field investigator, you must have the appropriate scientific
knowledge, supplies, and equipment before departing for the field.
It is important to discuss the field work preparations with someone
knowledgeable about the disease and field investigations. It is also
essential to review the relevant literature.
Before leaving for a field investigation, consult laboratory staff to ensure
that you take the proper laboratory materials and know the proper
collection, storage, and transportation techniques. By talking with the
laboratory staff, you are also informing them about the outbreak, and
they can anticipate what type of laboratory resources will be needed.
You also need to know what supplies or equipment to bring to protect
yourself. Some outbreak investigations require no special equipment,
while an investigation of SARS or Ebola haemorrhagic fever may require
personal protective equipment such as masks, gowns, and gloves.
Finally, before departing, you should have a plan of action. What are the
objectives of this investigation, i.e., what are you trying to accomplish?
What will you do first, second, and third? Having a plan of action upon
which everyone agrees will allow you to ‘hit the ground running’ and
avoid delays resulting from misunderstandings.
Epidemiology-Made Easy
69
A good field investigator must be a good manager and collaborator
as well as a good epidemiologist, because most investigations are
conducted by a team rather than just one individual. The team members
must be selected before departure and know their expected roles
and responsibilities. Does the team need a laboratory technician, a
veterinarian, translator/interpreter, computer specialist, entomologist,
or other specialists? What is the role of each? Who is in charge? If you
have been invited to participate but do not work for the local health
team, are you expected to lead the investigation, provide consultation to
the local staff who will conduct the investigation, or simply lend a hand
to the local staff? And who are your local contacts?
Depending on the type of outbreak, the number of involved agencies may
be quite large. The investigation of an outbreak from an animal source
may include the department of agriculture. If criminal or bioterrorist
intent is suspected, law enforcement agencies may be in charge, or at
least involved. In a natural disaster (hurricane or flood), the Ministry
of Disaster Management and Preparedness, and the Prime minister’s
office may be involved. Staff from different agencies have different
perspectives, approaches, and priorities that must be reconciled.
For example, whereas the public health investigation may focus on
identifying a pathogen, source, and mode of transmission, a criminal
investigation is likely to focus on finding the perpetrator. Sorting out
roles and responsibilities in such multi-agency investigations is critical
to accomplishing the disparate objectives of the different agencies.
A communications plan must be established. The need for communicating
with the public and the community has long been acknowledged, but the
need for communicating quickly and effectively with elected officials
and the public is obvious during the epidemics like Ebola, Yellow
Fever, West Nile Virus encephalitis, SARS, anthrax, and COVID-19.
The plan should include how often and when to have conference calls
with involved agencies, who will be the designated spokesperson, who
will prepare health alerts and press releases, and the like. In addition,
operational and logistical details are important. Arrange to bring a
laptop computer, cell phone or phone card, camera, and other supplies.
If you are arriving from outside the area, you should arrange in advance
when and where you are to meet with local officials and contacts when
70
Anthony K Mbonye
you arrive in the field. You must arrange travel, lodging, and local
transportation. Many agencies and organizations have strict approval
processes and budgetary limits that you must follow. If you are traveling
to another country, you will need a passport and often a visa. You should
also take care of personal matters before you leave, especially if the
investigation is likely to be lengthy.
Step 2: Establish the existence of a disease outbreak
Definitions:
• A disease outbreak or an epidemic is the occurrence of more cases
of disease than expected in a given area or among a specific group of
people over a particular period of time.
• Many epidemiologists use the terms outbreak and epidemic
interchangeably, but the public is more likely to think that an
epidemic implies a crisis situation.
• Some epidemiologists apply the term epidemic to situations
involving larger numbers of people over a wide geographic area.
Indeed, the Dictionary of Epidemiology defines outbreak as an
epidemic limited to localized increase in the incidence of disease,
e.g., village, town, or closed institution.
One of the first tasks of the field investigator is to verify that a cluster of
cases is indeed an outbreak. Some clusters turn out to be true outbreaks
with a common cause, some are sporadic and unrelated cases of the same
disease, and others are unrelated cases of similar but unrelated diseases.
Even if the cases turn out to be the same disease, the number of cases may
not exceed what the health department normally sees in a comparable
time period. Here, as in other areas of epidemiology, the observed is
compared with the expected. The expected number is usually the
number from the previous few weeks or months, or from a comparable
period during the previous few years. For a notifiable disease, the
expected number is based on health department surveillance records.
For other diseases and conditions, the expected number may be based
on locally available data such as hospital discharge records, mortality
statistics, or cancer or birth defect registries. When local data are not
available, a health department may use rates from state or national data,
or, alternatively, conduct a telephone survey of physicians to determine
Epidemiology-Made Easy
71
whether they are seeing more cases of the disease than usual. Finally, a
survey of the community may be conducted to establish the prevalence
of the disease.
Even if the current number of reported cases exceeds the expected
number, the excess may not necessarily indicate an outbreak. Reporting
may rise because of changes in local reporting procedures, changes in the
case definition, increased interest because of local or national awareness,
or improvements in diagnostic procedures. A new physician, infection
control nurse, or healthcare facility may more consistently report cases,
when in fact there has been no change in the actual occurrence of the
disease. Some apparent increases are actually the result of misdiagnosis
or laboratory error. Finally, particularly in areas with sudden changes in
population size such as resort areas, college towns, and migrant farming
areas, changes in the numerator (number of reported cases) may simply
reflect changes in the denominator (size of the population).
Whether an apparent problem should be investigated further is not
strictly tied to verifying the existence of an epidemic (more cases than
expected). Sometimes, health agencies respond to small numbers of
cases, or even a single case of disease, that may not exceed the expected
or usual number of cases. As noted earlier, the severity of the illness,
the potential for spread, availability of control measures, political
considerations, public relations, available resources, and other factors
all influence the decision to launch a field investigation.
Step 3: Verify the diagnosis
The next step is to verify the diagnosis. This is closely linked to verifying
the existence of an outbreak. In fact, often these two steps are addressed
at the same time. Verifying the diagnosis is important: (a) to ensure
that the disease has been properly identified, since control measures are
often disease-specific; and (b) to rule out laboratory error as the basis for
the increase in reported cases.
First, review the clinical findings and laboratory results. If you have
questions about the laboratory findings (for example, if the laboratory
tests are inconsistent with the clinical and epidemiologic findings), ask
a qualified laboratory technician to review the laboratory techniques
72
Anthony K Mbonye
being used. If you need specialized laboratory work such as confirmation
in a reference laboratory, other chemical or biological fingerprinting,
or polymerase chain reaction, you must secure a sufficient number of
appropriate specimens, isolates, and other laboratory materials as soon
as possible.
Second, many investigators — clinicians and non-clinicians — find it
useful to visit one or more patients with the disease. If you do not have the
clinical background to verify the diagnosis, bring a qualified clinician with
you. Talking directly with some patients gives you a better understanding
of the clinical features, and helps you to develop a mental image of the
disease and the patients affected by it. In addition, conversations with
patients are very useful in generating hypotheses about disease aetiology
and spread. They may be able to answer some critical questions: What
were their exposures before becoming ill? What do they think caused
their illness? Do they know anyone else with the disease? Do they have
anything in common with others who have the disease?
Third, summarize the clinical features using frequency distributions. Are
the clinical features consistent with the diagnosis? Frequency distributions
of the clinical features are useful in characterizing the spectrum of illness,
verifying the diagnosis, and developing case definitions. These clinical
frequency distributions are considered so important in establishing the
credibility of the diagnosis, that they are frequently presented in the first
table of an investigation’s report or manuscript.
Step 4: Construct a working case definition
A case definition is a standard set of criteria for deciding whether
an individual should be classified as having the health condition of
interest. It includes clinical criteria and — particularly in the setting
of an outbreak investigation — restrictions by time, place, and person.
The clinical criteria should be based on simple and objective measures
such as ‘fever ≥ 40°C (101°F),’ ‘three or more loose bowel movements
per day,’ or ‘myalgias (muscle pain) severe enough to limit the patient’s
usual activities’. The case definition may be restricted by time (for
example, to persons with onset of illness within the past 2 months), by
place (for example, to residents of the nine-county area or to employees
of a particular plant) and by person (for example, to persons with no
Epidemiology-Made Easy
73
previous history of a positive tuberculin skin test, or to premenopausal
women). Whatever the criteria, they must be applied consistently to all
persons under investigation.
A case definition must not include the exposure or risk factor you are
interested in evaluating. This is a common mistake. For example, if one
of the hypotheses under consideration is that persons who worked in the
west wing were at greater risk of disease, do not define a case as ‘illness
among persons who worked in the west wing with onset between…’
Instead, define a case as ‘illness among persons who worked in the
facility with onset between…’ Then conduct the appropriate analysis to
determine whether those who worked in the west wing were at greater
risk than those who worked elsewhere.
Diagnoses may be uncertain, particularly early in an investigation. As a
result, investigators often create different categories of a case definition,
such as confirmed, probable, and possible or suspect, that allow for
uncertainty.
•
•
•
To be classified as confirmed, a case usually must have laboratory
verification.
A case classified as probable usually has typical clinical features of
the disease without laboratory confirmation.
A case classified as possible usually has fewer of the typical clinical
features.
Case Definitions
• Suspected: A case that meets the clinical case definition.
• Probable: A suspected case as defined above and or ongoing epidemic
and epidemiological link to a confirmed case.
• Confirmed: A suspected or probable case with laboratory confirmation.
In the outbreak setting, the investigators would need to specify time
and place to complete the outbreak case definition. For example, if
investigating an epidemic of meningococcal meningitis in Moyo district
Northern Uganda, the case definition might be the clinical features
with onset between January and April of that year among residents and
visitors to Moyo district.
74
Anthony K Mbonye
Classifications such as confirmed-probable-possible are helpful because
they provide flexibility to the investigators. A case might be temporarily
classified as probable or possible while laboratory results are pending.
Alternatively, a case may be permanently classified as probable or
possible if the patient’s physician decided not to order the confirmatory
laboratory test because the test is expensive, difficult to obtain, or
unnecessary. For example, while investigating an outbreak of diarrhoea
investigators usually try to identify the causative organism from stool
samples from a few afflicted persons. If the tests confirm that all of those
case-patients were infected with the same organism, the other persons
with compatible clinical illness are all presumed to be part of the same
outbreak and to be infected with the same organism.
A case definition is a tool for classifying someone as having or not having
the disease of interest, but few case definitions are 100% accurate in
their classifications. Some persons with mild illness may be missed, and
some persons with a similar but not identical illness may be included.
Generally, epidemiologists strive to ensure that a case definition includes
most if not all of the actual cases, but very few or no false-positive cases.
However, this ideal is not always met. For example, case definitions
often miss infected people who have mild or no symptoms, because they
have little reason to be tested.
Early in an investigation, investigators may use a ‘loose’ or sensitive
case definition that includes confirmed, probable, and possible cases to
characterize the extent of the problem, identify the populations affected,
and develop hypotheses about possible causes. The strategy of being
more inclusive early on is especially useful in investigations that require
travel to different hospitals, homes, or other sites to gather information,
because collecting extra data while you are there is more efficient than
having to return a second time. This illustrates an important axiom of
field epidemiology: get it while you can. Later on, when hypotheses
have come into sharper focus, the investigator may tighten the case
definition by dropping the ‘possible’ and sometimes the ‘probable’
category. In analytic epidemiology, inclusion of false-positive cases can
produce misleading results. Therefore, to test these hypotheses by using
analytic epidemiology (see Step 8), specific or tight case definitions are
recommended.
Epidemiology-Made Easy
75
Other investigations, particularly those of a newly recognized disease
or syndrome, begin with a relatively specific or narrow case definition.
For example, acquired immunodeficiency syndrome (AIDS) and severe
acute respiratory syndrome (SARS) both began with relatively specific
case definitions. This ensures that persons whose illness meets the case
definition, truly have the disease in question. As a result, investigators
could accurately characterize the typical clinical features of the illness, risk
factors for illness, and cause of the illness. After the cause was known and
diagnostic tests were developed, investigators could use the laboratory
test to learn about the true spectrum of illness, and broaden the case
definition to include those with early infection or mild symptoms.
Step 5: Find cases systematically and record information
Many outbreaks are brought to the attention of health authorities by
concerned healthcare providers or citizens. However, the cases that
prompt the concern are often only a small and unrepresentative fraction
of the total number of cases. Public health workers must therefore look
for additional cases to determine the true geographic extent of the
problem and the populations affected by it.
Usually, the first effort to identify cases is directed at healthcare
practitioners and facilities — physicians’ clinics, hospitals, and laboratories
— where a diagnosis is likely to be made. Investigators may conduct
what is sometimes called stimulated or enhanced passive surveillance, by
sending a letter describing the situation and asking for reports of similar
cases. Alternatively, they may conduct active surveillance by telephoning
or visiting the facilities to collect information on any additional cases.
In some outbreaks, public health officials may decide to alert the public
directly, usually through the local media. In other situations, the media
may have already spread the word.
If an outbreak affects a restricted population such as persons in a school,
or at a work site, and if many cases are mild or asymptomatic and
therefore undetected, a survey of the entire population is sometimes
conducted to determine the extent of infection. A questionnaire could
be distributed to determine the true occurrence of clinical symptoms,
or laboratory specimens could be collected to determine the number of
asymptomatic cases.
76
Anthony K Mbonye
Finally, investigators should ask case-patients if they know anyone else
with the same condition. Frequently, one person with an illness knows
or hears of others with the same illness.
In some investigations, investigators develop a data collection form
tailored to the specific details of that outbreak. In others, investigators
use a generic case report form. Regardless of which form is used, the
data collection form should include the following types of information
about each case.
•
•
•
•
•
Identifying information. A name, address, and telephone number
are essential if investigators need to contact patients for additional
questions, and to notify them of laboratory results and the outcome
of the investigation. Names also help in checking for duplicate
records, while the addresses allow for mapping the geographic
extent of the problem.
Demographic information. Age, sex, race, occupation, etc., provide
the person characteristics of descriptive epidemiology needed to
characterize the populations at risk.
Clinical information. Signs and symptoms allow investigators to
verify that the case definition has been met. Date of onset is needed
to chart the time course of the outbreak. Supplementary clinical
information, such as duration of illness and whether hospitalization
or death occurred, helps characterize the spectrum of illness.
Risk factor information. This information must be tailored to the
specific disease in question. For example, since food and water are
common vehicles for hepatitis A but not hepatitis B, exposure to
food and water sources must be ascertained in an outbreak of the
former but not the latter.
Source of information. The case report must include the source
of the report, usually a physician, clinic, hospital, or laboratory.
Investigators will sometimes need to contact the reporter, either to
seek additional clinical information or report back the results of the
investigation.
Traditionally, the information described above is collected on a standard
case report form, questionnaire, or data abstraction form.
Epidemiology-Made Easy
77
Step 6: Perform descriptive epidemiology
The next step after identifying and gathering basic data on the persons with
the disease, is to systematically describe some of the key characteristics
of those persons. This process, in which the outbreak is characterized
by time, place, and person, is called descriptive epidemiology. It may be
repeated several times during the course of an investigation as additional
cases are identified or as new data becomes available.
This step is critical for several reasons.
1. Summarizing data by key demographic variables provides a
comprehensive characterization of the outbreak — trends over
time, geographic distribution (place), and the populations (persons)
affected by the disease.
2. From this characterization you can identify or infer the population
at risk for the disease.
3. The characterization often provides clues about aetiology, source,
and modes of transmission that can be turned into testable
hypotheses.
4. Descriptive epidemiology describes the where and whom of the
disease, allowing you to begin intervention and prevention measures.
5. Early (and continuing) analysis of descriptive data helps you to
become familiar with those data, enabling you to identify and correct
errors and missing values.
Epidemic Curves
An epidemic curve shows the frequency of new cases over time, based
on the date of onset of a particular disease. The shape of the curve in
relation to the incubation period for a particular disease can give clues
about the source. Thus, there are three types of epidemic curves:
a) Point source outbreaks (epidemics) involve a common source, such as
contaminated food or an infected food handler, and all the exposures
tend to occur in a relatively brief period. Consequently, point source
outbreaks tend to have epidemic curves with a rapid increase in cases
followed by a somewhat slower decline, and all of the cases tend to fall
within one incubation period. In a point source epidemic of hepatitis A,
you would expect the rise and fall of new cases to occur within about a
30 day span of time, which is what is seen in the graph below.
78
Anthony K Mbonye
Figure 6.1: A point source epidemic of hepatitis A
Figure 6.1: A point source epidemic of hepatitis A
Source: LaMorte, 2007
Figure 6.2: An epidemic curve of cholera outbreak in the Broad Street area of London in
b) Continuous common source epidemics may also rise to a peak and then
1854.fall, but the cases do not all occur within the span of a single incubation
period. This implies that there is an ongoing source of contamination.
The down slope of the curve may be very sharp if the common source
is removed or gradual if the outbreak is allowed to exhaust itself. The
epidemic curve, figure 6.2 below is from the cholera outbreak in the
Broad Street area of London in 1854 that was investigated by Dr. John
Snow. Cholera has an incubation period of 1-3 days, and even though
residents began to flee when the outbreak erupted, you can see that this
outbreak lasted for more than a single incubation period. This suggests
an ongoing source of infection, in this case the Broad Street water pump.
Epidemiology-Made Easy
79
Figure 6.2: An epidemic curve of cholera outbreak in the Broad Street area of London in
1854.
Figure 6.2: An epidemic curve of cholera outbreak in the Broad Street area of
London in 1854.
Source: Snow, 1936
c) Propagated (or progressive source) epidemic. The epidemic curve, figure
6.3 shown below is from an outbreak of measles that began with a single
index case which infected a number of other individuals (The incubation
period for measles averages 10 days with a range of 7-18 days.) One
or more of the people infected in the initial wave infected a group of
people who become the second wave of infection. The transmission was
from person-to-person, rather than from a common source. Propagated
epidemic curves usually have a series of successively larger peaks, which
are one incubation period apart. The successive waves tend to involve
more and more people, until the pool of susceptible people is exhausted
or control measures are implemented. This is an ideal example, however;
in reality, most of these epidemics do not produce the classic pattern.
80
Anthony K Mbonye
!"#$%&'( )*+'',- '&."/&0"1'1$%2&'34'5' 0&567&6'3$89%&5:'
Figure 6.3: An epidemic curve of a measles outbreak
'
!
Source: LaMorte, 2007
For );some
the descriptive information
is all
needed to
!"#$%&'(
+'',- outbreaks,
'3$89%&5:'34'<5703-&775'8=58
' 311$%%&
/'"-'
,9 that
"0' "-'is >??@)'
figure out the source, and control measures can be undertaken rapidly.
In other cases, this descriptive information (person, place, and time)
helps generate hypotheses about the source, but it isn’t obvious what
the source is. When this occurs, it is necessary to test the hypotheses
by conducting an analytical study, i.e., either a case-control study or a
cohort study. This means collecting data and analyzing it in order to
identify the source. However, it is important to recognize that you can’t
test a hypothesis unless you have one to test. So, the descriptive studies
that generate hypotheses are essential.
Practical session
The graph 6.4 below shows the epidemic curve for a Salmonella outbreak
that occurred in Abim in 2009. Salmonella generally has an incubation
period of about 1-3 days. What kind of epidemic curve is this? What is
your justification?
Epidemiology-Made Easy
81
!
'
!"#$%&'( ); +'',- '3$89%&5:'34'<5703-&775'8=58'311$%%&
/'"-' ,9 "0' "-' >??@)'
Figure 6.4: An outbreak of Salmonella that occurred in Abim in 2009.
Source: MOH, 2009
Usefulness of epidemic curves
Epidemic curves are a basic investigative tool because they are so
informative. The epi-curve shows the magnitude of the epidemic over
time as a simple, easily understood visual. It permits the investigator to
distinguish an epidemic from an endemic disease. Potentially correlated
events can be noted on the graph.
• The shape of the epidemic curve may provide clues about the pattern
of spread in the population, e.g., point versus intermittent source
versus propagated.
• The curve shows where you are in the course of the epidemic — still
on the upswing, on the down slope, or after the epidemic has ended.
This information forms the basis for predicting whether more or
fewer cases will occur in the near future.
• The curve can be used for evaluation, answering questions like: how
long did it take for the health department to identify a problem? Are
intervention measures working?
• Outliers — cases that don’t fit into the body of the curve —may
provide important clues.
• If the disease and its incubation period are known, the epi-curve can
be used to deduce a probable time of exposure and help develop a
questionnaire focused on that time period.
82
Anthony K Mbonye
Drawing an epidemic curve. To draw an epidemic curve, you first must
know the time of onset of illness for each case. For some diseases, date of
onset is sufficient. For other diseases, particularly those with a relatively
short incubation period, hour of onset may be more suitable.
Occasionally, you may be asked to draw an epidemic curve when you
don’t know either the disease or its incubation time. In that situation, it
may be useful to draw several epidemic curves with different units on
the x-axis to find one that best portrays the data.
Interpreting an epidemic curve. The first step in interpreting an epidemic
curve is to consider its overall shape. The shape of the epidemic curve
is determined by the epidemic pattern (for example, common source
versus propagated), the period of time over which susceptible persons
are exposed, and the minimum, average, and maximum incubation
periods for the disease.
An epidemic curve that has a steep upslope and a more gradual down
slope (a so-called log-normal curve) is characteristic of a point-source
epidemic, in which persons are exposed to the same source over a
relative brief period. In fact, any sudden rise in the number of cases
suggests sudden exposure to a common source.
In a point-source epidemic, all the cases occur within one incubation
period. If the duration of exposure is prolonged, the epidemic is called
a continuous common-source epidemic, and the epidemic curve has a
plateau instead of a peak. An intermittent common-source epidemic
(in which exposure to the causative agent is sporadic over time)
usually produces an irregularly jagged epidemic curve reflecting the
intermittence and duration of exposure and the number of persons
exposed. In theory, a propagated epidemic — one spread from personto-person with increasing numbers of cases in each generation — should
have a series of progressively taller peaks one incubation period apart,
but in reality few produce this classic pattern.
The cases that stand apart may be just as informative as the overall
pattern. An early case may represent a background or unrelated case, a
source of the epidemic, or a person who was exposed earlier than most of
the cases (for example, the cook who tasted a dish hours before bringing
Epidemiology-Made Easy
83
it to the big picnic). Similarly, late cases may represent unrelated cases,
cases with long incubation periods, secondary cases, or persons exposed
later than most others (for example, someone eating leftovers). On
the other hand, these outlying cases sometimes represent miscoded
or erroneous data. All outliers are worth examining carefully because
if they are part of the outbreak, they may have an easily identifiable
exposure that may point directly to the source.
In a point-source epidemic of a known disease with a known incubation
period, the epidemic curve can be used to identify a likely period of
exposure. Knowing the likely period of exposure allows you to ask
questions about the appropriate period of time so you can identify the
source of the epidemic.
To identify the likely period of exposure from an epidemic curve of an
apparent point source epidemic:
1. Look up the average and minimum incubation periods of the disease.
This information can be found on disease fact sheets available on the
Internet or in the Control of Communicable Diseases Manual.
2. Identify the peak of the outbreak or the median case and count back
on the x-axis one average incubation period. Note the date.
3. Start at the earliest case of the epidemic and count back the minimum
incubation period, and note this date as well.
Ideally, the two dates will be similar, and represent the probable period
of exposure. Since this technique is not precise, widen the probable
period of exposure by, say 20% to 50% on either side of these dates, and
then ask about exposures during this widened period in an attempt to
identify the source.
In a similar fashion, if the time of exposure and the times of onset of
illness are known but the cause has not yet been identified, the incubation
period can be estimated from the epidemic curve. Subtract the time
of onset of the earliest cases from the time of exposure to estimate
the minimum incubation period. Then subtract the time of onset of
the median case from the time of exposure to estimate the median
incubation period. These incubation periods can be compared with a
list of incubation periods of known diseases to narrow the possibilities.
84
Anthony K Mbonye
Step 7: Develop a hypothesis
The next step in an investigation is formulating hypotheses, and in
reality, investigators usually begin to generate hypotheses at the time
of the initial telephone call. Depending on the outbreak, the hypotheses
may address the source of the agent, the mode (and vehicle or vector) of
transmission, and the exposures that caused the disease. The hypotheses
should be testable, since evaluating hypotheses is the next step in the
investigation. In an outbreak context, hypotheses are generated in a
variety of ways. First, consider what you know about the disease itself:
What is the agent’s usual reservoir? How is it usually transmitted? What
vehicles are commonly implicated? What are the known risk factors?
In other words, by being familiar with the disease, you can, at the very
least, ‘round up the usual suspects.’
Another useful way to generate hypotheses is to talk to a few of the
case-patients, as discussed in Step 3. The conversations about possible
exposures should be open-ended and wide-ranging, not necessarily
confined to the known sources and vehicles. In some challenging
investigations that yielded few clues, investigators have convened a
meeting of several case-patients to search for common exposures.
In addition, investigators have sometimes found it useful to visit the
homes of case-patients, and look through their refrigerators and shelves
for clues to an apparent foodborne outbreak.
Just as case-patients may have important insights into causes, so too
may the local health department staff. The local staff know the people
in the community and their practices, and often have hypotheses based
on their knowledge.
The descriptive epidemiology may provide useful clues that can be
turned into hypotheses. If the epidemic curve points to a narrow period
of exposure, what events occurred around that time? Why do the
people living in one particular area have the highest attack rate? Why
are some groups with particular age, sex, or other person characteristics
at greater risk than other groups with different person characteristics?
Such questions about the data may lead to hypotheses that can be tested
by appropriate analytic techniques.
Epidemiology-Made Easy
85
Given recent concerns about bioterrorism, investigators should
consider intentional dissemination of an infectious or chemical agent
when trying to determine the cause of an outbreak.
Epidemiological clues to possible bioterrorism
1. Single case of disease caused by an uncommon agent (e.g., glanders,
smallpox, viral haemorrhagic fever, inhalational or cutaneous
anthrax) without adequate epidemiologic explanation
2. Unusual, atypical, genetically engineered strain of an agent (or
antibiotic-resistance pattern)
3. Higher morbidity and mortality in association with a common
disease or syndrome or failure of such patients to respond to usual
therapy
4. Unusual disease presentation (e.g., inhalational anthrax or
pneumonic plague)
5. Disease with an unusual geographic or seasonal distribution (e.g.,
influenza in the summer)
6. Stable endemic disease with an unexplained increase in incidence
(e.g., tularemia, plague)
7. Atypical disease transmission through aerosols, food, or water,
in a mode suggesting deliberate sabotage (i.e., no other physical
explanation)
8. No illness in persons who are not exposed to common ventilation
systems (have separate closed ventilation systems), when illness is seen
in persons in close proximity who have a common ventilation system
9. Several unusual or unexplained diseases coexisting in the same
patient without any other explanation
10. Unusual illness that affects a large population (e.g., respiratory
disease in a large population may suggest exposure to an inhalational
pathogen or chemical agent)
11. Illness that is unusual (or atypical) for a given population or age
group (e.g., outbreak of measles-like rash in adults)
12. Unusual pattern of death or illness among animals (which may be
unexplained or attributed to an agent of bioterrorism) that precedes
or accompanies illness or death in humans
13. Unusual pattern of death or illness among humans (which may be
unexplained or attributed to an agent of bioterrorism) that precedes
or accompanies illness or death in animals
86
Anthony K Mbonye
14. Ill persons who seek treatment at about the same time (point source
with compressed epidemic curve)
15. Similar genetic type among agents isolated from temporally or
spatially distinct sources
16. Simultaneous clusters of similar illness in non-contiguous areas,
domestic or foreign
17. Large number of cases of unexplained diseases or deaths
Step 8: Evaluate hypotheses epidemiologically
After a hypothesis that might explain an outbreak has been developed,
the next step is to evaluate the plausibility of that hypothesis. Typically,
hypotheses in a field investigation are evaluated using a combination of
environmental evidence, laboratory science, and epidemiology. From
an epidemiologic point of view, hypotheses are evaluated in one of two
ways: either by comparing the hypotheses with the established facts; or
by using analytic epidemiology to quantify relationships and assess the
role of chance.
The first method is likely to be used when the clinical, laboratory,
environmental, and/or epidemiologic evidence so obviously supports the
hypotheses that formal hypothesis testing is unnecessary. For example,
in an outbreak of hypervitaminosis D that occurred in Massachusetts
in 1991, investigators found that all of the case-patients drank milk
delivered to their homes by a local dairy. Therefore, investigators
hypothesized that the dairy was the source and the milk was the vehicle.
When they visited the dairy, they quickly recognized that the dairy was
inadvertently adding far more than the recommended dose of vitamin
D to the milk. No analytic epidemiology was really necessary to evaluate
the basic hypothesis in this setting or to implement appropriate control
measures, although investigators did conduct additional studies to
identify additional risk factors.
In many other investigations, however, the circumstances are not as
straightforward, and information from the series of cases is not sufficiently
compelling or convincing. In such investigations, epidemiologists use
analytic epidemiology to test their hypotheses. The key feature of analytic
epidemiology is a comparison group. The comparison group allows
epidemiologists to compare the observed pattern among case-patients
Epidemiology-Made Easy
87
or a group of exposed persons with the expected pattern among no
cases or unexposed persons. By comparing the observed with expected
patterns, epidemiologists can determine whether the observed pattern
differs substantially from what should be expected and, if so, by what
degree. In other words, epidemiologists can use analytic epidemiology
with its hallmark comparison group to quantify relationships between
exposures and disease, and to test hypotheses about causal relationships.
The two most common types of analytic epidemiology studies used in
field investigations are retrospective cohort studies and case-control
studies, as described in the previous Lecture Series.
Retrospective cohort studies
A retrospective cohort study is the study of choice for an outbreak in a
small, well-defined population, such as an outbreak of gastroenteritis
among wedding guests for which a complete list of guests is available.
In a cohort study, the investigator contacts each member of the defined
population (e.g., wedding guests), determines each person’s exposure
to possible sources and vehicles (e.g., what food and drinks each guest
consumed), and notes whether the person later became ill with the
disease in question (e.g., gastroenteritis).
After collecting similar data from each attendee, the investigator
calculates an attack rate for those exposed to (e.g., who ate) a particular
item and an attack rate for those who were not exposed. Generally,
an exposure that has the following three characteristics or criteria is
considered a strong suspect:
1. The attack rate is high among those exposed to the item.
2. The attack rate is low among those not exposed, so the difference or
ratio between attack rates is high.
3. Most of the case-patients were exposed to the item, so that the
exposure could ‘explain’ or account for most, if not all, of the cases.
Commonly, the investigator compares the attack rate in the exposed
group to the attack rate in the unexposed group to measure the association
between the exposure (e.g., the food item) and disease. This is called the
risk ratio. When the attack rate for the exposed group is the same as the
attack rate for the unexposed group, the relative risk is equal to 1.0, and
88
Anthony K Mbonye
the exposure is said not to be associated with disease. The greater the
difference in attack rates between the exposed and unexposed groups,
the larger the relative risk, and the stronger the association between
exposure and disease.
Case-control studies
A cohort study is feasible only when the population is well defined and can
be followed over a period of time. However, in many outbreak settings, the
population is not well-defined and the speed of investigation is important.
In such settings, the case-control study becomes the study design of choice.
In a case-control study, the investigator asks both case-patients and a
comparison group of persons without disease (‘controls’) about their
exposures. Using the information about disease and exposure status, the
investigator then calculates an odds ratio to quantify the relationship
between exposure and disease. Finally, a p-value or confidence interval
is calculated to assess statistical significance.
Step 9: Refine and re-evaluate the hypothesis (s)
Unfortunately, analytic studies sometimes are unrevealing. This is
particularly true if the hypotheses were not well founded at the outset.
In field epidemiology, if you cannot generate good hypotheses (for
example, by talking to some case-patients or local staff and examining
the descriptive epidemiology and outliers), then proceeding to analytic
epidemiology, such as a case-control study, is likely to be a waste of time.
When analytic epidemiology is unrevealing, rethink your hypotheses.
Consider convening a meeting of the case-patients to look for common
links or visit their homes to look at the products on their shelves.
Consider new vehicles or modes of transmission.
Even when an analytic study identifies an association between an
exposure and disease, the hypothesis may need to be honed.
Sometimes a more specific control group is needed to test a more specific
hypothesis. For example, in many hospital outbreaks, investigators use an
initial study to narrow their focus. They then conduct a second study, with
more closely matched controls, to identify a more specific exposure or vehicle.
Epidemiology-Made Easy
89
Finally, recall that one reason to investigate outbreaks is research. An
outbreak may provide an ‘natural experiment’ that would be unethical
to set up deliberately, but from which the scientific community can
learn when it does happen to occur.
When an outbreak occurs, whether it is routine or unusual, consider
what questions remain unanswered about that particular disease and
what kind of study you might do in this setting to answer some of those
questions. The circumstances may allow you to learn more about the
disease, its modes of transmission, the characteristics of the agent, host
factors, and the like.
Step 10: Compare and reconcile with laboratory and environmental studies
While epidemiology can implicate vehicles and guide appropriate
public health action, laboratory evidence can confirm the findings.
Environmental studies are equally important in some settings, and they
are often helpful in explaining why an outbreak occurred. While you
may not be an expert in these other areas, you can help. Use a camera to
photograph the environmental conditions. Then, coordinate with the
laboratory, and bring back physical evidence to be analysed.
Step 11: Implement control and prevention measures
In most outbreak investigations, the primary goal is control of
the outbreak and prevention of additional cases. Indeed, although
implementing control and prevention measures is listed towards the end
of the conceptual sequence, in practice control and prevention activities
should be implemented as early as possible. The health department’s first
responsibility is to protect the public’s health, so if appropriate control
measures are known and available, they should be initiated even before
an epidemiologic investigation is launched. For example, a child with measles
in a community with other susceptible children may prompt a vaccination
campaign before an investigation of how that child became infected.
Confidentiality is an important issue in implementing control measures.
Healthcare workers need to be aware of the confidentiality issues relevant
to collection, management and sharing of data. If patient information
is disclosed to unauthorized persons without the patient’s permission,
90
Anthony K Mbonye
the patient may be stigmatized or experience rejection from family and
friends, lose a job, or be evicted from housing. Moreover, the healthcare
worker may lose the trust of the patient, which can affect adherence
to treatment. Therefore, confidentiality — the responsibility to protect
a patient’s private information—is critical in disease control and many
other situations.
In general, control measures are usually directed against one or
more segments in the chain of transmission (agent, source, mode
of transmission, portal of entry, or host), that are susceptible to
intervention. For some diseases, the most appropriate intervention
may be directed at controlling or eliminating the agent at its source.
A patient with a communicable disease such as tuberculosis, whether
symptomatic or asymptomatic, may be treated with antibiotics both to
clear the infection and to reduce the risk of transmission to others. For
an environmental toxin or infectious agent that resides in soil, the soil
may be decontaminated or covered to prevent escape of the agent.
Some interventions are aimed at blocking the mode of transmission.
Interruption of direct transmission may be accomplished by isolation
of someone with infection, or counselling persons to avoid the specific
type of contact associated with transmission. Similarly, to control an
outbreak of influenza-like illness in a nursing home, affected residents
could be quarantined, that is, put together in a separate area to prevent
transmission to others. Vehicle borne transmission may be interrupted
by elimination or decontamination of the vehicle. For example,
contaminated foods should be discarded, and surgical equipment is
routinely sterilized to prevent transmission. Efforts to prevent faecaloral transmission often focus on rearranging the environment to reduce
the risk of contamination in the future and on changing behaviours,
such as promoting hand washing. For airborne diseases, strategies may
be directed at modifying ventilation or air pressure, and filtering or
treating the air. To interrupt vector borne transmission, measures may
be directed toward controlling the vector population, such as spraying
to reduce the mosquito population.
Some simple and effective strategies protect portals of entry. For
example, bed nets are used to protect sleeping persons from being bitten
by mosquitoes that may transmit malaria.
Epidemiology-Made Easy
91
Some interventions aim to increase a host’s defences. Vaccinations
promote development of specific antibodies that protect against infection.
Similarly, prophylactic use of antimalarial drugs, recommended for
visitors to malaria-endemic areas, does not prevent exposure through
mosquito bites but does prevent infection from taking root.
Step 12: Initiate or maintain surveillance
Once control and prevention measures have been implemented, they
must continue to be monitored. If surveillance has not been ongoing,
now is the time to initiate active surveillance. If active surveillance was
initiated as part of case finding efforts, it should be continued. The
reasons for conducting active surveillance at this time are twofold. First,
you must continue to monitor the situation and determine whether the
prevention and control measures are working. Is the number of new
cases going down? Or are new cases continuing to occur? If so, where
are the new cases? Are they occurring throughout the area, indicating
that the interventions are generally ineffective, or are they occurring
only in pockets, indicating that the interventions may be effective but
that some areas were missed?
Second, you need to know whether the outbreak has spread outside its
original area or the area where the interventions were targeted. If so,
effective disease control and prevention measures must be implemented
in these new areas.
Step 13: Communicate the findings
Development of a communications plan and communicating what
is needed with those who need to know during the investigation, is
critical. The final task is to summarize the investigation, its findings, and
the outcome in a report; and to communicate this report in an effective
manner. This communication usually takes two forms:
1. An oral briefing for local authorities. If the field investigator is
responsible for the epidemiology but not disease control, then the
oral briefing should be attended by the local health authorities
and persons responsible for implementing control and prevention
measures. Often these persons are not epidemiologists, so findings
must be presented in a clear and convincing fashion with appropriate
92
Anthony K Mbonye
and justifiable recommendations for action. The presentation is an
opportunity for the investigators to describe what they did, what
they found, and what they think should be done about it. They should
present their findings in a scientifically objective fashion, and they
should be able to defend their conclusions and recommendations.
2. A written report. Investigators should also prepare a written report
that follows the usual scientific format of introduction, background,
methods, results, discussion, and recommendations. By formally
presenting recommendations, the report provides a basis for
action. It also serves as a record of performance and a document for
potential legal issues, as well as a reference if the health department
encounters a similar situation in the future. Finally, a report that
finds its way into the public health literature serves the broader
purpose of contributing to the knowledge base of epidemiology and
public health.
Questions to stimulate further reading:
1. Discuss two epidemiological study designs helpful in diseases
outbreak investigations.
2. Discuss three important attributes of an epidemic curve.
3. With examples, show how good leadership and effective
communication are important in controlling disease
outbreaks.
Practical Session
1. This year in September 2020, there were heavy rains in Bududa
district leading to landslides, displacing many households from the
mountain slopes down to the flooded flat lands. Roads were made
impassable and food crops were destroyed. Makeshift camps were
set up for the displaced population, while relatives and neighbours
donated food stuffs. Within a week, pregnant women and children
aged< 5 years were reported to be particularly affected by a disease
presenting with fever, headaches and abdominal pains. There were
also frequent episodes of diarrhoea cases.
a) Identify and discuss the role of each relevant sector that you think
can contribute to the mitigation of the effects of this disaster.
Epidemiology-Made Easy
93
b) With examples show how intersectoral collaboration is necessary
to control disease outbreaks and disaster situations.
c) What disease outbreak do you suspect is attacking the pregnant
women and children?
d) List the key steps in investigating the outbreak and suggest
possible control measures.
2. As a surveillance officer attached to Gulu district, you have been
notified by the district health officer that there is a strange disease in
the community that has killed two people. It is mentioned that they
both presented with bleeding tendencies before they died.
a) List the steps that you are going to investigate the diseases outbreak.
b) After investigations, preliminary data shows that more people
have been reported sick with fever, cough, loss of appetite, extreme
weakness of body parts, and bleeding tendencies. More data shows
that people living in households who ate bush meat were more likely
presenting such symptoms and illnesses.
c) In total, 100 patients had been reported at local health units
(60 from households who had eaten bush meat and 40 had no
history of exposure to bush meat); meanwhile, 100 people in the
neighborhood who had not presented with any illnesses (20 with
who ate bush meat and 80 who didn’t eat bush meat) were included
in the epidemiological analyses.
Describe the type of study design mentioned above and its relevancy
to understand disease epidemiology.
a) Using a 2 x 2 table, calculate the relative risk and the odds ratio.
b) How do you interpret the relative risk and the Odds ratio?
c) Using a case definition for Ebola Hemorrhagic Fever, out of
150 confirmed cases, 120 patients died within 5 days of diagnosis.
Calculate the case fatality rate of the disease.
3. In a disease outbreak investigation, line-listing is an essential step.
Describe the type of variables usually recoded and explain how this
technique helps in understudying the epidemic.
94
Anthony K Mbonye
Table 6.0: Answers to Question 2
Exposure factor
Cases
Controls
Total
Ate bush meat
a
60
b
20
80
Did not eat bush
meat
c
40
d
80
120
Total
100
100
200
Relative risk = (a/a+c)/ (b/b+d) = (60/100)/(20/100) = 0.6/0.2 = 3.0
Odds ratio = ad/bc=60*80/40*20=4,800/800=6
Bibliography
1. Becker KM, Moe CL, Southwick KL, MacCormack JN.
‘Transmission of Norwalk virus during a football game. N Engl.’ J
Med 2000;343;1223–7.
2. Dicker R C, Coronado F, Koo D, & Parrish R G. Principles of
epidemiology in public health practice; an introduction to applied
epidemiology and biostatistics. 2006.
3. Heyman DL, ed. Control of communicable diseases manual, 18th ed.
Washington, DC: American Public Health Association, 2004.
4. Klee AL, Maldin B, Edwin B, IPoshni I, Mostashari F, Fine A, et al.
‘Long-term prognosis for clinical West Nile Virus infection’. Emerg
Infect Dis 2004;10:1405–11.
5. PAHO. ‘Case definitions: meningococcal disease and viral
meningitis.’ Epidemiol Bull 2001;22(4):14–6.
6. Snow J. Snow on cholera. London: Humphrey Milford: Oxford U
Press, 1936.
7. Torok TJ, Tauxe RV, Wise RP, Livengood JR, Sokolow R, Mauvais
S, et al. ‘A large community outbreak of salmonellosis caused
by intentional contamination of restaurant salad bars’. JAMA
1997;278:389–95.
8. Treadwell TA, Koo D, Kuker K, Khan AS. ‘Epidemiologic clues to
bioterrorism’. Public Health Reports 2003; 118:92–8.
9. Wayne W LaMorte. ‘Descriptive Epidemiology’. Boston University
School of Public health. May 5, 2017.
Epidemiology-Made Easy
95
Lecture Notes-Series Seven
Criteria for judging a good
research report.
Lecture Outline
1.
2.
3.
4.
Background
Criteria for judging a good research report
Other techniques to evaluate research findings
Practical Session
Expectations
After reading this Lecture Series, you should clearly understand the
steps to take to critically review a research report. It is important for
you to identify one research report published in a peer reviewed journal
and attempt to use the criteria to evaluate the findings. In this way, you
will thus master skills to interpret research study findings and be able
to make decisions on how to use them.
7.1 Background
Policy makers, program managers, students, and researchers often
encounter numerous research reports that they seek to extract
knowledge and insights from. Policy makers and program managers
usually want compelling evidence on which to base to review existing
policies and change interventions. They have two vested interests
that are paramount to the health of the population: efficacy and costeffectiveness. These help them when confronting politicians on
accountability issues or while requesting budgets to fund new polies and
new interventions. While researchers and students, on the other hand,
have vested interests in expanding the knowledge base and generating
evidence around new treatments and interventions.
In both worlds, compelling evidence is required and scrutiny of a
research report becomes very handy. Below are key steps to scrutinise
and evaluate a research report.
96
Anthony K Mbonye
7.2 Criteria for judging a good research report
•
•
•
•
•
•
•
•
•
•
•
•
•
•
When was the work published?
Where was it published?
Are the qualifications of the authors appropriate?
Is the purpose of the study/objectives clearly stated?
Are methods/experimental design clearly described and appropriate?
Have all possible influences/confounders on the findings been
identified and controls instituted?
Has the sample been appropriately selected?
Has the reliability of the scoring been appropriately set?
Are the comparisons between groups appropriate?
Is the investigation of sufficient duration?
Is the statistical analysis appropriate to answer the research questions
or hypotheses?
Have the research questions or hypotheses been answered?
Do the interpretations and conclusions logically follow the
experimental findings?
Is there a scientific basis for recommending a new therapy?
7.3 Other techniques to evaluate research findings
Ideally the above steps can be summarised into three critical areas:
1. Study conceptualisation: Why was the study conceptualised? Has a
thorough enough literature search been undertaken, to highlight
any major gaps in the exiting knowledge or limitations with existing
interventions or new treatments? Are the objectives clearly stated?
Are they measurable? Are the research questions/hypothesis (s)
posed in a way that ensures they can be answered?
2. Methods: Is the research design selected appropriate? Are study
subjects appropriately selected? Have the sample sizes been
calculated well? Is the implementation plan adequate? Have the data
been analysed appropriately to capture the major the findings?
3. Results and conclusions: Have the data been summarised to show the
major findings? Have the hypothesis(s) been tested and research
questions answered? Has there been a comparison with previous
findings? Have the strengths and limitations of the study been
discussed; and further areas for research proposed? Have policy
implications been presented? Are conclusions supported by the data?
Epidemiology-Made Easy
97
Practical Session
1. With examples, discuss the type of research study design yielding
results which are likely to convince policy makers, program
managers and practioners to change to a new treatment of a disease?
2. If you were to design a behavioural intervention to promote early
seeking of routine screening for cancer of cervix, what study design
would you use and why?
3. Social science research is important in disease prevention and control.
Describe with examples the relevancy of the above statement.
4. Retrieve the paper, Mbonye AK, Neema S, Magnussen P.
‘Treatment-seeking practices for malaria prevention in pregnancy
among rural women in Mukono district, Uganda’. J Biosoc Sci. 2006
Mar; 38(2):221-37.
a) Comment on the study design.
b) Basing on the major study findings, propose interventions to
prevent malaria in pregnancy.
c) Discuss the possible study designs to test and evaluate the above
interventions.
Bibliography
1. Craig P, Dieppe P, Macintyre S, Michie S, Nazareth I, and Petticrew,
M. ‘Developing and evaluating complex interventions: the new
Medical Research Council guidance’. BMJ. 2008 Sep 29;337.
2. Drummond MF, Jefferson TO. ‘Guidelines for authors and peer
reviewers of economic submissions to the BMJ’. Bmj. 1996 Aug
3;313(7052):275-83.
3. Kmet Leanne, M., Cook, L. S., and Lee, R. C. ‘Standard quality
assessment criteria for evaluating primary research papers from a
variety of fields.’ (2004).
4. McCann AL, Schneiderman ED. ‘Using research for clinical
decision-making: Evaluating a research report’. J Contemp Dent Pract.
2002 May 15;3(2):48-60.
98
Anthony K Mbonye
LECTURE NOTES
EPIDEMIOLOGY
MADE EASY
Lecture Notes on Epidemiology-Made Easy presents a practical approach
to understanding epidemiology techniques for disease prevention. It is a work
filled with practical insights accumulated from over 20 years of teaching and
professional experience, while controlling infectious and non-infectious
diseases in Uganda. The book is a hand tool for students, lectures, policy
makers, program managers, and social workers that confront diseases and
health issues daily and would like a quick reference guide with facts on which
to base their decisions.
It is arranged in a such a way that the theoretical part is presented
followed by questions and a practical session to stimulate critical thinking. In
this way, it helps to invigorate the practice of alternative thinking, but most
importantly it encourages discussions, preferably in teams to gain consensus
and facilitate problem solving.
Finally, for each topic, worked examples are presented. This is to make
it easy for a student or a researcher to hone their practical skills. It is hoped
that after going through the practical sessions, skills for aiding in
epidemiological research and practice will be developed and mastered.
The reader is encouraged to read more about basic and applied
epidemiology, of which there is a lot of literature, but can also benefit from
reading other books written by the author:
1. Uganda’s Health Sector through Turbulent Politics (1958-2018), 2018
2. How to get a Research Grant, Publish and Inluence Policy, 2019
3. Religion, Politics and the Health System in Uganda, 2020
4. Lecture Notes on Health Systems, Policy and Maternal Health in Uganda, 2021
Anthony K Mbonye (PhD, FRCP)
Professor, School of Public Health, College of
Health Sciences, Makerere University & Professor,
Department of Maternal Child Health, Save The
Mothers Programme, Uganda Christian University.
0
You can add this document to your study collection(s)
Sign in Available only to authorized usersYou can add this document to your saved list
Sign in Available only to authorized users(For complaints, use another form )