Epidemiology Made Easy: Lecture Notes by Anthony K Mbonye

LECTURE NOTES EPIDEMIOLOGY MADE EASY Anthony K Mbonye (PhD, FRCP) LECTURE NOTES Epidemiology-Made Easy Anthony K Mbonye (PhD, FRCP) Professor, School of Public Health, College of Health Sciences, Makerere University & Professor, Department of Maternal and Child Health, Save The Mothers Programme, Uganda Christian University. Epidemiology-Made Easy 1 Published by Anthony K. Mbonye Tel: +256-772411668 E-mail: akmbonye@musph.ac.ug akmbonye@yahoo.com P. O. Box 11853 Kampala, Uganda. © Mbonye K. Anthony 2021 ISBN : 978-9970-9922-7-0 First Edition 2021 All rights reserved. No part of this publication may be copied or reproduced in any form or by any means electronic, photocopying, or otherwise without the prior written permission of the publisher. This book can be cited as follows: Mbonye AK. Lecture Notes on Epidemiology-Made Easy. P.O Box 11853, Kampala Uganda, 2021. Typesetting and production by Print Farm FZE, Dubai. sales@printfarmdxb.com 2 Anthony K Mbonye Table of Contents Table of Contents ...........................................................................................................................03 Preface.................................................................................................................................................. 04 Lecture Series One: Epidemiology-definitions and Concepts................................... 05 Lecture Series Two: Epidemiology and Research.......................................................... 13 Lecture Series Three: Analytical Epidemiology ........................................................... 21 Lecture Series Four: Experimental Epidemiology......................................................... 35 Lecture Series Five: Measurement and Reporting of Outcomes ............................ 45 Lecture Series Six: Practical Steps in investigating a disease outbreak.............. 68 Lecture Series Seven: Criteria for judging a good research report ...................... 96 Epidemiology-Made Easy 3 Preface This Lecture Series book was prompted by the need for a clear, well organized and structured way for readers, especially students, to understand epidemiology and how to apply it in disease prevention and control. Through my experience teaching undergraduate and postgraduate studies, I found that many students studying epidemiology and biostatistics lack strong foundations in mathematics and statistics. Thus, for a long time I have been interested in making epidemiology easier and more accessible to such an audience, by approaching it from the perspective of daily experiences, where people encounter diseases and health issues. This book is meant to be a concise guide for nurses, medical and paramedical students, policy makers, program managers, and social workers. This lecture series was developed out of my experience as a lecturer, health professional and a policy maker, who has contributed to the control of infectious and non-infectious diseases for over two decades. The lecture notes are divided into two sections: the theoretical and the practical sessions. The practical component includes questions and activity sections designed to facilitate active participation and acquisition of practical, analytical and critical thinking skills in disease control and prevention. The exercises also expose the reader to real life situations, preparing them for what lies ahead. I hope this book gives the readers adequate knowledge and skills to confront the high disease burden in Uganda. 4 Anthony K Mbonye Lecture Notes - Series One Epidemiology Definitions and Concepts Lecture Outline 1. 2. 3. 4. 5. Definitions and concepts of Epidemiology Descriptive studies Case reports Cross-sectional studies Surveillance Expectations After reading this Lecture Series, it is expected that you should clearly understand the definitions and concepts of basic epidemiology and the different types of descriptive studies. You will be introduced to a practical session that you are encouraged to do. Through this learning, you should master how to interpret research reports based on basic epidemiology techniques and how to use these in public health and control of infections. 1.1 What is Epidemiology? Epidemiology has been defined as ‘the study of the distribution and determinants of health related states or events in specific populations, and the application of these data in the control of health problems’ (Last, 1988). This definition emphasizes that epidemiologists are concerned not only with death, illness and disability, but also with more positive health states and with the means to improve health. The target of a study in epidemiology is a human population. A population can be defined in geographical or other terms; for example, a specific group of hospital patients or factory workers could be the unit of study. The most common population used in epidemiology is that which exists in a given area or country at a given time. This forms the basis for defining subgroups with respect to sex, age group, ethnicity, and so on. The structures of populations vary between geographical areas and Epidemiology-Made Easy 5 time periods. Epidemiological analysis has to take such variations into account. 1.2 Concepts of Epidemiology Epidemiology has its origins in ideas first recorded over 2,000 years ago by Hippocrates and other prominent thinkers of antiquity, who realised that environmental factors can influence the occurrence of disease. However, it was not until the nineteenth century that the distribution of disease in specific human population groups was measured to any great extent. This work marked not only the beginning of epidemiology, but also some of its most spectacular achievements. For example, the findings of John Snow that the risk of cholera in London was related, among other things, to the drinking of water supplied by a particular company. Snow’s epidemiological studies were one aspect of a wide-ranging series of investigations that involved an examination of physical, chemical, biological, sociological, and political processes (Cameron and Jones 1983). Snow located the home of each person who died from cholera in London during 1848–49 and 1853–54, and noted an apparent association between the source of drinking water and the deaths. He prepared a statistical comparison of cholera deaths in districts with different water supplies, and thereby showed that both the number of deaths and, more importantly, the mortality rate were high among people supplied by the Southwark company. On the basis of his meticulous research, Snow constructed a theory about the communication of infectious diseases in general, and suggested that cholera was spread by contaminated water. He was thus able to encourage improvements in the water supply long before the discovery of the organism responsible for cholera; his research had a direct impact on public health policy. 1.3 Uses of Epidemiology In the broad field of public health, epidemiology is used in a number of ways. Early studies in epidemiology were concerned with the causes (aetiology) of communicable diseases and such work remains, essential since it can lead to the identification of prevention methods. In this sense, epidemiology is a basic medical science with the goal of improving the health of a population. 6 Anthony K Mbonye The causation of some diseases can be linked exclusively to genetic factors, as with sickle cell disease, but is more commonly the result of an interaction between genetic and environmental factors. In this context, environment is defined broadly to include any biological, chemical, physical, psychological or other factors that can affect health. Behaviour and lifestyle are of great importance in this connection and epidemiology is increasingly used to study both their influence and preventive intervention through health promotion. Epidemiology is also concerned with the course and outcomes (natural history) of diseases in individuals and groups. The application of epidemiological principles and methods to problems encountered in the practice of medicine with individual patients has led to the development of clinical medicine. Epidemiology is often used to describe the health status of population groups. Knowledge of the disease burden in populations is essential for health authorities, which seek to use limited resources to the best possible effect. Epidemiology can be used to identify priority health programmes for prevention and care. In some specialist areas, such as environmental and occupational epidemiology, the emphasis is on studies of population types of environmental exposure. Recently, epidemiologists have become involved in evaluating the effectiveness and efficiency of health services, by determining the appropriate length of stay in hospital for specific conditions; the value of treating high blood pressure; the efficiency of sanitation measures to control diarrhoeal diseases; and the impact on public health of reducing lead additives in petrol, and so on. 1.4 Descriptive Epidemiolocal Studies The Five ‘W’ Questions Traditional descriptive epidemiology has focused on several features: person, place, time, agent, host, and environment. An alternative approach is that of newspaper coverage. Good descriptive research, should answer five basic ‘W’ questions – who, what, when, why and where – and an implicit sixth question, so what? 8 Anthony K Mbonye Who has the disease in question? Age and sex are usually described, but other characteristic might be important too, including race, occupation, or recreational activities. The risk of venous thromboembolism for example, increases exponentially with age. Only 1% of breast cancers occur in men, but a family history of breast cancer increases their risk. Commercial fishing remains a risky business and having fun with an all-terrain vehicle or snow mobile, especially when drunk, can be lethal. What is the condition or disease being studied? Development of a clear, specific, and measurable case definition is an essential step in description epidemiology. Without such a description, the reader cannot interpret the report. Generally, stringent criteria for case definitions are desirable. In the early of HIV/AIDS, expanding the case definition of AIDS yielded a sudden surge in new cases. Why did the condition or disease arise? Descriptive studies often provide clues about cause that can be pursued with more sophisticated research designs. When is the condition common or rare? Time provides important clues about health events. The prototype might be the outbreak of gastroenteritis soon after ingestion of staphylococcal toxin. Some temporal relations can be long – e.g., vaginal adenosis and clear cell carcinoma of the vagina appeared years after that intrauterine exposure to diethylstilboestrol. Furthermore, cervical and other epithelial cancers develop decades after infection with human papillomavirus, and births and deaths from pneumonia and influenza have regular seasonal patterns as might sperm counts. 1.5 Types of Descriptive Epidemiological Studies Descriptive studies consist of two major groups: those that deal with individuals and those that relate to populations. Studies that involve individuals are case reports, case-series reports, cross-sectional studies, and surveillance; whereas ecological correlation studies examine populations. 1.5.1 Case Report The case report is the least publishable unit in the medical literature. Often, an observant clinician reports an unusual disease or association, which prompts further investigations with more rigorous study designs. Epidemiology-Made Easy 9 For example, a clinician, among others, reported benign hepatocellular adenomas, a rare tumour in women who had taken oral contraceptives. A large case-control study pursed this lead and confirmed a strong association between long-term use of high dose and this rare, but sometimes deadly tumour. However, not all case reports deal with serious health threats. 1.5.2 Case-Series Report A case-series aggregates individual cases in one report. Sometimes the appearances of several similar cases in a short period heralds an epidemic. For example, a cluster of homosexual men in Los Angeles with a similar clinical syndrome alerted the medical community to the AIDS epidemic in North America. Whereas a report of a single unusual case might not trigger further investigation, a case-series of several unusual cases (in excess of what might be expected) adds to the concern. A convenient feature of case-series reports is that they can constitute the case group for a case-control study, which can then explore the causes of a disease. 1.5.3 Cross-sectional (Prevalence) Studies Prevalence studies describe the health of populations. For example, in the Uganda, periodic surveys of the health status of the populations are done by the government – e.g., the Demographic Health Survey and the Uganda Population-based HIV Impact Survey (UPHIA). These studies provide a snapshot of the population at a particular time. Prevalence studies can be done in smaller populations as well. For example, the results of a survey done in a Puerto Rican pharmaceutical factory indicated an exceptionally high prevalence of gynaecomastia among employees. This finding led to the hypothesis that exposure to ambient oestrogen dust in the plant might be the cause; serum concentrations of oestrogen lent support to the hypothesis. After improvements in dust control in the factory, the epidemic disappeared. Similar prevalence studies have linked gynaecomastia with feeding of refugees and tainted food. Since both exposure and outcome are ascertained at the same time (the defining feature of a cross-sectional study), costs are small and loss to follow up is not a problem. However, because exposure and outcome are identified 10 Anthony K Mbonye at one-time point, the temporal sequence is often impossible to work out. 1.5.4 Surveillance Surveillance is another important type of descriptive study. Surveillance can be thought of as watchfulness over a community. A more formal definition is ‘the ongoing systematic collection, analysis and interpretation of health data essential to the planning, implementation, and evaluation of public health practice, closely integrated with the timely dissemination of these data to those who need to know. Prevention and control of the problem are fundamental parts of the feedback loop. Surveillance can be either active or passive. Passive surveillance relies on data generally gathered through traditional channels, such as death certificates. By contrast, active surveillance searches for cases. The reporting of abortion-related deaths provides an example. By comparison with official statistics, active surveillance identifies about twice as many deaths. Similarly, underreporting of maternal deaths remains an international problem. Epidemiological surveillance has made important contributions to health, but none more impressive than smallpox eradication. Surveillance and containment were responsible for the elimination of smallpox from the world, an extraordinary public-health achievement. Whereas mass immunisation of the world’s population had failed, the approach of identification of cases through surveillance and then immunisation of susceptible persons in the surrounding communities stopped transmission. Without a non-human vector, the virus died out. Practical Session: 1. A new laboratory test has been designed to test Hepatitis B. It is cheap (0.5 $) and can easily be afforded by citizens in developing countries like Uganda. However, it needs to be evaluated against the Gold-Standard, the PCR test. Describe two epidemiological parameters you will use to assess the performance of the new test. 2. List two uses of epidemiology and discuss how you can use epidemiology to improve the health of children in your community. Epidemiology-Made Easy 11 Bibliography 1. Beaglehole R, Bonita R, Kjellström T. Basic epidemiology. Geneva: World Health Organization; 1993 Jan. 2. Cameron D, Jones IG. ‘John Snow, the Broad Street pump and modern epidemiology. International Journal of Epidemiology. 1983 Jan 1;12(4):393-6. 3. Grimes DA, Schulz KF. ‘An overview of clinical research: the lay of the land’. The Lancet. 2002 Jan 5;359(9300):57-61. 4. Grimes DA, Schulz KF. ‘Cohort studies: Matching towards outcomes’. The Lancet. 2002 Jan 262;359(9304):341-45. 5. Last JM. ‘What is” clinical epidemiology?”’. Journal of Public Health Policy. 1988 Jul 1;9(2):159-63. 6. Schulz KF, Grimes DA. ‘Descriptive studies: What they can and cannot do’. The Lancet. 2002 Jan 12;359(9304):145-9. 7. Schulz KF, Grimes DA. ‘Case-Control studies: Research in Reverse.’ The Lancet. 2002 Feb 2;3 59(9304):431-34. 12 Anthony K Mbonye Lecture Notes-Series Two Epidemiology and Research Lecture Outline 1. 2. 3. 4. 5. 6. 7. 8. Classification of research What studies can and cannot do Cross-Section Studies Cohort Study Case Control Study Non-Randomised Trials Randomised studies Areas for further research Expectations After reading this Lecture Series, it is expected that you should clearly understand the different types of research design, as well as what type of research design is applied where and when. Through this learning, you should master how to interpret research study findings and learn how to rapidly identify key points and insights when reading study reports. 2.1 Classification of Research Most research can be grouped into two extensive categories: experimental and observation research. Figure 2.1 shows that one can quickly decide the type of research category by noting whether the investigators assigned the exposure – e.g., treatments – or whether they observed usual clinical practice or population behaviour and practices. For experimental studies, one needs to distinguish whether the exposures were assigned by a truly random technique (with concealment of the upcoming assignment from those involved) or whether some other allocation scheme was used, such as alternative assignment. With observational studies, which dominate the literature, the next step is to ascertain whether the study has a comparison or control group. If it has a comparison or control group, the study is termed analytical. If not, it is termed a descriptive study. If the study is analytical, the temporal direction of the trial needs to be identified. Epidemiology-Made Easy 13 If the study determines both exposures and outcomes at one-time point, it is termed cross-sectional. An example would be measurement of blood pressure of men admitted to a hospital with acute onset chest pain versus their next door neighbours. This type of study provides a snapshot of the population of sick and well at one-time point. If the study begins with an exposure – e.g., condom–use and follows men for a few years to measure outcomes, e.g., prevalence of sexually transmitted diseases (STDs) – then it is deemed a cohort study. Cohort studies can be either concurrent or non-concurrent. Figure 2.1: Classification of types of clinical research Did investigator assign exposure? Yes No Experimental study Observational study Random allocation? Comparison group? Yes No Randomised controlled trail Exposure Analytical study NonRandomised controlled trail Descriptive study Direction? Exposure and outcome at the same time Outcome Exposure No Yes Outcome Cohort study Casecontrol study Crosssectional study Source: Grimes & Schultz, 2002. 14 Anthony K Mbonye By contrast, if the analytical study begins with an outcome – e.g., prevalence of STDs – and looks back in time for an exposure, such as condom-use, then the study is a case control study. Studies without comparison groups are called descriptive studies. At the bottom of the research hierarchy is the case report. When more than one patient is described, it becomes a case–series report. 2.2 What studies can and cannot do Is the study design appropriate for the question? Starting at the bottom of the research hierarchy, descriptive studies are often the first foray into a new area of medicine. Investigators do descriptive studies to describe the frequency, natural history, and possible determinants of a condition. The results of these studies show how many people develop a disease for a condition over time, describe the characteristics of the disease and those affected, and generate hypotheses about the cause of the disease. These hypotheses can be assessed through more rigorous research, such as analytical studies or randomised controlled trials. An example of a descriptive study would be the early reports of hepatis B disease and yellowing of eyes syndrome. An important caveat (often forgotten or intentional ignored) is that descriptive studies which don’t have a comparison group, do not allow assessment of association. Only comparative studies (both analytical and experimental) enable assessment of possible causal associations. 2.3 Cross-Section Studies Sometimes termed as frequency surveys or a prevalence studies, cross sectional studies are done to examine the presence or absence of an exposure at a particular time. Thus, prevalence is the focus. Since both outcome and exposure are ascertained at the same time, the temporal relation between the two might be unclear. For example, assume that a cross sectional study finds obesity to be more common among women with arthritis compared to those without arthritis. Did the extra weight load on joints lead to arthritis or did women with arthritis become involuntarily inactive and then obese? This type of question is unanswerable in a cross sectional study. Epidemiology-Made Easy 15 2.4 Cohort Studies Cohort studies proceed in a logical sequence: from exposure to outcome. Hence, this type of research is easier to understand than case-control studies. Investigators identify a group with an exposure of interest and another group or groups without the exposure. The investigators then follow the exposed and unexposed groups forward in time to determine outcomes. If the exposed groups develop a higher incidence of the outcome than the unexposed, then the exposed is associated with an increased risk of the outcome. The cohort study has important strengths and weaknesses. Because exposure is identified at the outset, one can assume that the exposure preceded the outcome. Recall bias is less of a concern than in the case control study. The cohort study enables calculation of true incidences rates, relative risks, and attributable risks. However, for the study of rare events or events that take years to develop, this type of research design can be slow to yield results and thus prohibitively expensive. Nonetheless, several famous large cohort studies continue to provide important information. Figure 2.2: Temporal direction of three study designs: Exposure Exposure Cohort Study Case-Control Study Outcome Outcome Cross-sectional Study Exposure Outcome Time Source: Grimes & Schultz, 2002 16 Anthony K Mbonye 2.5 Case Control Studies Case-control studies work backwards. Because thinking in this direction is not intuitive for clinicians, case-control studies are often widely misunderstood. Starting with an outcome, such as a disease, this type of study looks backward in time for exposures that might have caused the outcome. As shown in figure 2.2, investigators define a group with an outcome (for example, ovarian cancer) and a group without an outcome (controls). Then, through chart reviews, interviews, or other means, the investigators ascertain the prevalence (or amount) of exposure to a risk factor – e.g., oral contraceptives, ovulation induction drugs in both groups. If the prevalence of the exposure is higher among cases than among controls, then the exposure is associated with an increased risk of the outcome. Case-control studies are especially useful for outcomes that are rare or that take a long time to develop, such as cardiovascular disease and cancer. These studies often require less time, effort and money than would cohort studies. The challenge with case-control studies is choosing an appropriate control group. Controls should be similar to cases in all important respects excepts for not having the outcome in question. Inappropriate control groups have ruined many case control studies and caused much harm. Additionally, recall bias (better recollection of exposures among the cases than among the controls) is a persistent difficultly in studies that rely on memory. Because the case control study lacks denominators, investigators cannot calculate incidence rates, relative risks or attributable risks. Instead, odds ratios are the measure of association used; when the outcome is uncommon – e.g., most cancers – the odds ratio provide a good proxy for the true relative risk. Outbreaks of food borne diseases act as a good prototype for demonstrating the value of case-control studies. Those with vomiting and diarrhoea are asked about food exposures, as are a sample of those not ill. If a higher proportion of those ill report having eaten a food than those well, the food becomes suspect. In this way, German potato salad on a ship was linked with a serious outbreak of shigella resistant to several antibiotics. Epidemiology-Made Easy 17 2.6 Non-Randomised Trials Some experimental trials do not randomly allocate participants to exposures – e.g., treatments or prevention strategies. Instead of using truly random techniques, investigators often use methods that fall short of the mark – e.g., alternate assignment. The US Preventive Services Task Force and Canadian Task Force on the Periodic Health Examination designate this research design as class II-1, indicating less scientific rigour than randomised trials but more than analytical studies. After the investigators have assigned participants to treatment groups, the way a non-randomised trial is done and analysed resembles that of a cohort study. The exposed and unexposed are followed forward in time to ascertain the frequency of outcomes. Advantages of a nonrandomised trial include use of a concurrent control group and uniform ascertainment of outcomes for both groups. However, selection bias can occur. 2.7 Randomised Controlled Trails The randomised controlled trial is the only known way to avoid selection and confounding biases in clinical research. This design approximates the controlled experiment of basic science. It also resembles the cohort study in several respects, with the important exception of randomisation of participants to exposures (figure 2.2). The hallmark of randomised controlled trials is assignment of participants to exposures purely by the play of chance. Randomised controlled trials reduce the likelihood of bias determining outcomes. When properly implemented, random allocation precludes selection bias. Trials feature uniform diagnostic criteria for outcomes and often blinding those involved to the exposure each participant is receiving, reduces information bias. A unique strength of this study design is that it eliminates confounding bias, both known and unknown. Furthermore, the trial tends to be statistically efficient. If properly designed and done, a randomised controlled trial is likely to be free of bias and is thus especially useful for examination of small or moderate effects. In observational studies, bias might easily account for small to moderate differences. 18 Anthony K Mbonye Randomised controlled trials have drawbacks as well, however. External validity is one. Whereas the randomised controlled trial, if properly done, has internal validity – i.e., it measures what it sets out to measure – it might not have external validity. This term indicates the extent to which results can be generalised to the broader community. Unlike the observational study, the randomised controlled trial includes only volunteers who pass through a screening process before inclusion. Those who volunteer for trials tend to be different from those who do not; for example, their health might be better. Another limitation is that a randomised controlled trial cannot be used in some instances, since intentional exposure to harmful substances – e.g., toxins, bacteria, or other noxious exposures – would be unethical. As with cohort studies, the randomised controlled trial can be prohibitively expensive. Indeed, the cost of large trials runs into the tens of millions of US dollars. Questions to stimulate further reading: 1. Uganda reported an outbreak of COVID-19 in March 2020. Several interventions have been implemented to stop the spread of the disease. These include lockdowns, hand washing and wearing masks. Other interventions have been implemented at the level of health facilities to improve treatment of COVID-19 patients. a) What study design would you implement to assess the level knowledge of the population on COVID-19 prevention? And why? b) What study design would you implement to study perceptions, opinions and behavioural practices towards COVID-19 prevention? 2. Pfizer has developed a vaccine against COVID-19, what study design would you implement to evaluate the efficacy of the vaccine? 3. Discuss the merits and limitations of the study designs in 1 and 2 above. Epidemiology-Made Easy 19 Practical Session: 1. A disease outbreak has been reported in Entebbe town. Five patients with a dry cough, fever, sore throat, hoarse voice and lack of smell with a history of having recently returned from travel abroad, and they have reported to health units around Entebbe town in the last 2 days. Three of the patients were having difficulty in breathing and needed to be given oxygen at Entebbe Grade B hospital. a) Discuss three steps in your immediate plan to investigate the disease outbreak. b) What study design is appropriate at this moment? c) What further investigations are you likely to carry out? 2. Two students, a masters and PhD, were designing studies as part of their degree programs. The master’s student wanted to find out whether young girls exposed to contraceptives were at less risk of having unwanted pregnancies and abortions. The PhD student wanted to assess the effect of long term contraceptives use on the risk of ovarian cancer. a) Discuss with examples, three types of study designs the masters student could use for the study. b) Discuss with examples, two study designs the PhD student could use to test his/her hypothesis (s). Bibliography 1. Beaglehole R, Bonita R, Kjellström T. Basic epidemiology. Geneva: World Health Organization; 1993 Jan. 2. Grimes DA, Schulz KF. ‘An overview of clinical research: the lay of the land’. The Lancet. 2002 Jan 5;359(9300):57-61. 3. Grimes DA, Schulz KF. ‘Bias and causal associations in observational research’. The Lancet. 2002 Jan 19;359(9302):248-52. 4. Schulz KF, Grimes DA. ‘Descriptive studies: What they can and cannot do’. 2002 Jan 12;359(9304):145-9. 20 Anthony K Mbonye Lecture Notes-Series Three Analytical Epidemiology Lecture Outline 1. Case Control Studies 2. Cohort Control Studies Expectations After reading this Lecture Series, it is expected that you should clearly understand what is analytical epidemiology and the different types of analytical studies. You will be introduced to a practical session that you are encouraged to do. Through this learning, you should master how to interpret research reports based on analytical epidemiology and how to use these in public health and control of infections. 3.1 Case-Control Studies Case-control studies contribute greatly to the research toolbox of an epidemiologist. They embody the strengths and weaknesses of observational epidemiology. Moreover, epidemiologists use them to study a huge variety of associations. The strength of case-control studies can be appreciated in early research done by investigators hoping to understand the cause of AIDS. Casecontrol studies identified risk groups – e.g., homosexual men, intravenous drug users, and blood transfusion recipients – and risk factors – e.g., multiple sex partners, receptive anal intercourse in homosexual men, and not using condoms. Based on such studies, blood banks restricted high risk individuals from donating blood and educational programmes began to promote safer behaviours. As a result of these precautions, the rate of HIV-1 transmission was greatly reduced, even before the virus had been identified. By comparison with other study designs, case-control studies can yield important findings in a relatively short time, and with relatively little money and effort deployed. This apparently quick road to research results entices many newly trained epidemiologists. However, caseEpidemiology-Made Easy 21 control studies tend to be more susceptible to biases compared to other analytical, epidemiological designs. Rothman et al (2008) comments that: ‘because it need not be extremely expensive nor time consuming to conduct a case-control study, many studies have been conducted by would be investigators who lack even a rudimentary appreciation for epidemiologic principles. Occasionally such haphazard research can produce fruitful or even extremely important results, but often the results are wrong because basic research principles have been violated.’ 3.1.2 A Case-Control study design Case-control study designs might seem easy to understand, but many clinicians stumble over them. Because this type of study runs backwards by comparison with most other studies, it often confuses researchers and readers alike. In cohort studies, for example, study groups are defined by exposure. In case-control studies, however, study groups are defined by outcome. To study the association between smoking and lung cancer, therefore people with lung cancer are enrolled to form the case group, and people without lung cancer are identified as controls. Researchers then look back in time to ascertain each person’s exposure status (smoking history), hence the retrospective nature of this study design. Investigators compare the frequency of smoking exposure in the Figure 3.1: Classification of types of clinical research Case control study design Past or Present Exposure Yes Exposure No Exposure Exposure Yes No Present Population with outcome (cases) With outcome Sample of cases Population without outcome (controls) No outcome Sample controls Time Source: Shultz & Grimes, 2002 22 Anthony K Mbonye case group with that in the control group, and calculate a measure of association. Unlike cohort studies, case-control studies cannot yield incidence rates. Instead, they provide an odds ratio, derived from the proportion of individuals exposed in each of the case and control groups. When the incidence rate of a particular outcome in the population of interest is low (usually under 5% in both the exposed and unexposed suffices) the odds ratio from a case-control study is a good estimate of relative risk. 3.1.3 Advantages and Disadvantages of case-control designs Epidemiologists often tout case-control studies as the most efficient design in terms of time, money, and effort. This assertion makes sense when the incidence rate of an outcome is low, since in a cohort design the researchers would have to follow up many individuals to identify one with the outcome. Case-control studies are also efficient in the investigation of diseases that have a long latency period – e.g., cancer – in which instance a cohort study would involve many years of follow-up before the outcome became evident. Finally, many methodological issues affect the validity of the results of case-control studies, and two factors – i.e., choosing a control group and obtaining exposure history – can greatly affect a study’s vulnerability to bias. Selection of case and control groups: 3.1.4 Case Group All the cases from a population could theoretically be included as participants in a case-control study. For practical reasons, however, only a sample is frequently studied. Investigators should, therefore, state how the sample was selected, providing a clear definition of the outcome being studied including, for example, clinical symptoms, laboratory results, and diagnostic methods used. Furthermore, researchers should detail eligibility criteria used for selection, such as age range and location (clinic, hospital, population-based). Finally, they should gather data, preferably from incident (new) rather than prevalence (both old and new) cases, since diagnostic patterns change overtime, recent diagnoses are likely to be more consistent than those obtained from different periods. Epidemiology-Made Easy 23 3.1.5 Control Group Controls should be free of the disease (outcome) being studied, but should be representative of those individuals who would have been selected as cases had they been the population at risk of becoming cases. Selection of controls must be independent of the exposure being investigated. When investigators consider potential control groups, they must anticipate all the potential biases that could arise, making this task one of the hardest in epidemiology. Suppose investigators selected individuals with myocardial infarction from the cardiology ward of a large city hospital that serves the entire city as cases, but identified people without infarction from the emergency medicine ward that serves the city. Unfortunately, the exposure history for patients from the city would not usually accurately reflect that of patients statewide. For example, the exposure of interest – e.g., a new blood pressure drug – might not be available to patients in outlying areas of the state but be commonly prescribed in the city. In this example, therefore, either the controls should be chosen from the entire state, like the cases, or the investigators should exclude all individuals who lived outside the local community served by the emergency medicine ward. Moreover, controls should be selected independent of exposure. Assume that this new antihypertensive drug causes drowsiness and slows reaction time. Such side effects might lead to automobile accidents, with injured drivers entering the emergency medicine department. Thus, the investigator’s control group would include an abnormally high proportion of individuals exposed to the new antihypertensive, a biased comparison with the case group. 3.2 Cohort Control Studies The term cohort has military, not medical, roots. A cohort was a 300600 man unit in the Roman army; ten cohorts formed a legion. Thus a cohort study consists of bands or groups of persons marching forward in time from an exposure to one or more outcomes. This analogy might be helpful, since cohort studies have confusing synonyms: incidences, longitudinal, forward looking, follow up, concurrent, and prospective study. Although the terminology can seem daunting, the cohort study is easy for clinicians to understand, since it flows in a logical direction (unlike the case-control study). 24 Anthony K Mbonye 3.2.1 Data Collection A cohort study follows up two or more groups from exposure to outcome. In this simplest form, a cohort study compares the experiences of a group exposed to some factor with another group not exposed to the factor. If the former group has a higher or lower frequency of an outcome than the unexposed, then an association between exposure and outcome is evident. The defining characteristic of all cohort studies is that they track people forward in time from exposure to outcome. Researchers doing this kind of study must, therefore, go forward in time from the present to choose their cohorts. Either way, a cohort study moves in the same direction, although gathering data might not. For example, an investigator who wants to study the epidemic of multiple births stemming from assisted reproductive technologies could begin a cohort study now. Women exposed to these technologies and a similar group who conceived naturally, could be tracked forward through their pregnancies to monitor the frequency of multiple births (a concurrent cohort study). Alternatively, the investigator might use existing medical records and go back in time several years to identify women exposed and not exposed to these technologies. The investigator would then track them forward through records to note the birth outcomes. Again, the study moves from exposure to outcome, though the data collection occurred after. 3.2.2 Advantages of Cohort Studies Cohort studies have many appealing features. They are the best way to ascertain both the incidence and natural history of a disease. The temporal sequence between putative cause and outcome is usually clear: the exposed and unexposed can often be seen to be free of the outcome at the onset. By contrast, this chicken-and-egg question often frustrates cross-sectional and case-control studies. For example, in a case-control study, patients with chronic widespread pain were more likely to have mental illness than controls. As such, do mood and anxiety disorders increase this risk, or do patients with chronic pain develop mood and anxiety disorders as a result of their disorder? Cohort studies are useful for investigating multiple outcomes that might arise after a single exposure. An illustrative case would be cigarette Epidemiology-Made Easy 25 smoking (the exposure) and stroke, emphysema, oral cancer and heart disease (the outcomes). Although assessment of many outcomes is often cited as a positive attribute of cohort studies, this feature can be abused. For example, testing the associations between exposure and many outcomes, but only reporting the significant ones, represents misleading science. Investigators should preferably have planned primary and secondary associations to examine (sometimes called hypothesis confirmation). Although investigators can look at other outcomes (hypothesis generation), they should report the findings of all examination, not just significant ones, so that readers can correctly interpret the results. The cohort design is also useful in the study of rare exposures: a researcher can often recruit people with uncommon exposures – e.g., to ionising radiation or chemicals – in the workplace. A hospital or factory might provide a large number of individuals with the exposure of interest, which would be rare in the general population. Since the investigator does not assign exposure, no ethical concerns arise. Cohort studies also reduce the risk of survivor bias. Diseases that are rapidly fatal are difficult to study because of this factor. For example, a hospital based case control study of the link between snow–shovelling and myocardial infarction would miss all those who died in the driveway. A cohort study would be a less biased (but more cumbersome) approach: compare rates of myocardial infarction among those who shovel and those who do not shovel. Finally, cohort studies allow calculation of incidence rates, relative risks, and confidence intervals. Other outcome measures in cohort studies include life table rates, survival curves, and hazard ratios. By contrast, case-control studies cannot provide incidence when the outcome is uncommon. 3.2.3 Disadvantages of Cohort Studies Cohort studies also have important limitations. Selection bias is in built into cohort studies. For example, in a cohort study investigating effects of jogging on cardiovascular disease, those who choose to jog probably differ in other important ways (such as diet and smoking) from those who do not exercise. In theory, both groups should be the same in all 26 Anthony K Mbonye important respects, except for the exposure design is not optimum for rare diseases or those that take a long time to develop – e.g., cancer. However, several large (and thus expensive) cohort studies have made landmark contributions to our knowledge of uncommon diseases. Loss of follow up can be difficult, even at 1 month, and particularly so with longitudinal studies that continue for decades. Furthermore, differential losses to follow up between those exposed and unexposed can bias results. Over time, the exposure status of study participants can change, for example the proportion of women who use oral contraceptives will switch to an intrauterine devices, and vice-verse. In such events, partitioning might be needed to avoid a blurring of exposure, sometimes termed contamination. 3.2.4 What to Look for with Cohort Studies? Who is at risk? All participants (both exposed and unexposed) in a cohort study must be at risk of developing the outcome. For example, since women who have had a tubal sterilisation operation have almost no risk of salpingitis, they should not be included in cohort studies of pelvic inflammatory disease. Who is exposed? Cohort studies need a clear, unambiguous definition of the exposed in the cohort. This definition sometimes involves quantifying the exposure by degree, rather than just yes or no. For example, the minimum exposure might have to be 14 cigarettes per day or less or 3-6 months of oral contraceptives. Definition of exposure levels in this way can result in more than two groups, e.g., non-smokers, light smokers and heavy smokers. Who is an appropriate control? The key notion is that controls (the unexposed) should be similar to the exposed in all the important respects, except for the lack of exposure. If so, the unexposed groups will reveal the background rate of the outcome in the community. The unexposed group can come from either internal (persons from the same time and place, such as a hospital ward) or external sources. Internal comparisons are most desirable. In a particular population, individual Epidemiology-Made Easy 27 segregate by themselves (or through medical interventions) into exposure status – e.g., cigarette smoking, occupation, contraception. For example, in a cohort study, 138 patients with HIV-1 associated Kaposi’s sarcoma were divided into two groups: those with oral and those with cutaneous lesions. The presence of oral lesions (the exposure) had a poorer prognosis, with a medical survival (the outcome) one-third that of the other group. If satisfactory internal controls are not available, researchers look elsewhere (sometimes termed a double cohort study). In a trial of an occupational exposure, finding an adequate number of employees in the factory without the exposure might be difficult. Hence, one might choose workers in a similar factory in the same community. This choice assumes that workers in the other factory have the same baseline risk of the outcome in question, which might not be the case. Even less desirable is use of population norms: disease-specific mortality rates are an example. A researcher might compare lung cancer death rates among workers in the factory with rates of persons of the same age and sex in the population. Bias inevitably creeps into such comparisons because of the healthy worker effect: those who work are healthier, in general, than those who do not (or cannot) work. Additionally, the desire to reap economic benefits from a certain outcome might further bias comparisons. Assessment of outcomes Outcomes must be defined in advance: they should be clear, specific and measurable. Identification of outcomes should be comparable in every way for the exposed and unexposed to avoid bias. Failure to define outcomes leads to uninterpretable results. Keeping those who judge outcomes unaware of the exposure status of participants (blinding) in a cohort study is important for subjective outcomes, such as tenderness or erythema. By contrast, with objective outcome measures. Outcome data can come from many sources. For mortality studies, death certificates are often used. Although convenient, the validity of the clinical information is highly variable. For non-fatal outcomes, sources include hospital charts, insurance records, laboratory records, disease registries, hospital discharge logs and physical examination, and measurement of participants. Optimally, the person who judges 28 Anthony K Mbonye outcomes should be unaware of the exposure. When diagnoses vary in their confidence, assignment of levels of assurance might be helpful, such as definite probable and suspect. 3.2.5 Tracking participants over time How to minimise loss to-follow-up Although loss of participants damages the power and precision of a study, differential loss to follow-up is more problematic. If the likelihood of loss to follow-up is related both to exposure and outcomes then bias can result. For example, some participants given a new antibiotic might have such poor outcomes that they are unable to complete questionnaires or to return for examination. Their disappearance from the cohort would make the new antibiotic look better than it is. The best way of dealing with loss to follow-up is to avoid it. For example, restrict participation to only those judged likely to complete the study. Obtaining the names of several family members or friends who do not live with the respondent is often helpful at the start of such studies. The participants’ family doctor might also be helpful. Should the respondent move, these contacts would probably know their new address. Motor vehicle registration records can be useful in such instances too. Furthermore, national vital statistics registries facilitate follow up. Participants can be offered financial compensation for their time lost from work as a result of the study. Diligent tracking of participants is hard work, and might require hiring personnel for this task alone. 3.2.6 Reporting Cohort Studies Many researchers who conduct cohort studies report their findings in an unsatisfactory way. An investigator’s first challenge is to convince the editor (the readers) that the exposed and unexposed groups were indeed similar in all important respects, except for the exposure. The first table in reports of cohort studies customarily provides demographic and other prognostic factors for both groups with hypothesis testing (P values) to show the likelihood that observed differences could be due to chance. For dichotomous outcome measures, such as being sick or feeling well, the investigator should provide raw data sufficient for the reader to confirm the results. For cumulative incidence, the investigator should Epidemiology-Made Easy 29 calculate the proportion who developed the outcome during the specified study interval. For incidence rates, the value is expressed per unit of time. Then relative risks and confidence intervals should be provided. Use of P value should not replace interval estimation (relative risk with confidence intervals), and should only be used as supplementary information. Like other observational studies, cohort studies have built in bias. Investigators should identify potential biases in their data and show how these might have affected results. Whenever possible, confounding factors should be discussed in detail. 3.3 Controlling for Confounding Case-control studies need to address confounding bias. This type of bias can be dealt with in the design phase by restriction or matching, but researchers generally prefer to handle it in the analysis phase with analytical techniques such as logistic regression or stratification with Mantel–Haenszel approaches. If this second approach is used, investigators should plan carefully in advance what potentially confounding variables to obtain data for; irrespective of the analytical approach used, researchers cannot control for the variable for which they have no data Calculating Odds Ratio and Relative risk What is an Odds Ratio? An odds ratio is defined as the (odds of the event in the exposed group) divided by the (odds of the event in the non-exposed group). If the data is set up in a 2 x 2 table as shown in table 3.3, page 33, then the odds ratio is (a/b) / (c/d) = ad/bc. Odds ratios are commonly used to report case-control studies. The odds ratio helps identify how likely an exposure is to lead to a specific event. The larger the odds ratio, the higher odds that the event will occur with exposure. Odds ratios smaller than one, imply the event has fewer odds of happening with the exposure. The following is an example to demonstrate calculation of the odds ratio (OR): 30 Anthony K Mbonye Practical Session If we have a hypothetical group of smokers (exposed) and non-smokers (not exposed), then we can look for the rate of lung cancer (event). If 17 smokers have lung cancer, 83 smokers do not have lung cancer, one non-smoker has lung cancer, and 99 non-smokers do not have lung cancer, the odds ratio is calculated as follows. First, we calculate the odds in the exposed group. • Odds in exposed group = (smokers with lung cancer) / (smokers without lung cancer) = 17/83 = 0.205 Next, we calculate the odds for the non-exposed group. • Odds in not exposed group = (non-smokers with lung cancer) / (non-smokers without lung cancer) = 1/99 = 0.01 Finally, we can calculate the odds ratio. • Odds ratio = ad/bc (Table 3.2) = (odds in exposed group) / (odds in not exposed group) = 0.205 / 0.01 = 20.5 Thus, using the odds ratio, this hypothetical group of smokers has 20 times the odds of developing lung cancer than non-smokers. The question then arises: is this significant? Table 3.2: Risk of smoking and lung cancer Exposure factor Smokers Non smokers Total Cases Controls Total a b 100 17 83 c d 1 99 18 182 100 200 Odds Ratio and Confidence Intervals To examine whether this finding is significant, the confidence interval needs to be calculated. The confidence interval gives an expected range for the true odds ratio for the population to fall within. If estimating the odds of lung cancer in smokers versus non-smokers of the general Epidemiology-Made Easy 31 population based on a smaller sample, the true population odds ratio may be different than the odds ratio found in the sample. In order to calculate the confidence interval, the alpha, or the level of significance, is specified. An alpha of 0.05 means the confidence interval is 95% (1 – alpha) the true odds ratio of the overall population is within range. A confidence level of 95% is traditionally chosen in the medical literature (but other confidence intervals can be used). The formula for calculating confidence intervals are complex and are usually done through readily available statistical computer packages. If the confidence interval for the odds ratio includes the number 1, then the calculated odds ratio would not be considered statistically significant. This can be seen from the interpretation of the odds ratio. An odds ratio greater than 1 implies there are greater odds of the event happening in the exposed versus the non-exposed group. An odds ratio of less than 1 implies the odds of the event happening in the exposed group are less than in the non-exposed group. While an odds ratio of exactly 1 means the odds of the event happening are the exact same in the exposed versus the non-exposed group. Thus, if the confidence interval includes 1 (e.g., [0.01- 2], [0.99- 1.01], or [0.99- 100] all include one in the confidence interval), then the expected true population odds ratio may be above or below 1, so it is uncertain whether the exposure increases or decreases the odds of the event happening with the specified level of confidence. The odds ratio can be confused with relative risk. As stated above, the odds ratio is a ratio of 2 odds. As odds of an event are always positive, the odds ratio is always positive and ranges from zero to very large. What is a relative risk? This is a ratio of probabilities of the event occurring in all exposed individuals versus the event occurring in all non-exposed individuals. The relative risk, is calculated thus: {a / (a+b)} / {c / (c+d)}. If the disease condition (event) is rare, then the odds ratio and relative risk may be comparable, but the odds ratio will overestimate the risk if the disease is more common. 32 Anthony K Mbonye In such cases, the odds ratio should be avoided, and the relative risk will be a more accurate estimation of risk. Commonly, odds ratios will be reported in case-control studies, in which relative risks cannot be calculated. The relative risk for the above hypothetical example of smokers versus non-smokers developing lung cancer is calculated as: Relative Risk = (17/100) / (1/100) = 0.17 / 0.01 = 17 Thus in this example, the relative risk is 17. Thus smokers have a relative risk 17 times to have lung cancer compared to non-smokers. Practical Session: 1. A researcher wanted to find out the effect of drinking boiled water on the impact of diarrhoea. He visited a nearby health centre IV and looked for children aged < 5 years who had developed diarrhoea and visited the facility for treatment in the previous one year. He took history of hygiene practices, including drinking boiled water, in an equal number of children who visited the facility for other illnesses or immunisation in the same period. 2. The results turned out as follows: Among 200 households that did not boil water, 60 children had diarrhoea; while among the comparison group of 200 households that boiled water only 20 children had diarrhoea. a) Using a 2 by 2 table, calculate the risk of diarrhoea among children due to unboiled water. b) How do you interpret this? c) What is the prevalence of diarrhoea at the health facility? d) What was the relative risk of getting diarrhoea in the exposed group? Questions to stimulate further reading: 1. Discuss the merits and limitations of case control and cohort studies 2. List the uses of cohort studies in epidemiology and discuss how you can use the study design to improve the health of your community. Epidemiology-Made Easy 33 Bibliography: 1. Beaglehole R, Bonita R, Kjellström T. Basic epidemiology. Geneva: World Health Organization; 1993 Jan. 2. Bonita R, Beaglehole R, Kjellstrom T. Basic epidemiology: World Health Organization. Geneva, Switzerland. 2006. 3. Grimes DA, Schulz KF. ‘Cohort studies: Matching towards outcomes’. The Lancet. 2002 Jan 262;359(9304):341-45. 4. Rothman KJ, Greenland S, Lash TL. Case–control studies. Encyclopaedia of Quantitative Risk Analysis and Assessment. 2008 Sep 15;1. 5. Schulz KF, Grimes DA. ‘Case-Control studies: Research in Reverse’. The Lancet. 2002 Feb 2;3 59(9304):431-34. 6. Schulz KF, Grimes DA. ‘The Lancet handbook of essential concepts in clinical research’. The Lancet; 2006. 7. Tenny S and Hoffman MR. ‘Odds ratio (OR).’ University of Nebraska Medical Center and SIU School of Medicine, (2017). Answers to the practical session Table 3.3: Calculating Odds ratio and Relative Risk Exposure Households with no boiled water diarrhoea Households with boiled water Cases Controls Diarrhoea No diarrhoea Total a 60 b 140 200 c 20 d 180 200 400 a)Total The risk of diarrhoea expressed80as odds ratio is320 defined as the odds of getting diarrhoea in the exposed group/risk of disease in the non-exposed group = (a/c)/(b/d) = ab/cd = 140*60/20*180=2.3 b) The risk of diarrhoea among children with households that don’t boil drinking water is 2.3 times higher than households which boil water. c) The prevalence of diarrhoea at this facility is 80/400 = 20% d) Relative risk = {a / (a+b)}/{c / (c+d)} = (60/200)/ (20/200) = 0.3/0.1=3 34 Anthony K Mbonye Lecture Notes-Series Four Experimental Epidemiology Lecture Outline 1. Experimental Epidemiology 2. Randomised Controlled Trails 3. Randomisation 4. Blinding 5. Field trials 6. Community trials Expectations After reading this Lecture Series, it is expected that you should clearly understand what is experimental epidemiology and the different types of studies and techniques to achieve the robustness of the studies You will be introduced to a practical session that you are encouraged to do. Through this learning, you should master how to interpret research reports based on experimental epidemiology and how to use these in public health and control of infections. 4.1 Experimental Epidemiology Intervention or experimental epidemiology involves attempting to change a variable in one or more groups of people. This could mean the elimination of a dietary factor thought to cause allergy, or testing a new treatment on a selected group of patients. The effects of an intervention are measured by comparing the outcome in the experimental group with that in a control group. Since the interventions are strictly administered according to a protocol, ethical considerations are of paramount importance in the design of these studies. For example, no patient should be denied appropriate treatment as a result of participation in an experiment, and the treatment being tested must be acceptable in light of current knowledge. Experimental epidemiology can take one of three forms: • Randomised controlled trial • Field trials • Community trials Epidemiology-Made Easy 35 4.2 Randomised Controlled Trails A randomised controlled trial (or randomised clinical trial) is an epidemiological experiment to study a new prevention or therapeutic regimens. Subjects in a population are randomly allocated to groups, usually called treatment and control groups and the results are assessed by comparing the outcome in the two or more groups. The outcome of interest will vary but may be better treatment with a drug, or improved practices in a hospital setting. Figure 1: Design of a Randomised Controlled Trial: Study population Potential participants Invitation to participans Selection by defined criteria Non-participants (do not meet selection criteria) Potential participants Participants Randomisation Control Treatment 36 Anthony K Mbonye To ensure that the groups being compared are equivalent, patients are allocated to them randomly, i.e., by chance. Within the limits of chance, randomisation ensures that control and treatment groups will be comparable at the start of the investigation; any difference between groups are chance occurrences unaffected by the conscious or unconscious biases of the investigators. The intervention under test may be a new drug or a new regimen, such as new drug to prevent malaria in pregnancy. All subjects in the trial must meet the specified criteria for the condition under investigation, and other criteria are usually specified to ensure a reasonably homogeneous group of subjects, e.g., only patients with long standing or mild disease. Randomised controlled trials have been helpful in assessing the value of new therapies to combat diseases. For example, a trial using rice-based or glucose–based oral rehydration solution involved 342 patients with acute watery diarrhoea during an epidemic of cholera in Bangladesh in 1983 (Molla et al., 1985). The patients were randomly assigned to treatment with either glucose-based or rice-based oral rehydration solution. The study showed that the glucose component of oral rehydration solution could be replaced by rice powder with improved results, as indicated by decreases in mean stool output and intake of solution. Studies such as this have important implications for the efficient use of health care resources in developing countries. Glucose is a costly manufactured product and is not always available in countries where diarrhoeal diseases are a major problem. The details of a randomised controlled trial of early discharge from hospital after myocardial infarction are shown in figure 2 below. The study suggests that, for carefully selected patients with uncomplicated myocardial infarction, discharge after three days does not harm the patient. Fewer patients were readmitted or had subsequent problems than in the late discharge group. However, only a small proportion of all myocardial infarction patients were included in the study, and its power was thus limited because of the small sample size. Epidemiology-Made Easy 37 Figure 2: Randomised Controlled Trial of Early Hospital Discharge after Myocardial Infarction Myocardia patients Complicated excluded (329) Uncomplicated (179) Randomised (80) Not included in study (99) Early discharge (40) Late discharge (40) Outcomes 0 0 Deaths 6 10 Hospital read mission 3 8 Patients with angina 0 5 Re-infections Source: Topol et al 1988 4.3 Randomisation In order to attribute a difference in outcome between the two trial arms to the new treatment being tested, the characteristics of people should be similar between the groups. • • • 38 Randomly allocation of subjects produces groups that are as similar as possible with regard to all characteristics except the trial interventions. The only systematic difference between the two arms should be the treatment given. Therefore, any differences in results observed at the end of the trial should be due to the effect of the new treatment, and not to any other Anthony K Mbonye Randomisation is a process for allocating subjects between the different trial interventions. Each subject has the same chance of being allocated to any group, which ensures similarity in characteristics between the arms. This minimises the effect of both known confounders, and thus has a distinct advantage over observational studies in which statistical adjustments can only be made for known confounders. Although randomisation is designed to produce groups with similar characteristics, there will always be small differences because of chance variation. Thus randomisation cannot produce identical groups. Randomisation also minimises bias. If either the researcher or trial subject is allowed to decide which intervention is allocated, then subjects with a certain characteristic, for example, those who are younger or suffering less severe disease, could be over represented in one of the trial arms. This could produce a bias which makes the new intervention look effective when it really is not, or overestimate the treatment effect. Selection bias can occur if a choosing a particular subject for the trial is influenced by knowing the next treatment allocation. Allocation bias involves giving the trial treatment that the clinical or subject feels might be most beneficial. Sometimes, the researcher has access to the list of randomisation from which the next allocation can be seen, possibly creating allocation bias. This can be avoided if randomisation is done through a central office (for example, a clinical trial units) or a computer system, so that the research has no control over either process (called allocation concealment). 4.4 Blinding The randomisation process minimises the potential for bias, but the benefit could be greater if the trial intervention given to each subject is concealed. Subjects or researchers may have expectations associated with a particular treatment, and knowing which was given can create bias. This can affect how people respond to treatment, and how the researcher manages or assesses the subject. In subjects, this bias is specifically referred to as the placebo effect. Humans have a remarkable psychological ability to affect their own health status. The effect of any of these biases could result in subjects receiving the new intervention appearing to do the action of the new treatment. Epidemiology-Made Easy 39 Clinical trials are described as double-blind if neither the subject nor anyone involved in giving the treatment, or managing or assessing the subject, is aware of which treatment was given. In single-blind trials, usually only the subject is blind to the treatment they have received. A placebo has no known active component. It is often referred to as a ‘sugar pill’ because many treatment trials involve swallowing tablets. However, a placebo could also be a saline injection, a sham surgical procedure, sham medical device or any other intervention that is meant to resemble the test intervention, but has no known effect on the disease of interest, and no adverse effect. Using placebo needs to be fully justified in any clinical trial. While there are some arguments against placebos such as a sham surgery, these trials can provide valuable evidence on the effectiveness of a new intervention. They can be conducted as long as there is ethical approval, and patients are fully aware that they may be assigned to the sham group. When it is not possible to conceal the trial interventions, an outcome measure that does not depend on the personal opinion of the subject or researcher is best. For example, in a trial evaluating hypnotherapy for smoking cessation, a subjective measure would be to ask the subjects if they stopped smoking at, say, 1 year. However, there could be some continuing smokers who misreport their smoking status. An objective endpoint would be to measure serum or urinary Nicotine, as a marker of current smoking status, because this is specific to tobacco smoke inhalation and so less prone to bias than a questionnaire on self-reported habits. Summary Points: • • • • Clinical trials are essential for evaluating new methods of disease detection, prevention and treatment. Clinical trials, especially when randomised, are considered to provide the strongest evidence. Randomisation minimises the effect of confounding and bias, and blinding further reduces the potential for bias. 4.5 Field Trials Field trials, in contrast to clinical trials, involve people who are disease 40 Anthony K Mbonye free but presumed to be at risk. Data collection takes place ‘in the field‘, usually among non-institutionalised people in the general population. Since the subjects are disease free and the purpose is to prevent the occurrence of diseases that may occur with relatively low frequency, field trials are often huge undertakings involving major logistic and financial considerations. For example, one of the largest field trials ever undertaken was that of the Salk vaccine for the prevention or poliomyelitis, which involved over one million children. Even study of the prevention of coronary heart disease in high-risk middle aged males involved screening 360,000 men to identify 12,866 men eligible for the trial. In each of these two examples, randomisation was used to allocate participants to various treatment groups. The field trial method can be used to evaluate interventions aimed at reducing exposure, without necessarily measuring the occurrence of health effects. For instance, different protective methods for pesticide exposure have been tested in this way and measurement of blood lead levels in children has shown the protection provided by elimination of lead paint in the home environment. Such intervention studies can often be carried out on a small scale at low cost. 4.6 Community Trials In this form of experiment, the treatment groups are communities rather than individuals. This is particularly appropriate for diseases that have their origins in social conditions, which in turn can most easily be influenced by intervention directed at group behaviour as well as at individuals. Cardiovascular disease is a good example of a condition appropriate for community trials (Farquhar et. al., 1977), several of which are under way in this field (Salonen et al., 1986). A limitation of such studies is that only a small number of communities can be included, and random allocation of communities is not practicable: other methods are required to ensure that any differences found at the end of the study can be attributed to the intervention rather than to inherent differences between communities. Furthermore, it is difficult to isolate the communities where intervention is taking place from general social changes that may be occurring. Consequently, this type of study may underestimate the effect of intervention. Epidemiology-Made Easy 41 Table…..: Application of Different Observational Study Designs: Table 4.1: Application of Different Observational Study Designs: Ecological Cross- Case- Cohort Sectional Control Investigation of rare disease ++++ - +++++ Investigation of rare cause Testing multiple effects of Cause Study of multiple exposure and determinants Measurements of time relationship ++ - - +++++ ++ - - +++++ ++ ++ - +++++ ++ - +b +++++ Direct measurement to incidence - - +c +++++ Investigation of long latent periods - - +++ - § Key +…….+++++ indicates the degree of suitability § - Not suitable § b If prospective § c If population based Source: Beaglehole R, Bonita R, Kjellström T. Basic epidemiology. Geneva: World Health Organization; 1993 Jan. 42 Anthony K Mbonye Table 4.2: Advantages and disadvantages of different observational study designs: Ecological Cross- Case- Sectional Control Cohort Probability of recall bias NA Medium High Low Selection bias NA High High Low Loss of follow up NA NA Low High Confounding High Medium Medium Low Time required Low Medium Medium High COST Low Medium Medium High Source: Beaglehole & Kjellström, 1993 Practical session: 1. A new drug has been developed to treat Hepatitis B. It is cheap (3.0 $) per dose and can easily be afforded by developing countries like Uganda. However, it needs to be evaluated for its effectiveness. Describe an epidemiological study design you will use to assess the drug efficacy. 2. Discuss the importance of randomisation and why it is important in clinical studies. 3. Discuss the term confounding, its importance and how it can be overcome. 4. Discuss why recall and selection bias are high with case control studies. Epidemiology-Made Easy 43 Bibliography 1. Beaglehole R, Bonita R, Kjellström T. Basic epidemiology. Geneva: World Health Organization; 1993 Jan. 2. Farquhar JW. The Stanford cardiovascular disease prevention programs. Annals of the New York Academy of Sciences. Apr.1991 3. Molla AM, Ahmed SM, Greenough 3rd WB. Rice-based oral rehydration solution decreases the stool volume in acute diarrhoea. Bulletin of the World Health Organization. 1985;63(4):751. 4. Salonen JT, Salonen R, Seppänen K, Rauramaa R, Tuomilehto J. HDL, HDL2, and HDL3 subfractions, and the risk of acute myocardial infarction. A prospective population study in eastern Finnish men. Circulation. 1991 Jul;84(1):129-39. 5. Schulz KF, Grimes DA. ‘Allocation concealment in randomised trials: defending against deciphering’. The Lancet. 2002 Feb 16;359(9306):614-8. 6. Schulz KF, Grimes DA. ‘Blinding in randomised trials: hiding who go what’. 7. The Lancet. 2002 Feb 23;359:696-700. 8. Schulz KF, Grimes DA. ‘Generation of allocation sequencies in randomised trials. Chance not choice’. The Lancet. 2002 Feb 9;359: 515-19. 44 Anthony K Mbonye Lecture Notes-Series Five Measurement and Reporting of Outcomes Lecture Outline 1. 2. 3. 4. Measurement of Outcomes Sample size calculations Errors and bias Reporting of outcomes Expectations After reading this Lecture Series, it is expected that you should clearly know how to measure and repot epidemiological outcomes. You will also understand how to calculate the sample size of a study; and you will be introduced to epidemiological errors and biases and how to overcome them. Later, you will be introduced to a practical session that you are encouraged to do. Through this learning, you should master how to measure and repot epidemiological outcomes and how to use these in public health and control of infections. 5.1 Types of Outcomes Outcome measures fall into two basic categories: counting people and taking measurements on people. There is a special case of taking measurements’ that is based on time-to-event data. It is useful to distinguish between them because it helps to define the trial objectives, and methods of sample size calculation and statistical analysis. First, the unit of interest is determined, usually a person. Second, consider what will be done to the unit of interest. The outcome measure will involve either counting how many people have a particular characteristic (i.e., put them into mutually exclusive groups, such as ‘dead’ or ‘alive’), or taking measurements on them. In some situations, taking a measurement on someone involves counting something, but the unit of interest is still a person. Box 1 below shows examples of outcome measures. Having measured the endpoint for each trial subject, it is necessary to summarise the data in a form that can be readily communicated to others. Epidemiology-Made Easy 45 Box 1: Outcome Measures when the Unit of Interest is a Person: Counting people (Binary or Categorical data) Dead or alive Admitted to hospital (Yes or No) Suffered a first heart attack (Yes or No) Recovered from disease (Yes or No) Severity of disease (Mild, Moderate, Severe) Ability to perform household duties (none, a little, some, moderate, high) Taking measurements on people (continuous data) Blood pressure Body weight Cholesterol level Size of tumour Counting People This type of outcome measure is easily summarised by calculating the percentage of the population. For example, the effect of flu vaccine can be examined by counting how many developed flu in the vaccinated group, and dividing this number by the total number of patients in that group. This proportion (or percentage) is the risk, i.e., the risk of developing flu if vaccinated. The same calculation is made in the unvaccinated group, i.e., the risk of developing flu if not vaccinated. Taking Measurement on People: Table 5.1: Measuring levels of cholesterol. 46 3.6 3.8 3.9 4.1 4.2 4.5 4.5 4.8 5.1 5.3 5.4 5.4 5.6 5.8 5.9 6.0 6.1 6.1 6.2 6.3 6.4 6.5 6.6 6.8 6.9 7.1 7.2 7.2 7.3 7.4 7.5 7.7 8.0 8.1 8.1 8.2 8.3 9.0 9.1 10.0 Anthony K Mbonye Table 5.1 above shows levels cholesterol (mmol/L) for 40 healthy men, all aged 45 years (ranked in order of size). These data are summarised by two parameters: the ‘average’ level of cholesterol and measure of spread or variability. The average, often referred to as a measure of central tendency, can be described by either the means or median. It is where the middle of the distribution lies. The mean is more commonly reported and often taken to be the same as the average. Another measure of average is the mode – the most frequently occurring value – but there are few instances where this is the best summary measure. The mean is the sum of all the values divided by the number of observations. In the example above, the mean is 256/40 = 6.4 mmol/L. The median is the value that has half the observations above it and half below. In the example, it is halfway between the 20th and 21st value; median = (6.3 + 6.4)/2 = 6.35 mmol/L. One measure of spread is the standard deviation. It quantifies the amount of variability in a group of people, i.e., how much the data spreads about from the mean. It is calculated as: √ Sum of (the distance of each data point from the mean)2 (Number of data values – 1) In the example, the standard deviation is 1.57 mmol/L: the cholesterol levels differ from the mean value of 6.4 by, on average, 1.57 mmol/L. Another measure of spread is the interquartile range. This is the difference between the 25th centile (the value that has a quarter of the data below it and three quarters above it) and the 75th centile (the value that has three-quarters of the data below it and a quarter above it). Epidemiology-Made Easy 47 Table 5.2: Measuring the interquartile range Cholesterol (mmol/l) Number of men Percentage 3.0 – 3.9 3 7.5 1.0 – 4.9 5 12.5 5.0 – 5.9 7 17.5 6.0 – 6.9 10 25.0 7.0 – 7.9 7 17.5 8.0 – 8.9 5 12.5 9.0 – 9.9 2 5.0 10.0 – 10.9 1 2.5 TOTAL 40 100.0 Source: Hackshaw, 2009 In the example, there are 40 observations so the 25th centile is between the 10th and 11th data points (i.e. 5.32 mmol/L) and the 75th centile is between the 30th and 31st data points (i.e. 7.47 mmol/L). Sometimes, the actual 25th and 75th centile are presented instead of the interquartile range. Deciding which measures of average spread to use depends on whether the distribution is symmetric or not. To help determine this, the data is grouped into categories of cholesterol levels and the frequency distribution is examined. The shape is reasonably symmetric, that the distribution is Gaussian or Normal (‘N’ is in capital letters to avoid confusion with the usual definition of the normal, which can indicate people without disease). This is more easily visualised by drawing a curve around the histogram, which is said to be bell-shaped. When data are normally distributed, the mean and median are similar. The preferred measures of average and spread are the mean and standard deviation, because they have useful mathematical properties which underlie many statistical methods used to analyse this type of 48 Anthony K Mbonye data. When the data are not Normally distributed, the median and interquartile range are better measures. To understand why, consider the outcome measure number of days in hospital for 20 patients. It is clear that the distribution is not symmetric. It is skewed to the right (this is where the tail end of the data is). When most of the data are towards the right, the distribution is said to be skewed to the left. The summary statistics that describe this data are: Mean Median = 17 days = 9 days Standard deviation = 19 days Interquartile range= 8 days The middle of the data, and spread, are better represented by the median and interquartile range. The mean and standard deviation are heavily influenced by a few very high values. When data are skewed, it is sometimes possible to transform it, usually by taking logarithms or the square root. Many biological measurements only have a Normal (symmetric) distribution after the logarithm is taken, so using the log of the values would produce a histogram that has a similar shape to the figure. The mean is calculated using the log of the values, and the result is back transformed to the original scale. For example, if the mean of the transformed values is 0.81, using log to the base10, the calculation 10 0.81 = 6.5 produces the mean value of the original scale. This is called a geometric mean. Sometimes no transformation is possible that will turn a skewed distribution into a Normal one. In these situations, the median and interquartile range should be used. A probability (or centile) plot can be used to determine whether data is normally distributed or not. Many statistical software packages can provide this. An example can be seen above which includes the 40 cholesterol measurements. If the observations lie reasonably along a straight line, the data are Normally distributed. Another simple check is to examine whether the mean = 2 x standard deviation produces sensible numbers. In the example above, with a mean of 17 days, it would be, (2 x 19); the lower limit of – 21 days is clearly implausible. Epidemiology-Made Easy 49 5.3 Time–To-Event Data A specific category of ‘taking measurements on people’ involves examining the time taken until an event has occurred, based on the difference between two calendar dates. An event could be defined in many ways, and one of the simplest and most commonly used ‘is death’ – hence the term survival analysis which is applied to this type of data. In the following seven subjects, the endpoint is time from randomisation until death (in years), and all have died. 4.5 6.1 6.7 8.3 9.1 9.4 10.0 The mean (7.7 years) or median (8.3 years) are easily calculated. In another group of nine subjects, not all have died at the time of statistical analysis. 2.7 dead 2.9 dead 3.3 alive 4.7 dead 5.1 alive 6.8 alive 7.2 dead 7.8 dead 9.1 alive The mean or median cannot be calculated in the usual way until the subjects have died, which could take many years, and it is incorrect to ignore those still alive because the summary measure would be biased downward. An alternative is to obtain the survival rate at say, 3 years survival rate is 7/9 = 78%. This is simply an example of ‘counting people’. However, every subject needs to be followed up for at least 3 years, unless they died before hand, and the outcome (dead or alive) must be known at that point for all of them. In many studies this is not possible, particularly with long follow up, because contact is lost with some subjects. In 1958 a statistical method was developed that changed the way this type of data was displayed and analysed. In the example above, the timeto-event variable is treated as ‘time’ from randomisation until death or last known to be alive’ (instead of ‘time from randomisation until death’), and there is another variable with the values 0 or 1 to indicate ‘still alive’ or dead’. A subject who is still alive, or last known to be alive at a certain date, is said to be censored. The two variables are used in a life-table from which it is possible to construct a Kaplan-Meier plot. This approach uses the last available information on every subject and 50 Anthony K Mbonye how long they lived for, or have been in the study. It is therefore less of a concern if contact with some subjects was lost, because having the date when they were last known to be alive still provides information. The table below is based on the group of nine subjects. The plot looks like a series of steps. Every time a subject dies, the step drops down (the first drop is at 2.7 years) when subjects are censored, four in the example, they contribute no further information to the analysis after that date. In large studies with many deaths, the plot looks smoother. It is possible to estimate survival rates at specific time points, and the median survival. For the 5-year survival rate, a vertical line is drawn on the X-axis at ‘5’ and the corresponding Y-axis value is taken when the line hits the curve: 65%. The median is the time at which half the subjects have died. A horizontal line is drawn on the Y-axis at ‘50%’ and the corresponding X-axis value is taken when the line hits the curve: 7.2 years. 5.3: These estimates more accurately obtained from the life-table. Table Life Table are of the survival data of nine patients Table 5.3: Life Table of the survival data of nine patients Time Since Censored (0=Yes, Number Of Percentage Randomisation 1=Dead) Patients at Risk Alive (Survival (Years) Rate %) 0 - 9 100 2.7 1 9 89 2.9 1 8 78 3.3 0 7 78 4.7 1 6 65 5.1 0 5 65 6.8 0 4 65 7.2 1 3 43 7.8 1 2 22 9.1 0 1 22 Source: Hackshaw, 2009 Epidemiology-Made Easy 51 When some subjects are censored, i.e., not all have died, the KaplanMeier median survival is not the same as finding the median from a ranked list of numbers (as in the example). They are only identified when every subject has died, which is rare in trials. The median is used instead of the mean, because time-to-event date often has a skewed distribution. The Kaplan-Meier plot starts off with every subject alive at time zero, this is the most common form in the literature. This type of plot is useful when deaths tend to occur early on. However, it is possible to have a plot in which no subject has died at time zero. 5.4 Different Types of Time-To-Event Outcome Measures In the section above, the ’event’ in the time-to-event data is ‘death’, called overall survival because it relates to death from any cause. The methods can apply to an endpoint that involves measuring the time until a specified event has occurred; for example, time from entry to a trial until the occurrence or recurrence of a disorder, such as severe exacerbation of asthma, or any change in health status, such as time until hospital discharge.. Overall survival is simple because it only requires the date of death. Cause-specific survival requires, in addition, accurate confirmation of cause of death (such as pathology records), which is not always available or reliably recorded. Also, cause-specific survival means that deaths from causes other than that of interest are not counted as an event (they are censored). This may be inappropriate when the treatment has serious side-effects. A new therapy may reduce the lung cancer death rate but increase the risk of dying from treatment-related side effects, for example, cardiovascular disease. Here, overall survival is probably more appropriate. When an event is disease incidence, recurrence or progression, the date when this occurs is required. However, obtaining accurate dates is difficult unless subjects are examined regularly. The date is usually when the disease was first discovered. This is either the date when the subject was due to have one of the regular examinations specified in the trial protocol, or after the subject developed symptoms and received clinical confirmation. Subjects in the trial arms should therefore have their regular examinations at a similar time. When the measure is based on two or more event types and a subject could have both events, such as disease occurrence followed by death, it 52 Anthony K Mbonye is usual to consider only the date of the first event in the analysis. This is because the patient may be managed differently afterwards: the trial treatment changes or stops, non-trial therapies are given, or patients may be given the treatment from the other trial arm. When this occurs, it is difficult dealing with sub-sequent events, and how to attribute differences in the endpoint to the trial treatments. Unlike overall survival, disease-, progression or event-free survival are unaffected by subsequent treatments because only the first event matters in the analysis. Box 3: Time-To-Event Outcome Measures in Trials Endpoint Event Comments Overall survival Death from any cause Disease-free survival First recurrence of the disease Death from any cause Event-free survival First recurrence of disease First occurrence of other specified diseases Death from any cause First sign of disease progression Death from any cause Easily defined May mask the effect of an intervention if it only affects a specific disease. Useful when patients are thought to be free from disease after treatment, so patients have a good prognosis. Need date of recurrence. Similar to disease free survival Progression free survival Disease (or cause)-specific survival Death from the disease of interest Epidemiology-Made Easy Useful for advanced disease, where patients have not been ‘cured’ after treatment, and are expected to get worse in the near future. Needs date of progression Useful when examining interventions that are not expected to have an effect on any disease 53 apart from the one of interest. Death from any cause Endpoint Event Disease Overall(or cause)-specific survival survival Death Deathfrom fromthe anydisease cause of interest Disease-free survival First recurrence of the disease Death from any cause Endpoint An event is defined as follows. All other subjects are censored First recurrence of disease Event-free survival Time-totreatment Progression failure First occurrence of other specified First sign diseases of disease Death from progression any cause First sign disease Death fromofant cause progression Stopped treatment Death from any cause have not been ‘cured’ after treatment, and are expected to get worse in the near future. Needs date of Comments progression Useful when examining Easily defined interventions May mask thethat effectare of not expected to have an an intervention if it only effect any disease. disease affects on a specific apart one are of Usefulfrom whenthe patients interest. thought to be free from Need accurate recording disease after treatment, and confirmation of so patients have a good Comments cause of death. prognosis. Assumes is Need date treatment of recurrence. not associated Similar to disease with free death from other causes. survival Similar to progression free survival Useful for advanced disease, where patients have not been ‘cured’ after treatment, and are Source: Hackshaw, 2009 expected to get worse in the near future. Recurrence: there was no clinical evidence ofNeeds the disease date shortly after of treatment, but the disease returned later on. progression Disease (or Death from the disease of Useful when examining cause)-specific Progression (or relapse): disease after treatment, interest the patient still had the interventions that are survival but it got worse later. Disease and event-free survival may be used not expected to have an interchangeably, so it is useful to be clear abouteffect the precise definition. on any disease apart from the one of 5.6 Measurement of Outcomes: interest. Identification and quantification of outcomes Need is the accurate core consideration recording of research. However, slippery terminology often matters and complicates confirmation of for investigators and readers alike. For example, the term rate (as in cause of death. maternal mortality rate) has been misused in textbookstreatment and journal Assumes is articles for decades. Additionally, rate is often interchangeably not used associated with with proportion and ratio. Figure 3 presents a simple approach to death from other causes. free survival classification of these common terms. 54 Anthony K Mbonye Figure 3: Distinguishing Rates, Population, and Ratios Ratio Is numerator included in denominator? Yes No Is time included in denominator? Yes No Measure: Rate Proportion Ratio Example: Incidence rate Prevalence rate Maternal mortality ratio Source: Grimes & Schultz, 2002 A ratio is a value obtained by dividing one number by another. These two numbers can be either related or unrelated. This feature – i.e., relatedness of numerator and denominator – divides ratios into two groups: those in which the numerator is included in the denominator – e.g., rate and proportion – and those in which it is not. A rate measures the frequency of an event in a population. As shown in figure 3, the numerator (those with the outcome) of a rate must be contained in the denominator (those at risk of the outcome). Although all ratios feature a numerator and denominator, rates have two distinguishing characteristics: time and a multiplier. Rates indicate the time during which the outcomes occur and a multiplier, commonly to a base ten, to yield whole numbers. An example would be an incidence rate, indicating the number of new cases of disease in a population at risk over a defined interval of time - e.g., 11 cases of tuberculosis per 100,000 persons per year. Proportion is often used synonymously with rate, but the former does not have a time component. Like a rate, a proportion must have Epidemiology-Made Easy 55 the numerator contained in the denominator. Since the numerator and denominator have the same units, those divide out, leaving a dimensionless quantity; a number without units. An example of a proportion is prevalence – e.g., 27 of 100 at risk have malaria. This number indicates how many of a population who are at risk have a condition at a particular time (here, 27%); since documentation of new cases over time is not involved, prevalence is more properly considered a proportion than a rate. Although all rates and proportions are ratios, the opposite is not true. In some ratios, the numerator is not included in the denominator. Perhaps the most common example is the maternal mortality ratio. The definition includes women who die of pregnancy related causes in the numerator and women with livebirths (usually 100,000) in the denominator. However, not all those in the numerator are included in the denominator – e.g., a woman who dies of an ectopic pregnancy cannot be in the denominator of women with live births. Thus, this is actually a ratio, not a rate, a fact only recently appreciated. 5.7 Measures of Association Relative risk (also termed the risk ratio) is another useful ratio: the frequency of outcome in the exposed group divided by the frequency of outcome in the unexposed. If the frequency of the outcome is the same in both groups, then the ratio is 1.0, indicating on association between exposure and outcome. By contrast, the ratio will be greater than 1.0, implying an increased risk associated with exposure. Conversely, if the frequency of disease is less among the exposed, then the relative risk will be less than 1.0, implying a protective effect. The odds ratio has different meanings in different settings. In case control studies, this measure is the usual measure of association. It indicates the odds of the exposure among the case group divided by the odds of the exposure among controls. If cases and controls have equal odds of having the exposure, the odds ratio is 1.0, indicating no effect. If the cases have a higher odds of exposure than the controls, then the ratio is greater than 1.0, implying an increased risk associated with exposure. Similarly, odds ratios less than 1.0 indicate a protective effect. 56 Anthony K Mbonye An odds ratio can also be calculated for cross-sectional, cohort, and randomised controlled studies. Here, the disease-odds ratio is the ratio of the odds in favour of disease in the exposed versus that in the unexposed. In this context, the odds ratio has some appealing statistical features when studies are aggregated in meta analyses, but the odds ratio does not indicate the relative risk when the proportion with the outcome is greater than 5-10% - i.e., the term has little clinical relevance or meaning with higher incidence rates. The confidence interval reflects the precision of study results. The interval provides a range of values for a variable, such as a proportion, relative risk, or odds ratio, that has a specified probability of containing the true value for the entire population from which the study sample was taken. Although 95% CIs are the most commonly used, others such as 90%, are seen (and advocated). The wider the confidence interval, the less precision exists in the result, and vice versa. For relative risks and odds ratios, when the 95% CI does not include 1.0, the difference is significant at the usual 0.05 level. 5.8 Sample Size Calculations The desirable size of a proposed study can be assessed using standard formulae. Information on the following variables is required before the formula can be employed: • • • • • Required level of statistical significance for the expected result Acceptable chance of missing a real effect Magnitude of the effect under investigation Amount of disease in the population Relative sizes of the groups being compared In reality, sample size is often determined by logistic and financial considerations, and a compromise always has to be made between sample size and costs. A practical guide to determining size in health studies has been published by the WHO (Lwanga & Lemeshow, 1991). The precision of a study can also be improved by ensuring that the groups are of appropriate relative size. This is often an issue of concern in case– control studies when a decision is required on the number of controls Epidemiology-Made Easy 57 to be chosen for each case. It is not possible to be definitive about the ideal ratio of controls to cases, since this depends on the relative costs of accumulating cases and controls. If cases are scarce and controls plentiful, it is appropriate to increase the ratio of controls to cases. In general, however, there may be little point in having more than four controls for each case. It is important to ensure that there is sufficient similarity between cases and controls when the data are to be analysed by, for example, age group or social class; if most cases and only a few controls were in the older age groups, the study would be inefficient and a wasted effort. Summary Points • • • • • • Trials should have clearly defined outcome measures (endpoints). Secondary endpoints should be closely correlated with ‘primary’ endpoints and have been validated, especially if they are used as the main trial endpoint. Outcome measures could involve ‘counting people’, ‘taking measurements on people’ or ‘time-to-event’ data. Counting people: data are summarised by a percentage or proportions. Taking measurements on people: data are summarised by average and spread (mean and standard deviation if the data are Normally distributed, median and interquartile range if the data are skewed). Time–to-event data: when not all patients have had the event of interest, the data can be summarised using a Kaplan-Meier plot, median value, or survival or event rate at a specific time point. 5.9 Potential errors in Epidemiological Studies An important purpose of most epidemiological investigations is to measure accurately the occurrence of disease (or other outcome). Epidemiological measurement is, however, not easy and there are many possible sources of errors in measurement. Much attention is devoted to minimising errors and, since they can never be completely eliminated, assessing their importance. Error can be either random or systematic. Random Error This is Random error due to chance alone, of an observation on a sample from the true population value. This may lead to lack of precision in the measurement of an association. 58 Anthony K Mbonye Random error can never be completely eliminated since we can study only a sample of the population, individual variation always occurs and no measurement is perfectly accurate. Random error can be reduced by the careful measurement of exposure and outcome thus making individual measurements as precise as possible. Sampling error occurs as part of the process of selecting study participants who are always a sample of a larger population, and the best way to reduce it is to increase the size of the study. Systematic Error Systematic error (or bias) occurs in epidemiology when there is a tendency to produce results that differ in a systematic manner from the true values. A study with a small systematic error is said to have a high accuracy. Systematic error is a particular hazard because epidemiologists usually have no control over participants in studies, unlike the situation in laboratory experiments. Furthermore, it is often difficult to obtain representative samples of source populations. Some variables of interest in epidemiology are particularly difficult to measure, among them personality type, alcohol consumption habits, and past exposures to rapidly changing environmental conditions, and this difficulty may lead to systematic error. The possible sources of systematic error in epidemiology are many and varied, indeed over 30 specific types of bias have been identified. The principal biases are: • Selection bias • Measurement (or classification) bias Selection Bias Selection bias occurs when there is a systematic difference between the characteristics of the people selected for a study and the characteristics of those who are not. An obvious source of selection bias occurs when participants select themselves for a study, either because they are unwell or because they are particularly worried about an exposure. It is well known, for example, that people who respond to an invitation to participate in a study on the effects of smoking differ in their smoking Epidemiology-Made Easy 59 habits from non-responders, the latter are usually heavier smokers. In studies of children’s health, where parental cooperation is required, selection bias may also occur. In a cohort study of new-born children (Victoria et al., 1987), the proportion successfully followed up for 12 months varied according to income level of the parents. If individual entering or remaining in a study display different associations from those who do not, a biased estimate of the association between exposure and outcome is produced. An important selection bias is introduced when the disease or factor under investigation itself make people unavailable for study. For example, in a factory where workers are exposed to formaldehyde, those who suffer most from eye irritation are likely to leave their jobs at their own request or after medical advice. The remaining workers are less affected and in a prevalence study conducted in the workplace, the association between formaldehyde exposure and eye irritation may be very misleading. Measurement Bias Measurement bias occurs when the individual measurement or classification of disease or exposure is inaccurate (i.e., they don’t measure correctly what they are supposed to measure). There are many sources of measurement bias and their effects are of varying importance. For instance, biochemical or physiological measurements are never completely accurate and different laboratories often produce different results on the same specimen. If the specimens of the exposed and control groups are analysed randomly by different laboratories with insufficient joint quality assurance procedures, the errors will be random and less potentially serious for the epidemiological analysis, than in the situation where all specimens from the exposed group are analysed in one laboratory and all those from the control group are analysed in another. If the laboratories produce systematically different results when analysing the same specimen, the epidemiological evaluation becomes biased. A form of measurement bias of particular importance in retrospective case-control studies is known as recall bias. This occurs when there is a differential recall of information by cases and controls; for instance, 60 Anthony K Mbonye cases may be more likely to recall past exposure, especially if it is widely known to be associated with the disease under study (for example, lack of exercise and heart disease). Recall bias can either exaggerate the degree of effect associated with the exposure (as with heart patients being more likely to admit to a past lack of exercise) or underestimate of it (if cases are more likely than controls to deny past exposure). If measurement bias occurs equally in the groups being compared (nondifferential bias) it almost always results in an underestimate of the true strength of the relationship. This form of bias may account for some apparent discrepancies between the results of different epidemiological studies. 5.10 Confounding In a study of the association between exposure to a cause (or risk factor) and the occurrence of disease, confounding can occur when another exposure exists in the population and is associated both with the disease and the exposure being studied. A problem arises if this extraneous factor – itself a risk factor of the health outcome – is unequally distributed between the exposure subgroups. Confounding occurs when the effects of two exposures (risk factors) have not been separated and it is therefore incorrectly concluded that the effect is due to one rather than the other variable. For instance, in a study of the association between tobacco smoking and lung cancer, age would be a confounding factor if the average ages of the non-smoking and smoking groups in the study population were very different, since lung cancer incidence increases with age. Confounding can have a very important influence, possibly even changing the apparent direction of an association, a variable that appears to be protective may, after control of confounding, be found to be harmful. The most common concern over confounding is that it may create the appearance of a cause-effect relationship that in reality does not exist. For a variable to be confounder, it must, in its own right, be a determinant of the occurrence of disease (i.e., a risk factor) and with the exposure and lung cancer, smoking is not a confounder if the smoking habits are identified in the exposed and control groups. Epidemiology-Made Easy 61 Age and social class are often confounders in epidemiological studies. An association between high blood pressure and coronary heart disease may in truth represent concomitant changes in the two variables that occur with increasing age; the potential confounding effect of age has to be considered, and when this is done it is seen that high blood pressure indeed increases the risk of coronary heart disease. Another example of confounding is shown in the figure 5.11 below. Confounding may be the explanation for the relationship demonstrated between coffee consumption and the risk of coronary heart disease, since it is known that coffee consumption is associated with cigarette smoking: people who drink coffee are more likely to smoke than people who do not drink coffee. It is also well known that cigarette smoking is a cause of coronary heart disease. It is thus possible that the relationship between coffee consumption and coronary heart disease, merely reflects the known causal association of smoking with the disease. In this situation smoking confounds the apparent relationship between coffee Figure 5.11: Confounding: coffee drinking, cigarette smoking, and coronary heart consumption and coronary heart disease. Figure 5.11: Coffee drinking, cigarette smoking, and coronary heart disease EXPOSURE DISEASE (coffee consumption) (coronary heart disease) CONFOUNDING cigarette smoking Control of confounding Several methods are available to control confounding, either through study or during the analysis of results. Control of design confounding methods used to control confounding the design of design or SeveralThe methods arecommonly available to control confounding, eitherinthrough study an epidemiological study are: during•theRandomisation analysis of results. • • Restriction Matching The methods commonly used to control confounding in the design of an epidemiological study are: • 62 Randomisation Anthony K Mbonye At the analysis stage, confounding can be controlled by: • Stratification • Statistical modelling Randomisation: which is applicable only to experimental studies, is the ideal method for ensuring that potential confounding variables are equally distributed among the groups being compared. The samples sizes have to be sufficiently large to avoid random misdistribution of such variables. Randomisation avoids the association between potentially confounding variables and the exposure that is being considered. Restriction: can be used to limit the study to people who have particular characteristic. For example, in a study on the effects of coffee on coronary heart disease, participation in the study of confounded by cigarette smoking. Matching: If matching is used to control confounding the study participants are selected so as to ensure that potential confounding variables are evenly distributed in the two groups being compared. For example, in a case–control study of exercise and coronary heart disease, each patient with heart disease can be matched with a control of the same age group and sex to ensure that confounding by age and sex does not occur. Matching has been used extensively in case control studies, but it can lead to problems in the selection of controls if the matching criteria are too strict or too numerous; this is called overmatching. Matching can be expensive and time consuming, but is particularly useful if the danger exists of there being on overlap between cases and controls, as where the cases are likely to be older than the controls. Analysis: In large studies it is usually preferred to control for confounding in the analytical phase rather than in the design phase. Confounding can then be controlled by stratification, which involves the measurement of the strength of associations in well-defined and homogeneous categories (strata) of the confounding variable. If age is a confounder, the association may be measured in, say 10-year age groups; if sex of ethnicity is a confounder, the association is measured separately in men and women or in the different ethnic groups. Methods are available for summarising the overall association by producing a weighted average of the estimates calculated in each separate stratum. Epidemiology-Made Easy 63 Questions to stimulate further reading: 1. 2. 3. 4. Discuss two types of errors commonly encountered in epidemiolocal studies. How can the errors be minimised? Discuss the term confounding and the different strategies to address it Uganda has a high burden of malaria, HIV/AIDS, hepatitis B, measles, cholera, typhoid, and non-communicable diseases. Describe how an epidemiologist can help policy makers to control these diseases. 5. Describe several study designs they may use to provide policy relevant data. 6. The Malaria Control Division in the Ministry of Health, together with the Reproductive Health Division, want to evaluate a new antimalarial drug called Atekin for malaria prevention in pregnancy. It is hypothesized that Atekin has more beneficial effects in reducing parasitemia and anemia in pregnancy. a) What are the outcome indicators for this study? b) What study design would best measure the outcome indicators? c) What are the strengths and weaknesses of the design of your choice? 7. Define the term surveillance. a)Describe the types of surveillance systems commonly use in epidemiology. b) Describe how surveillance can help in the control of the frequent viral haemorrhagic fevers (like Ebola) in Uganda. Practical session 1. Discuss important considerations when calculating sample size of a study. 2. Why are outcome measures important in writing research grant proposals? 3. Discuss the importance of time-to-event studies. 4. A Case Control study was conducted in Mukono district to determine if children who live in households with dirty and old latrines, experience more episodes of diarrhea compared to children from households with new and improved latrines. 64 Anthony K Mbonye In total, 100 ‘cases’ (60 with old latrines and 40 with new latrines) and 400 ‘controls’ (150 with old latrines and 250 with new latrines) were included in the study. a. Calculate the risk of getting diarrhoea expressed as the Odds Ratio Table 5.4a Cases of diarrhoea (OR) using 2 x 2 table. (10 marks).among children by type of latrine Table 5.4 Cases of diarrhoea among children by type of latrine Households with old latrines (exposed) Households with new latrines (nonexposed) Total Children with diarrhea (Cases) Control Total a 60 b 150 210 c 40 d 250 290 100 400 500 Answers: Odds ratio is defined as (odds of the event in the exposed group)/(odds of the event of the non-exposed). The formula to calculate odds ratio is, (a/b)/(c/d) =ad/ bc=60x250/150x40=2.5 b. Interpret results of the Odds Ratio (5 marks) The odds ratio of 2.5 implies that the risk of getting diarrhea among children who reside in households with old latrines is 2.5 times higher that those in households with new latrines. 2. One thousand women in reproductive age visited a cervical cancer clinic and were tested for cervical cancer using CareHPV test. Pap smear was used as the ‘Gold standard’ for detecting cervical cancer. Epidemiology-Made Easy 65 Table 5.5: Calculating the sensitivity, specificity and predictive value of a test CAREHPV Test (Positive) Pap Smear (Positive) Have disease Pap Smear (Negative) Have no disease 160 80 a TOTAL b 240 (False positives) CAREHPV Test (Negative) TOTAL c d 40 (False negatives) 720 760 200 Total positive 800 Total negative with no disease 1,000 Total Population with disease a. Using information above, calculate the prevalence of HPV in this population. Prevalence of the diseases is: Total positive with disease/Total population = 200/1000=20% b. What was the sensitivity of the CareHPV test? Sensitivity of the test is: True positives/Total Positives with disease or a/ (a+c)x 100 = 160/200 x100=80% c. What was the Specificity of the CareHPV test? True negatives/Total negatives with no disease or d/(b+d)X 100 = 720/800 X 100=90% d. What was the Predicative Value of a positive CareHPV test? The positive predictive value of the test is: a / (a+b) x 100 =160/ (160+40) x 100=80% e. What was the Predicative Value of a negative CareHPV test? The negative predictive value of the test is: d / (d+c) x 100= 720/ (720+40) x 100=94.7% 3. Uganda has an increasing prevalence of cancer diseases, common among them is cancer of the cervix, breast, lung, colon, prostate, etc. Accordingly, cases in Kyadondo region are captured in the cancer registry and there are efforts to capture all cancer cases in the country. 66 Anthony K Mbonye In follow up of cancer patients, the time of diagnosis of cancer and the cure, relapse or death are usually recoded. a) Describe the technique you would use to find out the survival rate of women who get cancer of the cervix. b) Why is this epidemiological parameter important? . c) If you wanted to find out the risk factors that expose women to cancer of the cervix, what study design(s) would you prefer, and why? Bibliography 1. Beaglehole R, Bonita R, Kjellström T. Basic epidemiology. Geneva: World Health Organization; 1993 Jan. 2. Grimes DA, Schulz KF. ‘Bias and causal associations in observational research’. The Lancet. 2002 Jan 19;359(9302):248-52. 3. Grimes DA, Schulz KF. ‘Uses and abuses of tests’. The Lancet. 2002 March 9, 359:881-84. 4. Hackshaw A. A concise Guide to Clinical trials. Willey-Blackwell, BMJ Books, 2009. 5. Lwanga SK, Lemeshow S, World Health Organization. Sample size determination in health studies: a practical manual. World Health Organization; 1991. 6. Schulz KF, Grimes DA. ‘Sample size slippages in randomised trials: exclusions and the lost and the way forward’. The Lancet. 2002, March 2;359: 781-85. 7. Schulz KF, Grimes DA. ‘Sample size calculations in randomised trials: mandatory and mystical’. The Lancet. 2005 Apr 9;365(9467):1348-53. 8. Schulz KF, Grimes DA. The Lancet handbook of essential concepts in clinical research. Lancet; 2006. 9. Schulz KF. ‘Randomised trials, human nature, and reporting guidelines’. The Lancet. 1996 Aug 31;348(9027):596-8. 10. Schulz KF. ‘Randomised trials, human nature, and reporting guidelines’. The Lancet. 1996 Aug 31;348(9027):596-8. Epidemiology-Made Easy 67 Lecture Notes-Series Six Practical Steps in investigating a disease outbreak Lecture Outline 1. 2. 3. 4. Background Key steps in disease investigation Detailed explanation of the key steps for disease investigation Practical Session Expectations After reading this Lecture Series, it is expected that you should clearly know practical steps required to investigate a diseases outbreak. Later, you will be introduced to a practical session that you are encouraged to do. Through this learning, you should master how to investigate a diseases outbreak and how to use the findings in designing prevention and control measures. 6.1 Background: Once there is a reported disease outbreak and a decision to conduct a field investigation of that outbreak has been made, working quickly and swiftly is critical. Below are the essential practical steps necessary to investigate a diseases outbreak. These will then be explained and their relevancy emphasized in subsequent sections. 6.2 Key steps in disease investigation: 1. 2. 3. 4. 5. 6. 7. 68 Prepare for field work Establish the existence of an outbreak Verify the diagnosis Construct a working case definition Find cases systematically and record the data Perform descriptive epidemiological analyses Develop a working hypotheses Anthony K Mbonye 8. Evaluate the hypotheses 9. From time to time, review, refine, and re-evaluate the hypotheses 10. Compare and reconcile with laboratory and/or environmental studies 11. Implement control and prevention measures 12. Initiate or maintain surveillance 13. Communicate the findings 6.3 Detailed explanation of the key steps in disease investigation Step 1: Prepare for field work Field preparations can be grouped into two categories: (a) scientific and investigative issues, and (b) management and operational issues. Good preparation in both categories is needed to facilitate a smooth field investigation. As a field investigator, you must have the appropriate scientific knowledge, supplies, and equipment before departing for the field. It is important to discuss the field work preparations with someone knowledgeable about the disease and field investigations. It is also essential to review the relevant literature. Before leaving for a field investigation, consult laboratory staff to ensure that you take the proper laboratory materials and know the proper collection, storage, and transportation techniques. By talking with the laboratory staff, you are also informing them about the outbreak, and they can anticipate what type of laboratory resources will be needed. You also need to know what supplies or equipment to bring to protect yourself. Some outbreak investigations require no special equipment, while an investigation of SARS or Ebola haemorrhagic fever may require personal protective equipment such as masks, gowns, and gloves. Finally, before departing, you should have a plan of action. What are the objectives of this investigation, i.e., what are you trying to accomplish? What will you do first, second, and third? Having a plan of action upon which everyone agrees will allow you to ‘hit the ground running’ and avoid delays resulting from misunderstandings. Epidemiology-Made Easy 69 A good field investigator must be a good manager and collaborator as well as a good epidemiologist, because most investigations are conducted by a team rather than just one individual. The team members must be selected before departure and know their expected roles and responsibilities. Does the team need a laboratory technician, a veterinarian, translator/interpreter, computer specialist, entomologist, or other specialists? What is the role of each? Who is in charge? If you have been invited to participate but do not work for the local health team, are you expected to lead the investigation, provide consultation to the local staff who will conduct the investigation, or simply lend a hand to the local staff? And who are your local contacts? Depending on the type of outbreak, the number of involved agencies may be quite large. The investigation of an outbreak from an animal source may include the department of agriculture. If criminal or bioterrorist intent is suspected, law enforcement agencies may be in charge, or at least involved. In a natural disaster (hurricane or flood), the Ministry of Disaster Management and Preparedness, and the Prime minister’s office may be involved. Staff from different agencies have different perspectives, approaches, and priorities that must be reconciled. For example, whereas the public health investigation may focus on identifying a pathogen, source, and mode of transmission, a criminal investigation is likely to focus on finding the perpetrator. Sorting out roles and responsibilities in such multi-agency investigations is critical to accomplishing the disparate objectives of the different agencies. A communications plan must be established. The need for communicating with the public and the community has long been acknowledged, but the need for communicating quickly and effectively with elected officials and the public is obvious during the epidemics like Ebola, Yellow Fever, West Nile Virus encephalitis, SARS, anthrax, and COVID-19. The plan should include how often and when to have conference calls with involved agencies, who will be the designated spokesperson, who will prepare health alerts and press releases, and the like. In addition, operational and logistical details are important. Arrange to bring a laptop computer, cell phone or phone card, camera, and other supplies. If you are arriving from outside the area, you should arrange in advance when and where you are to meet with local officials and contacts when 70 Anthony K Mbonye you arrive in the field. You must arrange travel, lodging, and local transportation. Many agencies and organizations have strict approval processes and budgetary limits that you must follow. If you are traveling to another country, you will need a passport and often a visa. You should also take care of personal matters before you leave, especially if the investigation is likely to be lengthy. Step 2: Establish the existence of a disease outbreak Definitions: • A disease outbreak or an epidemic is the occurrence of more cases of disease than expected in a given area or among a specific group of people over a particular period of time. • Many epidemiologists use the terms outbreak and epidemic interchangeably, but the public is more likely to think that an epidemic implies a crisis situation. • Some epidemiologists apply the term epidemic to situations involving larger numbers of people over a wide geographic area. Indeed, the Dictionary of Epidemiology defines outbreak as an epidemic limited to localized increase in the incidence of disease, e.g., village, town, or closed institution. One of the first tasks of the field investigator is to verify that a cluster of cases is indeed an outbreak. Some clusters turn out to be true outbreaks with a common cause, some are sporadic and unrelated cases of the same disease, and others are unrelated cases of similar but unrelated diseases. Even if the cases turn out to be the same disease, the number of cases may not exceed what the health department normally sees in a comparable time period. Here, as in other areas of epidemiology, the observed is compared with the expected. The expected number is usually the number from the previous few weeks or months, or from a comparable period during the previous few years. For a notifiable disease, the expected number is based on health department surveillance records. For other diseases and conditions, the expected number may be based on locally available data such as hospital discharge records, mortality statistics, or cancer or birth defect registries. When local data are not available, a health department may use rates from state or national data, or, alternatively, conduct a telephone survey of physicians to determine Epidemiology-Made Easy 71 whether they are seeing more cases of the disease than usual. Finally, a survey of the community may be conducted to establish the prevalence of the disease. Even if the current number of reported cases exceeds the expected number, the excess may not necessarily indicate an outbreak. Reporting may rise because of changes in local reporting procedures, changes in the case definition, increased interest because of local or national awareness, or improvements in diagnostic procedures. A new physician, infection control nurse, or healthcare facility may more consistently report cases, when in fact there has been no change in the actual occurrence of the disease. Some apparent increases are actually the result of misdiagnosis or laboratory error. Finally, particularly in areas with sudden changes in population size such as resort areas, college towns, and migrant farming areas, changes in the numerator (number of reported cases) may simply reflect changes in the denominator (size of the population). Whether an apparent problem should be investigated further is not strictly tied to verifying the existence of an epidemic (more cases than expected). Sometimes, health agencies respond to small numbers of cases, or even a single case of disease, that may not exceed the expected or usual number of cases. As noted earlier, the severity of the illness, the potential for spread, availability of control measures, political considerations, public relations, available resources, and other factors all influence the decision to launch a field investigation. Step 3: Verify the diagnosis The next step is to verify the diagnosis. This is closely linked to verifying the existence of an outbreak. In fact, often these two steps are addressed at the same time. Verifying the diagnosis is important: (a) to ensure that the disease has been properly identified, since control measures are often disease-specific; and (b) to rule out laboratory error as the basis for the increase in reported cases. First, review the clinical findings and laboratory results. If you have questions about the laboratory findings (for example, if the laboratory tests are inconsistent with the clinical and epidemiologic findings), ask a qualified laboratory technician to review the laboratory techniques 72 Anthony K Mbonye being used. If you need specialized laboratory work such as confirmation in a reference laboratory, other chemical or biological fingerprinting, or polymerase chain reaction, you must secure a sufficient number of appropriate specimens, isolates, and other laboratory materials as soon as possible. Second, many investigators — clinicians and non-clinicians — find it useful to visit one or more patients with the disease. If you do not have the clinical background to verify the diagnosis, bring a qualified clinician with you. Talking directly with some patients gives you a better understanding of the clinical features, and helps you to develop a mental image of the disease and the patients affected by it. In addition, conversations with patients are very useful in generating hypotheses about disease aetiology and spread. They may be able to answer some critical questions: What were their exposures before becoming ill? What do they think caused their illness? Do they know anyone else with the disease? Do they have anything in common with others who have the disease? Third, summarize the clinical features using frequency distributions. Are the clinical features consistent with the diagnosis? Frequency distributions of the clinical features are useful in characterizing the spectrum of illness, verifying the diagnosis, and developing case definitions. These clinical frequency distributions are considered so important in establishing the credibility of the diagnosis, that they are frequently presented in the first table of an investigation’s report or manuscript. Step 4: Construct a working case definition A case definition is a standard set of criteria for deciding whether an individual should be classified as having the health condition of interest. It includes clinical criteria and — particularly in the setting of an outbreak investigation — restrictions by time, place, and person. The clinical criteria should be based on simple and objective measures such as ‘fever ≥ 40°C (101°F),’ ‘three or more loose bowel movements per day,’ or ‘myalgias (muscle pain) severe enough to limit the patient’s usual activities’. The case definition may be restricted by time (for example, to persons with onset of illness within the past 2 months), by place (for example, to residents of the nine-county area or to employees of a particular plant) and by person (for example, to persons with no Epidemiology-Made Easy 73 previous history of a positive tuberculin skin test, or to premenopausal women). Whatever the criteria, they must be applied consistently to all persons under investigation. A case definition must not include the exposure or risk factor you are interested in evaluating. This is a common mistake. For example, if one of the hypotheses under consideration is that persons who worked in the west wing were at greater risk of disease, do not define a case as ‘illness among persons who worked in the west wing with onset between…’ Instead, define a case as ‘illness among persons who worked in the facility with onset between…’ Then conduct the appropriate analysis to determine whether those who worked in the west wing were at greater risk than those who worked elsewhere. Diagnoses may be uncertain, particularly early in an investigation. As a result, investigators often create different categories of a case definition, such as confirmed, probable, and possible or suspect, that allow for uncertainty. • • • To be classified as confirmed, a case usually must have laboratory verification. A case classified as probable usually has typical clinical features of the disease without laboratory confirmation. A case classified as possible usually has fewer of the typical clinical features. Case Definitions • Suspected: A case that meets the clinical case definition. • Probable: A suspected case as defined above and or ongoing epidemic and epidemiological link to a confirmed case. • Confirmed: A suspected or probable case with laboratory confirmation. In the outbreak setting, the investigators would need to specify time and place to complete the outbreak case definition. For example, if investigating an epidemic of meningococcal meningitis in Moyo district Northern Uganda, the case definition might be the clinical features with onset between January and April of that year among residents and visitors to Moyo district. 74 Anthony K Mbonye Classifications such as confirmed-probable-possible are helpful because they provide flexibility to the investigators. A case might be temporarily classified as probable or possible while laboratory results are pending. Alternatively, a case may be permanently classified as probable or possible if the patient’s physician decided not to order the confirmatory laboratory test because the test is expensive, difficult to obtain, or unnecessary. For example, while investigating an outbreak of diarrhoea investigators usually try to identify the causative organism from stool samples from a few afflicted persons. If the tests confirm that all of those case-patients were infected with the same organism, the other persons with compatible clinical illness are all presumed to be part of the same outbreak and to be infected with the same organism. A case definition is a tool for classifying someone as having or not having the disease of interest, but few case definitions are 100% accurate in their classifications. Some persons with mild illness may be missed, and some persons with a similar but not identical illness may be included. Generally, epidemiologists strive to ensure that a case definition includes most if not all of the actual cases, but very few or no false-positive cases. However, this ideal is not always met. For example, case definitions often miss infected people who have mild or no symptoms, because they have little reason to be tested. Early in an investigation, investigators may use a ‘loose’ or sensitive case definition that includes confirmed, probable, and possible cases to characterize the extent of the problem, identify the populations affected, and develop hypotheses about possible causes. The strategy of being more inclusive early on is especially useful in investigations that require travel to different hospitals, homes, or other sites to gather information, because collecting extra data while you are there is more efficient than having to return a second time. This illustrates an important axiom of field epidemiology: get it while you can. Later on, when hypotheses have come into sharper focus, the investigator may tighten the case definition by dropping the ‘possible’ and sometimes the ‘probable’ category. In analytic epidemiology, inclusion of false-positive cases can produce misleading results. Therefore, to test these hypotheses by using analytic epidemiology (see Step 8), specific or tight case definitions are recommended. Epidemiology-Made Easy 75 Other investigations, particularly those of a newly recognized disease or syndrome, begin with a relatively specific or narrow case definition. For example, acquired immunodeficiency syndrome (AIDS) and severe acute respiratory syndrome (SARS) both began with relatively specific case definitions. This ensures that persons whose illness meets the case definition, truly have the disease in question. As a result, investigators could accurately characterize the typical clinical features of the illness, risk factors for illness, and cause of the illness. After the cause was known and diagnostic tests were developed, investigators could use the laboratory test to learn about the true spectrum of illness, and broaden the case definition to include those with early infection or mild symptoms. Step 5: Find cases systematically and record information Many outbreaks are brought to the attention of health authorities by concerned healthcare providers or citizens. However, the cases that prompt the concern are often only a small and unrepresentative fraction of the total number of cases. Public health workers must therefore look for additional cases to determine the true geographic extent of the problem and the populations affected by it. Usually, the first effort to identify cases is directed at healthcare practitioners and facilities — physicians’ clinics, hospitals, and laboratories — where a diagnosis is likely to be made. Investigators may conduct what is sometimes called stimulated or enhanced passive surveillance, by sending a letter describing the situation and asking for reports of similar cases. Alternatively, they may conduct active surveillance by telephoning or visiting the facilities to collect information on any additional cases. In some outbreaks, public health officials may decide to alert the public directly, usually through the local media. In other situations, the media may have already spread the word. If an outbreak affects a restricted population such as persons in a school, or at a work site, and if many cases are mild or asymptomatic and therefore undetected, a survey of the entire population is sometimes conducted to determine the extent of infection. A questionnaire could be distributed to determine the true occurrence of clinical symptoms, or laboratory specimens could be collected to determine the number of asymptomatic cases. 76 Anthony K Mbonye Finally, investigators should ask case-patients if they know anyone else with the same condition. Frequently, one person with an illness knows or hears of others with the same illness. In some investigations, investigators develop a data collection form tailored to the specific details of that outbreak. In others, investigators use a generic case report form. Regardless of which form is used, the data collection form should include the following types of information about each case. • • • • • Identifying information. A name, address, and telephone number are essential if investigators need to contact patients for additional questions, and to notify them of laboratory results and the outcome of the investigation. Names also help in checking for duplicate records, while the addresses allow for mapping the geographic extent of the problem. Demographic information. Age, sex, race, occupation, etc., provide the person characteristics of descriptive epidemiology needed to characterize the populations at risk. Clinical information. Signs and symptoms allow investigators to verify that the case definition has been met. Date of onset is needed to chart the time course of the outbreak. Supplementary clinical information, such as duration of illness and whether hospitalization or death occurred, helps characterize the spectrum of illness. Risk factor information. This information must be tailored to the specific disease in question. For example, since food and water are common vehicles for hepatitis A but not hepatitis B, exposure to food and water sources must be ascertained in an outbreak of the former but not the latter. Source of information. The case report must include the source of the report, usually a physician, clinic, hospital, or laboratory. Investigators will sometimes need to contact the reporter, either to seek additional clinical information or report back the results of the investigation. Traditionally, the information described above is collected on a standard case report form, questionnaire, or data abstraction form. Epidemiology-Made Easy 77 Step 6: Perform descriptive epidemiology The next step after identifying and gathering basic data on the persons with the disease, is to systematically describe some of the key characteristics of those persons. This process, in which the outbreak is characterized by time, place, and person, is called descriptive epidemiology. It may be repeated several times during the course of an investigation as additional cases are identified or as new data becomes available. This step is critical for several reasons. 1. Summarizing data by key demographic variables provides a comprehensive characterization of the outbreak — trends over time, geographic distribution (place), and the populations (persons) affected by the disease. 2. From this characterization you can identify or infer the population at risk for the disease. 3. The characterization often provides clues about aetiology, source, and modes of transmission that can be turned into testable hypotheses. 4. Descriptive epidemiology describes the where and whom of the disease, allowing you to begin intervention and prevention measures. 5. Early (and continuing) analysis of descriptive data helps you to become familiar with those data, enabling you to identify and correct errors and missing values. Epidemic Curves An epidemic curve shows the frequency of new cases over time, based on the date of onset of a particular disease. The shape of the curve in relation to the incubation period for a particular disease can give clues about the source. Thus, there are three types of epidemic curves: a) Point source outbreaks (epidemics) involve a common source, such as contaminated food or an infected food handler, and all the exposures tend to occur in a relatively brief period. Consequently, point source outbreaks tend to have epidemic curves with a rapid increase in cases followed by a somewhat slower decline, and all of the cases tend to fall within one incubation period. In a point source epidemic of hepatitis A, you would expect the rise and fall of new cases to occur within about a 30 day span of time, which is what is seen in the graph below. 78 Anthony K Mbonye Figure 6.1: A point source epidemic of hepatitis A Figure 6.1: A point source epidemic of hepatitis A Source: LaMorte, 2007 Figure 6.2: An epidemic curve of cholera outbreak in the Broad Street area of London in b) Continuous common source epidemics may also rise to a peak and then 1854.fall, but the cases do not all occur within the span of a single incubation period. This implies that there is an ongoing source of contamination. The down slope of the curve may be very sharp if the common source is removed or gradual if the outbreak is allowed to exhaust itself. The epidemic curve, figure 6.2 below is from the cholera outbreak in the Broad Street area of London in 1854 that was investigated by Dr. John Snow. Cholera has an incubation period of 1-3 days, and even though residents began to flee when the outbreak erupted, you can see that this outbreak lasted for more than a single incubation period. This suggests an ongoing source of infection, in this case the Broad Street water pump. Epidemiology-Made Easy 79 Figure 6.2: An epidemic curve of cholera outbreak in the Broad Street area of London in 1854. Figure 6.2: An epidemic curve of cholera outbreak in the Broad Street area of London in 1854. Source: Snow, 1936 c) Propagated (or progressive source) epidemic. The epidemic curve, figure 6.3 shown below is from an outbreak of measles that began with a single index case which infected a number of other individuals (The incubation period for measles averages 10 days with a range of 7-18 days.) One or more of the people infected in the initial wave infected a group of people who become the second wave of infection. The transmission was from person-to-person, rather than from a common source. Propagated epidemic curves usually have a series of successively larger peaks, which are one incubation period apart. The successive waves tend to involve more and more people, until the pool of susceptible people is exhausted or control measures are implemented. This is an ideal example, however; in reality, most of these epidemics do not produce the classic pattern. 80 Anthony K Mbonye !"#$%&'( )*+'',- '&."/&0"1'1$%2&'34'5' 0&567&6'3$89%&5:' Figure 6.3: An epidemic curve of a measles outbreak ' ! Source: LaMorte, 2007 For );some the descriptive information is all needed to !"#$%&'( +'',- outbreaks, '3$89%&5:'34'<5703-&775'8=58 ' 311$%%& /'"-' ,9 that "0' "-'is >??@)' figure out the source, and control measures can be undertaken rapidly. In other cases, this descriptive information (person, place, and time) helps generate hypotheses about the source, but it isn’t obvious what the source is. When this occurs, it is necessary to test the hypotheses by conducting an analytical study, i.e., either a case-control study or a cohort study. This means collecting data and analyzing it in order to identify the source. However, it is important to recognize that you can’t test a hypothesis unless you have one to test. So, the descriptive studies that generate hypotheses are essential. Practical session The graph 6.4 below shows the epidemic curve for a Salmonella outbreak that occurred in Abim in 2009. Salmonella generally has an incubation period of about 1-3 days. What kind of epidemic curve is this? What is your justification? Epidemiology-Made Easy 81 ! ' !"#$%&'( ); +'',- '3$89%&5:'34'<5703-&775'8=58'311$%%& /'"-' ,9 "0' "-' >??@)' Figure 6.4: An outbreak of Salmonella that occurred in Abim in 2009. Source: MOH, 2009 Usefulness of epidemic curves Epidemic curves are a basic investigative tool because they are so informative. The epi-curve shows the magnitude of the epidemic over time as a simple, easily understood visual. It permits the investigator to distinguish an epidemic from an endemic disease. Potentially correlated events can be noted on the graph. • The shape of the epidemic curve may provide clues about the pattern of spread in the population, e.g., point versus intermittent source versus propagated. • The curve shows where you are in the course of the epidemic — still on the upswing, on the down slope, or after the epidemic has ended. This information forms the basis for predicting whether more or fewer cases will occur in the near future. • The curve can be used for evaluation, answering questions like: how long did it take for the health department to identify a problem? Are intervention measures working? • Outliers — cases that don’t fit into the body of the curve —may provide important clues. • If the disease and its incubation period are known, the epi-curve can be used to deduce a probable time of exposure and help develop a questionnaire focused on that time period. 82 Anthony K Mbonye Drawing an epidemic curve. To draw an epidemic curve, you first must know the time of onset of illness for each case. For some diseases, date of onset is sufficient. For other diseases, particularly those with a relatively short incubation period, hour of onset may be more suitable. Occasionally, you may be asked to draw an epidemic curve when you don’t know either the disease or its incubation time. In that situation, it may be useful to draw several epidemic curves with different units on the x-axis to find one that best portrays the data. Interpreting an epidemic curve. The first step in interpreting an epidemic curve is to consider its overall shape. The shape of the epidemic curve is determined by the epidemic pattern (for example, common source versus propagated), the period of time over which susceptible persons are exposed, and the minimum, average, and maximum incubation periods for the disease. An epidemic curve that has a steep upslope and a more gradual down slope (a so-called log-normal curve) is characteristic of a point-source epidemic, in which persons are exposed to the same source over a relative brief period. In fact, any sudden rise in the number of cases suggests sudden exposure to a common source. In a point-source epidemic, all the cases occur within one incubation period. If the duration of exposure is prolonged, the epidemic is called a continuous common-source epidemic, and the epidemic curve has a plateau instead of a peak. An intermittent common-source epidemic (in which exposure to the causative agent is sporadic over time) usually produces an irregularly jagged epidemic curve reflecting the intermittence and duration of exposure and the number of persons exposed. In theory, a propagated epidemic — one spread from personto-person with increasing numbers of cases in each generation — should have a series of progressively taller peaks one incubation period apart, but in reality few produce this classic pattern. The cases that stand apart may be just as informative as the overall pattern. An early case may represent a background or unrelated case, a source of the epidemic, or a person who was exposed earlier than most of the cases (for example, the cook who tasted a dish hours before bringing Epidemiology-Made Easy 83 it to the big picnic). Similarly, late cases may represent unrelated cases, cases with long incubation periods, secondary cases, or persons exposed later than most others (for example, someone eating leftovers). On the other hand, these outlying cases sometimes represent miscoded or erroneous data. All outliers are worth examining carefully because if they are part of the outbreak, they may have an easily identifiable exposure that may point directly to the source. In a point-source epidemic of a known disease with a known incubation period, the epidemic curve can be used to identify a likely period of exposure. Knowing the likely period of exposure allows you to ask questions about the appropriate period of time so you can identify the source of the epidemic. To identify the likely period of exposure from an epidemic curve of an apparent point source epidemic: 1. Look up the average and minimum incubation periods of the disease. This information can be found on disease fact sheets available on the Internet or in the Control of Communicable Diseases Manual. 2. Identify the peak of the outbreak or the median case and count back on the x-axis one average incubation period. Note the date. 3. Start at the earliest case of the epidemic and count back the minimum incubation period, and note this date as well. Ideally, the two dates will be similar, and represent the probable period of exposure. Since this technique is not precise, widen the probable period of exposure by, say 20% to 50% on either side of these dates, and then ask about exposures during this widened period in an attempt to identify the source. In a similar fashion, if the time of exposure and the times of onset of illness are known but the cause has not yet been identified, the incubation period can be estimated from the epidemic curve. Subtract the time of onset of the earliest cases from the time of exposure to estimate the minimum incubation period. Then subtract the time of onset of the median case from the time of exposure to estimate the median incubation period. These incubation periods can be compared with a list of incubation periods of known diseases to narrow the possibilities. 84 Anthony K Mbonye Step 7: Develop a hypothesis The next step in an investigation is formulating hypotheses, and in reality, investigators usually begin to generate hypotheses at the time of the initial telephone call. Depending on the outbreak, the hypotheses may address the source of the agent, the mode (and vehicle or vector) of transmission, and the exposures that caused the disease. The hypotheses should be testable, since evaluating hypotheses is the next step in the investigation. In an outbreak context, hypotheses are generated in a variety of ways. First, consider what you know about the disease itself: What is the agent’s usual reservoir? How is it usually transmitted? What vehicles are commonly implicated? What are the known risk factors? In other words, by being familiar with the disease, you can, at the very least, ‘round up the usual suspects.’ Another useful way to generate hypotheses is to talk to a few of the case-patients, as discussed in Step 3. The conversations about possible exposures should be open-ended and wide-ranging, not necessarily confined to the known sources and vehicles. In some challenging investigations that yielded few clues, investigators have convened a meeting of several case-patients to search for common exposures. In addition, investigators have sometimes found it useful to visit the homes of case-patients, and look through their refrigerators and shelves for clues to an apparent foodborne outbreak. Just as case-patients may have important insights into causes, so too may the local health department staff. The local staff know the people in the community and their practices, and often have hypotheses based on their knowledge. The descriptive epidemiology may provide useful clues that can be turned into hypotheses. If the epidemic curve points to a narrow period of exposure, what events occurred around that time? Why do the people living in one particular area have the highest attack rate? Why are some groups with particular age, sex, or other person characteristics at greater risk than other groups with different person characteristics? Such questions about the data may lead to hypotheses that can be tested by appropriate analytic techniques. Epidemiology-Made Easy 85 Given recent concerns about bioterrorism, investigators should consider intentional dissemination of an infectious or chemical agent when trying to determine the cause of an outbreak. Epidemiological clues to possible bioterrorism 1. Single case of disease caused by an uncommon agent (e.g., glanders, smallpox, viral haemorrhagic fever, inhalational or cutaneous anthrax) without adequate epidemiologic explanation 2. Unusual, atypical, genetically engineered strain of an agent (or antibiotic-resistance pattern) 3. Higher morbidity and mortality in association with a common disease or syndrome or failure of such patients to respond to usual therapy 4. Unusual disease presentation (e.g., inhalational anthrax or pneumonic plague) 5. Disease with an unusual geographic or seasonal distribution (e.g., influenza in the summer) 6. Stable endemic disease with an unexplained increase in incidence (e.g., tularemia, plague) 7. Atypical disease transmission through aerosols, food, or water, in a mode suggesting deliberate sabotage (i.e., no other physical explanation) 8. No illness in persons who are not exposed to common ventilation systems (have separate closed ventilation systems), when illness is seen in persons in close proximity who have a common ventilation system 9. Several unusual or unexplained diseases coexisting in the same patient without any other explanation 10. Unusual illness that affects a large population (e.g., respiratory disease in a large population may suggest exposure to an inhalational pathogen or chemical agent) 11. Illness that is unusual (or atypical) for a given population or age group (e.g., outbreak of measles-like rash in adults) 12. Unusual pattern of death or illness among animals (which may be unexplained or attributed to an agent of bioterrorism) that precedes or accompanies illness or death in humans 13. Unusual pattern of death or illness among humans (which may be unexplained or attributed to an agent of bioterrorism) that precedes or accompanies illness or death in animals 86 Anthony K Mbonye 14. Ill persons who seek treatment at about the same time (point source with compressed epidemic curve) 15. Similar genetic type among agents isolated from temporally or spatially distinct sources 16. Simultaneous clusters of similar illness in non-contiguous areas, domestic or foreign 17. Large number of cases of unexplained diseases or deaths Step 8: Evaluate hypotheses epidemiologically After a hypothesis that might explain an outbreak has been developed, the next step is to evaluate the plausibility of that hypothesis. Typically, hypotheses in a field investigation are evaluated using a combination of environmental evidence, laboratory science, and epidemiology. From an epidemiologic point of view, hypotheses are evaluated in one of two ways: either by comparing the hypotheses with the established facts; or by using analytic epidemiology to quantify relationships and assess the role of chance. The first method is likely to be used when the clinical, laboratory, environmental, and/or epidemiologic evidence so obviously supports the hypotheses that formal hypothesis testing is unnecessary. For example, in an outbreak of hypervitaminosis D that occurred in Massachusetts in 1991, investigators found that all of the case-patients drank milk delivered to their homes by a local dairy. Therefore, investigators hypothesized that the dairy was the source and the milk was the vehicle. When they visited the dairy, they quickly recognized that the dairy was inadvertently adding far more than the recommended dose of vitamin D to the milk. No analytic epidemiology was really necessary to evaluate the basic hypothesis in this setting or to implement appropriate control measures, although investigators did conduct additional studies to identify additional risk factors. In many other investigations, however, the circumstances are not as straightforward, and information from the series of cases is not sufficiently compelling or convincing. In such investigations, epidemiologists use analytic epidemiology to test their hypotheses. The key feature of analytic epidemiology is a comparison group. The comparison group allows epidemiologists to compare the observed pattern among case-patients Epidemiology-Made Easy 87 or a group of exposed persons with the expected pattern among no cases or unexposed persons. By comparing the observed with expected patterns, epidemiologists can determine whether the observed pattern differs substantially from what should be expected and, if so, by what degree. In other words, epidemiologists can use analytic epidemiology with its hallmark comparison group to quantify relationships between exposures and disease, and to test hypotheses about causal relationships. The two most common types of analytic epidemiology studies used in field investigations are retrospective cohort studies and case-control studies, as described in the previous Lecture Series. Retrospective cohort studies A retrospective cohort study is the study of choice for an outbreak in a small, well-defined population, such as an outbreak of gastroenteritis among wedding guests for which a complete list of guests is available. In a cohort study, the investigator contacts each member of the defined population (e.g., wedding guests), determines each person’s exposure to possible sources and vehicles (e.g., what food and drinks each guest consumed), and notes whether the person later became ill with the disease in question (e.g., gastroenteritis). After collecting similar data from each attendee, the investigator calculates an attack rate for those exposed to (e.g., who ate) a particular item and an attack rate for those who were not exposed. Generally, an exposure that has the following three characteristics or criteria is considered a strong suspect: 1. The attack rate is high among those exposed to the item. 2. The attack rate is low among those not exposed, so the difference or ratio between attack rates is high. 3. Most of the case-patients were exposed to the item, so that the exposure could ‘explain’ or account for most, if not all, of the cases. Commonly, the investigator compares the attack rate in the exposed group to the attack rate in the unexposed group to measure the association between the exposure (e.g., the food item) and disease. This is called the risk ratio. When the attack rate for the exposed group is the same as the attack rate for the unexposed group, the relative risk is equal to 1.0, and 88 Anthony K Mbonye the exposure is said not to be associated with disease. The greater the difference in attack rates between the exposed and unexposed groups, the larger the relative risk, and the stronger the association between exposure and disease. Case-control studies A cohort study is feasible only when the population is well defined and can be followed over a period of time. However, in many outbreak settings, the population is not well-defined and the speed of investigation is important. In such settings, the case-control study becomes the study design of choice. In a case-control study, the investigator asks both case-patients and a comparison group of persons without disease (‘controls’) about their exposures. Using the information about disease and exposure status, the investigator then calculates an odds ratio to quantify the relationship between exposure and disease. Finally, a p-value or confidence interval is calculated to assess statistical significance. Step 9: Refine and re-evaluate the hypothesis (s) Unfortunately, analytic studies sometimes are unrevealing. This is particularly true if the hypotheses were not well founded at the outset. In field epidemiology, if you cannot generate good hypotheses (for example, by talking to some case-patients or local staff and examining the descriptive epidemiology and outliers), then proceeding to analytic epidemiology, such as a case-control study, is likely to be a waste of time. When analytic epidemiology is unrevealing, rethink your hypotheses. Consider convening a meeting of the case-patients to look for common links or visit their homes to look at the products on their shelves. Consider new vehicles or modes of transmission. Even when an analytic study identifies an association between an exposure and disease, the hypothesis may need to be honed. Sometimes a more specific control group is needed to test a more specific hypothesis. For example, in many hospital outbreaks, investigators use an initial study to narrow their focus. They then conduct a second study, with more closely matched controls, to identify a more specific exposure or vehicle. Epidemiology-Made Easy 89 Finally, recall that one reason to investigate outbreaks is research. An outbreak may provide an ‘natural experiment’ that would be unethical to set up deliberately, but from which the scientific community can learn when it does happen to occur. When an outbreak occurs, whether it is routine or unusual, consider what questions remain unanswered about that particular disease and what kind of study you might do in this setting to answer some of those questions. The circumstances may allow you to learn more about the disease, its modes of transmission, the characteristics of the agent, host factors, and the like. Step 10: Compare and reconcile with laboratory and environmental studies While epidemiology can implicate vehicles and guide appropriate public health action, laboratory evidence can confirm the findings. Environmental studies are equally important in some settings, and they are often helpful in explaining why an outbreak occurred. While you may not be an expert in these other areas, you can help. Use a camera to photograph the environmental conditions. Then, coordinate with the laboratory, and bring back physical evidence to be analysed. Step 11: Implement control and prevention measures In most outbreak investigations, the primary goal is control of the outbreak and prevention of additional cases. Indeed, although implementing control and prevention measures is listed towards the end of the conceptual sequence, in practice control and prevention activities should be implemented as early as possible. The health department’s first responsibility is to protect the public’s health, so if appropriate control measures are known and available, they should be initiated even before an epidemiologic investigation is launched. For example, a child with measles in a community with other susceptible children may prompt a vaccination campaign before an investigation of how that child became infected. Confidentiality is an important issue in implementing control measures. Healthcare workers need to be aware of the confidentiality issues relevant to collection, management and sharing of data. If patient information is disclosed to unauthorized persons without the patient’s permission, 90 Anthony K Mbonye the patient may be stigmatized or experience rejection from family and friends, lose a job, or be evicted from housing. Moreover, the healthcare worker may lose the trust of the patient, which can affect adherence to treatment. Therefore, confidentiality — the responsibility to protect a patient’s private information—is critical in disease control and many other situations. In general, control measures are usually directed against one or more segments in the chain of transmission (agent, source, mode of transmission, portal of entry, or host), that are susceptible to intervention. For some diseases, the most appropriate intervention may be directed at controlling or eliminating the agent at its source. A patient with a communicable disease such as tuberculosis, whether symptomatic or asymptomatic, may be treated with antibiotics both to clear the infection and to reduce the risk of transmission to others. For an environmental toxin or infectious agent that resides in soil, the soil may be decontaminated or covered to prevent escape of the agent. Some interventions are aimed at blocking the mode of transmission. Interruption of direct transmission may be accomplished by isolation of someone with infection, or counselling persons to avoid the specific type of contact associated with transmission. Similarly, to control an outbreak of influenza-like illness in a nursing home, affected residents could be quarantined, that is, put together in a separate area to prevent transmission to others. Vehicle borne transmission may be interrupted by elimination or decontamination of the vehicle. For example, contaminated foods should be discarded, and surgical equipment is routinely sterilized to prevent transmission. Efforts to prevent faecaloral transmission often focus on rearranging the environment to reduce the risk of contamination in the future and on changing behaviours, such as promoting hand washing. For airborne diseases, strategies may be directed at modifying ventilation or air pressure, and filtering or treating the air. To interrupt vector borne transmission, measures may be directed toward controlling the vector population, such as spraying to reduce the mosquito population. Some simple and effective strategies protect portals of entry. For example, bed nets are used to protect sleeping persons from being bitten by mosquitoes that may transmit malaria. Epidemiology-Made Easy 91 Some interventions aim to increase a host’s defences. Vaccinations promote development of specific antibodies that protect against infection. Similarly, prophylactic use of antimalarial drugs, recommended for visitors to malaria-endemic areas, does not prevent exposure through mosquito bites but does prevent infection from taking root. Step 12: Initiate or maintain surveillance Once control and prevention measures have been implemented, they must continue to be monitored. If surveillance has not been ongoing, now is the time to initiate active surveillance. If active surveillance was initiated as part of case finding efforts, it should be continued. The reasons for conducting active surveillance at this time are twofold. First, you must continue to monitor the situation and determine whether the prevention and control measures are working. Is the number of new cases going down? Or are new cases continuing to occur? If so, where are the new cases? Are they occurring throughout the area, indicating that the interventions are generally ineffective, or are they occurring only in pockets, indicating that the interventions may be effective but that some areas were missed? Second, you need to know whether the outbreak has spread outside its original area or the area where the interventions were targeted. If so, effective disease control and prevention measures must be implemented in these new areas. Step 13: Communicate the findings Development of a communications plan and communicating what is needed with those who need to know during the investigation, is critical. The final task is to summarize the investigation, its findings, and the outcome in a report; and to communicate this report in an effective manner. This communication usually takes two forms: 1. An oral briefing for local authorities. If the field investigator is responsible for the epidemiology but not disease control, then the oral briefing should be attended by the local health authorities and persons responsible for implementing control and prevention measures. Often these persons are not epidemiologists, so findings must be presented in a clear and convincing fashion with appropriate 92 Anthony K Mbonye and justifiable recommendations for action. The presentation is an opportunity for the investigators to describe what they did, what they found, and what they think should be done about it. They should present their findings in a scientifically objective fashion, and they should be able to defend their conclusions and recommendations. 2. A written report. Investigators should also prepare a written report that follows the usual scientific format of introduction, background, methods, results, discussion, and recommendations. By formally presenting recommendations, the report provides a basis for action. It also serves as a record of performance and a document for potential legal issues, as well as a reference if the health department encounters a similar situation in the future. Finally, a report that finds its way into the public health literature serves the broader purpose of contributing to the knowledge base of epidemiology and public health. Questions to stimulate further reading: 1. Discuss two epidemiological study designs helpful in diseases outbreak investigations. 2. Discuss three important attributes of an epidemic curve. 3. With examples, show how good leadership and effective communication are important in controlling disease outbreaks. Practical Session 1. This year in September 2020, there were heavy rains in Bududa district leading to landslides, displacing many households from the mountain slopes down to the flooded flat lands. Roads were made impassable and food crops were destroyed. Makeshift camps were set up for the displaced population, while relatives and neighbours donated food stuffs. Within a week, pregnant women and children aged< 5 years were reported to be particularly affected by a disease presenting with fever, headaches and abdominal pains. There were also frequent episodes of diarrhoea cases. a) Identify and discuss the role of each relevant sector that you think can contribute to the mitigation of the effects of this disaster. Epidemiology-Made Easy 93 b) With examples show how intersectoral collaboration is necessary to control disease outbreaks and disaster situations. c) What disease outbreak do you suspect is attacking the pregnant women and children? d) List the key steps in investigating the outbreak and suggest possible control measures. 2. As a surveillance officer attached to Gulu district, you have been notified by the district health officer that there is a strange disease in the community that has killed two people. It is mentioned that they both presented with bleeding tendencies before they died. a) List the steps that you are going to investigate the diseases outbreak. b) After investigations, preliminary data shows that more people have been reported sick with fever, cough, loss of appetite, extreme weakness of body parts, and bleeding tendencies. More data shows that people living in households who ate bush meat were more likely presenting such symptoms and illnesses. c) In total, 100 patients had been reported at local health units (60 from households who had eaten bush meat and 40 had no history of exposure to bush meat); meanwhile, 100 people in the neighborhood who had not presented with any illnesses (20 with who ate bush meat and 80 who didn’t eat bush meat) were included in the epidemiological analyses. Describe the type of study design mentioned above and its relevancy to understand disease epidemiology. a) Using a 2 x 2 table, calculate the relative risk and the odds ratio. b) How do you interpret the relative risk and the Odds ratio? c) Using a case definition for Ebola Hemorrhagic Fever, out of 150 confirmed cases, 120 patients died within 5 days of diagnosis. Calculate the case fatality rate of the disease. 3. In a disease outbreak investigation, line-listing is an essential step. Describe the type of variables usually recoded and explain how this technique helps in understudying the epidemic. 94 Anthony K Mbonye Table 6.0: Answers to Question 2 Exposure factor Cases Controls Total Ate bush meat a 60 b 20 80 Did not eat bush meat c 40 d 80 120 Total 100 100 200 Relative risk = (a/a+c)/ (b/b+d) = (60/100)/(20/100) = 0.6/0.2 = 3.0 Odds ratio = ad/bc=60*80/40*20=4,800/800=6 Bibliography 1. Becker KM, Moe CL, Southwick KL, MacCormack JN. ‘Transmission of Norwalk virus during a football game. N Engl.’ J Med 2000;343;1223–7. 2. Dicker R C, Coronado F, Koo D, & Parrish R G. Principles of epidemiology in public health practice; an introduction to applied epidemiology and biostatistics. 2006. 3. Heyman DL, ed. Control of communicable diseases manual, 18th ed. Washington, DC: American Public Health Association, 2004. 4. Klee AL, Maldin B, Edwin B, IPoshni I, Mostashari F, Fine A, et al. ‘Long-term prognosis for clinical West Nile Virus infection’. Emerg Infect Dis 2004;10:1405–11. 5. PAHO. ‘Case definitions: meningococcal disease and viral meningitis.’ Epidemiol Bull 2001;22(4):14–6. 6. Snow J. Snow on cholera. London: Humphrey Milford: Oxford U Press, 1936. 7. Torok TJ, Tauxe RV, Wise RP, Livengood JR, Sokolow R, Mauvais S, et al. ‘A large community outbreak of salmonellosis caused by intentional contamination of restaurant salad bars’. JAMA 1997;278:389–95. 8. Treadwell TA, Koo D, Kuker K, Khan AS. ‘Epidemiologic clues to bioterrorism’. Public Health Reports 2003; 118:92–8. 9. Wayne W LaMorte. ‘Descriptive Epidemiology’. Boston University School of Public health. May 5, 2017. Epidemiology-Made Easy 95 Lecture Notes-Series Seven Criteria for judging a good research report. Lecture Outline 1. 2. 3. 4. Background Criteria for judging a good research report Other techniques to evaluate research findings Practical Session Expectations After reading this Lecture Series, you should clearly understand the steps to take to critically review a research report. It is important for you to identify one research report published in a peer reviewed journal and attempt to use the criteria to evaluate the findings. In this way, you will thus master skills to interpret research study findings and be able to make decisions on how to use them. 7.1 Background Policy makers, program managers, students, and researchers often encounter numerous research reports that they seek to extract knowledge and insights from. Policy makers and program managers usually want compelling evidence on which to base to review existing policies and change interventions. They have two vested interests that are paramount to the health of the population: efficacy and costeffectiveness. These help them when confronting politicians on accountability issues or while requesting budgets to fund new polies and new interventions. While researchers and students, on the other hand, have vested interests in expanding the knowledge base and generating evidence around new treatments and interventions. In both worlds, compelling evidence is required and scrutiny of a research report becomes very handy. Below are key steps to scrutinise and evaluate a research report. 96 Anthony K Mbonye 7.2 Criteria for judging a good research report • • • • • • • • • • • • • • When was the work published? Where was it published? Are the qualifications of the authors appropriate? Is the purpose of the study/objectives clearly stated? Are methods/experimental design clearly described and appropriate? Have all possible influences/confounders on the findings been identified and controls instituted? Has the sample been appropriately selected? Has the reliability of the scoring been appropriately set? Are the comparisons between groups appropriate? Is the investigation of sufficient duration? Is the statistical analysis appropriate to answer the research questions or hypotheses? Have the research questions or hypotheses been answered? Do the interpretations and conclusions logically follow the experimental findings? Is there a scientific basis for recommending a new therapy? 7.3 Other techniques to evaluate research findings Ideally the above steps can be summarised into three critical areas: 1. Study conceptualisation: Why was the study conceptualised? Has a thorough enough literature search been undertaken, to highlight any major gaps in the exiting knowledge or limitations with existing interventions or new treatments? Are the objectives clearly stated? Are they measurable? Are the research questions/hypothesis (s) posed in a way that ensures they can be answered? 2. Methods: Is the research design selected appropriate? Are study subjects appropriately selected? Have the sample sizes been calculated well? Is the implementation plan adequate? Have the data been analysed appropriately to capture the major the findings? 3. Results and conclusions: Have the data been summarised to show the major findings? Have the hypothesis(s) been tested and research questions answered? Has there been a comparison with previous findings? Have the strengths and limitations of the study been discussed; and further areas for research proposed? Have policy implications been presented? Are conclusions supported by the data? Epidemiology-Made Easy 97 Practical Session 1. With examples, discuss the type of research study design yielding results which are likely to convince policy makers, program managers and practioners to change to a new treatment of a disease? 2. If you were to design a behavioural intervention to promote early seeking of routine screening for cancer of cervix, what study design would you use and why? 3. Social science research is important in disease prevention and control. Describe with examples the relevancy of the above statement. 4. Retrieve the paper, Mbonye AK, Neema S, Magnussen P. ‘Treatment-seeking practices for malaria prevention in pregnancy among rural women in Mukono district, Uganda’. J Biosoc Sci. 2006 Mar; 38(2):221-37. a) Comment on the study design. b) Basing on the major study findings, propose interventions to prevent malaria in pregnancy. c) Discuss the possible study designs to test and evaluate the above interventions. Bibliography 1. Craig P, Dieppe P, Macintyre S, Michie S, Nazareth I, and Petticrew, M. ‘Developing and evaluating complex interventions: the new Medical Research Council guidance’. BMJ. 2008 Sep 29;337. 2. Drummond MF, Jefferson TO. ‘Guidelines for authors and peer reviewers of economic submissions to the BMJ’. Bmj. 1996 Aug 3;313(7052):275-83. 3. Kmet Leanne, M., Cook, L. S., and Lee, R. C. ‘Standard quality assessment criteria for evaluating primary research papers from a variety of fields.’ (2004). 4. McCann AL, Schneiderman ED. ‘Using research for clinical decision-making: Evaluating a research report’. J Contemp Dent Pract. 2002 May 15;3(2):48-60. 98 Anthony K Mbonye LECTURE NOTES EPIDEMIOLOGY MADE EASY Lecture Notes on Epidemiology-Made Easy presents a practical approach to understanding epidemiology techniques for disease prevention. It is a work filled with practical insights accumulated from over 20 years of teaching and professional experience, while controlling infectious and non-infectious diseases in Uganda. The book is a hand tool for students, lectures, policy makers, program managers, and social workers that confront diseases and health issues daily and would like a quick reference guide with facts on which to base their decisions. It is arranged in a such a way that the theoretical part is presented followed by questions and a practical session to stimulate critical thinking. In this way, it helps to invigorate the practice of alternative thinking, but most importantly it encourages discussions, preferably in teams to gain consensus and facilitate problem solving. Finally, for each topic, worked examples are presented. This is to make it easy for a student or a researcher to hone their practical skills. It is hoped that after going through the practical sessions, skills for aiding in epidemiological research and practice will be developed and mastered. The reader is encouraged to read more about basic and applied epidemiology, of which there is a lot of literature, but can also benefit from reading other books written by the author: 1. Uganda’s Health Sector through Turbulent Politics (1958-2018), 2018 2. How to get a Research Grant, Publish and Inluence Policy, 2019 3. Religion, Politics and the Health System in Uganda, 2020 4. Lecture Notes on Health Systems, Policy and Maternal Health in Uganda, 2021 Anthony K Mbonye (PhD, FRCP) Professor, School of Public Health, College of Health Sciences, Makerere University & Professor, Department of Maternal Child Health, Save The Mothers Programme, Uganda Christian University.

Epidemiology Made Easy: Lecture Notes by Anthony K Mbonye

Related documents

Products

Support

Epidemiology Made Easy: Lecture Notes by Anthony K Mbonye

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib