• Introduction to Evidence Based EMS – Part 2

    by Jason Merrill. Last modified: 01/02/14


    <<< Make sure you read Part 1 first! Click here!

    Evaluating Evidence Quality: Individual Studies

    Just like the different types of evidence in the example above had different levels of quality, different types of scientific evidence also have different levels of quality. Unlike the example above, which was pretty straightforward, scientific studies involve a structured method, and they are also relatively complex. We need some background information to understand their relative quality.

    The gold standard of all scientific studies in EMS and medicine are large, multi-centre, double-blind, randomized-control trials performed on human subjects.

    As an example, we can examine a study comparing biphasic and monophasic AED shocks which was published in 2000 (1). The study examined patient outcomes for out-of-hospital cardiac treated by EMS crews in Europe.

    Sample Size

    When we’re examining scientific studies, “large” simply means that there are a lot of subjects enrolled in the study. This is important because the more patients we examine, the more certain we can be of our results. If we imagine a study of a drug in which we only enrol two subjects, one who will take the drug and one who will not, it’s easy to see why this is: the single subject who took the drug might just happen to die or get better after receiving the drug for reasons completely unrelated to the drug and thereby skew our results. If we examine hundreds of people, the odds that unrelated, random occurrences will effect our results decrease.

    Also, if there is a relatively rare effect from a the drug, which might effect only 1 in a 1000 people, we’re much more likely to detect it if we have a study with many thousands of participants. Thus, ideally larger studies are better. The largest studies, such as vaccine safety studies or large-scale epidemiological studies, can include hundreds of thousands or even millions of patients (Kuulasmaa 2013, 2) but that’s not usually possible. For one thing, such huge studies are expensive, and there just isn’t enough money out there to conduct them for most things. For another thing, it can be difficult to gather up enough subjects to perform such studies.

    sample size

    In our example AED study, there were 338 patients enrolled, of whom 115 had ventricular tachycardia and were examined by the study, which is large for a study of cardiac arrest, but not nearly as large as the larger studies we mentioned above. As a general rule, studies should have a bare minimum of 50 patients in both the intervention group and the control group, and any event being studied should have happened at least 50 times (3) and studies that are smaller than this should be subject to a certain amount of suspicion and usually should be considered low-quality evidence.

    Confidence Interval

    An important number to look for in any study to determine if it was large enough is the “confidence interval.” The confidence interval has two parts. First is the confidence level, which is often 95%. The second part of the confidence interval is a range of numbers called the interval estimate. In our example study about AEDs, the authors found that 53% of the patients who received monophasic shocks had a good neurological outcome at discharge. At a 95% confidence level, this finding had an interval estimate of 6% to 62%. What this means is that the researchers have performed a statistical test and found that the study was repeated 100 times, the results would probably find a that between 6% and 62% of patients receiving monophasic shocks had good neurological outcomes in 95 out of the 100 studies, and the remaining five studies would find numbers outside of that range.

    Ideally, studies of interventions should include a pre-defined minimum clinically important difference that should fall into the interval estimate, and for the study to be useful the interval estimate should never include the number 0 (4). However, a minimum clinically important difference isn’t always given, even when it should be, and occasionally a study will use a different statistical model to estimate its reliability, making it difficult for practitioners who don’t have much training in statistics to interpret. Some studies fail to include estimates of reliability for all the things they test, and these studies must be read with suspicion.


    For a study to be “multi-centre” means that it includes multiple teams from multiple institutions each duplicating the same study procedure. This reduces the chances that a systematic error by any one team or something strange happening at any one institution will skew the study’s results. In our example AED study, there were four centres involved: one in Mainz, Germany, one in Brugge, Belgium, one in Hamburg, Germany, and one in Helsinki, Finland. While multi-centre studies are gold-standard science, they are also expensive and difficult to perform compared with single-centre studies that only include one research team.

    Sometimes, when no multi-centre study has been performed on a subject, but multiple single-centre studies have, the same high quality can be achieved by performing a “metastudy,” or a study of the studies which have been done. While researchers are human, and even multi-centre studies and metastudies can have serious design flaws, it’s usually safe to consider multi-centre studies and metastudies to be very high-quality evidence. That doesn’t mean that single-centre studies can’t be high quality evidence, it just means that we have to view the results of single-centre studies with slightly more suspicion, even if they’re well-designed.


    A double-blind study is a study in which neither the researchers nor the subjects know which group the subjects are in while the study is being conducted. This is important because of something called the placebo effect. Simply put, the placebo effect is the tendency of people who believe they are being helped to get better, and patients who believe they are receiving an advanced or experimental treatment may have better outcomes simply because they believe they will. If clinical researchers believe that a treatment will be effective, they may subtly change the way they interact with their patients and thereby instil more confidence in them and invoke the placebo effect. Thus, a double-blind design where neither the patient nor the researcher knows what treatment the patient is getting is best for study design. This is pretty easy to achieve when the intervention is a drug: some subjects can be given the drug and others can be given a sugar pill or a saline injection.

    Double blind method

    When studying other interventions it can be difficult or impossible. In our AED study, for example, blinding was not used. It would have been impractical and cost-ineffective to try to try and manufacture generic-looking AEDs specifically for this study, and the researchers just issued AEDs using different waveforms to their EMS crews. If we were studying something like long backboards, it would be practically impossible to blind either the study participants or the researchers to whether a backboard was being used or not! In general, drug studies that don’t use double-blind procedures should be considered suspect, but other studies can often still be considered high-quality evidence even if they didn’t use a double blind when the intervention being studied would be impractical or impossible to blind the patients or the researchers to.


    Recall from when we were discussing the scientific method above that a crucial element of all science is the comparison of at least two groups. Along with a study’s size, how subjects were determined or allocated to these groups is absolutely critical in assessing a study’s quality. Ideally, patients should be allocated to a study group at random, which is why he very best studies are “randomized-control trials.” What that means is that subjects were assigned to a group completely at random at the very beginning of the study., and in our AED study example, this was done by randomly giving EMS crews either a biphasic or monophasic AED to use at the beginning of each shift.

    When subjects are not allocated to study groups at random, it can seriously impact the quality of a study’s results. As an extreme example, let’s imagine that a drug company wants to sell a drug, and is willing to conduct a fraudulent scientific study to do it. (This isn’t that far-fetched a scenario: in 2009 it was discovered that a major drug company and a major scientific publisher had conspired to produce an entire fake medical journal (Grant 2009,  Hutson 2009)). The drug company might not be able to make the entire study up, especially if they wanted to fool a regulatory body or some other group with extensive resources to investigate whether the study was completed or not, but they could very easily create a study that seemed legitimate but which made the drug seem far more effective than it was by assigning patients non-randomly. If they assigned less sick patients to the group that was treated with the drug and more sick patients to the group that did not receive the drug, the study would probably show that people taking the drug had better outcomes regardless of whether the drug helped or not.

    Randomized controlled studies


    This is just an imaginary example to illustrate why randomization matters, of course, and most randomization problems are not a result of fraud of bad intent, but when they happen they can cause serious flaws in what would otherwise be high-quality science. As a real-world example, several years ago a problem with randomization led to the early termination of a large, multi-centre trial of mechanical CPR devices when it skewed the results so badly that the mechanical CPR devices falsely appeared to be resulting in patient deaths (5).

    When evaluating how a study’s groups are allocated, randomized control trials are the gold standard, and they can usually be considered high quality evidence. Unfortunately, the devil is in the details: there are a lot of study designs that are called “random” by the researchers who use them, but which aren’t actually random, and many studies aren’t actually randomized at all. In the terminated study of mechanical CPR devices I mentioned above, for example, the authors used a technique called “block randomization” whereby entire EMS stations were assigned to a group. The researchers failed to keep the study protocol consistent across EMS stations, and the problems occurred when one EMS station changed its protocols and skewed the results. If this problem hadn’t caused the study to be terminated, it might have been very difficult for the end reader to pick up on, but very often randomization problems are present in research that should be obvious to readers.

    In 2012, for example, a study on limiting the use of lights and sirens was published that used a technique very similar to the block randomization used in the mechanical CPR study above (6). The authors implemented a limited lights and sirens protocol at certain EMS stations, and compared them with nearby EMS stations that allowed EMS practitioners to decide whether to use lights and sirens or not without any specific protocol guidance. The authors found that the stations using the limited lights and sirens protocol reduced their proportion of transports using lights and sirens without negatively impacting patient outcomes when compared with the stations not using the protocol.

    However, if we examine the study groups, we find that the towns not using the limited lights and sirens protocol were almost 3 times as far from the hospital than those using the protocol were, and they had significantly different socioeconomic conditions. The authors dismissed these differences as insignificant, but there is a well-demonstrated link between socioeconomic status and health (Deaton, 2003), so the people from non-study towns might well have been sicker, and because they were further from the hospital lights and sirens presumably would have saved more transport time for them.

    As we said above, truly random assignment of subjects to study groups is the gold standard and can be considered high-quality evidence. We should consider studies where subjects weren’t assigned to study groups randomly, or where authors claim that subjects were assigned to groups randomly but they don’t seem like they actually were, to be medium-quality evidence at best. To decide, consider how similar the study groups were likely to be throughout the study. If the study groups seem like they were likely to be composed of very similar subjects, it’s probably safe to accept a study as medium-quality evidence. If the study groups seem like they were likely to be dissimilar in ways that would effect the study’s outcome, such as in the lights and sirens study above, the study must be considered low-quality evidence.

    Observational Studies

    There are certain things that are important to study which we cannot examine at with randomized control trials or similar techniques because we can’t ethically assign people to groups to be compared. For example, if we think that a given substance might cause cancer in humans, we can’t randomly assign human study subjects to a group that will ingest that substance: doing so would be unethical. Instead, we can perform an observational study. In an observational study, we simply gather data about subjects without intervening in their lives in any way and compare the people who are exposed to whatever factor we are interested in studying with those who are not.


    For example, in order to determine whether or not cigarette smoking is associated with cancer, epidemiologists gathered extensive data about people in the United States, and compared the cancer rates of people who smoked with those who did not smoke (Sarraci, 2010). When evaluating an observational study, we should look for much larger study sizes than we would expect in other studies, especially if the study is a cohort or cross-sectional study. Prospective studies, in which the researchers gather data on subjects from the beginning of the study period until the end of the study period, are generally higher-quality evidence than retrospective studies, where researchers gather data on things that have already happened. Observational studies can be high-quality evidence when they are prospective and have very large sample sizes, but they often require sophisticated statistical techniques to perform which are difficult for practitioners without a background in statistics to evaluate (7).

    Subject Type

    Finally, the best studies to guide EMS and medical practice are studies completed with human subjects. The reason for this should be fairly obvious: humans are a different type of animal than rats, or pigs, or any other animal that might be used in a laboratory study. However, if we want to study something in a randomized control trial which would be unethical to study in humans we might still use an animal. For example, there was a recent study published on fluid resuscitation in hemorrhagic shock which involved removing a large, predetermined volume of blood from its subjects (8). The study used rats, and more than half the rats used in the study died from their blood loss. This kind of research could never be ethically appropriate to perform in humans! Still, even if this sort of study is useful, in EMS animal-model studies must be considered low-quality evidence because rats are just not similar enough to people. (In other disciplines of medicine animal model studies can sometimes be more useful, especially for puzzling out how things work on a microscopic or biochemical level.)

    Evaluating Evidence Quality: Systematic Reviews

    It’s useful to think of evidence based EMS as a court that puts our EMS practices on trial. In this court, an individual study is like a single witness. Just like a court of law has rules of evidence to help it evaluate the value of a given witness’s testimony, we’ve seen some rules that we can use to evaluate the value of a given study. However, a court can’t just listen to a single witness and then reach a judgement. If that was how courts of law worked, there would either be a lot more innocent people in prison or a lot more guilty people walking the streets, depending on who the one witness was the courts listened to was!

    Evidence based EMS is a lot like that: we can’t just examine one study on a given practice and leave it at that. We need to examine all the studies on the practice and reach a conclusion based on all the evidence. Just like witnesses in court often given different versions of events, scientific studies often contradict each other, and we need a systematic way to compare them and assign weight to their findings.

    Evidence-based EMS projects

    An example of such a project would be the Canadian Prehospital Evidence Based Practice (PEP) project run by Dalhousie University and EHS in Nova Scotia.


    Projects like this evaluate evidence using a systematic, structured method to determine its quality and assign weight to its results, and make recommendations based on the resulting findings. The PEP project, for example, divides evidence into three quality levels.

    ebmpyramidLevel I evidence

    The highest quality recognized by the PEP project, is defined as “Evidence obtained from at least one properly randomized controlled trial or systematic reviews or meta-analysis that contain randomized control trials”.

    Level II evidence

    Medium-quality evidence, is defined as “Evidence obtained from non-randomized studies with a comparison group or systematic reviews of non-randomized studies with a comparison group. Registry-type studies in which comparisons are made are included here.”

    Level III evidence

    Lowest quality evidence recognized by the PEP project, is defined as “Evidence from studies with no comparison group or simulation studies or animal studies.” Opinion articles and articles which don’t report on primary research are excluded from the PEP project.

    From there, the PEP project further divides evidence into that which supports a given intervention, that which is neutral, and that which opposes a given treatment (Canadian Prehospital Evidence Based Practice Project, 2013a). This structured approach allows the PEP project’s authors to make valid recommendations, even when contradictory evidence exists, and to evaluate the quality of their recommendations. For example, there is evidence both for and against the safety and efficacy of prehospital active external rewarming of hypothermia victims. By evaluating the evidence systematically, however, the PEP project can recommend active external rewarming. It can also state that the existing evidence is fair, meaning that there is a reasonable chance that the recommendation may change as further research is done.(Canadian Prehospital Evidence Based Practice Project, 2013b)

    Another major vehicle for evaluating scientific evidence are review articles published in peer-reviewed journals. These articles bring the data on a given subject together and evaluate it just like an evidence-based EMS project would. An example would be a review article on prehospital oxygen therapy which appeared in the journal Respiratory Care in 2013 (9). The authors of that review examined 80 studies on the subject and concluded that the only indications for oxygen are hypoxia confirmed by testing (such as pulse oximetry) or by clinical observations, or the presence of carbon monoxide poisoning, and that administering oxygen to patients who are not hypoxic or poisoned with carbon monoxide is potentially harmful.

    It’s also important to remember that science is a moving target. If we think back to our courtroom analogy, when new evidence about a crime comes to light a person can request a new trial. This actually happens quite a bit, and people who were wrongfully convicted of crimes often go free when new evidence comes to light (DNA Exoneree Case Profiles, 2013). Just like court convictions need to be re-examined when new evidence comes to light, we need to re-examine evidence based EMS recommendations when new studies are completed, and new evidence very often means we have to change our past practices.

    Applying Scientific Evidence to EMS Practice

    There’s an old joke that can be useful for understanding how to apply empirical evidence:

    Three tourists are riding together on a train in Scotland. They look out the window, and see a black sheep. One tourist says, “Fascinating! All Scottish sheep are black!” The second tourist says, “No, no, no! Only some Scottish sheep are black.” The third tourist rolls his eyes at his two companions and says, “Gentlemen, all we can say for sure is that there is one sheep in Scotland that appeared to be black on one side to the three of us while this train car was passing by it.”

    All three of the tourists in this joke have some sort of evidence for what they’re saying, but only one of them is saying something particularly useful. In a way the other two are doing something similar to the mistakes I’ve seen many EMS practitioners make in thinking about evidence-based EMS. The tourist who sees a single black sheep in Scotland and presumes on that basis that all sheep in Scotland are black is doing something that we’ve been guilty of a lot as a profession: he’s taking a single piece of evidence and drawing overly-broad conclusions from it. For example, the 8-minute ALS response time benchmark widely in use throughout North America is largely based on a small number of studies conducted in the 1970’s (10).

    On the other hand, the tourist who says that all he can say for sure is that there’s one sheep in Scotland which appeared black on one side while they were passing by it is technically correct. Based on the single bit of evidence the tourists have, it is technically possible that, say, there are no naturally back sheep in Scotland somebody covered half the sheep in coal dust before the train passed by and will come and wash it off after the train passed. That isn’t a very reasonable or likely explanation, however.

    This tourist is similar to those practitioners I have heard claiming that we shouldn’t be doing anything that’s not evidence-based. While evidence based EMS is a very important tool, and in an ideal world every EMS intervention would be based on high-quality evidence, that’s just not a practical approach at the moment. For an example of why, we can take a look at the PEP project’s recommendations for foreign body airway obstruction (Canadian Prehospital Evidence Based Practice Project, 2013c). At the time of this writing, the PEP project makes no recommendation for or against any of the following interventions:

    • Back Blows
    • Chest Thrusts
    • Cricothyrotomy
    • Direct Laryngoscopy and Magill forceps
    • Heimlich Maneuver
    • Oxygen

    This is because no research exists, or that which does exist hasn’t been evaluated yet. The fact that research hasn’t been done on foreign body airway obstruction doesn’t mean that we have to stand by and let our patients turn blue and die when they’re choking! In the absence of scientific evidence, expert recommendations are still a valid basis for treatments. We need to learn to become less like the tourists who jump to conclusions or narrowly insist on absolute certainty and more like the tourist who sees a single black sheep and draws a reasonable conclusion based on that evidence that some Scottish sheep are black.

    Finally, it’s important to remember that evidence based EMS is not something an individual practitioner can do alone: it’s something that we have to integrate into our EMS systems and our profession. For one thing, the protocol-driven model which most EMS systems follow often doesn’t allow individual practitioners to change their practices based on the latest evidence. As paramedicine evolves and becomes more professionalized this might change, but in the short term it’s one of the realities of our industry.

    Secondly, and perhaps more importantly, EMS practitioners are generalists to a much greater degree than pretty much any other allied health profession. This means that there is an enormous amount of research to keep up with if we’re going to make our practices evidence-based. My RSS feed reader, for example, sometimes has more than a hundred journal articles in a day. I can read the abstracts of fewer than half of them, and I am only able to read the full peer-reviewed article a tiny fraction of the time because it’s expensive and time-consuming to do so.

    No one person can keep current with the volume of research that’s out there being done: if we’re going to become evidence-based, we need to adjust the way we approach evidence as a profession.


    References (non-PubMed)

    • Kuulasmaa K. The WHO MONICA Project [Internet]. 2013 [cited 2013 Jan 31]. Available from: http://www.thl.fi/monica/
    • Grant B. Merck published fake journal. The Scientist [Internet]. 2009 Apr 30 [cited 2014 Jan 31]; Available from: http://www.the-scientist.com/?articles.view/articleNo/27376/title/Merck-published-fake-journal/
    • Hutson S. Publication of fake journals raises ethical questions. Nat Med. 2009 Jun;15(6):598–598.
    • Deaton A. Health, Income, and Inequality [Internet]. National Bureau of Economic Research; 2003. Available from: http://www.nber.org/reporter/spring03/health.html
    • Saracci R. Epidemiology: a very short introduction. Oxford; New York: Oxford University Press; 2010.
    • Canadian Prehospital Evidence Based Practice Project. Level / Direction of Evidence [Internet]. (2013a) EMS Prehospital Evidence Based Protocols. [cited 2014 Jan 30]. Available from: https://emspep.cdha.nshealth.ca/MethodLOEChart.aspx
    • Canadian Prehospital Evidence Based Practice Project. Hypothermia [Internet]. (2013b) EMS Prehospital Evidence Based Protocols. [cited 2014 Jan 30]. Available from: https://emspep.cdha.nshealth.ca/LOE.aspx?VProtStr=Hypothermia&VProtID=154
    • DNA Exoneree Case Profiles [Internet]. (2013) The Innocence Project. [cited 2014 Jan 31]. Available from: http://www.innocenceproject.org/know/
    • Canadian Prehospital Evidence Based Practice Project. Foreign Body Obstruction(Complete/Partial) [Internet]. (2013c) EMS Prehospital Evidence Based Protocols. [cited 2014 Jan 30]. Available from: https://emspep.cdha.nshealth.ca/LOE.aspx?VProtStr=Foreign%20Body%20Obstruction(Complete/Partial)&VProtID=118



    Schneider T, Martens PR, Paschen H, Kuisma M, Wolcke B, Gliner BE, Russell JK, Weaver WD, Bossaert L, Chamberlain D. Multicenter, randomized, controlled trial of 150-J biphasic shocks compared with 200- to 360-J monophasic shocks in the resuscitation of out-of-hospital cardiac arrest victims. Optimized Response to Cardiac Arrest (ORCA) Investigators. Circulation. 2000 Oct 10;102(15):1780-7. PMID: 11023932.


    Klein NP, Hansen J, Chao C, Velicer C, Emery M, Slezak J, Lewis N, Deosaransingh K, Sy L, Ackerson B, Cheetham TC, Liaw KL, Takhar H, Jacobsen SJ. Safety of quadrivalent human papillomavirus vaccine administered routinely to females. Arch Pediatr Adolesc Med. 2012 Dec;166(12):1140-8. PMID: 23027469.


    Glasziou P, Doll H. Was the study big enough? Two café rules. Evid Based Med. 2006 Jun;11(3):69-70. PMID: 17213093.


    Stratford PW. The added value of confidence intervals. Phys Ther. 2010 Mar;90(3):333-5. PMID: 20190236.


    Paradis NA, Young G, Lemeshow S, Brewer JE, Halperin HR. Inhomogeneity and temporal effects in AutoPulse Assisted Prehospital International Resuscitation–an exception from consent trial terminated early. Am J Emerg Med. 2010 May;28(4):391-8. PMID: 20466215.


    Merlin MA, Baldino KT, Lehrfeld DP, Linger M, Lustiger E, Cascio A, Ohman-Strickland P, Dossantos F. Use of a limited lights and siren protocol in the prehospital setting vs standard usage. Am J Emerg Med. 2012 May;30(4):519-25. PMID: 21570233.


    Carlson MD, Morrison RS. Study design, precision, and validity in observational studies. J Palliat Med. 2009 Jan;12(1):77-82. PMID: 19284267.


    Hussmann B, Lendemans S, de Groot H, Rohrig R. Volume replacement with Ringer-lactate is detrimental in severe hemorrhagic shock but protective in moderate hemorrhagic shock: studies in a rat model. Crit Care. 2014 Jan 6;18(1):R5. PMID: 24393404.


    Branson RD, Johannigman JA. Pre-hospital oxygen therapy. Respir Care. 2013 Jan;58(1):86-97. PMID: 23271821.


    Swor RA, Cone DC. Emergency medical services advanced life support response times: lots of heat, little light. Acad Emerg Med. 2002 Apr;9(4):320-1. PMID: 11927458.

    The following two tabs change content below.
    Jason Merrill
    Jason is a Primary Care Paramedic/EMT-A in Western Canada. He has worked in EMS and critical care since late 2000, in settings ranging from high-volume urban systems and acute care hospitals in the United States to remote/wilderness and SAR settings in Canada.

    Tags: , , , ,

    Leave a Reply