Oh, Epworth, We Hardly Knew Ye
Murray Johns developed the Epworth Sleepiness Scale (ESS) and published his methodology in 1991. (1) He subsequently tested its reliability in a group of 104 medical students (2) and found a test-retest correlation of r = .82. For those of us who are fans of Karl Pearson and his product-moment correlation coefficient (and I know there are many fans out there), this is a very large correlation and indicates that the measure is reliable.
However, you may have noticed that the subjects were medical students. This should have raised a red flag as we all know that medical students are not normal. Dr. Johns also tested ESS validity in 54 sleep apnea patients before and after treatment with CPAP (2) and found that Epworth scores decreased significantly with a Student’s t value of 9.59 with p < .001. And for those of us who are fans of William Sealy Gosset (who published the methodology for the t test using “Student” as his pen name for reasons having to do with Guinness beer [long story, which I will save for another blog]), this is a very high t value and indicates a significant result. So, we have documented reliability and validity for the ESS.
Dr. Johns’ goal was to provide an alternative to the time consuming and expensive Multiple Sleep Latency Test in the evaluation of sleepiness. The Epworth consists of eight questions, is extremely easy to administer and score, and, with the reliability and validity well established by Dr. Johns, soon became a staple of sleep patient evaluation. The Epworth has in recent years become a method for triaging patients and is frequently used to determine access to diagnostic and therapeutic services. But, as we shall see, there is new evidence to suggest that this may put too much of a burden on the Epworth.
In a recent issue of the Journal of Clinical Sleep Medicine, a group of researchers in Otago, New Zealand, just across the ocean (or sea, or lake, or maybe strait, or at least some body of water) from Dr. Johns in Epworth, Australia (lovely this time of year, I hear) added more data to the ESS story. Dr. Johns showed test-retest reliability in medical students and validity in OSA patients before and after treatment. Drs. Campbell, Neill and Scott of the University of Otago added a measure of test-retest reliability of the ESS in OSA patients. (3)
In the clinical New Zealand population, the correlation between a primary care ESS and a sleep study ESS approximately six months later was r = .67, which is still large and was statistically significant. More importantly in this clinical situation, the difference between the first and second measures was more than three points in 46 percent of the patients. Almost 8 percent of patients changed from an ESS of less than 10 (normal) to an ESS of more than 10 (sleepy) in the interval between measurements. And another 7 percent changed from sleepy to normal. The authors conclude that the ESS values were variable and “care must be taken when using the ESS as a prioritization or treatment decision tool.” (3)
In an accompanying editorial, Drs. Omobomi and Quan (4) provide a “requiem” for the ESS. A requiem is a mass for the dead, which I think may be a bit premature — the ESS is only wounded, not dead. They point out a variety of factors leading to variability in the ESS such as napping, stimulant use, motivation with possible loss of employment and educational level. They note that “the limitations of the ESS … make it inadequate as a singular tool in stratifying patients at risk for OSA.” (4) I don’t think using the ESS as the only measure of sleepiness is what Dr. Johns had in mind in the first place.
How do we decide if patients are at high risk of OSA and should have a diagnostic test? Dr. Quan and colleagues have provided data that directly compared the four-variable screening tool, STOP, STOP-Bang and ESS. (5) STOP-Bang won. The ESS lost. Don’t use the ESS to stratify risk.
But is the ESS dead? If it weren’t nailed to its perch, would it have fallen over? I think not. A recent study (6) used cluster analysis and found that sleep apnea patients can be reliably divided into three subgroups: disturbed sleep, minimal symptoms or upper airway symptoms with sleepiness. Although this research is in the early stages, there is some hope that categorization of OSA patients may predict optimal treatment parameters. And the clinical tool used to identify sleepiness in patients was (wait for it) the ESS. Therefore, as part of a thorough clinical evaluation, the ESS may yet provide extremely useful data.
Sleepiness is a difficult construct to pin down, and subjective measurements like the ESS have inherent variability. It does not correlate well with the MSLT, or with sleep apnea severity indicators in the sleep study. We have all awakened patients who are deeply asleep and snoring like a freight train in the waiting room, and yet they turn in an ESS with a score of three. But this does not mean the ESS has no value. We just need to listen to what it is telling us.
- Johns MW. A new method for measuring daytime sleepiness: the Epworth sleepiness scale. Sleep. 1991;14(6):540–545.
- Johns M. Reliability and factor analysis of the Epworth Sleepiness Scale. Sleep. 1992;15(4):376–381.
- Campbell AJ, Neill AM, Scott DA. Clinical reproducibility of the Epworth Sleepiness Scale for patients with suspected sleep apnea. J Clin Sleep Med. 2018;14(5):791–795.
- Omobomi O, Quan SF. A requiem for the clinical use of the Epworth Sleepiness Scale. J Clin Sleep Med. 2018;14(5):711–712.
- Silva GE, Vana KD, Goodwin JL, Sherrill DL, Quan SF. Identification of patients with sleep disordered breathing: comparing the four-variable screening tool, STOP, STOP-Bang, and Epworth Sleepiness Scales. J Clin Sleep Med. 2011;7(5):467–472.
- Keenan, B., Kim, J., Singh, B., Bittencourt, L., Chen, N., Cistulli, P., . . . Pack, A. (2018). Recognizable clinical subtypes of obstructive sleep apnea across international sleep centers: A cluster analysis. Sleep, 41(3), https://doi.org/10.1093/sleep/zsx214