Making Sense of that Mask Study

We should take a moment to talk about the latest mask study that's come out, because it's pretty neat but does seem to be leading to some questionable claims in popular media. You can read the study for yourself here.

The Study

The first, and probably most important thing, to note is that the study isn't designed to test which mask types are most effective. Instead, the purpose of the study is to present a novel and less expensive protocol for testing mask effectiveness. This is because they're only really trying to see if the protocol is capable of detecting variance between masks effectively. As the authors say:

Below we describe the measurement method and demonstrate its capabilities for mask testing. In this application, we do not attempt a comprehensive survey of all possible mask designs or a systematic study of all use cases. We merely demonstrated our method on a variety of commonly available masks and mask alternatives with one speaker, and a subset of these masks were tested with four speakers.

The protocol is pretty cool (or I'm just a sucker for studies using lasers). In essence the speaker places their face in the hole at the front of the mechanism and then speaks. The laser highlights the droplets exhaled, and a cell phone camera tracks and counts them for about 40 seconds. Because the researchers are cheeky, and because it's important to capture exhalation with several sound types including plosives, the phrase that was tested was "Stay healthy, people".

The testing protocol had the participant test each mask 10 times, and take a sip of water between each take. They also tested with no mask for control.

This is where major limitations start to come in. You'll notice I said "the participant" and not "the participants". For their test of 14 masks, they only had one person testing the masks. After this test, they added an additional three persons and tested four masks (surgical, cotton, and bandana) across this group of four total speakers. They also did trials with all four participants wearing no mask, including a second set of no-mask trials for the first participant. The differences in no-mask droplet count for each person is included here:

Curiously, speaker 1 seems to be producing 3-4x as many droplets as the other speakers, and seems to have a much wider range of droplet expulsion. That variance seems potentially significant, and makes me wish there were more participants in the test.

Mask Results

Of course, you want to know about the masks. These are the masks that were tested:

1. ‘Surgical’ (Surgical mask, 3-layer), 2. ‘Valved N95’ (N95 mask with exhalation valve), 3. ‘Knitted’ (Knitted mask), 4. ‘PolyProp’ (2-layer polypropylene apron mask), 5. ‘Poly/Cotton’ (Cotton-polypropylene-cotton mask), 6. ‘MaxAT’ (1-layer Maxima AT mask), 7. ‘Cotton2’ (2-layer cotton, pleated style mask), 8. ‘Cotton4’ (2-layer cotton, Olson style mask), 9. ‘Cotton3′ (2-layer cotton, pleated style mask), 10. ‘Cotton1’ (1-layer cotton, pleated style mask), 11. ‘Fleece’ (Gaiter type neck fleece), 12. ‘Bandana’ (Double-layer bandana), 13. ‘Cotton5′ (2-layer cotton, pleated style mask), 14. ‘Fitted N95’ (N95 mask, no exhalation valve, fitted)

So, let's see how they did. This plot shows the relative proportion of droplets that passed through the mask as compared to no mask, as depicted on a logarithmic scale. Keep in mind that error bars with a solid dot only show the range for Speaker #1 (Mr. Spit Talker), while the error bars with open circles show incorporate data from all four speakers.

This results in some interesting data. First, it's very clear that the Fitted N95 was most effective for Speaker 1! For our other masks, we can see that they all help some, though there seems to be a lot of variation between persons (what is going on with Cotton 5?). The other weird finding is that the fleece mask seems to have generated MORE particles than no mask alone. The authors speculate this is because it might break larger droplets into smaller droplets.

Well, maybe. Let's look at relative droplet counts over time for the masks that did get multiple testers:

It seems like the bandana had a LOT of variability between users, ranging from minimally useful for Speakers 2 and 3 to surprisingly effective for Speaker 4. Curiously, Speaker 4 also had significantly poorer results with the Cotton mask than any of the other speakers. It seems possible that if Speaker 4 had been the speaker to test all 14 masks, we might be talking about how bandanas work almost as well as surgical masks!

So what gives? Well, a few things are going on here. Because we only have four participants, we can't really account for inter-subject variation effectively; and the authors rightly say so:

Inter-subject variations are to be expected, for example due to difference in physiology, mask fit, head position, speech pattern, and such.

This may be compounded by the droplet count only captures droplets in a limited physical range, which means that droplets that are directed elsewhere aren't contained. Depending on mask fit and head position, this could vary a lot. There are other significant measurement issues the authors identify that probably aren't of interest to casual readers, and the authors do suggest technical solutions for other researchers to use.

So what's the take away? Most importantly, the study shows the new measurement device works well and can be cheaply constructed. That's a big deal! It also shows that there's variation in droplet release between masks that's worthy of further investigation with more rigorous testing methods, but it probably doesn't mean that we should burn everything that's not a fitted N95. In fact, the masks by and large all had significant reductions in particles ("all masks have won and all must have prizes" - The Dodo) as compared to no mask at all. They also suggest something that I think is brilliant - using this mechanism to train people on how to effectively wear their masks.

I also encourage you to look at this twitter thread and article by Katherine Ellen Foley . This article by Yasemin Saplakoglu is also good. As always, I'm not an expert on this - I'm just reading the paper and interpreting what I see. I always welcome more nuanced assessments and data from other sources, and defer to the researchers who say you should “absolutely not” interpret this as final evidence about the effectiveness of bandanas or other mask types.