GA COVID-19 Report July 28, 2020
Daily Summary & Notes
Today’s report uses the data from the 2:50PM Report from the GA Department of Public Health.
Today we saw 4209 new cases (our record for new cases is 4813), which brings us to 26064 in the past 7 days (14.9% of total cases so far). We also had 54 new deaths (our record for new deaths 100), which brings us to 309 in the past 7 days (8.7% of total deaths so far). We saw 406 new hospitalizations (our record is 447), bringing our 7-day count to 2050 (11.7% of total hospitalizations so far). Lastly, we had 64 new ICU admissions (85 is the record), bringing our 7-day count to 332 (10.3% of total ICU admissions cases so far).
For testing, we saw 34363 new COVID19 tests, bringing us to 192035 in the past 7 days (12.9% of total COVID19 tests so far). We also saw 487 new antibody tests, bringing us to 13493 in the past 7 days (6.2% of total antibody tests so far).
Numbers are back up today — we’re still in the middle of a pandemic. Today I want to take a moment to highlight improvements in DPH’s reporting. On July 17th, this tweet went viral:
While the criticisms of DPH’s reporting are not new (it’s one of the reasons this report exists), it’s virality sparked a number of critcial articles and some rebuttals as well. Personally, I think it’s a healthy exercise in continuing to monitor the quality of the information that’s being offed to the public, and how its presentation affects public perception of the outbreak. With that in mind, I think it’s only fair to recognize where progress has been made. As of today, GA is changing their visualization to not use the deceptive 4-color bins with shifting ranges. Instead, they’re doing this:
The case count graph, which I’ve pointed out misleadingly makes it look like we’re on a constant decline, now looks like this:
This is progress. There’s still an absolute dearth of publicly available longitudinal data or case level data, which is a major limitation for independent monitoring. And there’s an alarming lack of data on race for a massive number of patients:
But we should acknowledge the progress in reporting that is occurring. Maybe one day they’ll release my fantasy data sets listing each indicator by county by day, and listing each case by age, county, specific pre-existing conditions, outcomes, and duration of treatment. That latter one would probably have to be a protected data set, but I can still dream.
You can access an interactive version of these graphs, including embedded data here.
Prior to 5/11, all data is taken from the noonish update from the GA Department of Public Health to present even time intervals between data points which is important for graph interpretation. On 5/11, reporting schedule shifts to being at 9AM, 1PM, and 7PM, so this report will capture to the 1PM reporting time. On June 2nd, reporting was reduced to once a day at 3PM. Data does reflect multiple inefficiencies and inaccuracies in the current reporting system, including showing tests before their results are returned, delays in reporting on weekends that create artificial spikes and valleys in change data. In general, interpretation should examine the general trends, and not focus exclusively on endpoint trajectories, which are highly influenceable by these data variations.
To help visualize the effects of State actions on the outbreak, I’ve added a few sets of lines to several of the graphs. The first — the vertical blue lines — show when the state of emergency went into effect (3/15; solid line) and when we might expect to see first effects from it (dotted line). The second — vertical red lines — is the Friday Shelter in Place was instituted (4/3; solid line) and the date we might expect to see first effects (dotted line). The third — vertical pink lines — show when the shelter in place was lifted (4/30; solid line) and the date we might expect to see first effects (dotted line).
In addition, to help visualize change in graphs using cumulative data that spans large counts, both linear and algorithmic scales are offered. You can read more on interpreting graphs using log scales here.
Where point data is presented, a LOESS regression with 95% confidence intervals is shown to help the viewer interpret overall trends in the data. This is preferred over a line graph connecting all points, which tends to over-emphasize outliers in report.
Cumulative Confirmed Cases
Cumulative ICU Use
Count Level Tracking
Z Score Fluctuations
Because percentage growth becomes misleading over time, I’ve added a floating 4-week Z-score visualization for each measure to help put into perspective the magnitude of daily variation in numbers.
For those who don’t spend a lot of time in the world of statistics, a Z score is a measure that describes the relationship of an observation (in this case, a particular day’s number) to the average across the entire group. It is calculated by taking the difference between the observation and the mean, and dividing by standard deviation.
Z = (Observed Score — Mean) / Standard Deviation
For example, if the mean score for a group is 50, and the standard deviation is 10, then a score of 60 woud have a Z score of (60–50) / 10 = 1, and a score of 20 would have a Z score of (20/50) / 10 = -3.
This can be useful in identifying patterns in data reporting, and help put daily fluctuations in perspective. Because the data is more localized, it doesn’t fall victim to the diminishing returns effect. These visualizations are limited to the data from the last 30 days, which further helps illustrate trends and fluctuations.
For today’s cases, the 30-day mean is 3261.4 and the standard deviation is 802.04.
For today’s hospitalizations, the 30-day mean is 227.77 and the standard deviation is 133.87.
For today’s deaths, the 30-day mean is 26.17 and the standard deviation is 23.08.
For today’s ICU Admissions, the 30-day mean is 32.27 and the standard deviation is 21.97.
These graphs contain several markers that reflect the changing nature of the testing data that has been provided over time.
As of 4/28 specific counts of the number of tests administered by the government and commercial providers stopped being reported. Additionally, on this date we began to track data on the number of positive tests conducted by the CDC.
On 5/27, specific counts of serology tests (antibody tests) became available, which had previously been aggregated into the total test count. This date has been marked with a vertical gold line on the graphs. This distinction is important, as positive antibody tests do not result in new cases in the overall count, and thus both suppress the positive test rate and artificially inflate estimates of test prevalence. The daily data for daily COVID19 tests and serology tests is tracked starting on this date.
Positive Tests by Source
Total Testing Trends
For today’s new tests, the 30-day mean is 24766.63 and the standard deviation is 7239.92.
COVID19 Molecular Testing Trends
For today’s new tests, the 30-day mean is 22679.57 and the standard deviation is 6785.7.
COVID19 Antibody Testing Trends
For today’s new tests, the 30-day mean is 2087.03 and the standard deviation is 1318.55.
Is Increased Testing Causing Increased Cases?
A popular talking point recently is that the increase in cases that are being detected is not reflective of increased spread, but rather a result of increased testing. There is a certain logic to this — the more tests that are run the more potential cases we can identify. However, this can lead us to significant logical errors, and these in turn can lead to dangerous behaviors. While our data does not allow a perfect causal analysis, we can examine what associations between testing and cases exist in our data.
If we run a simple correlation between total number of tests and total number of cases, we get an initially persuasive graph. Note that this graph includes both antibody and molecular tests.
This gives a correlation of 0.98! This is inviting, but it mostly just shows that both of these numbers are increasing. This is potentially misleading because it looks at cumulative data. In fact, if we run a correlation between the total number of tests administered and a simple series of ascending numbers (1, 2, 3, etc.) we get a correlation of 0.97. Because our hypothesis (increased testing causes increases in reported cases) is more about fluctuations in these two variables than cumulative growth, we need a different analysis.
If we look at the daily increase in cases against the daily increase in tests, we get a different picture:
This gives us a correlation of 0.7575566. But this number is also misleading, because there are significant time lags in reporting of tests and new cases within the data.
To better assess the relationship, let’s look at 10-day moving averages for both new tests and new cases, and see what correlation exists between them. This will help balance out the issues of delayed results.
This gives us a correlation of 0.82. By the observational nature of our data, we can’t infer causation, and we can’t remove eliminate extraneous factors.
As I’ve watched this plot evolve over the past few weeks, I think it’s starting to become clear that we have two different distributions happening. The first is the relatively flat group of cases you see across the bottom, which has characterized most of our COVID19 response. During this time, we had relatively stable case numbers regardless of whether we saw heavy testing days or light testing days. In this group, it’s pretty clear that testing and number of cases detected aren’t strongly related. However, we also see a second group of cases, which seem to veer upwards rather abruptly at around 12500 cases per day. These data points are more recent. If our data was limited to these cases, you could make a case that there’s a strong association between increased testing and increased case identification. As it is, our correlation estimate ends up sitting somewhere between the two lines.
Having watched the data evolve, my hypothesis is that the increase in testing is a response to the increase in cases. of particular relevance to this hypothesis is the gap between our two “groups” above the 15000 test per day level. I think if testing were the driver of new case identification, then we’d see greater variation at the higher testing levels, rather than two distinct low variation areas forks. When we consider the hypothesis that increased case reports spurred an aggressive increase in testing, the pattern here makes more sense. When the number of cases detected at ~12500 tests per day began increasing, testing itself was escalated to try to keep up. This hypothesis would also account for the differing curves between new tests per day and new cases per day.
Comorbidity (Written 7/15/2020)
I think today is a good time to remind people about comorbidity risks. I often see people insist that they have no risk because “only people with pre-existing conditions get COVID”. While pre-existing conditions are associated with increased risk, this misses both that healthy people with no prior conditions get COVID, and that what’s counted as pre-existing conditions is pretty broad. The GA DPH website indicates that the following are considered comorbid conditions in COVID19 data reporting: Chronic Lung Disease, Diabetes Mellitus, Cardiovascular Disease, Chronic Renal Disease, Chronic Liver Disease, Immunocompromised Condition, Neurologic/Neurodevelopmental Condition, and Pregnancy. These are very prevalent conditions here in Georgia — Over 6.9% of adults have COPD or other lung disease, more than 1 in 10 Georgians have diabetes, and more than 1 in 3 Georgians have some sort of cardiovascular disease. I could pull stats fo r the other conditions listed, but the implication is clear — a large proportion of our citizens are at elevated risk. Most people likely either have one of these comorbidities, or are close to someone who does, and don’t recognize the risk.
As always, I am not trained in epidemiology, and defer to recognized experts in the field on all issues. These analyses and commentary are solely designed to help lay persons approach the publicly available data and larger public health conversations.
Wash Your Hands.
Wear a Mask.
Code and data available here. Analysis conducted using R.