The Tracking the Spread dashboard is a joint project of Spotlight PA and The Philadelphia Inquirer to monitor and visualize the outbreak of COVID-19 in Pennsylvania and the surrounding region. If you’re a researcher, journalist, or a just curious reader, you might have questions about the data we’re using or how we’re presenting it. This article is for you.
Below, you’ll find information about our sourcing and methodology. If you have a question that you don’t feel has been answered or you have other feedback on how we could improve the dashboard, don’t hesitate to email either Dan Simmons-Ritchie at
[email protected]
or Garland Potts at
[email protected]
Why provide a dashboard?
Spotlight PA and The Inquirer are committed to providing Pennsylvania readers the most accurate and up-to-date information on the COVID-19 outbreak. We believe our dashboard provides a simple interface for tracking the spread in Pennsylvania and surrounding states.
The state Department of Health offers its own interactive
dashboard on its website
that displays Pennsylvania data. However, the department’s dashboard remains cumbersome and difficult to use, particularly on mobile devices. It doesn’t provide data on states outside Pennsylvania and doesn’t prioritize certain data points that we believe are most useful for readers. So, we wanted to create a way to make it easier to understand the spread of COVID-19.
While a number of national media outlets, including
the New York Times
, provide impressive visualizations of the spread in Pennsylvania and nationally, we believe our dashboard better prioritizes data that is most relevant to Pennsylvanians and the decisions they face on a day-to-day basis. Our dashboard also includes certain data points that are not available on these other websites. In addition, because we primarily compile Pennsylvania data from the state Department of Health ourselves, our data is generally updated sooner than other media sources.
How do you compile your Pennsylvania data?
The dashboard provides a way to view both Pennsylvania-specific COVID-19 data and data for neighboring states.
For Pennsylvania, Spotlight PA and The Inquirer compile data each day from the state Department of Health’s interactive COVID-19 dashboard. This collection involves a combination of automated and manual processes. The data we collect from the department is not always perfect: The department does not always update its data at regular intervals and, on some occasions, it will provide inaccurate data that it later corrects.
Sometimes the data on the department’s dashboard doesn’t match the data it has released in other places, like in its press releases or on other pages of its website. We do our best to check the department’s data before publishing it to the dashboard. On some occasions, we will retroactively correct or fill in missing data based on archived information from the Department of Health website.
For hospitalization numbers for Pennsylvania, the dashboard relies on data compiled by the
COVID Tracking Project
. The COVID Tracking Project is a volunteer organization launched by The Atlantic that collects COVID-19 data nationally. While we could compile this data ourselves, we decided that it was easier to rely on COVID Tracking Project because they have been collecting this data since April.
Where do you get your Philadelphia data from?
The Inquirer and Spotlight PA have slightly different versions of the same core dashboard. Readers who view The Inquirer version of the dashboard can switch to Philadelphia-specific data that is unavailable on Spotlight’s version.
The data for Philadelphia is compiled in a similar way to the Pennsylvania data, through a combination of manual and automated collection each day. Similar to the state Department of Health data, Philadelphia’s data is not always perfect. The City of Philadelphia, for instance, does not update its data on weekends.
Where do you get data for the other states from?
We rely on data compiled by the New York Times and COVID Tracking Project for the other states/regions that are selectable on its dashboard. The New York Times and COVID Tracking Project are some of the few organizations that are compiling state-by-state data on COVID-19 in the U.S. Like the data that Spotlight PA and The Inquirer is compiling for Pennsylvania, this data is also not always perfect. Both organizations occasionally have to make retroactive corrections or adjustments to their data to ensure its accuracy and integrity.
Why don’t you just use federal data?
As of July 17, 2020, the federal government has not provided reliable or accurate daily data on COVID-19 cases and deaths in the U.S. Since the beginning of the outbreak in the U.S., news outlets and public health researchers have largely filled the gap: recording and compiling data themselves on a state-by-state basis. As described above, even data provided by state and county officials, as in Pennsylvania, is not always perfect.
Some websites are reporting different numbers than your numbers, why is that?
Because there is no central repository for national COVID-19 data for the U.S., there may be differences in the numbers presented on our dashboard compared to data and visualizations produced by other state or national news websites.
There are a number of reasons why this might occur. This may be because different outlets are collecting their data from different sources or on different schedules. It may be because of errors that public health officials have made in the data that were later corrected but were nevertheless recorded. It may be because certain outlets are focusing on slightly different metrics, for instance, in what is considered a “positive” coronavirus case.
And, as described above for Pennsylvania, there are occasions where the numbers in the Department of Health’s own interactive dashboard may not be the same as those reported in the department’s press materials or on other pages of its website.
We do our best to ensure we are providing readers with the best and most accurate COVID-19 available.
What do you consider a COVID-19 ‘case’?
For all states in the dashboard, “cases” represents the sum of lab-confirmed positive COVID-19 cases and “probable” COVID-19 cases. A probable case is one in which health officials or medical professionals have deemed that a person likely had COVID-19 despite not having a laboratory test and positive result. This determination is made based on criteria defined by the Centers for Disease Control and Prevention. For instance, a person may be considered a “probable” case if they had recent exposure to someone with the coronavirus, like a family member, and then shortly later they developed COVID-19 symptoms. As of July 1 in Pennsylvania, the number of probable cases was much lower than the number of confirmed cases. At that time, the state had about 85,000 confirmed cases and about 2,500 probable cases.
Are positive “antibody” test results included in your case tally?
There are two dominant types of coronavirus tests: polymerise chain reaction (PCR) tests and serological tests. PCR tests are to detect whether a person is currently infected with the coronavirus. The latter test, sometimes called an “antibody” test, is intended to detect whether a person was previously infected with the coronavirus and has since developed antibodies.
For this reason, results from PCR tests are typically used to determine the current number of people in a community who are infected with COVID-19. In Pennsylvania, as of July 1, the Department of Health said that its tally of 85,000 people who have tested positive for the coronavirus is based solely on PCR tests. The number does not include serological tests.
Of the department’s “probable” cases, the department says it does count some people who have had positive serological results. But these are people, the department says, who also have either COVID-19 symptoms or have recently been exposed to someone infected with COVID-19. For that reason, the department considers these people as “probable” COVID-19 cases. As of July 1, among the department’s tally of 2,500 probable cases, 633 people fell into this category.
How reliable are the official tallies of COVID-19 cases?
Spotlight PA and The Inquirer are gathering the best publicly available data that exists on COVID-19. At the same time, it’s well understood by public health researchers that the number of positive cases being reported by state, city, and county health departments is likely to be a significant undercount of the total number of COVID-19 infections.
There are a number of reasons for this undercount. COVID-19 is unusual in both its level of contagiousness and the fact that many people who are infected will not show symptoms. This means that the virus is able to spread rapidly through a community without detection. People may not seek a coronavirus test because they don’t know they are infected. In other cases, a person may be infected and have symptoms but choose not to get tested or, due to a lack of testing availability in their area, may not be able to get tested.
More generally, as the state Department of Health notes on its own website, there are a number of factors that can affect the number of cases reported each day. Beyond the prevalence of the virus itself, the department notes those factors include: “testing patterns (who gets tested and why), testing availability, lab analysis backlogs, lab reporting delays, new labs joining our electronic laboratory reporting system, mass screenings, etc.”
Why do you use “7-day” moving averages in some of your charts?
As described above, there are a number of factors that can influence how many new cases are reported each day by health officials. If you look closely at the charts for new daily cases and deaths, you may notice a particular pattern: Numbers tend to be higher during the middle of the week and lower during the weekends. Statistical analysts call this “seasonality.” It describes data that conforms to a pattern over regular intervals.
The reason for the seasonality in COVID-19 data is widely attributed to irregular data reporting: Officials often don’t report data over the weekend and then catch up during the working week. To provide a clearer understanding of the overall trend, we overlay some charts with a line representing the 7-day moving average of the data. A number of other news outlets, including the New York Times, have taken a similar approach to visualizing the data.
How do you determine the “14-day trend” for counties?
For each state/region, the dashboard provides a table of county-level data including a column labelled “14-day trend.” This provides a description of the trend of new daily cases over the past 14 days as either “rising,” “falling,” or “unclear.”
To make this assessment, Spotlight PA and The Inquirer first calculate the 7-day moving average of new cases for each county for each day over the past 14 days. As described above, using 7-day moving averages is typically viewed as a more reliable way of understanding the trend given the seasonality in the data.
We then analyze this data using a statistical model called “linear regression.” Although the term may sound intimidating, the concept is relatively easy to understand when visualized: Imagine plotting a series of points across an XY-axis and then drawing a straight line to best “fit” all those points. That line is called a “linear regression line.” We calculate regression lines for each county based on their 7-day moving averages of new daily cases over the past 14 days.
We then convert the slope of those regression lines into a special number for each county that, in essence, represents the average percentage change in new daily cases for that county, per day, over the past 14 days.
Here, we made editorial judgements about what constituted a “rising” or “falling” description. If a county’s percent change was greater than 2.5%, our dashboard evaluates that county as having a “rising” trend. If the county’s percentage change is lower than -2.5%, the dashboard evaluates that county as having a “falling” trend. We chose these ranges so that our evaluations of “rising” and “falling” were more likely to err on the conservative side.
We evaluate counties that have an average percentage change between 2.5 to -2.5 percent as having an “unclear” trend.
Separately, our model will evaluate a county’s trend as “unclear” if there appears to be no meaningful trend. In statistics, this determination is typically made by interpreting a special number calculated from the regression analysis called a “P value.” If a county’s regression line has a P value above 0.05, the dashboard interprets its trend as “unclear.” In statistics, it’s common practice to interpret a linear regression with a P value above 0.05 as having no meaningful trend.
We consulted Krys Johnson, an epidemiologist at Temple University, and launched the “trend description” feature of the dashboard in June.
How reliable is your “14-day trend” analysis?
While the math behind our trend analysis is complicated, it’s worth noting that, in statistical terms, it’s a relatively simplistic model. Linear regression is a common method of analysis used in statistics but, in order to chart the spread of disease, epidemiologists create far more sophisticated models that rely on multiple variables.
When reading the dashboard’s trend descriptions, care should be taken in particular when interpreting the results for sparsely populated counties. Many of these counties have relatively few new daily cases each day. In these situations, the dashboard may readily evaluate a county as having a “rising” or “falling” trend based on small movements in the data.
For these reasons, in order to understand the overall trend of their county, we urge readers to look closely at the number of new daily cases over a longer period of time, and to also consider hospitalizations and other indicators. We also strongly advise readers to follow the advice and guidance of public health officials in their communities. Our trend descriptions are intended as a helpful way of understanding the data at a glance in your county, but they are not intended to supersede or replace the judgements of local officials or public health experts.
Why don’t you include other types of data?
New types of COVID-19 data are constantly being published by state officials and public health researchers. Readers sometimes contact us about other types of COVID-19 data they’d like to see on the dashboard. We appreciate and welcome all suggestions from readers. Because of the work involved in adding and maintaining new data sources, however, we think carefully before adding new data to the dashboard. Our preference is to include data that we know comes from a reliable source, will be reliably updated each day, and is provided in a machine-readable format with a structure that is unlikely to change over weeks or months. For these reasons, we may be unable to immediately incorporate new types of data that are made available. However, we are constantly evaluating whether new sources should be included in the dashboard and we continue to appreciate reader suggestions.
CHANGE LOG
As of July 13, in order to be as transparent as possible about changes we are making to our data dashboard, all of them will be documented here:
July 15, 2020:
In the Testing section for all states, positive test numbers now represent only lab-confirmed positive test results. Prior to this, “positive” test numbers were derived from each state’s total tally of cases, which could include both lab-confirmed positive test results and “probable” cases. This change, coupled with changes made on July 13, ensure that the dashboard is presenting the most accurate information on each state’s number of lab-confirmed cases and its percentage of positive tests.
For all states except Pennsylvania on the dashboard, these data changes are retroactive. Because we use data compiled and archived by the COVID Tracking Project , the “positive” tally for each day in the “Total Tests” chart for these states represents only lab-confirmed positive tests.
For Pennsylvania’s testing data, however, these data changes are retroactive only to July 13. That means that in the “positive” test tallies for each day prior to July 13 in Pennsylvania’s “Total Tests” chart, some probable cases are included. Please note, as described in the change log’s July 13 note, “total test” tallies prior to July 13 also include probable tests.
On and after July 13, however, the “positive” test numbers for Pennsylvania represent only lab-confirmed positive results. And, as described in the July 13 change log note, “total tests” tallies for Pennsylvania only include lab-confirmed positive results.
July 13, 2020:
Total test numbers for Pennsylvania are now calculated based on the sum of lab-confirmed positive test results and negative test results.
Prior to this, the number was derived from the sum of “cases” and negative test results. The problem with this approach is that the department’s “cases” tally includes both lab-confirmed positive results and “probable” cases. Although the number of probable cases included in the department’s “cases” tally is relatively small, this change ensures that the dashboard accurately reflects the total number of people tested in Pennsylvania.
These data changes for Pennsylvania are not retroactive. That means that in the total test tallies for each day prior to July 13 in the “Total Tests” and “Tests per Day” charts some probable cases are included. On and after July 13, the total test tallies exclude “probable” cases.