About Me
I'm a data scientist, astronomer, and chronic overthinker who's spent the last decade trying to figure...
This was something I put together for the interest of colleagues at work, and I got some good feedback from it so I thought I'd share it publicly. I want to preface this with the fact that I'm neither a physician, an epidemiologist, a virologist, nor any other kind of medical expert (I don't even play one on TV!), and I don't work even tangentially in healthcare. I'm just a numbers monkey who reads the news.
I've seen a lot of people on every possible social network posting graphs of the local COVID-19 infections. Sometimes they're line graphs, sometimes they're bar charts, sometimes they're even line graphs and bar charts, but I haven't seen much in the way of model fitting. So, I thought I'd throw a couple of basic models at the data and see if they met expectations.
Given an unlimited number of free hosts, diseases infection counts would grow exponentially, with the number of new cases tomorrow being a multiple of the total number of infections today. In the real world, however, a disease's resources are limited: Populations are limited, and people who are already infected (or who have recovered) cannot be infected again. As time progresses, it gets more and more difficult for the disease to find (if you will) new hosts. This results in the number of cumulative cases breaking from an exponential growth curve into something that slows and eventually stalls: An S-shaped logistic curve.
Epidemics will approximate a logistic when starved for whatever reason: Either it burns through the community and runs out of fuel of its own accord, or we starve it by limiting our own exposure to contagious people.
We see exactly this S-shaped curve in the provincial data:
It's interesting to note that the shape of the data points changes slightly after April's weather started to warm up and days got sunnier. With more people outside, the infection rate appeared to lower, though this could also have been a coincidence caused by the infections in long-term care homes (LTCHs) peaking around 20 April.
If we look at the number of new cases each day, what we see fits reasonably well to a normal distribution, or the so-called bell curve:
The curve is seen to over-estimate the number of cases on the back end, though, and under-estimates the height of the peak. This is curious, as the expected deviation from a normal curve is for the infection rate to remain higher than this curve would predict, maintaining a longer right tail, not a shorter one.
We may be able to explain this by thinking of the COVID-19 pandemic as being two mostly parallel events: A community pandemic, and an institutional pandemic. In Nova Scotia, the institutions that dominate the institutional spread of the disease are LTCHs.
Looking at the number of new cases in the community each day, we see a much better fit to a normal curve, especially in the back end. This really points to the physical distancing measures that have been put in place in the province working, with the community infections beginning to drop off after April 8th.
New LTCH cases, on the other hand, fit best to a log normal curve, which has a steeper incline than a normal curve at the start, but a longer tail at the end:
The steeper incline and longer tail are subtle in the LTCH data, but the fact that this is a better fit than a standard normal curve points to the difficulty in isolating healthy residents from contagion in a mostly insulated environment. It's interesting to note that the LTCH infections both start and peak much later than the community infections, and also that the infection curve is much narrower, with a more well-defined peak. This is the difference between a flattened curve (community) and a less well controlled outbreak (LTCH).
If we assume that these two populations (community and LTCH) are independent from each other (and if care home staff are being careful about isolating themselves, this is a reasonable assumption, if not exactly true), we can just add these two infection curves together to get a new estimate for the overall daily infection rate in the province:
As we can see, this new estimate holds much closer to the centre of the data points on the back-end than the wholesale fitting of a normal curve to the total counts did, and it also chases the very high counts seen in mid-April.
Conveniently for us here in the province, both LTCH and community spread appears to be petering out, with average infection rates dropping to near zero by mid-May.
So, why did the LTCH cases start so much later than the community cases? The LTCHs were insulated from the community. Ideally, that would have meant that no cases would have occurred in LTCHs, but that insulation, obviously, wasn't perfect. Staff members likely got sick due to community spread, and then introduced it into the care homes. Once there, it spread very quickly.
Can we identify how, exactly, this occurred? No, though we can look at when the infections are first detected and compare that to reported exposure events:
We can see from Figure 6 that the infections started in the LTCHs, and Northwood care homes in particular, after community exposures occurred on several Halifax Transit bus routes (light grey) and at the Dartmouth Superstore (light blue-green). This isn't to say that LTCH workers were exposed to and infected by the SARS-CoV-2 virus on those bus routes, or at that grocery store, but if they were, that would fit reasonably well with the timing of the LTCH outbreak.
Does anyone have any thoughts or feedback? I don't have regional LTCH data, so I couldn't do a community/LTCH split for regions (at least for regions outside of Central, since I can just assume that Northwood dominates LTCH numbers there), but I could fit some models to the regional data if anyone was interested.
I'm a data scientist, astronomer, and chronic overthinker who's spent the last decade trying to figure...
Table of Contents
Get a list of files in a zip archive
Welcome to my home on the web!
I'm a Senior Product Data Scientist who spent a decade in games and...
This was something I put together for the interest of colleagues at work, and I got some good