This was something I put together for the interest of colleagues at work, and I got some good feedback from it so I thought I'd share it publicly. I want to preface this with the fact that I'm neither a physician, an epidemiologist, a virologist, nor any other kind of medical expert (I don't even play one on TV!), and I don't work even tangentially in healthcare. I'm just a numbers monkey who reads the news.

I've seen a lot of people on every possible social network posting graphs of the local COVID-19 infections. Sometimes they're line graphs, sometimes they're bar charts, sometimes they're even line graphs and bar charts, but I haven't seen much in the way of model fitting. So, I thought I'd throw a couple of basic models at the data and see if they met expectations.

Given an unlimited number of free hosts, diseases infection counts would grow exponentially, with the number of new cases tomorrow being a multiple of the total number of infections today. In the real world, however, a disease's resources are limited: Populations are limited, and people who are already infected (or who have recovered) cannot be infected again. As time progresses, it gets more and more difficult for the disease to find (if you will) new hosts. This results in the number of cumulative cases breaking from an exponential growth curve into something that slows and eventually stalls: An S-shaped logistic curve.

Epidemics will approximate a logistic when starved for whatever reason: Either it burns through the community and runs out of fuel of its own accord, or we starve it by limiting our own exposure to contagious people.

We see exactly this S-shaped curve in the provincial data:

r/NovaScotia -  Figure 1. The cumulative count of COVID-19 cases in Nova Scotia. The  blue line shows the best fit of the data to a logistic curve. The pink  line shows the best fit to the data in early April, constrained as best  as possible to the Chief Medical Officer's prediction that the  infections would peak "at the end of April". The bars at the bottom show when we might expect to see the impact of known COVID-19 exposures and government actions in the data itself.
Figure 1. The cumulative count of COVID-19 cases in Nova Scotia. The blue line shows the best fit of the data to a logistic curve. The pink line shows the best fit to the data in early April, constrained as best as possible to the Chief Medical Officer's prediction that the infections would peak "at the end of April". The bars at the bottom show when we might expect to see the impact of known COVID-19 exposures and government actions in the data itself.

It's interesting to note that the shape of the data points changes slightly after April's weather started to warm up and days got sunnier. With more people outside, the infection rate appeared to lower, though this could also have been a coincidence caused by the infections in long-term care homes (LTCHs) peaking around 20 April.

If we look at the number of new cases each day, what we see fits reasonably well to a normal distribution, or the so-called bell curve:

r/NovaScotia -  Figure 2. Total new infections reported each day across Nova Scotia.
Figure 2. Total new infections reported each day across Nova Scotia.

The curve is seen to over-estimate the number of cases on the back end, though, and under-estimates the height of the peak. This is curious, as the expected deviation from a normal curve is for the infection rate to remain higher than this curve would predict, maintaining a longer right tail, not a shorter one.

We may be able to explain this by thinking of the COVID-19 pandemic as being two mostly parallel events: A community pandemic, and an institutional pandemic. In Nova Scotia, the institutions that dominate the institutional spread of the disease are LTCHs.

r/NovaScotia -  Figure 3. New community infections reported daily across Nova Scotia, calculated by taking the total reported cases and subtracting the reported LTCH cases.
Figure 3. New community infections reported daily across Nova Scotia, calculated by taking the total reported cases and subtracting the reported LTCH cases.

Looking at the number of new cases in the community each day, we see a much better fit to a normal curve, especially in the back end. This really points to the physical distancing measures that have been put in place in the province working, with the community infections beginning to drop off after April 8th.

New LTCH cases, on the other hand, fit best to a log normal curve, which has a steeper incline than a normal curve at the start, but a longer tail at the end:

r/NovaScotia -  Figure 4: New LTCH infections reported daily across Nova Scotia. On  dates where these values were not reported explicitly by the province,  data from Northwood's website was used to adjust previously reported  values.
Figure 4: New LTCH infections reported daily across Nova Scotia. On dates where these values were not reported explicitly by the province, data from Northwood's website was used to adjust previously reported values.

The steeper incline and longer tail are subtle in the LTCH data, but the fact that this is a better fit than a standard normal curve points to the difficulty in isolating healthy residents from contagion in a mostly insulated environment. It's interesting to note that the LTCH infections both start and peak much later than the community infections, and also that the infection curve is much narrower, with a more well-defined peak. This is the difference between a flattened curve (community) and a less well controlled outbreak (LTCH).

If we assume that these two populations (community and LTCH) are independent from each other (and if care home staff are being careful about isolating themselves, this is a reasonable assumption, if not exactly true), we can just add these two infection curves together to get a new estimate for the overall daily infection rate in the province:

r/NovaScotia -  Figure 5. Daily new infections reported across Nova Scotia. The red line  represents the best fit to community cases, the gold line the best fit  to LTCH cases, and the blue line is the summation of the community and  LTCH curves.
Figure 5. Daily new infections reported across Nova Scotia. The red line represents the best fit to community cases, the gold line the best fit to LTCH cases, and the blue line is the summation of the community and LTCH curves.

As we can see, this new estimate holds much closer to the centre of the data points on the back-end than the wholesale fitting of a normal curve to the total counts did, and it also chases the very high counts seen in mid-April.

Conveniently for us here in the province, both LTCH and community spread appears to be petering out, with average infection rates dropping to near zero by mid-May.

So, why did the LTCH cases start so much later than the community cases? The LTCHs were insulated from the community. Ideally, that would have meant that no cases would have occurred in LTCHs, but that insulation, obviously, wasn't perfect. Staff members likely got sick due to community spread, and then introduced it into the care homes. Once there, it spread very quickly.

Can we identify how, exactly, this occurred? No, though we can look at when the infections are first detected and compare that to reported exposure events:

r/NovaScotia -  Figure 6. Cumulative COVID-19 infections in LTCHs across Nova Scotia. The bars at the bottom show when we might expect to see the impact of known COVID-19 exposures and government actions in the data itself.
Figure 6. Cumulative COVID-19 infections in LTCHs across Nova Scotia. The bars at the bottom show when we might expect to see the impact of known COVID-19 exposures and government actions in the data itself.

We can see from Figure 6 that the infections started in the LTCHs, and Northwood care homes in particular, after community exposures occurred on several Halifax Transit bus routes (light grey) and at the Dartmouth Superstore (light blue-green). This isn't to say that LTCH workers were exposed to and infected by the SARS-CoV-2 virus on those bus routes, or at that grocery store, but if they were, that would fit reasonably well with the timing of the LTCH outbreak.

Does anyone have any thoughts or feedback? I don't have regional LTCH data, so I couldn't do a community/LTCH split for regions (at least for regions outside of Central, since I can just assume that Northwood dominates LTCH numbers there), but I could fit some models to the regional data if anyone was interested.

Recent Articles