We apply the SIR epidemiological model to the COVID-19 outbreak in New York City.
The data used in this study is the reported number of deaths per day. Assumptions are being made about the mortality rate, and for each case we infer the daily contact rate and the removal rate.
The methodology falls under Bayesian school of thought. For a given data set and a model, the most likely values for model’s parameters are being determined.
We end with forecasts about the possible evolution of pandemic.
Epidemiological study COVID-19
Based on the models found at http://www.mtholyoke.edu/~ahoyerle/math333/ThreeBasicModels.pdf
The population of size N is divided in three classes – these are fractions, not numbers:
S – susceptible, ones that can get the disease
I – infected
R – removed, ones that got the disease and developed immunity
The above quantities – S, I, R – are functions of time. The purpose of the model is to understand their evolution.
The model of interest for us is aptly name “SIR”. If the disease does not offer immunity, the class R is missing, and the models are called “SIS”. Simple versions without “vital dynamic”, i.e. naturally occurring deaths and births for relevant time scales are appropriate for our case.
Other quantities:
λ – daily contact rate, average number of “adequate” (i.e. that lead to infection) contacts per infective per day
γ – rate of removal of infected from the infected class (I). Note that some infected are assumed to become not infected immediately – perhaps not very correct, might need to add a latent period.
μ – daily death removal rate
σ = λ/(γ + μ) the contact number;
Immediate consequences from above definitions:
deaths per day: μ I
number of susceptible infected per day from one infected is λ S total number of infected per day λ S N I
proportion of individuals infected “t” time ago that are still infected is exp(-γ t) removal rate from infective class by death and recovery is γ + μ
average number of adequate contacts of an infective during the time is infected (before it dies or
recovers) is σ = λ/(γ + μ) σ S is the average number of infected from one single infected
(N S)’ = – λ S N I
(N I)’ = λ S N I – γ N I
(N R)’ = γ N I
S + I + R = 1
Keep N constant:
S’ = – λ S I
I’ = λ S I – γ I
R’ = γ I
S + I + R = 1
Data
We base our study only on the number of reported deaths per day for the initial phase of the pandemic. The assumptions of the SIR model – the one of a homogeneous population – are likely to hold.
Once the restrictions for social interaction were in place (around March 17), it is likely that the model must be amended.
New York Times curated and presented the data. They asked to provide a link to their page: https://www.nytimes.com/interactive/2020/us/coronavirus-us-cases.html
Figure 1 Data used in this study: number of deaths per day in NYC. The vertical line shows the approximate time of intervention (restriction on social interactions)
Bayesian fitting of epidemiological model
For a given set of parameters λ, γ, μ the SIR model describes the evolution of the three population classes – susceptible, infected, and recovered.
Bayesian framework chooses the values of the unknown parameters that are most likely for the given data.
We chose to focus on the reported deaths only, and this raises an additional problem: the time from initial infection and the first death is also unknown!
With a small trick we should be able to retrieve this piece of information as well.
The Bayesian fitting performs as follows. For a given set of λ, γ, μ, the SIR system of differential equations is solved. The time between the infection and first death is chosen so that the best fit of number of deaths per day is obtained. In other words, once the curve of infected was being determined, it is scaled down by μ – the mortality rate – and translated left to obtain the best possible fit.
The quality of the set of parameters is given by the log-likelihood of this best fit and based on this criterion a proposed set is retained or discarded.
The process is repeated many times and a big (~ few thousands) set of likely values for λ, γ, μ is obtained.
In this report the mortality μ is removed from the set of parameters under Bayesian search. Instead, we make assumptions about some reasonable values it might take, and for each of them the fully Bayesian treatment is applied to λ, γ (daily contact rate and rate of removal).
This is how the model fit the actual data:
Figure 2. Bayesian fitting for four assumptions about mortality rate; 0.1% provides the best fit of data
Figure 3. Minus log likelihood as a fit quality measure (small is better). Note that 0.1% provides the best values – hence it is framed in green.
The fits provide also the likely delay between the first infection and the reported death:
Figure 4. Delay between the first infection and first reported death
The distributions for daily contact rate and daily removal rate are presented not for their actual values but converted into doubling or halving times, that have more intuitive appeal:
Figure 5. Daily contact rate as a doubling time, in days
Figure 6. Daily removal rate as a halving time, in days
Figure 7. Evolution of infected population in New York City. Most likely case framed in green. Time of intervention by dashed line
Note that the best fits – the ones for 0.1% and 0.3% assumed mortality – tell a similar story.