An analysis of SARS – CoV2 Data and Forecasts : Introduction

An analysis of SARS – CoV2 Data and Forecasts : Introduction

Categories: Blog, COVID-19
Comments Closed

We are analyzing the existing available data on daily deaths caused by the SARS-CoV2 virus and use that in conjunction with certain simple models to predict the evolution of the disease in certain geographical areas.

Selected Data

While we have a lot of data available, including numbers of confirmed cases, numbers of recoveries and numbers of deaths, we are only going to use the number of deaths. One reason for this is that the number of confirmed cases is highly dependent on the testing policies for each geographical area and they have no uniform predictive power over the outcomes of the disease progression. We also are not using the number of confirmed recoveries, because each recovery is tied to a confirmed case, and hence we would expose ourselves to the same discretionary policies. We are going to confine our analysis solely to the confirmed deaths data, as that is much more immune to various testing policies.

Geographical regions

We are going to cover for the purpose of this entry only data from certain regions in Europe and North America, as for these regions it is less likely for deaths to be misreported, or to be counterfeited by various governments. So for now, the analysis is done on the following regions in North America: Washington, California, Colorado, New York and New Jersey. Also, it is done on the following regions in Europe: Italy, Spain, Germany, Netherlands. Few more regions will be added in the next few days.


During the initial stages of an outbreak, the growth in casualties grows exponentially. For this period we are using a simple exponential function:

When we run a fit for this model, we realize that the c and a parameters are degenerate, so the fit can’t differentiate between the 2 of them. In order to get rid of that degeneracy, we need to eliminate of of the, and our choice was to eliminate the c parameter. So for the initial stages of the outbreak, we chose this formula:

Here, b is the rate of the exponential growth and a is the start date, with the caveat that it also contains information about the size of the outbreak.

As the outbreaks growth, we tend to see a flattening and then a plateauing in the total number of fatalities. Also, as the outbreak matures, we also have more data points, allowing us for a better fit with more parameters. In that case, the formula that best approximates the more mature outbreak including all its phases is:

Your feedback is important to us. Please send it to

Comments are closed.