# 📈 Quantifying 'The Curve'

A naive attempt at COVID curve fitting

With COVID-19 being at the top of everyone’s mind, I decided to poke around with
some of the data floating around the internet. It seems that the `Johns Hopkins University`

data-set is among the most used sources.

It also allowed me to acquaint myself with a new and exciting part of science: data science.

I stocked up my toolbox with `Python`

, `Jupyter`

and `pandas`

and went to work.
All my code can be found on GitHub:
https://github.com/JeppeKlitgaard/COVID-19

## Getting the data

It consists of a `covid`

package that contains methods for grabbing the Johns
Hopkins University data as a neatly formatted `pandas.DataFrame`

, indexed by
`country`

and `date.`

In order to get metrics per population a `population`

column is added by using
data from data.worldbank.org via the `wbdata`

Python package and the
`SP.POP.TOTL`

indicator.

We then just iterate over the three metrics and calculate the `metric_per_1M`

column.

## Processing the data

Since the COVID outbreak did not start at the same time everywhere around the
world it makes sense to index the data based on the number of days, `rel_day`

,
since a threshold amount of a given metric was reached. For example, we may want
to index based on the number of days since 10 deaths were recorded for the
country/region.

This is done by applying the `covid.utils.get_x_day`

function country-wise:

A simple mathematical model that fits our data reasonably is a Logistic Fit
model, as described by `covid.statistics.LogisticMode`

which we can fit to using
the `lmfit`

Python package for each country.

## Graphing our data and fit

We can now plot the Johns Hopkins data and our fit using `matplotlib`

, giving
nice graphs with standard errors like the one below. Feel free to download the
git repository and have a play with the data in the `COVID_data.ipynb`

Jupyter
notebook yourself.

In the legend we also find the 3 parameters from our logistic fit, where the `c`

value corresponds to the maximum number of deaths per 1 million population
reached according to the crude logistic model.

As a Dane living in the UK, I find it interesting to see how the governmental responses of Sweden, the UK and Denmark seem to correlate to the number of deaths per population. Particularly Sweden is very comparable to Denmark in many regards, but has chosen a wildly different strategy for dealing with COVID-19.

For the time being, the predicted number of US deaths per population is quite low, but that is probably explained by the fact the outbreak is still fairly contained to a few major cities. The model is unable to take this into consideration, and given the media coverage surrounding the US response so far, it seems likely that the death toll there will be much larger than that described by the model.

### Breaking the curve

In order to see when the curve is “broken”, we can change the `y-axis`

to a log
scale to get:

We see that the US has only just broken the curve, while hard-hit countries like Spain and Italy seem to finally see the results of their strict lock-downs.

### All countries

Here is the graph for all countries — it is a bit busy.

*It should be noted that the tool also does *

`cases`

and `recoveries`

, but
deaths was chosen as this is probably the most reliable number across the
different countries.## Conclusion

For me this was a really nice way of playing around with the rather tragic COVID-19. I was quite surprised to see just how well the logistic model fits the data, though it does not account for the fact that societies will eventually have to reopen, which will likely result in a second ‘wave’.

I hope this was at least interesting and hopefully the git repository can serve as a starting point for the next guy looking to poke around with the data.

NOTE: I myself took
https://github.com/willhaslett/covid-19-growth/
as a starting point for using the Johns Hopkins data and `covid.constants`

is
shamelessly stolen from there.