In this post we will analyze the COVID-19 spread in ISRAEL.
For the data analysis we will use python. The data is fetched from the worldometer site using the python requests package. We will create graphs using the python matplotlib package.
The first thing we need to do it to fetch the data from the worldometer site. This is done by fetching data from the https://www.worldometers.info/coronavirus/country/israel/ URL, and using regular expression to extract the dates and the total cases values.
To avoid accessing the site whenever we run the application, we cache the site response into a file. If the file already exists, we use it instead of accessing the site. To reload the site data, we simply delete the cache file.
data_path = 'data.txt'
if not os.path.isfile(data_path):
with open(data_path, 'w') as f:
print("fetching data")
r = requests.get("https://www.worldometers.info/coronavirus/country/israel/")
f.write(r.text)
print("reading data from file")
with open(data_path, 'r') as f:
data = f.read()
print("get only the total data")
position = data.find("Total Coronavirus Cases")
data = data[position:]
print("get dates from data")
categories = re.search("categories: \\[([^]]*)]", data)
categories_non_quoted = categories.group(1).replace("\"", "")
dates = categories_non_quoted.split(",")
print("loaded {} dates".format(len(dates)))
print("get values from data")
values = re.search("data: \\[([^]]*)]", data)
total = values.group(1).split(",")
total = [int(x) for x in total]
print("loaded {} values".format(len(total)))
Next, we create a function to save a graph using the matplotlib.
import os
import re
import matplotlib.pyplot as plt
import requests
def plot(title, x_dates, y_values):
print("create {} graph".format(title))
plt.clf()
plt.title(title.title())
fig, ax = plt.subplots()
ax.set_xticks(range(0, len(x_dates), 10))
plt.xticks(rotation=75)
plt.plot(x_dates, y_values)
plt.minorticks_on()
plt.grid(axis='y', which='major')
plt.grid(axis='y', which='minor', alpha=0.2)
plt.ylim(0, max(y_values) * 1.05)
plt.savefig(title.replace(" ", "_"), dpi=300)
And now we can plot the total cases in ISRAEL:
plot("total cases", dates, total)
To get the following graph:
While we might get a first impression from this graphs, the real information is in the delta - the new cases discovered every day. Let's create this data, and plot the graph.
print("new cases")
new_cases = []
prev = 0
for i in range(len(total)):
new_cases.append(total[i] - prev)
prev = total[i]
plot("new cases", dates, new_cases)
And the graph is:
We can see the new cases per day are fluctuating. A quick examination of the data, reveals that there is a a one week frequency in the fluctuation. This can be explained by the fact that the COVID-19 tests labs are less productive on the weekends.
Let's create a week average for the COVID-19 new cases:
print("new cases weekly average")
new_cases_weekly = []
week = [0] * 7
for i in range(len(new_cases)):
week.pop(0)
week.append(new_cases[i])
new_cases_weekly.append(sum(week) / len(week))
plot("new cases weekly", dates, new_cases_weekly)
And now we get the graph:
That's much better!
We can now clearly see the first infection wave, starting at March, and the second wave starting at June. One last thing we should check is the R0 infection rate. We assume that every new infected person might infect others for a period of 10 days, and find out how many were infected.
print("R zero")
r_zero = []
infection_days = 10
days_range = len(new_cases_weekly) - infection_days
for i in range(days_range):
current = new_cases_weekly[i]
if current == 0:
r_zero.append(0)
else:
r_zero.append(new_cases_weekly[i + infection_days] / current)
plot("r zero", dates[:days_range], r_zero)
and get the R0 graph:
And so, we can see that as of 10 days ago, the R0 factor is below 1, which indicates that the current wave had started its fading about 10 days ago.
No comments:
Post a Comment