+34 606 589 356

Analysis of COVID-19 data in Spain

[5 minute reading]

From ZZ Data Labs we wanted to help understand the extent of the COVID-19 epidemic. During 15 weeks, we presented several studies that helped to understand the disease.

Here you will find the historical data of infection since we have records, and predictions made based on artificial intelligence algorithms that we readjust daily until May.

The data used for the training of our models were obtained from official bodies: the Ministry of Health, the National Institute of Statistics and the National Geographic Institute.

Estimated number of cases in Spain in the coming weeks

During a whole week, from March 12th to 19th, we worked on an artificial intelligence algorithm capable of estimating the number of simultaneous positive cases in Spain for the following weeks following several hypotheses:

  • Percentage of citizens interacting with others outside their place of residence
  • Movements between municipalities in Spain
  • Average number of people infected by a carrier of the virus
  • Number of undetected positive cases
  • Average daily percentage of recovered or deceased

Below is the high-level diagram of our solution, which was able to predict the number of infections at the municipal level, as well as hospital admissions by province.

High level diagram (Spanish) of the artificial intelligence algorithm developed to predict contagion by municipality and hospitalised patients by province

The result of our analysis is shown in the following graph, which includes three scenarios: low, medium and strict confinement. Rigidity in confinement compliance was key to reducing the number of simultaneous infections.

Prediction from March 12 (beginning of the outbreak) of the number of active infections until mid-April, following different scenarios.

Density of cases per 10,000 inhabitants in each Autonomous Community

Given the large amount of information we receive every day, it is often difficult at times to find the key data. One of the most relevant is the relative number of infections per inhabitant, which is directly related to the probability of infection. In the following map we can see how Madrid was the most affected community.

Number of counts per 10,000 inhabitants.

Lethality of the disease by sex and age

In the first weeks after the March outbreak, ZZ Data Labs was among the first to add information by sex and age and to provide information on the lethality of the virus. In this case we can see how the lethality in age segments above 80 years was above 10% in women and 15% in men.

Lethality of the disease by sex and age during the first weeks

Next steps and more advanced models

The rapid response of ZZ Data Labs reached the ears of the press and institutions and the initial model continued to evolve with the mobility data provided by the Secretary of State for Digitalization and Artificial Intelligence and the collaboration with the Technological Institute of Aragon.

The evolution of the model led to results of great utility for the institutions and of a higher technical level, which resulted in the elaboration of a scientific article that will be published in the journal PLOS ONE. A preliminary version of this article is available on demand.

The model developed for institutional use consists of a Markov Chain of 10 states and more than 30 transitions for more than 3000 different regions of Spain with a population of up to 5000 inhabitants per region.

The block diagram of the solution, much more complex than the initial solution, is as follows:

Block diagram of the Markov Chain developed to predict the expansion of COVID-19 in Spain
Close Menu