Data

Telecom Serbia is the national telecom operator in the Republic of Serbia. Under the very strict NDA, Telecom Serbia granted access to anonymized call detail records (CDR) data to BioSense Institute exclusively, for research purposes. Three types of data were provided for the first six months in 2020:

  • Activity data set – indicates telecommunication activity on radio base stations across the country
  • Connectivity data set – reflects telecommunication connections between radio base stations across the country
  • Mobility data set – shows spatial flow of users’ based on telecommunication activities on radio base stations across the country

 

Telecom activity data reflects the amount of telecommunication activity that occurred on a particular radio base station.  In the dataset different types of telecom activity can be distinguished:

  • Call In – activity which is proportional to the amount of received call in a radio base station in defined time step
  • Call Out – activity which is proportional to the amount of generated calls in a radio base station in defined time step
  • SMS In – activity which is proportional to the amount of received SMS in a radio base station in defined time step
  • SMS Out – activity which is proportional to the amount of sent SMS in a radio base station in defined time step
  • Uplink bytes – activity that is proportional to the amount of uploaded bytes in a radio base station using an internet service of telecom provider in defined time step
  • Downlink bytes – activity that is proportional to the amount of downloaded bytes in a radio base station using an internet service of telecom provider in defined time step
  • Number of sessions – activity that is proportional to the number of internet sessions user made in the a radio base station domain using an internet service of telecom provider in defined time step

Moreover, data includes activity of the international users using the Telecom Serbia network while roaming. We obtained aggregated activity data on a temporal level of one hour per country code that indicates the origin of the users. In order to distinguish domestic traffic from foreign traffic, we divided the dataset into two additional datasets: the activity and foreign activity datasets, and visualized them independently on the map. This data set gave us valuable insight into the population pulse in radio base stations’ domain. The volume of telecom activity data for the observed time period was approximately 20GB large.

 

Telecom connectivity data sets indicate the amount of voice activity exchange between two pairs of radio base stations. It represents a directed graph between the originating and terminating antenna. The weight of the links can be determined by indicator:

  • Number of calls – number that is proportional to the exchanged calls between two antennas in predefined time window

In the data set, links that were initiated or terminated by a user who is using a service from a different telecom operator can be distinguished. The dataset was aggregated in a time resolution of one hour. The amount of communication exchange between different antennas shed light on a connection between different areas in the country. The volume of voice connectivity data is close to 40 GB.

 

Mobility dataset contains a set of telecommunication records (SMS In/Out, Call In/Out, Internet activity) performed by randomly selected anonymized users. Spatial or temporal aggregation was not applied. However, users are randomly reselected every two weeks for the period of six months due to standard privacy protection procedures for such data type. Connecting users’ telecommunications activities on different antennas across time gave us users’ mobility paths, which further indicated population migrations in the country. This data set was the largest and was approximately 850 GB large.

 

Besides telecom data sets, we utilized several other data sources to get better insights on population trends.

 

Corine land cover/land use data is a data set gathered from Copernicus Land Monitoring Service. Copernicus is a European system for monitoring the Earth. Data is collected by different sources, including Earth observation satellites and in-situ sensors. The data is processed and provides reliable and up-to-date information in six thematic areas: land, marine, atmosphere, climate change, emergency management and security. The land thematic data is divided into categories, of which Pan-European category provides land cover / land use (LC/LU) information through the CORINE Land Cover data, High Resolution Layers, Biophysical parameters and European Ground Motion Service. The CORINE Land Cover data are provided for 1990, 2000, 2006, 2012, and 2018. This vector-based dataset includes 44 land cover and land use classes. The time-series also includes a landchange layer, highlighting changes in land cover and land-use. The high-resolution layers (HRL) are raster-based datasets which provide information about different land cover characteristics and are complementary to land-cover mapping (e.g. CORINE) datasets.  We used data from 2018 and the volume of this set is approximately 200 MB.

 

Data sets from the OpenStreetMap (OSM), that is a popular crowdsourcing platform containing a huge database of geotagged information, have also been utilized. Particularly for the purpose of this analysis, from OSM we utilized points of interest (POI) and traffic road data. The former represents physical features on the ground which function is described with tags (e.g. shop, touristic location, public transport, etc.), while the latter contains line vector features that represent roads with their additional traits, such as road classification, speed limit, etc. These data sets indicate the development of municipalities through the traffic infrastructure, as well as a number of different tags contained in the POI data set. The volume of the traffic data set was 1 GB, while the POI data set was approximately 400 MB.

 

Official population data from 2011 (CENSUS 2011), as well as estimated population data from 2019, are obtained from the Statistical Office of the Republic of Serbia (www.stat.gov.rs/). Estimated population data are calculated based on the number of newborns and deceased, as well as tracking the number of people who changed their permanent address. However, it does not include people who changed their location but did not update their personal documents or went to live abroad. Both data sets are available at the municipal level. The difference in the population was calculated to serve as a baseline for comparison with indicators derived from all other data sources.

 

Take a look at the Data