Hands-on Tutorials
Who Does Smartphone Location Data Represent?
A rapid validation of Facebook data
This work has been done entirely using open data, and was co-authored with Kai Kaiser. All errors and omissions are those of the author(s).
In crisis situations, governments and international relief organizations have traditionally relied on administrative reporting and survey data. This data is used to inform the emergence, response, and recovery related to shocks, such as natural disasters or public health crises. However, although traditional data can be carefully collected and purpose-built for measuring various socio-economic dimensions of disasters, it faces limitations. It can be infrequent, costly to implement, subject to lag, and may not be representative at higher levels of spatial granularity. Moreover, systems in many developing countries are simply not equipped to measure and rapidly make this data available in the face of an unanticipated development.
The proliferation of smartphone devices has led to a paradigm shift in data collection
Devices containing GPS technology make it possible to measure human movement at an unprecedented scale and frequency. Increasingly, tech companies such as Facebook, Google, and Apple, as well as app integration or mapping providers, such as Unacast, Cuebiq and Mapbox, have made these data assets available to the humanitarian community, as part of a ‘data for good’ movement––accelerated by COVID-19. Although not purpose-built for risk management, datasets derived from location data are available at high spatial precision, with frequent (typically daily or even 8-hourly) updates. Moreover, they are often made available by the same provider for several countries at a time, a consistency in method and measurement that is uncommon in traditional data.
For policymakers relying on traditional mobility measures, this can be game-changing. For instance, some datasets are available at the level of grid tiles or pixels, typically about 1km x 1km in dimension, and can be aggregated upwards arbitrarily depending on areas of interest. This enables mapping granular mobility data to smaller and smaller lenses of interest, such as wards/communes in settings such as Viet Nam, Union Councils in Pakistan, or census tracts in the United States.
The make-or-break question: Who does this data represent?
If carrying a smartphone or GPS-based device is required to be included in this data, inevitably, the data will skew towards affordability. Additionally, in low-income countries, if only one family member has a smartphone (compared to the US, where a third of households have three or more smartphones), the data might also skew towards ‘breadwinners’, and therefore also be gendered. Frequent, accessible analysis of how representative GPS-based data is remains of utmost importance, especially if it is to be applied to risk mitigation policy.
The availability of globally available geospatial layers (population counts, gender and building footprints) makes it possible to quantify representativeness rapidly, and conduct comparative national validations of this data.
A deep dive into Facebook mobility data
Given the wealth of possible applications of mobility data for public policy, we apply a rapid and replicable appraisal method to baseline mobility data across different local contexts. This method is applied below to understand how representative Facebook traces are of local residents, as well as transient populations such as tourists or displaced persons. We chose Facebook data because it has a large community of users already applying it to answer policy questions.
Two major products Facebook provides for mobility analytics are discussed. First, Movement Range Maps measure mobility at the level of administrative units. This dataset can prove instrumental for rapid assessments of disaster impacts. Second, Facebook Disaster Maps provide counts of Facebook users not at the level of administrative boundaries but Bing tiles, with sizes that begin at 600m x 600m. These tiles can be aggregated upwards arbitrarily depending on areas of interest, such as in the case of Movement Range Maps, which are aggregated metrics built from the number of Bing tiles that Facebook users visit over a given time period.

Using Movement Range Maps, it is possible to conduct a quick validation step. We find that the Stay Put metric, or the “ratio of Facebook users staying in a single tile all day”, is highly correlated with residential visits data provided by Google Community Mobility Maps. Figure 1 illustrates this consistency for three Pacific countries: Vietnam, Indonesia and the Philippines.
In terms of global coverage, Google and Facebook mobility data tend to be correlated in most countries as illustrated in Figure 2. This indicates consistency in representation across datasets provided by tech companies, and motivates further investigation into how these datasets can be used in tandem.
Whilst this data necessarily does not include anyone who does not carry a smartphone, it does indicate one promising result: that across platforms, the spatial distribution of two major mobility datasets––Facebook Movement Range Maps and Google Community Mobility Reports––is highly correlated. This is an important step in the journey to build confidence using these datasets for humanitarian work.
How can interested users access this data?
Movement Range Maps is an open dataset available at the district level. As more and more mobility products are made available to the humanitarian community in the spirit of data for good, it will become foundational to conduct rapid validation exercises to make this data operational.