In this month’s research spotlight, COSMOS highlights two recent studies that both model and investigate the global refugee crisis. The paper by Dr. Esther Mead, Dr. Maryam Maleki, Dr. Mohammad Arani, and Dr. Nitin Agarwal titled “Proposing Location-based Predictive Features for Modeling Refugee Counts,” used features to identify locations of refugee crisis. Similarly, the paper by Kazi Tanvir Islam, Dr. Esther Mead, and Dr. Nitin Agarwal titled “Impact of Data Imputation on the Performance of Global Refugee Modeling Dataset” used data imputation assisted by machine learning AI to make predictions of refugee populations when there were missing predictive values. 

These research studies explore the various ways trends in refugee populations can be predicted. In particular, they both look at whether these populations can be predicted through sociocultural, socioeconomic, and economic features.

The first study found some location-based features that stood out in prediction. The authors found that the global peace index, access to electricity, access to water, media censorship, and healthcare stood out as features for predicting refugee populations. Additionally, they found their model predicted best for European countries, most notably those that had data for features of justice and homicide rates. The model also showed features that were especially predictive for certain locations, such as corruption features for African and Asian countries and population features for the Americas.

The second study then looked at how models can predict when there is an absence of data, namely models that account for missing values by using data imputation. They compared four machine learning methods with the three traditional data imputation techniques of mean, median, and mode. After taking a real-world dataset and removing values, each of the seven methods was applied to the dataset and then compared to the real-world dataset with no missing values. What they found was that stochastic regression—a machine learning method that replaces values by following a linear regression, or trends—achieved the highest accuracy. 

“Together, these studies show how computational methods can be applied to predict real-world refugee populations. Policy makers, human rights organizations, and general researchers can use these models to predict population fluctuations related to refugee populations”, said Dr. Agarwal.