Digital Connectivity in sub-Saharan Africa: A Comparative Perspective1
Author:
Mr. Emre Alper
Search for other papers by Mr. Emre Alper in
Current site
Google Scholar
Close
and
Michal Miktus https://isni.org/isni/0000000404811396 International Monetary Fund

Search for other papers by Michal Miktus in
Current site
Google Scholar
Close

Contributor Notes

Author’s E-Mail Address: EAlper@imf.org, MMiktus@imf.org

Higher digital connectivity is expected to bring opportunities to leapfrog development in sub-Saharan Africa (SSA). Experience within the region demonstrates that if there is an adequate digital infrastructure and a supportive business environment, new forms of business spring up and create jobs for the educated as well as the less educated. The paper first confirms the global digital divide through the unsupervised machine learning clustering K-means algorithm. Next, it derives a composite digital connectivity index, in the spirit of De Muro-Mazziotta-Pareto, for about 190 economies. Descriptive analysis shows that majority of SSA countries lag in digital connectivity, specifically in infrastructure, internet usage, and knowledge. Finally, using fractional logit regressions we document that better business enabling and regulatory environment, financial access, and urbanization are associated with higher digital connectivity.

Abstract

Higher digital connectivity is expected to bring opportunities to leapfrog development in sub-Saharan Africa (SSA). Experience within the region demonstrates that if there is an adequate digital infrastructure and a supportive business environment, new forms of business spring up and create jobs for the educated as well as the less educated. The paper first confirms the global digital divide through the unsupervised machine learning clustering K-means algorithm. Next, it derives a composite digital connectivity index, in the spirit of De Muro-Mazziotta-Pareto, for about 190 economies. Descriptive analysis shows that majority of SSA countries lag in digital connectivity, specifically in infrastructure, internet usage, and knowledge. Finally, using fractional logit regressions we document that better business enabling and regulatory environment, financial access, and urbanization are associated with higher digital connectivity.

I. Introduction

1. Large uncertainties characterize the major trends determining the future of work in sub-Saharan Africa (SSA) and connectivity is a key policy area. With its population projected to reach about 1.7 billion by 2040 from 1.0 billion currently, the United Nations projects a net increase in the working-age population (15–64 years) in SSA of about 20 million people per year. The need to generate 20 million jobs per year during the next two decades will be the key challenge facing policy makers in the SSA. The October 2018 Regional Economic Outlook for sub-Saharan Africa (IMF, 2018) identifies connectivity as a key policy area to promote job creation and yield dramatic improvements in living conditions. Connectivity goes beyond the need for traditional physical infrastructure of roads, railways, and ports, which is currently the focus of most country investment plans.

2. SSA countries also need to be digitally connected to take advantage of technical change and growth opportunities. Higher digital connectivity, coupled with an improved business climate, strong investment in people’s education and health, and good governance would deliver digital dividends, deemed critical for the twenty-first century workplace (World Bank, 2016). Experience within the region demonstrates that if there is an adequate digital infrastructure and a supportive business environment, new forms of business spring up and create jobs for the educated as well as the less educated.2 The region has been investing heavily in Information, Communications and Technology (ICT) infrastructure, including most recently, internet and mobile-cellular signal coverage.

3. Nonetheless, the quantity of infrastructure per se is only a part of the challenge. It is also important to consider the quality of infrastructure and its costs to users. Ongoing efforts to reform the policy and regulatory frameworks to make broadband access more affordable, accessible and universal, needs to be accompanied by skills development to fully exploit the technological advancement benefits. Finally, the population’s capacity to access the Internet, including cultural acceptance, supporting policy, and availability of smartphones and computers at the household level are all necessary factors.

4. We investigate the current state of play in digital connectivity in the SSA from a comparative perspective and analyze the drivers of heterogeneity across countries. The paper improves upon previous work by (i) assessing a significantly higher number of ICT indicators; (ii) using the most recent (2016–17) available data for a comprehensive set of countries based on data availability (193 economies); and (iii) applying several methodologies, including machine learning techniques to investigate the existence of global digital divide and to formulate a composite index across countries.3 Using fractional logit regressions with the aforementioned index as the dependent variable, the paper assesses the relative importance of various factors on digital connectivity, including SDG indicators, variables that characterize countries’ business and regulatory environment, risk, transparency and corruption perceptions, as well as the usual set of macroeconomic indicators.

5. We first employ unsupervised machine learning algorithms: (i) clustering technique of k-means to assess the existence of global digital divide; and (ii) dimensionality reduction, via principle components analysis (PCA), to investigate variation in digital connectivity. In this initial step, we do not impose any modelling structure on the available ICT variables. To put this differently, we let the data speak for itself and determine the optimal number cluster(s) and the countries in each cluster based on elbow technique, silhouette method, and gap statistics. The intention of the principal components analysis (PCA) is to motivate the composite index of digital connectivity by checking if a quasi-linear technique of dimensionality reduction would be a good approximation.

6. The paper constructs a digital connectivity index by imposing a modelling structure on the ICT variables and grouping them under five fundamental categories. These are key sub-indices summarizing a country’s ability to access ICT in line with those used by International Telecommunication Union (ITU). The five categories are (i) infrastructure; (ii) knowledge; (iii) affordability; (iv) quality; and (v) actual internet usage. These sub-indices are then aggregated in a single composite index through the Mazziotta-Pareto methodology (De Muro and others, 2011), allowing us to summarize a set of individual indicators that are assumed to be not fully substitutable. We construct the composite digital connectivity index, Enhanced Digital Access Index (EDAI), by using an improved aggregation technique, expand the variables which inform the index, and use the most recent available data for a larger number of countries relative to the Digital Access Index (DAI), launched by International Telecommunication Union, ITU, in 2003. Next, we use the EDAI to explore the drivers of digital connectivity variation in the world, including in SSA, by checking relative strengths and weaknesses across the five dimensions. The index can be used by policymakers to assess the level of preparedness for their countries for the Fourth Industrial Revolution.

7. Finally, we estimate fractional logit models for the full sample and other country groupings, including SSA, to assess the relationship of various factors with digital connectivity. We use over 100 independent variables and use step-wise regressions to reduce the number of explanatory variables by minimizing the quasi Akaike Information Criterion. Besides macroeconomic indicators, explanatory variables include various indicators related to the Sustainable Development Goals (SDGs), ease of doing business, regulatory environment, transparency, country risk, employment, climate, and corruption perceptions. Upon narrowing the number of independent variables, we estimate the same model for Advanced Economies (AEs), Emerging Market and Middle-Income Countries (MICs), Low Income Developing Countries (LIDCs), and the SSA economies to verify whether these variables are robust across different country groupings. Finally, only for SSA economies, we also check which components of the Country Policy and Institutional Assessment (CPIA) variables affect digital connectivity, while controlling for per capita income.

8. Our results indicate the existence of global digital divide and significant correlations between the business and regulatory environment and digital connectivity. The results suggest there is room for policy action to improve connectivity by addressing these. Specifically, the results indicate that there is a global digital divide, with a clustering of countries into three main groups; (ii) the variation in digital connectivity across countries can be broadly approximated by the first principal component, motivating a quasi-linear index to construct the index of digital connectivity; (iii) there is a significant heterogeneity in digital connectivity across different analytical country groupings based on income and geography; (iv) the majority of SSA countries lag behind in digital connectivity, with the exception of Botswana, Cabo Verde, Gabon, Lesotho, Mauritius, Seychelles, and South Africa and LIDCs such as Ghana and Rwanda; (v) among the five dimensions, SSA countries on average perform well in terms of affordability and quality, but do less well on infrastructure, internet usage, and knowledge; and finally (vi) fractional logit regressions underscore the importance of the regulatory and business enabling environment, higher urbanization and urban access to electricity for digital connectivity. Estimation results for SSA indicate that better business enabling and regulatory environment, financial access, urbanization, and availability of postal services are associated with higher digital connectivity. Specifically, we find that leveling the playing field for female entrepreneurs and reducing property registration costs are positively related to higher digital connectivity.

9. The structure of the paper is as follows: Section II introduces the unsupervised machine learning algorithm of k-means and its implementation on the ICT data, both for the world and the SSA. Section III details the construction of the composite EDAI through the Mazziotta-Pareto methodology, and provides robustness verifications in the form of equal and progressive weighting schemes. Section IV presents the results based on the fractional logit regressions to explore the factors which correlate strongly with digital connectivity across countries. Section V concludes.

II. Estimating the Digital Divide: Unsupervised Machine Learning

10. In the following section we describe the ICT dataset used in estimations and the methodology to investigate patterns across countries. Specifically, we describe the data used to construct the composite index of digital connectivity and the rationale for using the k-means algorithm for the investigation of the ICT adoption. The primary data source is the World Telecommunication/ICT Indicators Database, augmented by the UN E-Government Survey and UNESCO Institute for Statistics (UIS) database. The list of variables, their detailed definitions, and sources are described in Appendix I. Three-digit ISO codes of the 193 countries, based on data availability and their analytical groupings based on income, are presented in Appendix II. Throughout the paper all estimations are done using R software and averages for various analytical groups are calculated as the weighted averages using as weights PPP GDP shares based on the World Economic Outlook database (IMF, 2019a).

12. Unsupervised machine learning algorithms infer patterns from a dataset without imposing labels. Therefore, unlike their supervised equivalents, they cannot be directly applied to a regression or a classification problem. Nevertheless, they are useful to help discover the underlying structure of the data and thus they are often performed as part of an exploratory data analysis. Two of the main techniques implemented in unsupervised learning are principal component and cluster analysis. The latter serves to group or segment datasets with shared attributes in order to extrapolate algorithmic relationships. This technique identifies commonalities in the data and, as more data is brought into the analysis, reacts to the presence or absence of such commonalities in each new piece of data.

13. We implement one of the most popular unsupervised learning clustering algorithms is K-means clustering. It aims to divide n observations into k distinct, non-overlapping clusters in which each observation belongs to the cluster with the nearest mean, serving as a prototype of the cluster (James and others, 2013). We note that the K-means algorithm finds a local rather than a global optimum, thus the results depend on the initial (random) cluster assignment of each observation. Therefore, we run the algorithm 100 times for different random initial configurations and select the best solution, i.e. that minimizes equation 1 in Appendix III. The algorithm requires the specification of the number of clusters. We use the elbow method (Thorndike, 1953), average silhouette technique (Rousseeuw, 1987) and gap statistics (Tibshirani, 2000) in deriving the proper K.4,5

14. The K-means algorithm broadly groups countries into three general classes (Figure 1).6 The K-means unsupervised machine learning algorithm for the ICT variables for the world, taking the optimal number of clusters to be equal three, results in the grouping presented in Figure 1. The robustness checks for different cluster specifications (K equaling 2 and 4) yields similar results. The list of countries and their clusters are included in Appendix IV.

Figure 1.
Figure 1.

Digital Connectivity: country groupings under K=3 clusters

Citation: IMF Working Papers 2019, 210; 10.5089/9781513514604.001.A001

Sources: ITU’s ICT Indicators database, UN’s E-Government Survey, UNESCO’s UIS and authors’ calculations.

15. PCA analysis reveals that digital connectivity can be quantified by a composite, quasi-linear index.7 We implement PCA on the available ICT dataset and then plot the above-mentioned groups as the function of the first two components8 (Figure 2). The results suggest that the heterogeneity in digital connectivity can be largely explained by the differences in values of the first principal component, which in turn explains nearly half of the sample variation. Due to the linearity of the executed dimension reduction method, the results justify the quasi-linear composite index of digital connectivity, introduced in the next section.

Figure 2.
Figure 2.

K-means clustering under K=3, PCA

Citation: IMF Working Papers 2019, 210; 10.5089/9781513514604.001.A001

Sources: ITU’s ICT Indicators database, UN’s E-Government Survey, UNESCO’s UIS and authors’ calculations.

III. Constructing an Index of Digital Connectivity: EDAI

16. In this section we describe the methodology for constructing the Enhanced Digital Access Index (EDAI) and its five sub-indices. In addition to the results for various analytical country groupings, we check for the robustness of the outcomes by applying different weighting schemes: equal, gradual and Wroclaw. Furthermore, we present the radar graphs for several country groupings, such as oil exporters or economies in fragile situations to identify the main strengths and weaknesses in digital connectivity to inform potential areas for improved ICT adoption. The EDAI is constructed following the methodology of the Digital Access Index, introduced below which takes a holistic approach reflecting the highly complementary usage of modern technologies. We use the same data source in this section as in the unsupervised machine learning section (Appendix II).

A. Digital Access Index (DAI)

17. DAI was launched by the International Telecommunication Union in 2003 to inform the ability of each country’s population to take advantage of ICT.9 It was calculated for 178 economies, as a composite score of eight variables describing five categories: availability of infrastructure, affordability of access, educational level, quality of information and communication technology services, and internet usage. The variables measure access to and usage of ICT as well as education level of the population. Each variable is converted to an indicator with a value between zero and one by dividing it by the maximum value or “goalpost”. Each indicator is then weighted within its category and the resulting category index values are averaged to obtain the overall DAI value. Each category is of equal importance, although some variables within categories are assigned unequal weights. The overall weighting scheme is presented in Appendix III.

18. Based on the constructed DAI in 2003, SSA countries lag other regions in digital access (Figure 3). Higher values of DAI are positively related to the Purchasing Power Parity (PPP) based Gross Domestic Product (GDP) per capita. Based on 2003 values, with the exception of Canada, ranked 10th, the top ten economies are exclusively Asian and European, while the lowest ranked economies in terms of ICT adoption were from the SSA.

Figure 3.
Figure 3.

DAI distribution

Citation: IMF Working Papers 2019, 210; 10.5089/9781513514604.001.A001

Sources: ITU’s ICT Indicators database, UN’s terminology database, and authors’ calculations.

B. Enhanced Digital Access Index (EDAI)

19. The Digital Access Index (2003) has several shortcomings to assess the current global digital divide. First, it arbitrarily imposes equal weights during aggregation. This implies a strong assumption that all indicators within each sub-index are perfectly substitutable and similarly that all sub-indices are fully substitutable. This assumption may have weak theoretical justification given the disparity among the sub-indices (for instance educational level and infrastructure). Second, availability of indicators that define the digital connectivity has expanded since the launch of the DAI in 2003 and thus the Index is currently informed by a constrained set of variables. Finally, DAI is outdated and constructed for a slightly smaller number of countries.

20. We propose a new composite index to measure digital connectivity: EDAI. In the same spirit of the methodology of DAI, we consider five sub-categories of digital connectivity: availability of infrastructure, affordability of access, educational level of the population, quality of information and communication technology services, and internet usage. We augment the DAI with recently available indicators included in the ICT Development Index of ITU, and the indicators from the Digitization Index (Katz and others, 2014) for a larger set of countries. We then rescale all indicators to a [0, 100] interval, following the methodology of Inclusive Internet Index, by Facebook and Economist, through the following transformation:

X N e w = X O l d min ( X O l d ) max ( X O l d ) min ( X O l d ) 100

21. Using the rescaled variables and the five sub-indices, we construct EDAI values for each country by the Mazziotta-Pareto Index (MPI) methodology. An advantage of the MPI aggregation methodology is that it avoids artificially imposing equal weighs used in constructing the DAI as well as World Bank’s Digital Adoption Index, or the progressive scheme used in constructing the Inclusive Internet Index. This technique is based on a quasi-linear function that introduces a penalty for the units with unbalanced values, starting from the arithmetic mean of the normalized indicators.10 To be more precise, the composite index is given by a similar rescaling to [0, 100] interval:

M P I i = M i ( 1 C V i 2 ) = M i S i C V i

where Mi, CVi, Si denote respectively the i-th country mean, coefficient of variation, and its standard deviation. MPI is designed to normalize the indicators by a specific criterion that (i) normalizes values by a specific criterion and hence deletes the unit of measure and the variability effects; (ii) provides the synthesis independent from an ideal unit; and (iii) simplify the computations.11 Country specific values for EDAI and its sub-indices are in Appendix VI.

22. The EDAI index provides further evidence of a global digital divide, but compared to 2003, the gap seems to be narrowing (Figure 4).12 Keeping in mind caveats regarding differences in the aggregation method and the indicator set, we note the worldwide progress in terms of digital connectivity over the last 15 years. The diffusion of technology is clearly a global phenomenon that has witnessed a steep increase in the number of people connected, rather than remaining a privilege held by a few wealthy nations. SSA countries still lag other country groupings with the high digital connectivity achieved by Western European economies. Within the SSA, countries including Cabo Verde, Mauritius, Seychelles, and South Africa remain at the top in digital connectivity, while many other nations continue to lag. We therefore view these results as providing further evidence in favor of the digital divide hypothesis, in line with the results from the unsupervised machine learning algorithm. Nevertheless, the digital divide seems to be narrowing: the SSA distance-to-frontier (DTF) in 2003 was equal to 45.1, while it reduced to 17.3 in 2017.13 Reliance on DTF to provide evidence in favor of narrowing digitalization gap mitigates caveats due to differences in coverage, methodology, and variables in calculating the DAI and EDAI.

Figure 4.
Figure 4.

Evolution of Digital Connectivity: from DAI to EDAI

Citation: IMF Working Papers 2019, 210; 10.5089/9781513514604.001.A001

Sources: ITU’s ICT Indicators database, UN’s E-Government Survey, UNESCO’s UIS and authors’ calculations.

23. Heterogeneity in digital connectivity is related to income and geography. We group the EDAI values in deciles and plot them on the world map (Figure 5). North America, Europe, Western Asia, and Australia and Oceania rank highest in digital connectivity, while SSA and LIDCs in general lag in digital connectivity. This outcome is consistent with the observation that geographic location is in fact fundamentally important to issues concerning the digital divide (Grubesic and Murray, 2005) This provides evidence against the hypothesis that advances in ICT will render geographical divides as irrelevant.14

Figure 5.
Figure 5.

EDAI world distribution

Citation: IMF Working Papers 2019, 210; 10.5089/9781513514604.001.A001

Sources: ITU’s ICT Indicators database, UN’s E-Government Survey, UNESCO’s UIS and Authors’ calculations.

24. EDAI distribution differs between AEs, MICs, and LIDCs (figure 6).15 Figure 6 illustrates that the AEs perform better in terms of digital connectivity relative to the MICs, which in turn perform better relative to the LIDCs. Moreover, the distribution is much more dispersed for LIDCs.

Figure 6.
Figure 6.

EDAI distribution across AEs, MICs and LIDCs

Citation: IMF Working Papers 2019, 210; 10.5089/9781513514604.001.A001

Sources: ITU’s ICT Indicators database, UN’s E-Government Survey, UNESCO’s UIS and Authors’ calculations.

25. The digital divide is more prominent for the individual components of EDAI. Table 1 presents digital connectivity for the world, AEs, MICs, and LIDCs, and country values within each group, weighed by their PPP based GDPs. AEs lead in every single digital connectivity sub-index, with most prominent divergences arising in Knowledge and Internet Usage categories.

Table 1.

Sub-indices values

article image
Source: Authors’ calculations.

C. EDAI for SSA

26. SSA exhibits heterogeneity in digital connectivity. The digital divide literature tended to focus on the differences between Africa and the industrialized world, with inadequate attention to the heterogeneity in the region (Onyeiwu (2002)). Figure 7 focuses on the deciles for SSA only. It indicates the heterogeneity within SSA, with countries like Botswana, Cabo Verde, Gabon, Ghana, Lesotho, Mauritius, Rwanda, Seychelles, and South Africa ranking highest in the region in digital connectivity.16

Figure 7.
Figure 7.

EDAI SSA distribution

Citation: IMF Working Papers 2019, 210; 10.5089/9781513514604.001.A001

Sources: ITU’s ICT Indicators database, UN’s E-Government Survey, UNESCO’s UIS and authors’ calculations.
Figure 7.
Figure 7.

EDAI sub-indices

Citation: IMF Working Papers 2019, 210; 10.5089/9781513514604.001.A001

Sources: ITU’s ICT Indicators database, UN’s E-Government Survey, UNESCO’s UIS, and Authors’ calculations.

27. In terms of sub-indices, SSA lags other regions mainly in digital infrastructure, internet usage, and human capital. Figure 8 presents the comparison of SSA to the world and LIDCs. We observe that digital connectivity in SSA is similar to the LIDC average. In terms of the five dimensions, SSA and LIDCs are close to the rest of the world in terms of quality (speed of connection) and affordability but lag in infrastructure, internet usage, and knowledge. The comparable outcome for affordability in SSA relative to the rest of the world reflects similar SMS and internet prices in US$ across countries17. Likewise, the analogous outcomes for the quality sub-index are due to the variables that measure maximum download speeds, relatively alike globally.

Figure 8a.
Figure 8a.

EDAI for different groups of SSA economies

Citation: IMF Working Papers 2019, 210; 10.5089/9781513514604.001.A001

Source: ITU’s ICT Indicators database, UN’s E-Government Survey, UNESCO’s UIS, and Authors’ calculations.

28. Income differences can only partially account for the observed heterogeneity in digital connectivity across SSA countries. Countries with similar socio-economic backgrounds continue to diverge in digital connectivity.18 For example, Liberia and Lesotho, with comparable GDP PPP per capita and with relatively low rankings in the UN’s Human Development Index, possess considerably different EDAI values (50 and 70 respectively). The aforementioned Enhanced Digital Access Index difference is comparable to the discrepancies between Japan (85) and North Korea (65) or among USA (95) and Ghana (75)).

29. Economic structure can also partially explain heterogeneity in digital connectivity. Non-resource intensive economies (mostly agricultural and commodity exporters) lag oil-exporters and other resource-intensive exporters in terms of digital connectivity, specifically quality, internet usage, and knowledge (Figure 8). This may reflect insufficient funds to invest in digital infrastructure in non-resource intensive economies. Similarly, countries classified as experiencing fragility, as defined by violence and political instability, lag in terms of knowledge.19 This underscores the potential gains in connectivity by raising education levels in these economies.

Figure 8b.
Figure 8b.

EDAI for different groups of SSA economies

Citation: IMF Working Papers 2019, 210; 10.5089/9781513514604.001.A001

Source: ITU’s ICT Indicators database, UN’s E-Government Survey, UNESCO’s UIS, and Authors’ calculations.

IV. Estimating the Drivers: Fractional Logit Regressions

30. We use fractional logit regressions to explore the variables associated with digital connectivity. This method acknowledges the fractional nature of the dependent variable, can be employed for both discrete and continuous variables, and is capable of handling the extreme values of 0 and 1 without having to manipulate the data ((Papke and Wooldridge, 1996; Baum, 2008; and Mullahy, 2010). Moreover, fractional logit models allow one to capture non-linear relationships, particularly when the outcome variable is near 0 or 1 (Ramalho and others, 2011). The description of the model can be found in Appendix VIII.

31. We implement step-wise regressions in light of the large number of potential explanatory variables. The methodology consists of iteratively adding and removing regressors to find a subset of variables resulting in the best performing model. Performance is based on the model which minimizes the quasi-Akaike Information Criterion. Stepwise regression is useful for high-dimensional data containing multiple predictor variables. Alternative methods were also considered, such as penalized regression (ridge regression, lasso regression, elastic net) and principal components-based regression methods (Principal Component Regression, PCR, and Partial Least Squares, PLS). Nonetheless, penalized regression method can select variables correlated with each other, which may reduce interpretability (Takada and others, 2018). Similarly, the principal components options can be an effective tool for reducing dimensionality in problems where many variables are measured, particularly when there are strong linear relationships among the variables. Nonetheless, to interpret the principal components, one must filter through the coefficients (or loadings) of the linear combinations and identify patterns. This can be quite challenging in problems with many variables, which is precisely when principal components are in fact most helpful (Chipman and Gu, 2005). Thus, to facilitate interpretation, we implement the stepwise fractional logit regressions.20

32. The selected explanatory variables (127 in total) can be classified under 18 broad thematical groups. These variables are chosen with a view to provide an expansive coverage of potential factors related to digital connectivity. These include indicators related to the ease of doing business and rule of law, based on the prior literature that emphasized the role of national governments in Africa in framing ICT sector policies for investment, privatization, deregulation, and providing access in underserved areas (Sarkar and others, 2015). Furthermore, we include variables related to demographics, employment, education, and health.

Table 2.

Data for Fractional Logit Regression

article image
Sources: Available in Appendix VII and authors’ calculations.

33. We conducted a precursor check on bivariate correlations. We considered absolute values of pair-wise correlations. If two variables are found to be highly correlated (with absolute correlation coefficient over 0.9) then we calculated mean absolute correlation of each variable with the others and removed the one with the highest mean absolute correlation.

34. We first estimate stepwise fractional logit model for the full sample (Table 3, column 3).21,22 Through minimizing the quasi-Akaike Information Criterion, the total number of explanatory variables is reduced from 110 to 14.23 Similar to Sarkar and others (2015), our empirical results provide evidence in favor of the role of policies for digital connectivity. In the full sample regression, better business enabling and regulatory environment with higher tax revenue yield are associated higher digital connectivity as is higher share of renewable energy share in total energy production. Moreover, higher urban access to electricity, and in general urbanization as well as lower dependency on remittances seem to matter for digital connectivity while controlling for income per capita and digital connectivity.24 Private consumption is also positively related to digital connectivity likely capturing affordability and the availability of smart phones and personal computers to access to internet. Finally, ICT adoption seems to be related to the overall level of development, proxied by the proportion of remittances in the GDP, access to electricity and renewable energy consumption. The logit estimated coefficients can be transformed to identify as changes in odds. Assuming away the potential endogeneity issue, for the full sample, a decrease of 1 percentage point in share of rural population leads to the exp(-0.656) – 1, i.e., 0.48 odds increase in digital connectivity. Average share of rural population in the world is about 40 percent compared to 21 percent average in AEs. If the world average is to halve to 20 percent, i.e., a 20 percentage points decline, the odds of higher digital connectivity would rise by 0.09.

Table 3.

Fractional Logit Regression Results

article image
Note: *p<0.1; **p<0.05; ***p<0.01 The coefficients report changes in the odds ratio: value greater than 0 indicates increase in the odds ratio relative to the unconditional odds. Standard errors are reported in parentheses. All regressions control for per capita GDP in PPP terms, while the world regression is additionally controlled for the geographical regions. The variables definitions are enclosed in Appendix VII.

35. The fractional logit regressions estimated for income-based country groupings reveals heterogeneity regarding the drivers of digital connectivity. The results for AEs (Column 4) reveal that only account ownership is significantly related to digital connectivity, suggesting reforms to enhace financial access for the AEs, such as further promotion of FinTech industry or full digital financial transformation through mobile banking apps, mobile money or e-wallets Conversely, estimation results for MICs (column 5) emphasizes the positive association with better regulatory and business enabling environment, better logistics, and a higher tax revenue capacity. Finally, regression results for LIDCs (column 6) underscore the importance of higher electricity access in cities as well as improved financial access and business facilitating environment for higher digital connectivity. All these interesting results require further exploration. The lower dependence on remittances and higher private consumption expenditures could reflect better affordability in form of access to smart phones and personal computers for internet usage as mentioned earlier.

36. Results for the SSA-specific sample (column 7) further underscore the importance of a better business enabling and regulation environment, financial access, and urbanization . This includes leveling the playing field for female entrepreneurs and investing in better government services and in people’s health and providing better regulatory environment. Indeed, controlling for income per capita, higher percentage of population without postal services and lack of health regulations seem to adversely affect digital connectivity. Assuming away the potential endogeneity issue, for the SSA sample, a decrease of 1 percentage point in share of rural population leads 0.77 odds increase in digital connectivity. Average share of rural population in SSA is about 57 percent on average compared to 20 percent average in AEs. If the SSA average reduces to 21 percent, i.e., a 36 percentage points decline, the odds of higher digital connectivity would rise by 0.17. In the similar fashion, an increase of 1 percentage point in financial access as measured by account ownership leads to 3.1 odds increase in digital connectivity. Average share of account ownership is 41 percent in SSA compared to 95 percent in AEs. Hence if SSA account ownership were to improve to AE levels, odds of higher digital connectivity would rise by 1.67.

37. The differences between regression results for LIDCs and SSA require further investigation. Access to electricity and property registration are significant for LIDCs, but surprisingly not for SSAs. On the other hand, health regulation capacity, percentage of rural population, and population without postal services are significant SSAs, but not for LIDCs. This discrepancy could reflect for instance inclusion of MICs in the SSA sample or geographic differences among LIDCs not captured by the dummy variables.

38. Results from logit regression using only CPIA for SSA indicate importance of responsive governance. Finally, we perform fractional logit regressions for the SSA sample using the Country Policy and Institutional Assessment (CPIA) indicators (See Appendix VII for the 16 indicators) while controlling for the per capita income to assess the variation in the EDAI for SSA. Among the 16 CPIA indicators, only the CPIA environmental sustainability rating indicator survive the stepwise regression estimation (Table 4).25 With the exception of oil exporters, the environmental sustainability rating variable is robustly related to digital connectivity. This result likely reflects importance of responsive governance by the authorities and needs to be explored further.

Table 4.

Fractional logit regression results for CPIA in SSA

article image
Note: *p<0.1; **p<0.05; ***p<0.01 The coefficients report changes in the odds ratio: value greater than 0 indicates increase in the odds ratio relative to the unconditional odds. Standard errors are reported in parentheses. The regression is controlled for the per capita GDP in PPP terms.

V. Conclusion

39. Digital connectivity is a key policy area to promote job creation and yield dramatic improvements in living conditions. This is an especially relevant concerns for SSA, given that the region needs to generate 20 million jobs per year in the next two decades. In this paper, we provide an input into this debate by creating a global index of digital connectivity (EDAI) using more recent data and better fitting methodologies to assess the current stance of digital connectivity in SSA from a comparative perspective and main drivers. The EDAI can be used by policymakers to assess the level of preparedness for their countries for the Fourth Industrial Revolution.

40. Our results indicate the existence of global digital divide and a substantial lag in connectivity in SSA. Specifically, we find,

  • Evidence in favor of global digital divide by clustering countries into three main groups;

  • Significant heterogeneity in digital connectivity across different analytical country groupings based on income and geography;

  • Descriptive analyses based on the EDAI suggests that the majority of SSA countries lag in digital connectivity, the exceptions include MICs such as Botswana, Cabo Verde, Gabon, Lesotho, Mauritius, Seychelles, and South Africa and LICDs such as Ghana and Rwanda;

  • Among the five dimensions, SSA countries on average perform well in affordability and quality, but lag in infrastructure, internet usage and knowledge; and finally

  • Fractional logit regressions underscore the importance of the business enabling regulatory environment for improved digital connectivity. Higher urbanization, financial access, share of investment and private consumption, share of renewable energy are also associated with digital connectivity

  • Estimation results for SSA indicate that better business enabling and regulatory environment, financial access, urbanization, and availability of postal services are associated with higher digital connectivity. Specifically, we find that leveling the playing field for female entrepreneurs and reducing property registration costs are positively related to higher digital connectivity.

41. Concluding, we acknowledge that the channels through which the variables affect connectivity, as well as the avenues to address endogeneity concerns need to be explored in further research. The EDAI can allow the multi-year analysis, which should be regarded as the valuable extension in the future, with higher data availability.

References

  • Abdychev, A., Alonso, C., Alper, E., Desruelle, D., Kothari, S., Liu, Y., Perinet, M., Rehman, S., Schimmelpfennig, A., Sharma, P., 2018. The Future of Work in Sub-Saharan Africa, International Monetary Fund African Department Paper, No. 18/18, Washington, D.C.

    • Search Google Scholar
    • Export Citation
  • Baum, C. F., 2008. Stata tip 63: Modeling Proportions. The Stata Journal, 8(2), pp. 299303.

  • Chinn, M.D. and Ito, H., 2006. What Matters for Financial Development? Capital Controls, Institutions, and Interactions. Journal of Development Economics, 81(1), pp.163192.

    • Search Google Scholar
    • Export Citation
  • Chipman, H.A. and Gu, H., 2005. Interpretable Dimension Reduction. Journal of Applied Statistics, 32(9), pp. 969987.

  • Cullen, R., 2001. Addressing the Digital Divide. On-line Information Review, 25(5), pp. 311320.

  • De Muro, P., Mazziotta, M. and Pareto, A., 2011. Composite Indices of Development and Poverty: An Application to MDGs. Social Indicators Research, Vol.104, No. 1, pp. 118.

    • Search Google Scholar
    • Export Citation
  • Grubesic, T.H. and Murray, A. T., 2005. Geographies of Imperfection in Telecommunication Analysis. Telecommunications Policy, 29 (1), pp.6994.

    • Search Google Scholar
    • Export Citation
  • Gruss, B., and Kebhaj, S., 2019. Commodity Terms of Trade: A New Database, IMF WP/19/21, Washington, D.C.

  • Hardin, J. W., and Hilbe, J.M. 2007. Generalized Linear Models and Extensions. Stata Press.

  • Hjort, J., and Poulsen, J. 2019. The Arrival of Fast Internet and Employment in Africa, American Economic Review, 109(3), pp. 103279.

    • Search Google Scholar
    • Export Citation
  • International Monetary Fund, 2018. Regional Economic Outlook. Sub-Saharan Africa: Capital Flows and the Future of Work. Chapter 3. The Future of Work in Sub-Saharan Africa, Washington, D. C., October.

    • Search Google Scholar
    • Export Citation
  • International Monetary Fund, 2019a. World Economic Outlook: Growth Slowdown, Precarious Recovery. Statistical Appendix, Washington, D. C., April.

    • Search Google Scholar
    • Export Citation
  • International Monetary Fund, 2019b. Fiscal Monitor: Curbing Corruption. Methodological and Statistical Appendix, Washington, D. C., April.

    • Search Google Scholar
    • Export Citation
  • International Monetary Fund, 2019c. Regional Economic Outlook: Sub-Saharan Africa: Recovery Amid Elevated Uncertainty. Background Paper and Expanded Statistical Appendix, Washington, D. C., April.

    • Search Google Scholar
    • Export Citation
  • International Telecommunication Union (ITU), 2003. ITU World Telecommunication Development Report: Access Indicators for the Information Society, Digital Access Index.

    • Search Google Scholar
    • Export Citation
  • International Telecommunication Union (ITU), 2017. ICT Development Index (IDI).

  • James, G., Witten, D., Hastie, T. and Tibshirani, R., 2013. An Introduction to Statistical Learning. Springer.

  • Katz, R., Koutroumpis, P. and Martin Callorda, F., 2014. Using a Digitization Index to Measure the Economic and Social Impact of Digital Agendas. Info, 16(1), pp. 3244.

    • Search Google Scholar
    • Export Citation
  • Lenert, M.C. and Walsh, C. G., 2018. Balancing Performance and Interpretability: Selecting Features with Bootstrapped Ridge Regression. In AMIA Annual Symposium Proceedings (Vol. 2018, p. 1377). American Medical Informatics Association.

    • Search Google Scholar
    • Export Citation
  • Mullahy, J., 2015. Multivariate Fractional Regression Estimation of Econometric Share Models. Journal of Econometric Methods, 4(1), pp. 71100.

    • Search Google Scholar
    • Export Citation
  • Norris, P., 2001. Digital Divide: Civic Engagement, Information Poverty, and the Internet Worldwide. Cambridge University Press.

  • Onyeiwu, S., 2002. Inter-Country Variations in Digital Technology in Africa: Evidence, Determinants, and Policy Implications (No. 2002/72). WIDER Discussion Papers//World Institute for Development Economics (UNU-WIDER).

    • Search Google Scholar
    • Export Citation
  • Papke, L.E. and Wooldridge, J. M., 1996. Econometric Methods for Fractional Response Variables with an Application to 401(k) Plan Participation Rates. Journal of Applied Econometrics, 11(6), pp. 619632.

    • Search Google Scholar
    • Export Citation
  • Ramalho, E. A., Ramalho, J.J. and Murteira, J. M., 2011. Alternative Estimating and Testing Empirical Strategies for Fractional Regression Models. Journal of Economic Surveys, 25(1), pp.1968.

    • Search Google Scholar
    • Export Citation
  • Rousseeuw, P. J., 1987. Silhouettes: A Graphical Aid to the Interpretation and Validation of Cluster Analysis. Journal of Computational and Applied Mathematics, 20, pp. 5365.

    • Search Google Scholar
    • Export Citation
  • Sarkar, A., Pick, J.B. and Johnson, J., 2015. Africa’s Digital Divide: Geography, Policy, and Implications.

  • Takada, M., Suzuki, T. and Fujisawa, H., 2017. Independently Interpretable Lasso: A New Regularizer for Sparse Regression with Uncorrelated Variables. ArXiv Preprint ArXiv:1711.01796.

    • Search Google Scholar
    • Export Citation
  • Thorndike, R. L., 1953. Who Belongs in the Family?. Psychometrika, 18(4), pp. 267276.

  • Tibshirani, R., Walther, G. and Hastie, T., 2001. Estimating the Number of Clusters in a Data Set via the Gap Statistic. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 63(2), pp. 411423.

    • Search Google Scholar
    • Export Citation
  • United Nations, 2019. Global Indicator Framework for the Sustainable Development Goals and Targets of the 2030 Agenda for Sustainable Development.

    • Search Google Scholar
    • Export Citation
  • World Bank Group, 2016. World Development Report 2016: Digital Dividends. World Bank Publications.

  • World Bank Group, 2017. Doing Business report 2017, Distance to Frontier and Ease of Doing Business Ranking, World Bank Publications.

  • World Bank Group, 2019. Doing Business Report 2019: Training for Reform. World Bank Publications.

Appendix I – Variables used in Enhanced Digital Access Index (EDAI)

article image
article image
article image
Note: Selection of these variables is based on the following criterion: at least one observation for each variable should be available during 2014–17 for all countries. When a given economyhas more than one observation for a given variable, the latest data point is selected. Most of the observations are dated on 2016–17.

Appendix II – Country groupings

Global groupings by income (Fiscal Monitor) 26

Advanced Economies

AUS, AUT, BEL, CAN, CHE, CYP, CZE, DNK, ESP, EST, FIN, FRA, DEU, GBR, GRC, IRL, ISL, ISR, ITA, JPN, KOR, LVA, LTU, LUX, MCO, MLT, NLD, NOR, NZL, PRT, SGP, SVK, SVN, SWE, USA

Emerging and Middle-Income Countries

AGO, ARE, ARG, AZE, BGR, BLR, BRA, BWA, CHL, CHN, CIV, COG, COL,CPV, DZA, DOM, ECU, EGY, GAB, GNQ, HRV, HUN, IDN, IND, IRN, KAZ, KWT, LBY, LKA, LSO, MAR, MEX, MUS, MYS, NAM, OMN, PAK, PER, PHL, POL, QAT, ROU, RUS, SAU, SMR, SRB, STP, SWZ, SYC, THA, TUR, UKR, URY, VEN, ZAF

Low-Income Developing Countries

AFG, ALB, AND, ARM, ATG, BEN, BDI, BFA, BGD, BHR, BHS, BIH, BLZ, BOL, BRB, BRN, BTN, CMR, CAF, COD, COM, CRI, CUB, DJI, DMA, ERI, ETH, FJI, FSM, GEO, GHA, GIN, GMB, GNB, GRD, GTM, GUY, HND, HTI, IRQ, JAM, JOR, KEN, KGZ, KHM, KIR, KNA, LAO, LBN, LBR, LCA, LIE, MDA, MDG, MDV, MHL, MKD, MOZ, MLI, MNE, MNG, MMR, MRT, MWI, NER, NGA, NIC, NPL, NRU, PAN, PLW, PNG, PRK, PRY, RWA, SEN, SLE, SLB, SLV, SOM, SDN, SSD, SUR, SYR, TCD, TGO, TJK, TKM, TLS, TON, TTO, TUN, TUV, TZA, UGA, UZB, VCT, VNM, VUT, WSM, YEM, ZMB, ZWE

Sub-Saharan Africa (SSA) groupings (Sub-Saharan African Regional Economic Outlook) 27

Oil-exporting countries (SSA)

AGO, CMR, COG, GAB, GNQ, NGA, SSD, TCD

Other resource-intensive exporters (SSA)

BWA, BFA, GHA, NAM, NER, SLE, SOM, TZA, ZAF, ZMB

Non-resource-intensive exporters (SSA)

BEN, BDI, CIV, COM, CPV, ERI, ETH, GMB, GNB, KEN, LSO, MDG, MOZ, MUS, MWI, RWA, SEN, STP, SWZ, SYC, TGO, UGA

Countries in Fragile situations (SSA)

BDI, CAF, COM, COG, CIV, COD, ERI, GIN, GMB, GNB, LBR, MWI, MLI, SSD, STP, TCD, TGO, ZWE

APPENDIX III – K-means algorithm

  • Let Cl,C2,..., CK denote the set of indices of observations in each cluster such that:

  • ✓ Each observation belongs to at least one of the K clusters:
    C1C2CK={1,,n}
  • ✓ The clusters are not overlapping:
    kk'CkCk'=
  • K-means clustering aims to minimize the within-cluster variation:
    minC1,...,Ck{k=1kW(Ck)}(1)
  • where the within-cluster variation is defined through the squared Euclidean distance:
    W(Ck)=1|Ck|Σi,i'CkΣj=1p(xijxi'j)2(2)
  • with |Ck| denotying the power of set, ie. the number of observations in the kth cluster.

  • The algorithm of solving the above-mentioned problem may be described as follows:

Algorithm 1. K-means clustering

1. Randomly assign a number, from 1 to K, to each of the observations, as the initial cluster assignment.

2. Iterate until the cluster assignments stop changing:

a) For each of the K clusters, compute the cluster centroid (the vector of the p feature means for the observations in the kth cluster).

b) Assign each observation to the cluster whose centroid is closest (where closest is defined using Euclidean distance).

APPENDIX IV – K-means clustering country groupings

article image

APPENDIX V – Digital Access Index (DAI) weighting scheme

Infrastructure

Fixed telephone subscribers per 100 inhabitants: 10%

Mobile cellular subscribers per 100 inhabitants: 10%

Affordability

Internet access price as percent of GNI x 100: 20%

Knowledge

Adult Literacy: 13%

Combined primary, secondary, and tertiary school enrollment: 7%

Quality

International Internet bandwidth per capita: 10%

Broadband subscribers per 1000 inhabitants: 10%

Usage

Internet users per 100 inhabitants: 20%

Appendix VI – Enhanced DAI (EDAI) and components

article image
article image
article image
article image
article image

Appendix VII. Variables used in fractional logit regression

article image
article image
article image
article image
Note: With the exception of Balance of Payments and Macro Indicators (Chinn-Ito, IMF); Climate and Health (UN), Corruption, transparency and country risk (BTI Project, Transparency International, HIS Global Insight, PRS Group, Economist Intelligence Unit, V-Dem, World Economic Forum), all other category of variables are from World Bank’s, mostly from the World Development Indicators.

Appendix VIII. Fractional logit regression

The model proposed by Papke and Wooldridge (1996) has the following structure:

E ( y | X ) = G ( β X )

Where G(·) denotes the link-function satisfying G(·) ∈ [0,1], X represent a set of explanatory variables and y should be regarded as a dependent variable. The link function guaranties that the predicted values lie in the above-mentioned interval. In the following paper the authors decided to implement the logit function:

G ( ) = exp ( ) 1 + exp ( )

Generalized linear models (GLM) are usually fitted with maximum-likelihood algorithms (Hardin and Hilbe, 2007). Papke and Wooldridge (1996) propose however a particular quasi-likelihood method, which maximizes the following Bernoulli log-likelihood function:

l ( β ) = y l o g ( G ( β X ) ) + ( 1 s ) log ( G ( β X ) )
1

The authors wish to express their gratitude to Aidar Abdychev, Ben Clements, Jean Philippe Gillet, Clement Ncuti, Laure Redifer, Axel Schimmelpfennig, Murat Yavuz, and participants of the IMF seminar on July 18, 2019 for useful comments and suggestions.

2

Among others, see Hjort and Poulsen (2019).

3

The term digital divide or digital split owes to Norris (2001). Drivers of digital divide include socio-economic factors, geographical factors, educational, attitudinal and generational factors, or through physical disabilities (Cullen, 2001).

4

We also considered hierarchical clustering which yielded similar results. Specifically, we implemented the Divisive Hierarchical Clustering due to its superior properties in identifying relatively large clusters. We prefer the K-means algorithm since the hierarchical clustering methods are subject to arbitrary decisions of selecting both the distance metric and the linkage criteria, the time complexity of at least O(n2 log(n)), where n is the number of data points, as well as their sensitivity to noise and outliers.

5

The elbow method is a visual approach to choose a number of clusters such that adding another one does not lead to a significant increase in the ratio of within-group to total variance explained. The average silhouette technique is another visual application, which aims to maximize the within-cluster similarity, while simultaneously maximizing the across-cluster dissimilarities. Finally, the gap statistics is a statistical method that maximizes the total within intra-cluster variation for different number of clusters with their expected values under null reference distribution of the data.

6

The assigned colors in Figure 1, do not represent any intensity or qualitative nature of the groupings.

7

Intuitively, PCA can be regarded as a statistical procedure to reveal the internal structure of the data in a way that best explains the variation in the data within a multivariate context. It provides a lower-dimensional picture by considering only the first few principal components to reduce the dimensionality of the transformed data.

8

First component can be viewed as a linear transformation of indicators explaining the largest variation. The second component is still a linear transformation, orthogonal to the first one, explaining the largest portion of residual variation.

9

An alternative index with the same acronym is the Digital Adoption Index of World Bank (2016). This index is based on three sectoral sub-indices covering businesses, people, and governments, with each sub-index assigned an equal weight: DAI (Economy) = DAI (Businesses) + DAI (People) + DAI (Governments). Each sub-index is the simple average of several normalized indicators measuring the adoption rate for the relevant groups.

10

This implies that for any country, each lagging value of any sub-index would act as a bottleneck and therefore reduce the EDAI.

11

It is important to provide the synthesis independent from the “ideal unit” because a set of “optimal values” is arbitrary, non-univocal and would potentially vary over the time. See De Muro and others (2011).

12

This is in contrast to the key findings of the Inclusive Internet Index, commissioned by Facebook and conducted by the Economist Intelligence Unit that finds the digital divide to be widening at the bottom of the income pyramid.

13

The values of DAI for the world, normalized through the identical transformation as in the EDAI derivation and then averaged with the GDP per capita as the share of the world weights were compared to the analogous outcomes for the EDAI: DAIworld = 69.8, DAISSA = 24.7, EDAIworld = 85.1, EDAISSA = 67.8. For more information on the distance-to-frontier consult Distance to Frontier and Ease of Doing Business Ranking (2017).

14

These factors motivate the use of income and regional dummies as control variables in the fractional logit regressions (see Section IV).

15

Income based country groupings are from IMF(2019c) and are in Appendix IV.

16

These are the nine countries from SSA that have EDAI values above 70. The median for the world is 78 and there are four countries from SSA above the median: Cabo Verde, Mauritius, Seychelles, and South Africa.

17

Figure 7 shows that the cost of digital connectivity is similar in SSA compared to the rest of the world. However, relative to per capita income, affordability is an issue for SSA. Indeed, Abdychev and others (2018) and IMF (2019c) note the cost of a fixed broadband connection is the highest in sub-Saharan Africa compared to other regions. “Affordability” sub-index used in the aggregation of the EDAI goes beyond the broadband costs and internet penetration and also include factors such as the price of SMS and the price per minute of a peak rate call, see Appendix I.

18

This was first noted for SSA countries by Onyeiwu (2002).

19

SSA country classifications are from IMF (2019c).

20

A potential extension would be to implement decipherable penalized regression method, such as Bootstrapped Ridge Regression (Lenert and Walsh, 2018).

21

In all regressions, PPP GDP per capita is used as a control variable. Additionally, for the world regression, we also impose 16 regional dummy variables as further control variables.

22

Potential endogeneity is an issue that we are not able to address due to lack of a “good” instrument for the purposes of this study and the cross-sectional nature of the data used in estimations. In that sense, in the preceding analysis, we refrain from attributing causation and emphasis on the magnitudes of the coefficients. We rather focus on the strength of the correlations as well as the sign of the coefficients. With regular data availability across time, panel and distributed lag models could be considered as a valuable extension of further work in this area.

23

We use the CPIA variables (16) only in the SSA regression. Hence total number of variables used in the global regression is 110.

24

We acknowledge that the remittances as the percentage of GDP, as well as the renewable energy share as the percentage of total energy consumption may in fact capture the effect of variables not included in the analysis, being for instance a proxy for general level of development.

25

Environmental sustainability rating assesses the extent to which environmental policies foster the protection and sustainable use of natural resources and the management of pollution.

26

Classification from the IMF (2019b).

27

Classification from the IMF (2019c).

  • Collapse
  • Expand
Digital Connectivity in sub-Saharan Africa: A Comparative Perspective
Author:
Mr. Emre Alper
and
Michal Miktus