Building Statistics and Data Science Capacity for Development

Building Statistics and Data Science Capacity for Development

Building Statistics and Data Science Capacity for Development
1 September 2022 191 views No Comment
Eric A. Vance, University of Colorado at Boulder Laboratory for Interdisciplinary Statistical Analysis, and Kim Love, Quantitative Consulting and Collaboration
Editor’s note: This article originally appeared in CHANCE 34.3. The metrics have been updated to reflect 2022.
We consider development, at a broad level, to be sustainable action that positively affects society. We believe three components lead to such development: scientific and research innovations; creating jobs for sustainable activities; and implementing effective policies.
Statistics, as a discipline and practice, enables and accelerates all aspects of data-driven research, business, and policy. Data is key to knowledge, and statistics and data science are the bridge to understanding data. Understanding data allows people to make scientifically sound decisions, thereby answering questions, solving problems, and generating development benefits—often at a deeply meaningful level. This is especially relevant for developing countries, which often have pressing needs for which solutions can be guided through both the production and analysis of data.
We are mindful that local challenges require local solutions. Solutions provided by a consultant from a highly developed Western country may not apply in a developing country. The best people to collect or produce data are local experts who understand the context of the local data-collection activity. The people best positioned to analyze that data are local statisticians and data scientists who understand the context of how the data was produced. The best people to craft and implement policy innovations are locals with the lived experience of those to be affected by changes in policy. In other words, it is local researchers, businesses, and policymakers who actually solve local challenges.
In our experience, policymaking and other types of decision-making often ignore the need for rigorous statistical analyses. The simplistic perception is there are only two actors necessary for data-based decision-making: those who collect or produce the data and those who review the data and make policy or other decisions. In many cases, the data producers and data decision-makers exist within the same organization or individual, such as a business that collects data on its customers and uses it to make a business decision or an academic researcher who conducts an experiment to answer a scientific question.
As collaborative statisticians who have worked on hundreds of projects and supervised thousands more, we are keenly aware that there are, in fact, at least four important components required to make informed data-driven decisions: the domain expertise required to ask the right questions; high-quality, relevant data; appropriate, nuanced statistical analyses; and the power to make and implement a decision.
Assuming domain expertise exists within data producers and data decision-makers, this requires the addition of a third actor—the statistician or data scientist who possesses specialized skills in data analysis and interpretation. Sustainable development requires all three actors operating at full capacity.
Intersections of Data-Driven Development Actors
Consequently, our strategy for engaging in locally powered, data-driven development starts with acknowledging the following four potential gaps:
Gap 1. Local expertise to frame locally relevant development questions
Gap 2. Consistent production of high-quality data through carefully designed experiments, studies, or surveys at the international, national, and local levels
Gap 3. Technical ability in statistics and data science to design studies or experiments and appropriately model, analyze, and interpret data
Gap 4. Transforming evidence from data into action for development
A New Model for Building Statistics and Data Science Capacity
One of the International Statistical Institute’s four strategic priorities is building statistical capacity in developing countries. Historically, this has focused on building the capacity of developing countries to produce state-sponsored data (e.g., conduct accurate population censuses, consumer price index surveys, etc.).
While the state-sponsored production of data is foundationally important for development, it only addresses one of the potential gaps in data-driven development. Even though the data produced may be of high quality, it might not provide the information or evidence needed to answer the questions raised by the data decision-makers, the data may be interpreted incorrectly, or the data alone may be insufficient for taking action for development.
It is important to keep in mind that working in academic isolation will not create positive development outcomes. To create real-world impact, statisticians must take steps to apply theories and methods to help transform academic evidence into action for the benefit of society. For example, at the University of Calcutta, India, statisticians and data scientists created a model to forecast the solar energy output of solar farms and then collaborated with decision-makers to use the model to optimize the production of sustainable solar energy.
Transforming technical statistical methods into positive action for society requires statisticians and data scientists to be skilled collaborators, methodologists, and analysts. In other words, these statisticians must have the skill to work in the intersections mentioned above in addition to the skills to analyze and interpret data. They must be able to understand the data and projects they are working with deeply and broadly and be able to communicate the results of statistical methods and analytical work in ways that provide actionable evidence to those who can use it to positively affect society.
Our model for building statistics and data science capacity is to create statistics and data science collaboration laboratories (“stat labs”) that work in the intersections of data-driven development by collaborating with data producers and data decision-makers to transform evidence into action. Increased collaborations between statisticians and development actors in developing countries can contribute to improved infrastructure, business development, education, agricultural growth, and human rights issues, among other areas in urgent need of progress. Especially in developing countries, it is essential that local statisticians and data scientists interact with local researchers, businesses, and policymakers to develop sustainable solutions to local challenges.
Stat Labs Are Engines for Development
Stat labs provide a mechanism for increasing collaboration between statisticians and researchers, business professionals, and development policy actors. In our context, these labs are housed within research institutions in developing countries (generally universities) and have the following three main objectives:
Train statisticians to have a collaborative, evidence-to-action mindset
Teach researchers, business professionals, and development policy actors to become more capable of using data and more aware of the power of statistical analysis to inform decisions
Provide a collaborative space for statisticians and data scientists to work with those individuals to create data-driven innovations and solutions leading to widespread development outcomes
Stat labs are not rooms full of computers. Rather, a stat lab is a team of statisticians and data scientists empowered to collaborate with domain experts to ask relevant questions, produce high-quality data, analyze and interpret data to create knowledge and evidence, and transform that evidence into action for development.
We view these stat labs as “engines for development.” Stat labs initially focus on training students, faculty, and staff to become effective interdisciplinary researchers with the technical and collaborative skills necessary to accelerate research and transform it into solutions to development challenges. These stat labs reach out to the local community of researchers, business leaders, government agencies, and nongovernmental organizations to provide training and tailored statistical support. As a result, the community becomes more aware of the ability (and indeed, the necessity) of the stat labs to provide this type of assistance.
As this awareness grows, the community provides more requests to the stat lab for this type of assistance, which in turn provides more opportunities for capacity development within the lab. Initial successes create a positive feedback loop in which more development actors want to collaborate with the stat labs and become “data-capable,” and more statistics students, faculty, and staff desire work in the stat lab.
Stat labs that focus on using projects as opportunities to train students rise to the challenge of building capacity quickly to successfully complete a higher number of projects. When projects come with funding, senior students and faculty can be compensated to both work on projects and mentor junior students. Successful, high-profile projects also engender support for stat labs from administrators, potentially loosening restrictive rules, removing institutional barriers to success, and attracting more students to study statistics and data science. This self-reinforcing cycle is key to the long-term success of stat labs.
Over time, these stat labs output a supply of experienced statisticians with a collaborative, evidence-to-action mindset and a cadre of more capable development actors who recognize the power of statistics and data science to help solve their development challenges. The stat labs also provide a physical and intellectual space for statisticians/data scientists and development actors to collaborate on projects. These data-driven innovations lead to widespread, substantial, and sustainable development impacts, because a well-trained collaborative statistician or data scientist can enable and accelerate 10 or more development projects per year, and those projects can positively affect thousands or more people.
Through all these mechanisms, stat labs not only fulfill the role of data analyzer, but serve as a conduit that leads to development work in the intersections of the various actors. When collaborative statisticians are able to participate in and initiate projects that involve policymakers and decision-makers, they can translate the policymakers’ questions and desired information into quantitative questions that can be answered with data. When collaborative statisticians are trained to work directly with data producers, they can help design experiments or studies that result in high-quality data, which they can then analyze in a way that can be relied upon. When collaborative statisticians can do both, they can ensure data production occurs in a way that is appropriate for answering the questions posed by the decision-makers so they can make recommendations and implement policies. This three-way intersection, for which stat labs can be the catalyst, provides the most desirable setting for data-driven development.
The LISA 2020 Network of Stat Labs
While director of the Laboratory for Interdisciplinary Statistical Analysis (LISA) at Virginia Tech, Eric Vance recognized the power of the LISA stat lab to build the technical and collaborative capacity of his students while collaborating on projects to benefit researchers, businesses, and policymakers. The LISA students were, in fact, learning more statistics and how to translate their technical work into useful results by collaborating on these projects. The more projects they worked on, the more effective they became and the better they were at training and mentoring junior students.
Vance, while traveling to more than 70 countries, also recognized the LISA model—when adapted to local conditions—could help statisticians in developing countries build their own capacity to do similar work to solve local challenges. In 2012, he created the LISA 2020 Network to build statistical analysis and data science capacity in developing countries with a goal of creating a network of at least 20 stat labs by 2020.
In 2016, with five newly established stat labs as members of the LISA 2020 Network, Vance moved LISA from Virginia Tech to the University of Colorado at Boulder. Word of the benefits of the LISA 2020 Network spread widely as Vance collaborated with the International Statistical Institute, with particularly strong uptake and adoption in Africa, South Asia, and Brazil.
As of August 2021, the LISA 2020 Network consisted of 34 stat labs located in 10 low- and middle-income countries. There are 14 additional stat labs in the process of becoming full members of the LISA 2020 Network.
The overall purpose of the network is to facilitate the process of individual stat labs becoming engines for development. This allows not only for faster capacity growth at each lab, but also increases the potential for worldwide effects of the stat labs through lab- and country-level collaborations. Stat labs undertake the following seven-step process to become a member of the LISA 2020 Network:
Identify a potential director or coordinator and a mentor from within the LISA 2020 Network who will help the potential director or coordinator receive sufficient training and guidance in the needed nontechnical skills.
Gather and document support for the stat lab from within the department and across the university in the form of letters from senior university officials.
Complete and submit the full lab plan/proposal to become a “proposed member” of the LISA 2020 Network.
Respond to a review committee’s feedback on the full lab plan/proposal. If the response to the feedback is satisfactory, the lab will be granted “transitional member” status in the LISA 2020 Network.
Open the stat lab: train students and staff; provide research infrastructure for local domain experts; teach short courses/workshops to improve statistical skills and data literacy; report on the stat lab’s activities, outcomes, and impacts (metrics).
Stay connected with the network via semi-monthly Zoom meetings, annual symposia, quarterly reports of stat lab activities and numbers, and other channels.
Report a full quarter of metrics and present about the lab to the LISA 2020 Network.
Labs will typically complete steps 1–4 before Step 5, though some of the steps may be taken out of order. For example, a stat lab can open before they have completed Step 3. We encourage labs to become connected with the LISA 2020 Network (Step 6) at the early stages of the process. In general, labs that complete steps 1–3 are considered proposed members. Labs successfully responding to the feedback on the full lab plan/proposal will become transitional members. Next, they begin operation of the stat lab, report a full quarter of metrics, and introduce the lab to the LISA 2020 Network. If accepted by the LISA 2020 Network by a two-thirds majority vote, they become full members.
A further purpose of the LISA 2020 Network is to enable continued connections among the stat labs at various stages of development. This allows them to share progress, learn from one another, and collaborate on projects on an international scale. More established stat labs share their cultivated best practices with newer labs; the new labs then innovate to fit these practices to their particular circumstances and, in turn, share new successes and challenges with the network. The LISA 2020 Network facilitates these exchanges through regular emails and newsletters, twice-monthly Zoom meetings (including more extended presentations by member labs), and annual symposia.
Finally, the network is a united organization that can act as a venue to provide funding and connect stat labs to decision-makers and policymakers for data-driven development. As an example of the utility of the network, the US Agency for International Development signed a cooperative agreement with the University of Colorado at Boulder in 2018 for the LISA 2020 Network to provide funding for several stat labs to engage in pilot projects with data decision-makers. This fund is called the Transforming Evidence to Action (TEA) fund and enables the selected stat labs to collaborate in the intersections with data producers and data decision-makers. A selection of TEA fund projects currently underway include the following:
The University of Ibadan in Nigeria partnered with the Independent National Electoral Commission to assess the quality of the country’s Continuous Voter Registration exercise, examine the effectiveness of the electoral process for voters, and make recommendations regarding the quality of the voter register and future election-related activities.
Wolkite University in Ethiopia is partnering with the Gurage Zone Vital Events Registration Agency to improve the current vital events (e.g., births, deaths) registration system through design and analysis of resident surveys, database creation, and training of agency workers in data management and analysis.
The Federal University of Rio Grande do Norte (UFRN) in Brazil is partnering with the Department of Public Policy at UFRN and União dos Dirigentes Municipais de Educação to address educational inequalities in the state of Rio Grande do Norte by analyzing data obtained from the Brazilian Ministry of Education and producing models to determine relationships across school infrastructure, students’ social background, and students’ performance on standardized tests.
The African Center for Education Development in Nigeria is partnering with the Nigerian National Bureau of Statistics to study the impact of COVID-19 on small-scale business enterprises in northern Nigeria through the design and analysis of a survey of small business owners and using the results to assist in developing a road map for implementing economic intervention for small scale businesses.
The Kwame Nkrumah University of Science and Technology in Ghana conducted a two-part workshop for 26 female scientists in government positions on data analysis for decision-making (part one) and methods for policy analysis, planning, evaluation, and leadership (part two).
These projects allow stat labs to work in the intersections between data analyzers, data producers, and policymakers (i.e., data decision-makers). In several cases, the data producer and data decision-maker is the same entity. The stat lab provides collaborative statistics and data science expertise to their partner to ask development-relevant questions, produce high-quality data, analyze the data, and formulate policy recommendations, fully operating within the three-way intersection.
Statistics and Data Science Capacity Built
The LISA 2020 Network currently uses multiple indicators—collected quarterly from all labs in the network—to evaluate the accomplishments of the network and promote learning and sharing of information among the members.
These metrics have been collected since January 2019, when there were 10 members of the LISA 2020 Network that were in a position to collect them. As of March 31, 2022, with 36 labs reporting metrics, the labs have reported 1,467 stat lab projects. They have also reported 2,401 statisticians trained to be collaborative statisticians, 68 percent of whom are students (graduate and undergraduate) and 39 percent of whom are female. The stat labs have offered 399 workshops with 14,909 attendees, of whom 42 percent were female. Finally, the work of the stat labs has resulted in 146 peer-reviewed publications with 623 authors, including both stat lab collaborators and researchers (of whom 35 percent are female).
In addition to the indicators collected from all labs, there are indicators we collect from labs that undertake funded development-oriented projects. These include the number of program and policy changes made by public sector, private sector, or other development actors influenced by lab-funded research results or related scientific activities; number of convenings held to disseminate research for use and/or develop policy recommendations; and publications specifically related to the results of these projects. The majority of these projects are nearing completion, and we will be compiling and releasing results at the end of 2022.
Stat labs are also encouraged to collect and report customized metrics unique to their lab, which may be determined by the lab’s unique stakeholders. So far, this has included metrics such as the number of events to promote the lab to potential collaborators, undergraduate lectures, and number of theses and dissertations assisted by the lab.
Nine Lessons Learned (So Far)
Through the process of creating and growing stat labs, measuring and evaluating the progress of those labs through metrics, and assisting with the implementation and administration of the TEA fund projects, the LISA 2020 Network has learned nine important lessons.
Lesson 1. Stat labs should attain a stronger role in state-sponsored data production and analysis of that data. Stat labs are skilled at helping data producers produce high-quality data. For example, the LISA stat lab at the University of Ibadan was able to successfully assist the Independent National Electoral Commission in designing sampling plans and initiating sampling of the voting population and voting register to answer questions regarding the quality of the voter register and strength of Nigerian democracy. However, we have observed a divide between academic statisticians (including stat labs) and national data producers.
Therefore, stat labs should work toward bridging this gap to attain a more prominent role in the data production process, as those who produce data at a national level are often not aware of the benefits of working with skilled statisticians and data scientists during the planning stage of data production.
In addition, the production of state-sponsored data is often separate from the analysis or use of that data. Expert statistical analyses by stat labs could produce useful evidence leading to action for development.
To facilitate future interactions and strengthen the data production system, stat labs should focus on placing more of their graduating students in jobs in national statistics offices.
Lesson 2. Stat labs are skilled in training data decision-makers to become statistically aware and data literate and should focus more effort on training policymakers. Training is a strong point of the LISA 2020 Network and opens the door for future collaborations. In just more than two years, and with the COVID-19 pandemic making it more difficult, LISA 2020 stat labs have taught 220 workshops to 8,728 attendees, the vast majority of whom were research-focused data decision-makers (i.e., university staff and students). Stat labs can increase their impact by focusing on training business and policy decision-makers, as well as state-sponsored data producers.
An example of a stat lab expanding its scope for training was the TEA fund project at the Kwame Nkrumah University of Science and Technology in Ghana to build data analysis and interpretation capacity for policy decision-making and strategic planning, develop leadership capabilities, and provide a mentoring platform for mid-career female scientists in government positions. The effects of these workshops include 11 job promotions and acceptances into funded PhD programs, five research grants funded, five scientific publications relying on statistics learned in the workshop, and one participant becoming one of the leading voices in Ghana in the fight against COVID-19.
Lesson 3. Projects building capacity in the three-way intersection of data-driven development actors have the most potential for impact. When the data producer is also positioned to be a policy decision-maker, a stat lab can provide statistical expertise in all phases of the project to frame development-relevant questions, produce high-quality data, analyze the data, and make policy recommendations. Combining state-sponsored data production and local-level data production with thorough analyses and interpretation of data will help development actors move beyond the common practice of thinking data alone is sufficient for making decisions.
Therefore, the LISA 2020 Network should focus its capacity-building efforts on helping stat labs work in this intersection. It should redouble its efforts to provide training for statisticians to collaborate with the data producers and decision-makers to transform evidence into action.
Lesson 4. Projects in the intersection of the research, business, and policy domains also have high potential for impact. Our labs, being primarily centered at universities, naturally focus on supporting researchers by helping them design experiments and studies to produce data and/or analyze the data to make decisions about scientific research questions. Disseminating the findings of a collaboration between a stat lab and academic researcher through a journal, however, is often insufficient for influencing policy decisions. Therefore, if a project’s goal is to influence development decisions, policymakers should become involved in the project and stat labs must deliberately reach out to them.
Similarly, involving the local business community in academic research projects can increase the potential development effects of those projects. By reaching out early as skilled collaborators, stat labs can ensure they are helping answer questions of interest to policy or business decision-makers. The highest-impact projects will involve all three of the research, business, and policy domains.
Lesson 5. Transforming evidence to action (TEA) requires a mindset shift uncommon in statisticians and data scientists. Even when statisticians collaborate with other development actors, the “traditional” end of the statistics or data science cycle (i.e., a timely, cogent, well-motivated, and contextualized analysis with easily digestible findings, conclusions, and recommendations) is still only part way toward development action.
We have found that, even when working directly with policy actors who can make data-based decisions, it is difficult to follow project outcomes until they result in verifiable policy changes. This is partly because policy change is often a long and complex process and our statisticians feel compelled to move on to the next project. Nevertheless, statisticians must adopt a TEA mindset to see a project all the way to its end to transform evidence into action for the benefit of society.
Lesson 6. LISA 2020 stat labs are becoming engines for development. In a wide variety of contexts in 10 developing countries, our stat labs are successfully carrying out their missions. They are training their own staff and students and providing them with projects to further enable their growing capacity in collaborative statistics and data science. They are conducting short courses and workshops to broad audiences and engaging deeply in research projects, as evidenced by many coauthored publications. The stat labs are establishing themselves as local infrastructure to enable and accelerate data-driven development.
Lesson 7. LISA 2020 stat labs are adaptable in adverse circumstances. In 2020, navigating the COVID-19 pandemic required operational changes on a global scale. This unexpected crisis also forced the stat labs to adopt novel approaches to their training, teaching, and collaborative activities. As one example, the lab at the Federal University of Rio Grande do Norte in Brazil was able to transfer its student collaborator training activities to an online environment, providing lessons and group discussion. The initial semester provided the opportunity to record videos of the lessons, allowing for asynchronous lessons in future semesters when necessary.
Workshops offered by Afe Babalola University, which continued to be offered online to internal participants at the university, are another example. The technology available at many stat labs is adequate to provide alternatives when in-person gatherings are not possible; the motivation of the lab personnel is sufficient to prioritize lab activities, even when they cannot be done in the typical setting.
Lesson 8. LISA 2020 stat labs have many opportunities for improvement. We observe a lack of gender equality, with fewer women represented in every area in which gender is recorded—collaborative statisticians trained, workshop attendees, and publication authors. Gender issues are not specific to stat labs, but rather stem from larger societal issues in the countries where the labs are located. Despite the current lack of gender equality, however, the stat lab directors are enthusiastic about achieving gender equality. Labs that are approved as full members provide a plan for including females in the administration of the lab and working toward future gender equality in their activities.
Another area challenging stat labs is measuring longer-term effects of their activities. For example, although we record the number of short courses, workshops, and attendees for all the labs, only the Kwame Nkrumah University of Science and Technology has implemented a longer-term follow-up evaluation of their attendees to learn how they incorporated the training received into their work or research.
Lesson 9. Sustainable funding for stat labs is a challenge. Some labs have obtained ongoing funding from their universities; most rely on volunteer efforts from their passionate statisticians and data scientists, as well as support from individual researchers and workshop fees. Some labs have received initial funding to work with organizations and often continue working with them because the organizations recognize the value of skilled statistical collaborators and are willing to provide funding for future projects. This helps ensure the continued existence of these labs, as they strengthen their statistics and data science capacity while enabling and accelerating data-driven development.
Building Statistics and Data Science Capacity
The LISA 2020 Network began almost a decade ago with the idea that collaborative statisticians in developing countries could create stat labs to build statistics and data science capacity. Based on the collective experiences of more than 30 newly created stat labs since then, the network is being transformed by the idea of building statistics and data science capacity by focusing research, education, and outreach efforts on the intersections of data-driven development. The current and near-future focus of the network is on improving the quality and sustainability of the individual stat labs and strengthening connections between them.
Further Reading

Images Powered by Shutterstock