Khulisa has been conducting evaluations for nearly 30 years, and we have collected a LOT of data during those three decades. Our organization’s lifespan also coincides with the internet’s rise to dominance, during which the data collection field has rapidly expanded and transformed. Gone are the days when in-person interviews, mail, or telephone surveys were the only ways to collect data. Smartphones, tablets, and artificial intelligence have created a dizzying number of ways to conduct data collection.
Considering the wealth of experience and knowledge Khulisa has accumulated in this field, we decided to create a post about how we’ve designed some of our recent data collection projects and how we choose the best, most current data collection tools and processes for particular situations. We recently sat down with members of Khulisa’s Education and Development Division to discuss the data collection designs and methods they’ve used for their recent work, including the Early Grade Reading Assessment (EGRA), and the Trafficking in Persons (TIP) evaluations.
Factors to consider when choosing a data collection tool
There are many questions an evaluator or evaluation team must ask when designing a data collection project:
1) What’s the budget?
Cost is a huge factor when deciding how to collect data for an evaluation. Sending field workers out to collect data in person is hugely expensive, as is computer-assisted telephone interviewing (CATI). Programming online data collection forms, as well as cleaning and validating data can also be costly, depending on the program.
Fortunately, there are free data collection tools available that can save on costs, reduce the likelihood of errors, and make quality data collection possible for all different types of organizations. Khulisa and its partners were early adopters of Open Data Kit (ODK), an open-source data collection tool, and have recently moved to KoboToolbox, a free tool provided by a technology non-profit, for several of its evaluations in the education sector. Kobo is user-friendly, allowing enumerators to collect data in any language, anywhere in the world, using phones or tablets. These features make Kobo a great fit for Khulisa’s Schools2030 project, which involves surveying in dozens of schools across nine countries. “The huge benefit of KoboToolbox is that it’s free. “It’s used as a humanitarian tool,” says Khulisa Monitoring and Evaluation Associate Jesse Webb. “And that’s really incredible because we can work with people all over the world and ask them to get this tool.”
“[Kobo] just made it a lot more user friendly, and a lot of people can now build a survey, using functionalities that on ODK would have been more complicated because you needed to know a little bit of coding,” says Senior Monitoring Evaluation Research and Learning (MERL) Specialist, Leticia Taimo. “It’s a very powerful tool because it includes a lot of options within it to validate your data on the go, to use things like skip logic and native calculations,” Jesse says.
Free tools like Kobo also contribute to Khulisa’s capacity-building work. Khulisa helped its Early Grade Reading partner, Molteno, migrate many of its language and literacy training/assessment tools from paper-based over to tablets using Kobo.
Other popular data collection tools offer both free and premium options, with modestly priced paid versions offering more versatility than free versions. “There’s a paid version of Survey Monkey, so when we need a tool to do some basic analysis for us or the client, then Survey Monkey is a good option because it will pull the data into graphs pretty quickly and easily,” Jesse says. “Whereas with Kobo, you’re just getting [the data] back in an Excel form.”
2) Who are the respondents?
When deciding on a data collection design, the question is: Who do we need the data from?
“If it’s young people with access to cell phones and internet and data, any of the electronic tools would work,” says Education and Development Director Margie Roper. “If it’s an enumerated survey, with field workers, you can use a tool that requires training to collect the data. If it’s a group for which an electronic device may be intimidating [like older people, or anyone without access to the internet], then you could go for paper-based. But we’re really trying to move away from paper-based because there are a lot more errors in the whole process of capturing the data and cleaning it.”
In some cases, there are specific data collection tools available for respondents in a particular practice area. RTI International has developed a tool called Tangerine, designed specifically for early grade reading and math literacy assessments. “A lot of organizations use it worldwide because it is the only [data collection tool] that was developed specifically for the early grade reading assessment…We wouldn’t be able to administer the EGRA in the way that we do if we if we didn’t use this tool,” says Leticia.
For complex projects with more than one type of respondent, a combination of survey tools often works best. In the case of setting early grade reading benchmarks in African languages, Khulisa uses Tangerine for its learner assessments and Kobo for its teacher surveys.
3) Is there online access?
Inconsistent internet access and/or cellular coverage often necessitates innovative data collection design. When conducting surveys in rural areas, or anywhere that online coverage is challenging, it’s important to choose tools that function both online and offline.
Tangerine uses tablets that can collect data both with and without cellular service. “When you are out there, and there is no network, you’re still able to actually do your assessment and then you send the data later,” says Evaluation Coordinator Tshandapiwa Tshuma, who works on Khulisa’s Tshivenda Language Benchmarking Project in rural South Africa. “So that’s one of the really good advantages with Tangerine.”
When collecting large amounts of qualitative data using laptops, which aren’t always able to access the internet as easily as phones or tablets, Khulisa sometimes uses a tool called Jotform. Jotform, which offers both free and premium versions, allows data collectors to record information using fillable PDF forms.
“When you’re sitting, maybe observing a teacher or going through documents at a clinic, a laptop would be easier for people to type a lot of information in,” Leticia explains. “But you don’t know if you’re necessarily going to have internet in that moment. So working in an online tool doesn’t work, but a PDF that you can open and you can save, you can type things as you read through documents as you talk to people throughout a whole day, is useful.”
4) What kind of data are you collecting?
Data collection is not always about asking people questions and recording their answers. As the field of evaluation has grown broader and more wide-ranging, the data collection field has created progressively more sophisticated tools for various purposes.
Khulisa uses ArcGIS, an app based on geographic information system (GIS) mapping, to create interactive maps to track human trafficking hotspots. ArcGIS allows us to create heat maps with locations like high-density areas, areas with high concentrations of immigrants, places where sex workers tend to congregate, etc., where the likelihood of trafficking is higher.
Heatmaps are a useful tool to aggregate different data sources for the purpose of either demonstrating how funders can allocate resources for a given project, or to determine how a project might sample the population,” noted Thembi Mahlangu, Stakeholder Manager for Khulisa’s Measures for Countering Trafficking in Persons in South Africa (MCTIP) project. “The heat map is really talking to hotspots, to say, ‘Where is your highest risk area where human trafficking can occur, and how do we respond to that so that we have interventions that can help those people?’”
5) What are the challenges?
Some evaluations involve challenges – difficult terrain, political instability, etc. – that require major considerations in the data collection design process. In 2020, the explosion of the COVID-19 pandemic forced evaluators around the world to reconfigure their data collection strategies.
When South Africa’s pandemic lockdown began, the Khulisa team realized it would not be able to conduct in-person interviews of school principals and staff for the Early Grade Reading project. So the team turned to Geopoll to conduct interviews via CATI. Khulisa has also used Viamo, a company specializing in mobile communication, for CATI data collection.
Using a company like Geopoll, although it’s more costly than most of the methods described above, was an extremely effective way to collect data during a time when meeting in-person was impossible. A company like Geopoll also offers the added benefit reducing the burdens of data management, quality assurance, language translation, etc. “[Geopoll] can do this in multiple countries, multiple provinces, multiple languages, and all of the data will come to us in one database,” says Leticia. “And then when we get the data it will all be translated into the language that we want.”
Data science in data collection?
To collect quality data for data science, and to provide meaningful, evidence-based insights in an evaluation, we must bring a rigorous approach to how we select tools that are fit for a particular evaluation. These are some of the critical questions we need to ask ourselves when selecting a tool.
The data collection tools and methods that we’ve described here are just a small sample of what’s available. There are numerous tools available, which can be used together in thousands of designs. These are just a few of the designs that have worked well for Khulisa recently.
The Big Question: What is the role of artificial intelligence (AI) in data collection and analysis?
AI technology has recently exploded, and there are many new AI programs and applications available to speed up analysis and writing. But the data science considerations discussed above are still crucial when designing data collection processes and activities. AI can speed up processes, but it doesn’t possess the same contextual information as human evaluators. The job of an evaluator is to make value judgements, and AI doesn’t have values. The culture and context of the people from whom you are collecting the data is vital to analyze the data accurately.
Khulisa will continue to monitor new AI going forward to determine how to best use this evolving technology in data collection. We recently hosted a webinar discussing the science behind data collection and the role that AI might play. Watch the recording of this webinar here.
Keep an eye on Khulisa’s #EvalTuesdayTips section for more about our favorite data collection methods and designs.
Visit our YouTube page for the full gLOCAL presentation on this topic.