SAPRIN.SMHDP2022V1
SAPRIN Mental Health Data Prize 2022
Name | Country code |
---|---|
South Africa | RSA |
Demographic Surveillance
This dataset contains demographic surveillance data covering the period from 1 Jan 1993 to 30 Apr 2022.
SAPRIN (South African Population Research Infrastructure Network) is a network of health and demographic surveillance sites in South Africa that consists of five Health and Demographic Surveillance System (HDSS) nodes located in South Africa.
Between them, the nodes follow more than 75 000 households (320 000 individuals) longitudinally through regular surveillance visits. Shortly after the start of the Covid-19 pandemic, SAPRIN implemented a shared Covid-19 surveillance programme in the MRC/Wits University Agincourt HDSS in Bushbuckridge District, Mpumalanga, established in 1993; the University of Limpopo DIMAMO HDSS in the Capricorn District of Limpopo, established in 1996, and and the Africa Health Research Institute (AHRI) HDSS in uMkhanyakude District, KwaZulu-Natal, established in 2000. This Covid-19 surveillance is still being conducted in these three SAPRIN nodes.
As part of the Covid-19 surveillance, the PHQ-2 and GAD-2 screening questions were administered to household respondents in Agincourt, DIMAMO and AHRI. By the end of 2021, a total of 90 000 such interviews were conducted, with approximately 12 000 interviews in the target group of 14-24-year-olds. Although the PHQ-2 and GAD-2 questions on their own are not likely to be of interest to the Mental Data prize participants, several factors make this dataset to be of greater interest:
The interviews can be directly linked to the detailed longitudinal surveillance data in the three nodes, providing interesting contextual data to this set of observations on depression and anxiety during the span of the covid epidemic from its start across several epidemic waves of infection in South Africa.
These contextual data include:
The Covid-19 specific interviews were conducted from May 2020 and are still ongoing, with more than one interview with some participants at different points in time, allowing for the analysis of temporal effects.
The SAPRIN Mental Health Data Prize 2022 datasets consists of six types of the Demographic surveillance datasets :
SAPRIN Individual exposure episodes. This dataset splits the basic surveillance episodes at calendar year-end and at the date when the age in years (birthday) of an individual change. In the case of women who have given births, episodes are split at the time of delivery as well.
SAPRIN Individual status observations. This dataset consists of status observations such as education, employment, employment and partnership status of an individual, that recur at more or less regular interval per individual over the study period.
SAPRIN household status observations. This dataset consists of socio-economic status observations for a household. This data is collected from a household proxy respondent, preferably the head of household or any next available senior adult resident household member at more or less regular interval over the duration of the study.
SAPRIN household asset status observations. This dataset consists of asset status observations for a household. This data is collected from a household proxy respondent, preferably the head of household or any next available senior adult resident household member at more or less regular interval over the duration of the study.
5.SAPRIN individual COVID-19. This dataset consists of Covid-related status observations pertaining to Covid-19 diagnosis, vaccination status, attitudes to vaccination and the PHQ-2 and GAD-2 mental health related questions.
6.SAPRIN household COVID-19. This dataset consists of Covid-19 related household level status observations, household awareness, and impact of Covid-19 control measures on the household.
Event history data
Individual and household interviews
v1: Dataset for public distribution.
2022-08-01
v1: Dataset for public distribution.
Each record in the exposure dataset represents a period of observation for an individual during which all the recorded characteristics of the individual stay constant. For example, on the birthday of the individual a new episode will start, because the age of the individual has changed. Any change in one of the status values, such as education or marital status, will likewise result in a new episode on the date of the change. For the COVID-19 data, the questionnaire included both household-level and individual-specific questions, the latter of which could be directly addressed by other household members if they were present. The primary respondent acted as a proxy in all other cases. COVID-19 symptom screening was included in the questionnaire.
Topic |
---|
Mental Health, Covid-19 |
SAPRIN (South African Population Research Infrastructure Network) is a network of health and demographic surveillance sites in South Africa that consists of five Health and Demographic Surveillance System (HDSS) nodes located in South Africa, namely: 1) MRC/Wits University Agincourt HDSS in Bushbuckridge District, Mpumalanga, which has collected data since 1993. The nodal website is http://www.agincourt.co.za. 2) the University of Limpopo DIMAMO HDSS in the Capricorn District of Limpopo, which has collected data since 1996.The nodal website is: N/A. 3) the Africa Health Research Institute (AHRI) HDSS in uMkhanyakude District, KwaZulu-Natal, which has collected data since 2000. The nodal website is http://www.ahri.org. 4) the Gauteng Research Triangle Initiative for the Study of Population, Infrastructure and Regional Economic Development (GRT-INSPIRED) in Hillbrow, Johannesburg, and Atteridgeville and Melusi, Tshwane, Gauteng. The nodal website is: N/A. 5) and the Cape Town Surveillance through Healthcare Action Research Project (C-SHARP), Nomzamo and Bishop Lavis, Cape Town, Western Cape. The nodal website is: N/A.
Eligibility covered all individuals who are between the ages of 14 and 24yrs on the 1st of January 2020 and resident within each of the three SAPRIN nodal sites. Residence was defined as intention to sleep the majority of time at the dwelling in these areas over a four-month period
Name | Affiliation |
---|---|
Prof Mark Collinson | SAPRIN |
Dr Kobus Herbst | SAPRIN |
Prof Steve Tollman | Agincourt |
Prof Eric Maimela | DIMAMO |
Prof Willem Hanekom | AHRI |
Name | Affiliation | Role |
---|---|---|
Molulaqhooa Linda Maoyi | SAPRIN | Technical Assistance |
Tinofa Mutevedzi | SAPRIN | Technical Assistance |
Chodziwadziwa Kabudula | Agincourt | Technical Assistance |
Joseph Tlouyamma | DIMAMO | Technical Assistance |
Dickman Gareta | AHRI | Technical Assistance |
Name | Role |
---|---|
Department of Science and Innovation | Current Funder |
Name | Affiliation | Role |
---|---|---|
Agincourt Data Team | Agincourt | Providing Data |
DIMAMO Data Team | DIMAMO | Providing Data |
AHRI Data Team | AHRI | Providing Data |
Steve Tollman | Agincourt | |
Eric Maimela | DIMAMO | |
Willem Hanekom | AHRI | |
Centre for High Performance Computing | Centre for High Performance Computing | Providing IT Infrastucture for Data Processing |
All individuals who are between the ages of 14 and 24yrs on the 1st of January 2020. All exposure episodes of these individuals from the start of exposure to the 30th of April 2022 are included in the dataset.
The data on this Repository is not the result of a single questionnaire but is a result of harmonised data from three different sites longitudinally collected over more than twenty years using different questionnaires that varied over time and site.
Start | End | Cycle |
---|---|---|
1993-01-01 | 2022-04-30 | Agincourt |
1996-01-01 | 2022-04-30 | DIMAMO |
2000-01-01 | 2022-04-30 | AHRI |
2017
Start date | End date | Cycle |
---|---|---|
1993-01-01 | 2022-05-31 | Agincourt |
1996-01-01 | 2022-05-31 | DIMAMO |
2000-01-01 | 2022-05-31 | AHRI |
In all the HDSS nodes, data are collected from a household proxy respondent, preferably the head of household or any next available senior adult resident household member, after informed consent was obtained by trained fieldworkers. Respondents are informed of the purpose and confidentiality of the interview, their right to refuse participation or withdraw from the study, and that scientists would be given access to anonymised data to analyse and publish information. Informed consent was verbal in all HDSS nodes until 2016. Written informed consent started in 2017 in AHRI, and 2018 in DIMAMO and 2019 in Agincourt. Until 2016 for Agincourt and AHRI, and 2017 for DIMAMO, data collection was field-based 'paper and pen' personal interviews (PAPI), before changing to field-based computer-assisted personal interviews (CAPI). Since 2019, all SAPRIN HDSS nodes collect data in 3 annual rounds over a 45-week data collection schedule; one field-based CAPI round, sandwiched on either side by a Call-Centre-based computer assisted telephonic interview (CATI), to create 3 data points at an interval of approximately 4 months in each calendar year. In the past HDSS nodes had different data collection frequencies. AHRI data collection was 2 PAPI rounds per year from inception to 2011, changing to 3 PAPI rounds per year between 2012 and 2016, before becoming 1 PAPI round and 2 CATI rounds from 2017. Agincourt and DIMAMO have been collecting data once annually in a census-type format, over 4-5-month period until 2018.
The first step in the data preparation process is quality assurance. The SAPRIN Management hub team assess the data submitted to ensure it is in the correct format and falls within expected value ranges. Other potential issues checked include: missing data, incorrect data types, unexpected duplicate or orphan records. The SAPRIN Management hub assess this conversion by running both original operational database and the SAPRIN database created from the operational database through the iSHARE data quality assessment and indicator process. The data quality checking process is conducted using Pentaho Data Integration (PDI). PDI provides the Extract, Transform, and Load (ETL) capabilities that facilitates the process of capturing, cleansing, and storing data using a uniform and consistent format that is accessible and relevant to end users. The principle of the data quality checks is that if the data conversion conducted by the nodes was complete and accurate, there should be little or no difference in the data quality and demographic indicators between the base and SAPRIN versions of the nodal data. If the data submitted by the nodes meets the criteria for inclusion into the consolidated dataset the data moves to the second step of the data production process. However, if the data fail the inclusion checks, this could then lead to another iteration of data submission and quality control checks until SAPRIN Management hub is satisfied that they have high quality data.To produce this final standard dataset, the data is processed using PDI on the Centre for High Performance Computing cluster.
Not Applicable
Name | Affiliation | URL | |
---|---|---|---|
Kobus Herbst | SAPRIN | http://saprin.mrc.ac.za/ | kobus.herbst@mrc.ac.za |
Molulaqhooa Linda Maoyi | SAPRIN | http://saprin.mrc.ac.za/ | linda.maoyi@mrc.ac.za |
This data is made available for access under the following conditions:
1)The data and other materials provided by SAPRIN will not be redistributed or sold to other individuals, institutions, or organizations without the written agreement of SAPRIN.
2)The data will be used for statistical and scientific research purposes only. They will be used solely for reporting of aggregated information, and not for investigation of specific individuals or organisations. The Data User will neither use nor permit others to use the data in any way other than listed in the original application (Analysis Plan) for access to the dataset.
3)No attempt will be made to re-identify respondents, and no use will be made of the identity of any person or establishment discovered inadvertently. Any such discovery should immediately be reported to SAPRIN.
4)No attempt will be made to produce links among datasets provided by SAPRIN, or among data from SAPRIN and other datasets that could identify individuals or organizations.
5)The Data User will ensure that the data are kept in a secured environment and that only authorized users have access to the data.
6)Any books, articles, conference papers, theses, dissertations, reports, or other publications that employ data obtained from SAPRIN will cite the source of data in accordance with the Citation Requirement provided with each dataset.
7)An electronic copy of all reports and publications based on the requested data will be sent to SAPRIN.
8)The original collector of the data, SAPRIN, and relevant funding agencies bear no responsibility for use of the data or for interpretations or inferences based upon such uses.
9) Once the data set has served its indicated purpose it must be destroyed. If the dataset needs to be lodged for publication purposes, a reference (a digital object identifier will be maintained by SAPRIN for this purpose) to the original dataset on the SAPRIN data repository should be used. Derived or aggregated datasets produced from the original dataset do not fall within this provision and may be lodged as publication datasets. If the same dataset is needed for a different purpose, the dataset should be re-requested and the new purposes indicated.
Maoyi, ML; Herbst, K; Collinson, M; Mutevedzi, T (2022): SAPRIN Mental Health Data Prize 2022: Individual exposure episodes dataset. South African Population Research Infrastructure Network. https://doi.org/10.23667/SAPRIN.SMHDPIEE2022
Maoyi, ML; Herbst, K; Collinson, M; Mutevedzi, T (2022): SAPRIN Mental Health Data Prize 2022: Individual status observations dataset. South African Population Research Infrastructure Network. https://doi.org/10.23667/SAPRIN.SMHDPISO2022
Maoyi, ML; Herbst, K; Collinson, M; Mutevedzi, T (2022): SAPRIN Mental Health Data Prize 2022: Household status observations dataset. South African Population Research Infrastructure Network. https://doi.org/10.23667/SAPRIN.SMHDPHHS2022
Maoyi, ML; Herbst, K; Collinson, M; Mutevedzi, T (2022): SAPRIN Mental Health Data Prize 2022: Household asset status observations dataset. South African Population Research Infrastructure Network. https://doi.org/10.23667/SAPRIN.SMHDPHHAS2022
Maoyi, ML; Herbst, K; Collinson, M; Mutevedzi, T (2022): SAPRIN Mental Health Data Prize 2022: Individual COVID-19 dataset. South African Population Research Infrastructure Network. https://doi.org/10.23667/SAPRIN.SMHDPICS2022
Maoyi, ML; Herbst, K; Collinson, M; Mutevedzi, T (2022): SAPRIN Mental Health Data Prize 2022: Household Covid-19 dataset. South African Population Research Infrastructure Network. https://doi.org/10.23667/SAPRIN.SMHDPHHCS2022
The user of the data acknowledges that the original collector of the data and the relevant funding agencies bear no responsibility for the data's use or interpretation or inferences based upon it.
This dataset documentation is licensed under a Creative Commons Attribution-Non Commercial 4.0 International License. The dataset is shared in terms of the data-use agreement accepted at the time of data download.
Name | Affiliation | URL | |
---|---|---|---|
Kobus Herbst | SAPRIN | kobus.herbst@mrc.ac.za | http://saprin.mrc.ac.za/ |
Molulaqhooa Linda Maoyi | SAPRIN | linda.maoyi@mrc.ac.za | http://saprin.mrc.ac.za/ |
DDI.SAPRIN.SMHDP2022V1
Name | Affiliation | Role |
---|---|---|
Molulaqhooa Linda Maoyi | SAPRIN | Documentation of Study and Review of the metadata |
Kobus Herbst | SAPRIN | Documentation of Study and Review of the metadata |
Tinofa Mutevedzi | SAPRIN | Documentation of Study and Review of the metadata |
Mark Collinson | SAPRIN | Documentation of Study and Review of the metadata |
2022-08-07
Version 2 (August 2022)