Data quality
Note: Currently, this page is under construction...
Data quality in the social sciences is often discussed under the headings of validity and reliability when talking about concepts, measurement and data. However, few discuss transparency and data quality, and there is no consistent understanding what data quality means in the social sciences. In their famous book "Designing Social Inquiry", King et al. (1994, 24) set out five guidelines for "improving data quality" encompassing
- valid measures,
- reliable data collections,
- replicable analyses,
- a thorough documentation of the data generating process, and
- to "collect data on as many of [a theory’s] observable implications as possible".
Nowadays, much more data is available posing new challenges, though; counterfeit data, typos, and unintended mistakes in the data collection, for example, pose challenges beyond invalid and unreliable measures. The structured documentation of each indicator (including meta data) in WeSISpedia, common coding rules and formatting standards alongside the resources provided in WeSIS, for example, public data sets or community notebooks, directly speak to King et al.'s suggestions. Still, we also opted for a broader concept based on data quality definitions commonly used in information systems design (cf. Pipino et al. 2002).
Concise and consistent representation and "free-of-errorness" are taken care of by the validation check performed when uploading data to WeSIS (for more details see the file upload guide). The check not only requires the data providers to comply to the agreed coding rules and file formatting standards but also checks for "common" mistakes like typos, country code--name or value--scale mismatches.
References
- King, Gary, Robert O. Keohane, and Sidney Verba. 1994. Designing Social Inquiry: Scientific Inference in Qualitative Research. Princeton: Princeton University Press.
- Pipino, Leo L., Yang W. Lee, and Richard Y. Wang. 2002. "Data quality assessment." Communications of the ACM 45 (4): 211–18.