Data quality
Note: Currently, this page is under construction...
Data quality in the social sciences is often discussed under the headings of validity and reliability when talking about concepts, measurement and data. However, few discuss transparency and data quality, and there is no consistent understanding what data quality means in the social sciences. In their famous book, King et al. (1994, 24) set out five guidelines for "improving data quality" encompassing
- valid measures,
- reliable data collections,
- replicable analyses,
- a thorough documentation of the data generating process, and
- to "collect data on as many of [a theory’s] observable implications as possible".
Nowadays, much more data is available posing new challenges, though; counterfeit data, typos, and unintended mistakes in the data collection, for example, pose challenges beyond invalid and unreliable measures. For this reason, we opted for a broader concept based on data quality definitions commonly used in information systems design as a starting point (cf. Pipino et al. 2002).
References
- King, Gary, Robert O. Keohane, and Sidney Verba. 1994. Designing Social Inquiry: Scientific Inference in Qualitative Research. Princeton: Princeton University Press.
- Pipino, Leo L., Yang W. Lee, and Richard Y. Wang. 2002. "Data quality assessment." Communications of the ACM 45 (4): 211–18.