File formats
This page describes the file formats suitable and accepted for data upload to WeSIS as of November 2023.
Two templates - one for monadic and one for dyadic data - are available to CRC researches in the CRC shared Nextcloud folder "WeSIS Phase 2 > Data Collection > Data Templates" (each as a csv- and a xlsx-file). Both templates are approved and ready to be applied.
After a longer process, a conclusion was reached how to mark data quality.
In addition, a new way of flagging "missing values" was introduced.
Accordingly, the template was changed two include two new mandatory columns, namely user_na_values
and plausibility_concerns
.
The former columns data_quality
and data_quality_confidence
were marked as deprecated (they are now optional in order to retain old data in the database; and you can still access the documentation for the old template).
Contents
General remarks
When preparing the upload file, the first row - and only the first row (!) - can be used for any kind of additional information (coder information, project notes, last updated etc.). The line must start with the hashtag character # (as if it was a comment in R).
Many coding rules have been agreed upon and shall be followed. Please be aware of these rules and regularly check the site to see if further decisions have been added. These rules encompass among others country and IO names and codes, date time formats, technical variable names and other aspects that affect the data collection, ensure consistency and reduce error.
Both templates include
- mandatory and standardized columns for the data per se,
- mandatory and standardized columns for metadata, and
- (unlimited) optional columns for additional information regarding the data.
Standards for optional column names will evolve applying the principle of "first come, first serve" that was agreed upon. If you want to use optional columns,
- check the page on optional column names if any other project already made a suggestion that fits your needs.
- If a name already exists and you find the description suitable to file your information under this heading, please apply the existing one.
- If there is no name that suits the information you want to store, create a new name and describe the purpose of the column on the page so others may use it later on.
Mandatory columns for the monadic data template
Column name | Column number | Standardized? | Type | Description | Comments |
---|---|---|---|---|---|
cow_code | 1 | yes | Numeric | Country code according to the COW scheme | See the list of country and IO codes used for WeSIS and WeSISpedia. |
country_name | 2 | yes | String | Country name | See the list of country and IO codes used for WeSIS and WeSISpedia. |
year | 3 | yes | Numeric | The year value refers to | Note that a date, like election dates or the date of introduction of a policy, is a value. year, thus, may "double" the value. |
technical_variable_name | 4 | partly | String | Technical name of an indicator | Make sure to follow the naming convention for technical names in WeSIS and WeSISpedia. |
value | 5 | yes | Numeric, string or date | The actual value of the indicator | Always use the dot (.) as the decimal separator for numeric data. For dates follow the date time format. |
user_na_values | 6 | yes | Symbol | Flag for "missing data" | In principle, you can use an exclamation mark (!) to flag any value(s) as "missing data" . Certain codes are strongly recommended, though, and the values have to be of the same type as value . The default is a blank cell/column.
|
unit | 7 | yes | String | The unit of value | For dates the unit is "date", otherwise name the actual unit of the value (e.g. "per 1000 inhabitants", "as % of GDP" etc.) |
scale | 8 | yes | String | The scale of value | Only one of six possible values is allowed here:
|
source | 9 | partly | String | The source of value | If value derives from a common source please check if there is already a harmonized abbreviation. |
publication_date | 10 | yes | Date | Date of data collection and/or upload to WeSIS | This date refers to the data collection, the last check or the upload, i.e. it does not refer to year or date of value. Yet, the common date time format applies accordingly. |
category | 11 | partly | String | Name of the category an indicator belongs to | Category refers to the upper level an indicator belongs to ("parent topic", i.e., one of either a policy field, domestic condition or relation). |
label | 12 | no | String | Label of the (technical) category name | This is an easy-to-read label of the indicator. It is the same as "indicator name" in WeSISpedia and label in the Quick info box of an indicator page in WeSISpedia. |
plausibility_concerns | 13 | yes | String | Data quality aspect | Responses to the question "Are there notable concerns about the plausibility of this data point?"; only one of six answers is allowed here:
|
1 Note that binary is used as a strict boolean type with allowed values of 0/1
or true/false
; if you have a binary indicator but want to use "user-defined missing values", you have to choose "Multinomial" as scale
.
2 If multiple concerns are raised for the focal value, only the most prevalent one is coded. If you choose "Other" as the response, it is recommended to provide an explanation in the optional column comment
.
Mandatory columns for the dyadic data template
Column name | Column number | Standardized? | Type | Description | Comments |
---|---|---|---|---|---|
cow_code_sender | 1 | yes | Numeric | Country code of the "sender" according to the COW scheme | "Sender" is used here synonymously to any node in undirected networks. See the list of country and IO codes used for WeSIS and WeSISpedia. |
country_name_sender | 2 | yes | String | Country name of the "sender" | See the list of country and IO codes used for WeSIS and WeSISpedia. |
cow_code_receiver | 3 | yes | Numeric | Country code of the "receiver" according to the COW scheme | "Receiver" is used here synonymously to any node in undirected networks. See the list of country and IO codes used for WeSIS and WeSISpedia. |
country_name_receiver | 4 | yes | String | Country name of the "receiver" | See the list of country and IO codes used for WeSIS and WeSISpedia. |
year | 5 | yes | Numeric | The year value refers to | Note that a date, like election dates or the date of introduction of a policy, is a value. year, thus, may "double" the value. |
technical_variable_name | 6 | partly | String | Technical name of an indicator | Make sure to follow the naming convention for technical names in WeSIS and WeSISpedia. |
value | 7 | yes | Numeric, string or date time | The actual value of the indicator | Always use the dot (.) as the decimal separator for numeric data. For dates follow the date time format. |
user_na_values | 8 | yes | Symbol | Flag for "missing data" | In principle, you can use an exclamation mark (!) to flag any value(s) as "missing data" . Certain codes are strongly recommended, though, and the values have to be of the same type as value . The default is a blank cell/column.
|
unit | 9 | yes | String | The unit of value | For dates the unit is "date", otherwise name the actual unit of the value (e.g. "per 1000 inhabitants", "as % of GDP" etc.) |
scale | 10 | yes | String | The scale of value | Only one of six possible values is allowed here:
|
source | 11 | partly | String | The source of value | If value derives from a common source please check if there is already a harmonized abbreviation. |
publication_date | 12 | yes | Date | Date of data collection and/or upload to WeSIS | This date refers to the data collection, the last check or the upload, i.e. it does not refer to year or date of value. Yet, the common date time format applies accordingly. |
category | 13 | partly | String | Name of the category an indicator belongs to | Category refers to the upper level an indicator belongs to ("parent topic", i.e., one of either a policy field, domestic condition or relation). |
label | 14 | no | String | Label of the (technical) category name | This is an easy-to-read label of the indicator. It is the same as "indicator name" in WeSISpedia and label in the Quick info box of an indicator page in WeSISpedia. |
plausibility_concerns | 15 | yes | String | Data quality aspect | Responses to the question "Are there notable concerns about the plausibility of this data point?"; only one of six answers is allowed here:
|
1 Note that binary is used as a strict boolean type with allowed values of 0/1
or true/false
; if you have a binary indicator but want to use "user-defined missing values", you have to choose "Multinomial" as scale
.
2 If multiple concerns are raised for the focal value, only the most prevalent one is coded. If you choose "Other" as the response, it is recommended to provide an explanation in the optional column comment
.
Misc
Contributors: Nils Düpont, Sebastian Haunss, Gabriela León, Gabriella Skitalinska, Nate Breznau, Mohamad Tofayel Ahmed
Revisions:
- CRC internal release of templates on March 06, 2019
- Update of the template on November 06, 2023