File upload guide

From WeSISpedia
Revision as of 10:40, 16 June 2020 by Nils Duepont (talk | contribs) (Circumventing "typical" issues)
Jump to: navigation, search

This guide walks you through the process of uploading data to WeSIS.

Warnings: Currently, we are only saving monadic data. We are still working on processing dyadic data. You can test the uploading validations for dyadic files, but the data won't be saved in the database. Furthermore, some country codes are not accepted yet.

Please use the "Discussion" tab of this page to discuss further details.

Before you start

When you’ve decided that your data is finally ready to be uploaded to WeSIS please make sure that you have:

  1. Created an indicator page on WeSISpedia. You should use the following template. Here is an example indicator page to give you an idea of what the page structure should look like. Please make sure that all relevant fields in the info box are filled out, since this is important for data validation.
  2. Added all new indicators to the appropriate topic index pages on WeSISpedia. WeSIS only recognizes indicators that exist on these pages and have the mandatory columns filled out.
  3. Formatted your files according to the provided templates as described in File formats. Please note that the system accepts .csv,.xls and .xlsx. Though uploading .xls files is not recommended.
  4. Created a user account in WeSIS. Even if you participated in a previous testing session, this is the first time that WeSIS is online. Therefore, everyone needs to create a new account.

Circumventing "typical" issues

There are a couple of known issues that you can check already when preparing the data to avoid validation errors. This list is by no means exhaustive; it rather describes “typical” problems that occurred in the past and solutions to it.

Issue: I got a Multiple_triples error...

Answer: For every indicator there can only be one country-year observation. This error occurs if, for example, "New Zealand 2000" occurs twice.
Solution: Use one of the code snippets for R or Stata for identifying such rows that are stored in Seafile > WeSIS > Script-Templates_WeSIS > duplicate triple identification.

Issue: In Stata, I use export delimited using abcd.csv to write the outfile for WeSIS but the validation gives a warning...

Answer: For some unknown reason Stata seems to write bad csv-files at times.
Solution 1: Use export excel using abcd.xlsx (surprisingly this works better).
Solution 2: Open the csv file in Libre/Open Office, save it and re-upload it.

Issue: WeSIS cannot validate a csv-file written by R...

Answer 1: As a default, write.csv (base R) includes a first column with the index number ("row number") which screws the validation (write_csv from the tidyverse does not).
Solution: Explicitly tell R to not write this column.
Answer2 : As a default, write.csv does not use "UTF-8" as encoding (write_csv does) which may cause errors regarding comma and dot as decimal separator and/or string enclosing
Solution: Explicitly tell R to use "UTF-8"
The following lines are working: write.csv(df, file = "name_out.csv", row.names = FALSE, fileEncoding = "UTF-8") or write_csv(df, path = "name_out.csv")

Issue: I got a Scale/Value_mismatch error for metric data with decimals while working with Excel...

Answer: WeSIS accepts a dot as the decimal separator. A "German Excel" in particular uses a comma as a default.
Solution 1: Switch it in the Excel options (in a "German Excel" go to Datei > Optionen > Erweitert > Untick "Trennzeichen vom Betriebssystem übernehmen" and insert a dot as Dezimaltrennzeichen and comma as Tausendertrennzeichen).
Solution 2: Format the column as "text" and replace the comma with a dot.
Solution 3: Use Libre or Open Office especially for csv-files...

Issue: I got an error regarding date formats while working with Excel...

Answer: The WeSIS standard is YYYY.MM.DD. Excel thinks it's smart and converts and displays whatever it thinks is a date in line with your system's default once you open a file (which for a "German" OS usually is DD.MM.YYYY).
Solution 1: Format the column as "text" to keep the WeSIS standard.
Solution 2: Use Libre or Open Office especially for csv-files...

Issue: WeSIS gives me CountryName_mismatch error, but I am sure there is an issue with the entity list...

Solution: Get in touch with A01, particularly Nils, to discuss the issue.

WeSIS says there is a mismatch between a country name and the COW code...

Answer: WeSIS relies on COW, and COW defines the codes and names – which is relevant for the validation as there needs to be a standard. Checking both the name and the code prevents errors (!) and wrong assignments of data points.
Solution: Stick to the names provided in the entity list.

Issue: What about entities that were part of others in general? Or about "Yugoslavia" in particular?

Answer: COW has unique codes and defined relations for its entities. Thus, COW allows for and WeSIS accepts data for the entire time period, but you have to ensure that the data is attributable to the focal entity. Some fuzziness is unavoidable, e.g. does the data point for 345 in 1968 refer to "core" Serbia as part of the federation or to "Yugoslavia" in general?
Solution: Make use of an optional column "comment" in the template and WeSISpedia to document the "proper" assignment. Think about providing two indicators – a "raw" and an "imputed" one – if it is (a) justifiable and (b) methodologically sensible to "impute" or "disaggregate" the data.

Steps Outline

In this introduction, you'll see a quick demonstration of the upload process. (gif or video to be added). Or you can read the text below.

caption

The upload process consists of two main steps.

Step 1: Upload dataset files

At this step you can select or simply drag-n-drop your file into the upload area and press "SAVE".

Note: Big files may take a while to be validated (~2-5 minutes).

Step 2: Preview and Data Validation

At this step you are presented with a file overview, where you can:

  1. Change the file format to be used between "monadic" or "dyadic" (however, as mentioned above, we do not recognize the dyadic one yet)
  2. See all recognized mandatory columns, technical indicator names, optional columns, countries and year values fetched from the file by hovering over the appropriate "See full list" fields.
  3. Go back to the WeSIS home page by clicking the "Back to homepage" button.
  4. Access this upload guide by using the "Upload Guide" link on the bottom right.
  5. Open the file preview page, which shows the uploaded data with color-coded cells for different types of validation errors if present.
  6. Upload a new updated file to the system by clicking the "Reupload File" button, which brings the user back to step 1.

On the right, you can see the validation logs output and have the option to download either your .csv file updated with a special column indicating the rows with errors or the logs themselves as a .txt file.

If the file passed all validation checks, the Parsing Logs box will be empty. Click on the "Upload" button to complete the upload process.

Uploading the validated data to the database may take a few hours, especially with files over 1000 rows. Please check tomorrow if the indicator values are visible in WeSIS. If they are not, contact an A01 project member.

Error logs

At the moment the system performs the following types of error checks:

  1. Missing_columns - checks for mandatory columns missing in the uploaded file.
  2. Multiple_triples - validates that there is no more than one entry for each (cow_code, year, technical_variable_name) triple.
  3. Invalid_datatype - checks whether cells are of type desribed in File formats section, i.e. Numeric, String, Binary, Datetime or other.
  4. Unrecognized_values - checks whether the following column values: country, cow_code, technical_indicator_name, scale exist in database. If not, the user should update the appropriate WesSISPedia page.
  5. TechName/Scale_mismatch - checks whether the scale documented in WeSISPedia for each technical indicator agrees with the scale in the file.
  6. Scale/Value_mismatch - checks whether the scale chosen in the file agrees with the scale of the actual value.
  7. CowCode/CountryName_mismatch - checks whether the country_name and cow_code pairs agrees with country_name and cow_code pairs documented in WeSISPedia.

The logic used to validate data is shown in the flowchart below.

Mind Map.jpg

File preview

There are two options when previewing the data. Each can be toggled on or off by pressing the appropriate buttons located above the table on the left.

  • "Show/Hide All Rows" allows the user to show/hide all rows from the uploaded file. By default, the table in the data preview shows only the rows with errors.
  • "Show/Hide Optional Columns" allows the user to show/hide non-mandatory columns if present.

Please keep in mind that the row numbering starts on the first row with data. Therefore, the numbering may not perfectly match the numbering of your file.