How to encode data

The best way to integrate with the APPLICATE Data Portal at the data level is to serve data (observational or analysed/simulated) through the Open-source Project for a Network Data Access Protocol (OPeNDAP). OPeNDAP and the Network Common Data Format (NetCDF) both use the Common Data Model which simplifies data handling. When serving data through OPeNDAP, data must be encoded according to the Climate and Forecast convention (CF). Starting with version 1.6 of the CF conventions, standardised approaches to encoding:

  • gridded data
  • timeseries at stations
  • profiles at stations
  • trajectories
  • trajectories of profiles

are available. In the upcoming 1.8 version, support for geometries (e.g. information that currently is handled in KML or Shapefiles) will be added as well as better descriptions for satellite swath data. If datasets also are supporting the Attribute Convention for Dataset Discovery (ACDD), the APPLICATE Data Portal can generate discovery metadata directly from the data.

In this context it is worth noting that APPLICATE more concerned on how data are served than how data are stored. This means that data might be stored NetCDF/CF files, but also as WMO GRIB or in a relational database - provided data can be served as NetCDF/CF objects through OPeNDAP. This transformation from the storage format to the NetCDF/CF compliant OPeNDAP object is done by the application server serving the data. Data served directly by the APPLICATE Data Portal have to be encoded in NetCDF/CF. A number of open source tools facilitating data sharing using OPeNDAP is available and some listed below:

If your local data centre does not support this, data should be submitted to the APPLICATE data management system. Further information on how to do this is listed in a separate page.

Concerning formatting of data, NCAR/UCAR has created a nice overview of various file formats for climate data as well as samples on NetCDF encoding according to the Climate and Forecast convention. NetCDF has good support in Fortran, C, Python, Perl, Matlab, R etc. Sometimes it is referred to CMOR files. Basically, Climate Model Output Rewriter (CMOR) is software that helps users to rewrite existing data into NetCDF Climate and Forecast compliant files. The APPLICATE Data Portal also have software that can convert back and forth between GRIB and NetCDF/CF although this has not been tested on all flavours of WMO GRIB.

In order to check your data there is a number of compliance checkers available.

NetCDF/CF checkers:

Please remember to check against CF version 1.6 or higher.

If there is a demand, a compliance checker can be included in the APPLICATE data portal or the Post processing Environment. But for now it is recommended to use one of the public available ones

The APPLICATE Data Portal team will support partners in the encoding of information as NetCDF/CF. Not by doing the conversion, but by guiding the data producers.

Important issues to remember

In the APPLICATE Data Portal there is support for visualisation of datasets using OGC Web Map Service. For this to work properly and to able to show the temporal evolution of a dataset, datasets must be composed of physical files that can be aggregated by the OPeNDAP server and served as an aggregated dataset through OGC WMS. This implies that files that shall be aggregated cannot change structure through time. I.e. the dimensions, units, standard names etc of all variables have to remain the same for the full dataset. Variables that does not show up in all files will not be aggregated. Whether data is put in a few large files or many small files is left to the data producer to decide. The CMOR software creates one file per variable, and justifies this with:

Each file contains a single output variable (along with coordinate variables,attributes and other metadata) from a single model and a single simulation (i.e.,from a single ensemble member of a single climate experiment). This method of structuring model output efficiently serves the needs of most researchers who are typically interested in only a few of the many variables in the MIP databases. Data requests can be satisfied by simply sending the appropriate file(s) without first extracting the individual field(s) of interest.

This is not necessary as long as data are served through OPeNDAP. The APPLICATE Data Portal has functionality enabling the consumer to select the parameters wanted for download. This is the "transform" option on a dataset which exploits the functionality of OPeNDAP.

Summary

  1. Encode your data using NetCDF and the Climate and Forecast convention for use metadata.
  2. Add Attribute Convention for Dataset Discovery (ACDD) global attributes to ensure discovery metadata.
  3. Check your files using a compliance checker.
  4. If in doubt contact the APPLICATE Data Portal team and ask for guidance/support.