A NSW Government website

NARCliM data processing, testing and validation

NARCliM data processing, testing and validation

Key points

  • The NARCliM project produces an extensive climate dataset, most of which is in the form of raw data, which are used primarily by the NARCliM team and technical stakeholders.
  • The post-processed data are more accessible to end-users and underpin the NARCliM information products such as the regional climate change snapshots.
  • For some metrics, such as temperature and rainfall, NARCliM develops bias corrected datasets to better align with past measurements after the models have been simulated.
  • All NARCliM data undergo rigorous testing and validation to ensure the modelling outputs are complete.

Processing the data

NARCliM produces data for a vast time period which includes both future and past climates (see NARCliM modelling methodology).

For each generation, the NARCliM project produces climate data in 3 forms:

  • Raw data, which are the projections and hindcasts directly produced by the modelling process. The raw NARCliM datasets are exceptionally large. For example, the complete NARCliM2.0 dataset is approximately 15 petabytes, which is too large for most users to interact with on standard desktop computers.
  • Post-processed data, which summarise the raw data into time-based (temporal) categories. The post-processed data are more accessible to end-users and underpin NARCliM products such as the regional climate change snapshots.
  • Bias corrected data, which have been through a process of reducing bias by comparing modelled data for a historical time period with observed climate records, and then correcting the modelled data until the data aligns as closely as possible with these measurements. The bias correction process occurs after the post-processing of climate model outputs.
Raw data

Following comprehensive research and development, NARCliM data are first delivered as raw outputs. At 15 petabytes, NARCliM2.0 is currently the single biggest raw dataset in Australia. This amount of data is too large for most purposes so, before it is released, the raw data for each generation is processed to make it more manageable.

Post-processed data

Most end-users interact with post-processed NARCliM data, which are grouped into manageable chunks for further use. This temporal summarising (temporal resolution or frequency) is available sub-daily (1 hour, 3 hour, 6 hour), daily, monthly and seasonal averages calculated from the raw NARCliM data. Depending on the generation, daily, monthly and seasonal averages are available via the NSW Climate Data Portal.

NARCliM2.0 provides post-processed data for approximately 150 climatic variables across core variables including rainfall, temperature, humidity, air pressure and wind. NARCliM1.0 and NARCliM1.5 variables are listed in Table 6 in the NARCliM1.5 Technical Methods Report; a selection of these are publicly available through the NSW Climate Data Portal

Post-processing is also conducted to combine the outputs of multiple GCMs and RCMs into a multi-model average. Scientific research assessing climate models has shown that using the multi-model average gives a better representation of Earth’s climate. This approach has been widely adopted by the Intergovernmental Panel on Climate Change and others. It also reduces the influence of model bias (see below) on conclusions about current and future climates.

Bias corrected data

Climate models provide projections of future climate changes that are governed by the fundamental principles of the climate system. A benefit of this approach is that future changes in all climate variables (such as temperature or rainfall) are simulated together in a physically consistent way. However, a drawback is that the models often display biases (or systematic errors) in their modelled climates.

These biases can make it difficult to use the data for any impact assessment that is sensitive to non-linear changes in climate – for example, where temperature thresholds are more frequently exceeded.

The bias correction process involves comparing climate model data with observed climate records for a historical time period and adjusting the model until the data are sufficiently aligned. For technical guidance on using bias-corrected NARCliM1.0 data, see NARCliM Technical Note 3. The same bias-correction process was used for NARCliM1.5 and is explained in the NARCliM1.5 Technical Methods Report.

For each NARCliM generation, this bias corrected dataset has been developed for only a subset of outputs: daily precipitation, and maximum and minimum daily temperature. These variables are the most widely used in exploration, model evaluation and the derivation of metrics to characterise some physical climate hazards, such as extreme rain.

Data testing and validation

NARCliM data undergo a rigorous quality control process and scientific technical peer review. Models are tested and validated by running a ‘reanalysis simulation’ for a historical period on the same NARCliM grid (geographical area) and comparing the results with observed climate records. The more closely the model data align with the observed data, the better the model performance.

Quality assurance for NARCliM data has 2 parts.

  • Technical quality assurance – for each variable of both raw and post-processed data, output files are assessed for file appearance, metadata information and other technical system checks. This ensures that individual data files are not corrupt, include all necessary metadata and that no data loss has occurred during processing stages or during file transfer among the different file storage systems.
  • Scientific quality assurance – for NARCliM2.0 data, the key and most frequently accessed variables (primarily surface temperature and rainfall) are individually evaluated as they become available. In addition, all NARCliM2.0 outputs also undergo a more sophisticated, semi-automated approach. This considers the statistics of each variable to flag potential unreasonable results, which DCCEEW climate scientists then assess on a case-by-case basis.