CESM Coupled Data Assimilation

This documents our effort to configure CESM to run in various configurations to test concepts relevant to coupled data assimilation, including:

  • forced ocean-ice experiments, and

  • CESM B-compsets with active CAM6, CLM, CICE6 and POP2 components.

Compiling

Since DART is designed to minimize dependencies and maximize cross-platform compatibility, compiling DART on Cheyenne is trivial.

cd /glade/work/$USER
git clone https://github.com/NCAR/DART

Since Cheyenne is a Linux system with Intel processors, there is already a mkmf.template that works. Move it into place.

cd DART/build_templates
mv mkmf.template.intel.linux mkmf.template

Build the lorenz_63 model to ensure the setup is correct.

cd ../models/lorenz_63/work
./quickbuild.csh
[...]
Success: All single task DART programs compiled.
Script is exiting after building the serial versions of the DART programs.

Troubleshooting

Problems can arise if the DART executables weren’t built with the same compilers that were used to compile the netCDF library.

vim /glade/scratch/damrhein/GNYF.f09_g17_e80/run/da.log.2810502.chadmin1.ib0.cheyenne.ucar.edu.220209-231033

Error

Wed Feb 9 23:21:33 MST 2022 – BEGIN FILTER /glade/scratch/damrhein/GNYF.f09_g17_e80/bld/filter: error while loading shared libraries: libhwloc.so.15: cannot open shared object file: No such file or directory MPT ERROR: could not run executable. HPE MPT 2.19 02/23/19 05:31:12

Rebuilding the DART executables

As a first attempt at troubleshooting, try rebuilding the DART executables.

cd /glade/work/johnsonb/git/DAN_DART_2021-11-17/models/POP/work
./quickbuild.csh
./setup_CESM_hybrid_ensemble.csh
Building cesm with output to /glade/scratch/johnsonb/GNYF.f09_g17_e3/bld/cesm.bldlog.220214-161400
Time spent not building: 1286.737632 sec
Time spent building: 1622.594832 sec
MODEL BUILD HAS FINISHED SUCCESSFULLY
cd /glade/work/johnsonb/cases/GNYF.f09_g17_e3
./case.submit -M begin,end
[...]
Wait for run to finish
[...]
./CESM_DART_config.csh
./case.submit -M begin,end
[...]
Wait for run to finish
[...]
vim /glade/scratch/johnsonb/GNYF.f09_g17_e3/run/GNYF.f09_g17_e3.pop.dart_log.2010-01-04-00000.out

This rebuilt case ran successfully. Check to see if the DART state output is present.

cd /glade/scratch/johnsonb/GNYF.f09_g17_e3/run
ls *prior*
GNYF.f09_g17_e3.pop.output_priorinf_mean.2010-01-04-00000.nc
GNYF.f09_g17_e3.pop.output_priorinf_sd.2010-01-04-00000.nc
GNYF.f09_g17_e3.pop.preassim_priorinf_mean.2010-01-04-00000.nc
GNYF.f09_g17_e3.pop.preassim_priorinf_sd.2010-01-04-00000.nc
input_priorinf_mean.nc
input_priorinf_sd.nc
ls *output*
GNYF.f09_g17_e3.pop.output_mean.2010-01-04-00000.nc
GNYF.f09_g17_e3.pop.output_priorinf_mean.2010-01-04-00000.nc
GNYF.f09_g17_e3.pop.output_priorinf_sd.2010-01-04-00000.nc
GNYF.f09_g17_e3.pop.output_sd.2010-01-04-00000.nc

This output matches the stages_to_write entry in input.nml.

Getting started

Community Earth System Model (CESM) installation instructions are available via the README on the GitHub repository. The cube sphere grid is available as of CESM2.2.0.

Cloning and installing

Important

CESM has already been ported and should work “out of the box” on most of the supercomputers that are widely used in the geosciences community, including Pleiades. When compiling the model, ensure to set the machine command line option, --mach to match the supercomputer you are working on.

cd <installation_directory>
git clone https://github.com/ESCOMP/CESM.git cesm2_1_3
cd cesm2_1_3
git checkout release-cesm2.1.3
./manage_externals/checkout_externals

Commonly used grids

The CESM documentation includes a comprehensive list of grids. Typically, grids have descriptive long names when they are defined such as fv0.9x1.25 – which is the atmospheric finite volume ~1° grid – or gx1v7 – which is the seventh version of the oceanic displaced Greenland pole ~1° grid.

These long names are shortened when the atmospheric/land grids are coupled to the ocean/sea ice grid. Instead of fv0.9x1.25 and gx1v7, the shortened name becomes f09_g17.

Atmospheric grids

The workhorse atmospheric grid is the ~1° finite-volume f09 grid, which is used for CMIP experiments and the Large Ensemble. Other grids used for iHESP are the f05 ~0.5° finite volume grid and the f02 ~0.25° finite volume grid.

Cube sphere

As of CESM2.2.0, CAM supports a spectral element dynamical core (CAM-SE) on cube-sphere grids. Lauritzen et al. (2017) 1 list the available cube-sphere grids in their Table 1. A subset of their table is reproduced here.

Grid name

Average node spacing

Model timestep

ne16np4

∼208 km

1,800 s

ne30np4

∼111 km

1,800 s

ne60np4

∼56 km

900 s

ne120np4

∼28 km

450 s

ne240np4

∼14 km

225 s

The analog to the ~1.0° f09 finite volume grid is the ne30 ~1.0° spectral element grid.

Note

In addition to supporting the spectral element dynamical core on these grids, CAM also supports GFDL’s FV3 dynamical core on the ~1.0° C96 grid. For more information, see the CAM developmental compsets documentation.

Oceanic grids

Low-resolution

The workhorse oceanic grid is the ~1° displaced Greenland pole grid. There are two configurations of it that CESM2.1.3 is compatible with, gx1v6 (g16) and gx1v7 (g17). These grids are identical, except in g17, the Caspian Sea has been removed from the ocean/sea ice domain and inserted into the land domain.

g16

g17

g16

g17

High-resolution

The eddy-resolving grid is ~0.1° Poseidon Tripole grid. Again, just like with the low-resolution grid, there are two configurations of it that CESM2.1.3 is compatible with, tx0.1v2 (t12) and tx01.v3 (t13). These grids are identical, except in t13, the Caspian Sea has been removed from the ocean/sea ice domain and inserted into the land domain.

t12

t13

t12

t13

Building a case

The scripts for building cases within CESM are part of a software collection known as the Common Infrastructure for Modeling the Earth (CIME). This software supports both NCAR models and those developed within the Department of Energy’s Energy Exascale Earth System Model (E3SM) collection. Thus the build scripts to create a new case are contained within the cime subdirectory.

cd <installation_directory>/cesm2_1_3/cime/scripts
ls
create_clone    create_test        fortran_unit_testing  query_config     tests
create_newcase  data_assimilation  lib                   query_testlists  Tools

The create_newcase script is invoked and passed command line arguments to build a new case.

Command line option

Meaning

--case

The directory the case will be built in. It is common practice to include the experiment’s grid resolution and component set (described below) in the name of the case so that these aspects can be easily identified when browsing the file system later.

--compset

The component set of the experiment, including which models will be actively integrating (atmosphere, land, ocean, sea ice) and what boundary forcing will be used. CESM has an extensive list of component set definitions and these instructions using the FHIST compset, which has an active atmospheric component, the Community Atmosphere Model version 6, and historical sea surface forcing, staring in 1979.

--res

The grid resolution the model will run on. Each grid includes at least two parts, the atmospheric/land grid and the ocean/sea ice grid to which it is coupled. These instructions use a low-resolution finite volume grid for the atmosphere, fv0.9x1.25 and couple it to a ~1° ocean/sea ice grid, gx1v7. These grid names are truncated into f09_g17. Again, CESM has an extensive list of available grids.

--mach

The upercomputer the case will be built on. These instructions build a case on NCAR’s Cheyenne computer, however, if you are building on Pleiades, consult the table in the note below.

--project

The account code the project will be run on. When jobs from the experiment are run, the specified account will automatically be debited. Replace PXXXXXXXX with your project code.

--run-unsupported

Since the cube-sphere grid is a newly released aspect of CESM that is not used in Coupled Model Intercomparison Project runs, it is not considered a scientifically supported grid yet. In order to use it, you need to append this option.

Note

If you are building on pleiades, the core layout per node differs based on which nodes you are using. These differences are alreay accounted for within CESM. When specifying --mach there are four valid options:

Compute node processor

Corresponding --mach option

Broadwell

pleiades-bro

Haswell

pleiades-has

Ivy Bridge

pleiades-ivy

Sandy Bridge

pleiades-san

To build a case using the ~1° f09 finite volume grid:

./create_newcase --case /glade/work/johnsonb/cesm_runs/FHIST.cesm2_1_3.f09_g17.001 --compset FHIST --res f09_g17 --mach cheyenne --project PXXXXXXXX --run-unsupported
[...]
Creating Case directory /glade/work/johnsonb/cesm_runs/FHIST.cesm2_1_3.f09_g17.001

The case directory has successfully been created. Change to the case directory and set up the case.

cd /glade/work/johnsonb/cesm_runs/FHIST.cesm2_1_3.f09_g17.001
./case.setup

The case.setup script scaffolds out the case directory, creating the Buildconf and CaseDocs directories that you can customize. These instructions use the default configurations and continue on to compiling the model. On machines that don’t throttle CPU usage on the login nodes, the case.build command can be invoked. On Cheyenne, however, CPU intensive activities are killed on the login nodes, you will need to use a build wrapper to build the model on a shared compute node and specify a project code. Again, replace PXXXXXXXX with your project code.

qcmd -q share -l select=1 -A PXXXXXXXX -- ./case.build

The model build should progress for several minutes. If it compiles properly, a success message should be printed.

Time spent not building: 6.320388 sec
Time spent building: 603.685347 sec
MODEL BUILD HAS FINISHED SUCCESSFULLY

The model is actually built and run in a user’s scratch space.

/glade/scratch/johnsonb/FHIST.cesm2_1_3.f09_g17.001/bld/cesm.exe

Submitting a job

To submit a job, change to the case directory and use the case.submit script. The -M begin,end option sends the user an email when the job starts and stops running.

When the case is built, its default configuration is to run for five model days. This setting can be changed to run for a single model day using ./xmlchange STOP_N=1.

cd /glade/work/johnsonb/cesm_runs/FHIST.cesm2_1_3.f09_g17.001
./xmlchange STOP_N=1
./case.submit -M begin,end
[...]
Submitted job id is 2658061.chadmin1.ib0.cheyenne.ucar.edu
Submitted job case.run with id 2658060.chadmin1.ib0.cheyenne.ucar.edu
Submitted job case.st_archive with id 2658061.chadmin1.ib0.cheyenne.ucar.edu

Restart file

After the job completes, restart files are written to the run directory which is also in scratch space. These restart files are written for both active and data components. The CAM restart file contains a cam.r substring. By default, the FHIST case begins on January 1st, 1979. Thus, the restart file will be for January 2nd, 1979.

/glade/scratch/johnsonb/FHIST.cesm2_1_3.f09_g17.001/run/FHIST.cesm2_1_3.f09_g17.001.cam.r.1979-01-02-00000.nc

The fields in the restart file can be plotted using various langauges such as MATLAB or Python’s matplotlib.

References

1

Lauritzen, P. H., and Coauthors, 2018: NCAR Release of CAM-SE in CESM2.0: A Reformulation of the Spectral Element Dynamical Core in Dry-Mass Vertical Coordinates With Comprehensive Treatment of Condensates and Energy. Journal of Advances in Modeling Earth Systems, 10, 1537–1570, doi:10.1029/2017MS001257.

Compsets

Overview

CESM’s component models can be run in a variety of combinations, ranging from all components being active to all components being off. A given set of component configurations is known as a “component set” or “compset” for short.

Similar compsets are grouped using aliases. The CESM2 documentation provides a table showing which components are active.

Coupled data assimilation experiments typically use compsets with aliases beginning with B, since the atmosphere, land, sea ice and ocean components are all active in these compsets.

The CESM website provides a comprehensive list of all available compsets.

Data assimilation cycles

The DATA_ASSIMILATION_CYCLES setting within CESM denotes how many model integration, filter and inflation cycles will be attempted within a single CESM job submission. This a typical setting for CAM:

STOP_N: 6
STOP_OPTION: nhours
DATA_ASSIMILATION_CYCLES=4

Within a single CESM ./case.submit job, the model will run for 24 hours with filter and inflation running at 6-hour intervals.

This is a typical setting for POP:

STOP_N: 1
STOP_OPTION: ndays
DATA_ASSIMILATION_CYCLES=5

So within a single CESM ./case.submit job, the model will run for 5 days with filter and inflation running daily.

Note

The CESM RESUBMIT setting is distinct from DATA_ASSIMILATION_CYCLES. The value that RESUBMIT is set to denotes how many times ./case.submit will be invoked. Each submitted job must be completed within the period denoted by JOB_WALLCLOCK_TIME.

Modifying setup scripts for NUOPC

Note

It isn’t actually necessary to use a setup script to create a multi-instance CESM case to test NUOPC functionality. CIME’s ./create_newcase script can be invoked with the --ninst option instead.

Overview

The setup scripts for coupled CESM runs in the DART 9.X.X releases are already Manhattan compliant, however they invoke CESM utilities that are no longer present in CESM2.0.

Upgrading these setup scripts to test NUOPC functionality involves removing references to CESM1 utilities and replacing them the analagous utilities from CIME that CESM2 makes use of.

Example

For example, setting up CESM1 made heavy use of environmental variables that were accessed using the Tools/ccsm_getenv utility.

The analagous functionality in CESM2 uses CIME’s xmlquery utility, since many of these variables are now stored in xml configuration files rather than environmental variables.

Tractable path

To learn which aspects of the setup scripts need to be modified, you can diff the existing cam-fv and pop scripts, since both CESM1 and CESM2 versions of these scripts are contained within the DART repository.

You should begin by modifying the CESM perfect model obs setup script in DART/models/CESM/shell_scripts/CESM1_1_1_setup_pmo.

POP example

cd DART/models/pop/shell_scripts
diff cesm1_x/CESM1_1_1_setup_pmo cesm2_1/setup_CESM_perfect_model.csh

CAM example

cd DART/models/cam-fv/shell_scripts
diff cesm1_5/setup_hybrid cesm2_1/setup_hybrid

Forcing ocean and sea ice components

Runs with active ocean and sea ice components that are forced with prescribed atmospheric fluxes can be customized by editing the data atmosphere namelist and streams files.

Note

The data atmosphere fields can come from any source as long as they contain the requisite forcing fields. The easiest way to generate a “correct” set of atmospheric forcing fields is to use coupler history files from a CESM run in which CAM is active, but forcing from other reanalyses and data products such as JRA-55 and GISS are also acceptable.

The CIME documentation does contain instructions for configuring data atmosphere files. However, the instructions contained in that document are vague:

“Edit the user_datm.streams.txt.* file.”

Setting up a case to get acquainted with the files

Setting up a G compset case is the simplest way to become familiar with the data atmosphere stream files (datm.streams*).

cd <cesm_root>/cime/scripts/
./create_newcase --case /glade/work/${USER}/cases/G.fosi.f09_g17.001 --compset G --res f09_g17 --project PXXXXXXXX --run-unsupported
cd /glade/work/${USER}/cases/G.fosi.f09_g17.001
./case.setup
./preview_namelists

The preview_namelists script will fill the CaseDocs with the namelist and data streams files necessary to build a forced ocean/sea ice (FOSI) run.

ls CaseDocs/
atm_modelio.nml                   datm.streams.txt.presaero.clim_2000   ice_in           seq_maps.rc
cpl_modelio.nml                   drof_in                               ice_modelio.nml  wav_in
datm_in                           drof.streams.txt.rof.diatren_ann_rx1  lnd_modelio.nml  wav_modelio.nml
datm.streams.txt.CORE2_NYF.GISS   drv_in                                ocn_modelio.nml
datm.streams.txt.CORE2_NYF.GXGXS  esp_modelio.nml                       pop_in
datm.streams.txt.CORE2_NYF.NCEP   glc_modelio.nml                       rof_modelio.nml

There are four datm.streams* files. They contain a lists of all of the forcing fields from coupler history files that are necessary to conduct a FOSI run.

Structure of a data atmosphere stream file

The data atmosphere stream files are XML files that specify a domain file, a variable file and a time offset.

vim CaseDocs/datm.streams.txt.CORE2_NYF.GISS

These are the XML nodes in a datm.streams* file:

<?xml version="1.0"?>
<file id="stream" version="1.0">
  <dataSource>
     GENERIC
  </dataSource>
  <domainInfo>
    <variableNames>
      time    time
      lon      lon
      lat      lat
      area    area
      mask    mask
    </variableNames>
    <filePath>
      /glade/p/cesmdata/cseg/inputdata/atm/datm7/NYF
    </filePath>
    <fileNames>
      nyf.giss.T62.051007.nc
    </fileNames>
  </domainInfo>
  <fieldInfo>
    <variableNames>
      lwdn  lwdn
      swdn  swdn
      swup  swup
    </variableNames>
    <filePath>
      /glade/p/cesmdata/cseg/inputdata/atm/datm7/NYF
    </filePath>
    <fileNames>
      nyf.giss.T62.051007.nc
    </fileNames>
    <offset>
      0
    </offset>
  </fieldInfo>
</file>

The dataSource node typically specifies where the data come from. While this GISS file merely says GENERIC, the dataSource node from the CAM6 reanalysis is more descriptive, CAM6-DART Ensemble Reanalysis (NCAR RDA ds345.0).

The domainInfo node specifies a netCDF domain file and the variables it contains, time, lon, lat, area, mask. In this example, the forcing fields are specified on the T62 spectral grid.

The fieldInfo node specifies a netCDF variable file and the variables it contains. There are two strings in the variableNames entry because the data source might name a field differently than what the coupler is expecting. For example, in CaseDocs/datm.streams.txt.CORE2_NYF.GXGXS, the precipitation variable is declared in this manner:

<variableNames>
   prc  prec
</variableNames>

The pair of strings translate between the variable key in the GXGXS source file (Large and Yeager, 2004) 1 which specifies this field as prc and the key expected by the coupler, which is prec.

Time offset and axis mode

The time offset, offset, and time axis mode, taxmode, are the trickiest aspects of the datm.streams* files to get right.

The best explanation of these settings is in the CLM Customizing the DATM namelist documentation.

Specified fields

These are the fields that should be specified for a FOSI run:

  • lwdn downwelling longwave radiation

  • swdn downwelling shortwave radiation

  • swup upwelling shortwave radiation

  • prec precipitation

  • dens density

  • pslv sea level pressure

  • shum specific humidity

  • tbot 10-meter temperature

  • u 10-meter zonal velocity

  • v 10-meter meridional velocity

References

1

Large, W. G., and S. G. Yeager, 2004: Diurnal to decadal global forcing for ocean and sea-ice models: The data sets and flux climatologies. NCAR Tech. Note NCAR/TN-460+STR, 111 pp.

Reanalysis

The coupler history files from the Reanalysis are available both from:

To access the files on Campaign storage you’ll need to log onto either Casper or the data-access nodes. Campaign storage is not mounted on Cheyenne.

ssh <user>@data-access.ucar.edu
cd /gpfs/csfs1/cisl/dares/Reanalyses/f.e21.FHIST_BGC.f09_025.CAM6assim.011/cpl/hist

Each ensemble member’s coupler history files are stored in their own subdirectories, 0001, 0002, 0003, … 0080.

CESM2 Large Ensemble

The CESM2 Large Ensemble (LENS2) is a 100-member ~1.0° ensemble covering the period 1850-2100. It provides restart files that can be used to initialize a CESM ensemble.

The ensemble is constructed in a way that samples internal variability throughout an extended pre-industrial control simulation:

Members 11-90: “The chosen start dates (model years 1231, 1251, 1281, & 1301) sample AMOC and sea surface height (SSH) in the Labrador Sea at their maximum, minimum, and transition states.”

Access on GLADE

Restart files from the LENS2 are available on GLADE:

/glade/campaign/cesm/collections/CESM2-LE/restarts

SMYLE initialization

The Seasonal-to-Multiyear Large Ensemble (SMYLE) seeks to use a CESM B Compset on the ~1.0° f09_g17 finite-volume grid in order to make multi-year predictions.

It’s important to note that the peer-reviewed papers used to justify the SMYLE effort use anomaly correlations to support the notion that extented forecasts have meaningful predictive value: Luo et al. (2008), 1 Dunstone et al. (2016), 2 DiNezio et al. (2017), 3 Lovenduski et al. (2019), 4 Dunstone et al. (2020), 5 and Esit et al. (2021). 6

Anomaly correlations are distinct from the skill scores used historically in studies of prediction skill. The skill scores that have been used by the numerical prediction community exhibit two features:

  1. they make an explicit prediction

  2. they estimate an error in that prediction.

For example, the \(S_1\) score compares the error in the forecasted 500 hPa pressure surface against the magnitude of the horizontal pressure gradient (Teweles and Wobus, 1954 7 ). The height of the 500 hPa pressure is an explicit prediction of the future state of the atmosphere and the horizontal gradient “normalizes” in a sense, the magnitude of the error, since the error should be larger in areas where gradients are large.

Skill scores of this type are meaningful and useful – the National Centers for Environmental Prediction have tracked the operational \(S_1\) score throughout the history of the center since it effectively tracks the improvement predictive skill through several scientific generations.

Before devoting considerable time to the SMYLE effort, note that:

  1. anomaly correlations aren’t predictions,

  2. strong, spatially coherent correlations on interannual timescales can be observed even when the “signal” that is being correlated is synthetic noise (Livezey and Chen, 1983 8 ), and

  3. anomaly correlations aren’t predictions (this bears repeating).

References

1

Luo et al., 2008: Extended ENSO Predictions Using a Fully Coupled Ocean–Atmosphere Model, J Clim, 21(1), 84–93, https://doi.org/10.1175/2007JCLI1412.1.

2

Dunstone et al., 2016: Skilful predictions of the winter North Atlantic Oscillation one year ahead, Nat Geosci, 9, 809–814, https://doi.org/10.1038/NGEO2824.

3

DiNezio et al., 2017: A 2 Year Forecast for a 60–80% Chance of La Niña in 2017–2018, GRL, 44(22) 11,624-11,635, https://doi.org/10.1002/2017GL074904.

4

Lovenduski et al., 2019: Predicting near-term variability in ocean carbon uptake, Earth Syst Dynam, 10, 45–57, https://doi.org/10.5194/esd-10-45-2019.

5

Dunstone et al., 2020: Skilful interannual climate prediction from two large initialized model ensembles, ERL, 15(9), https://doi.org/10.1088/1748-9326/ab9f7d.

6

Esit, M., S. Kumar, A. Pandey, D. M. Lawrence, I. Rangwala, and S. Yeager, 2021: Seasonal to multi-year soil moisture drought forecasting. npj Clim Atmos Sci, 4, 1–8, https://doi.org/10.1038/s41612-021-00172-z.

7

Teweles, S., and H. B. Wobus, 1954: Verification of Prognostic Charts. Bul Am Meteor Soc, 35, 455–463, https://doi.org/10.1175/1520-0477-35.10.455.

8

Livezey, R. E., and W. Y. Chen, 1983: Statistical Field Significance and its Determination by Monte Carlo Techniques. Mon Wea Rev, 111, 46–59, doi:10.1175/1520-0493(1983)111<0046:SFSAID>2.0.CO;2.