CESM Coupled Data Assimilation
This documents our effort to configure CESM to run in various configurations to test concepts relevant to coupled data assimilation, including:
forced ocean-ice experiments, and
CESM B-compsets with active CAM6, CLM, CICE6 and POP2 components.
Compiling
Since DART is designed to minimize dependencies and maximize cross-platform compatibility, compiling DART on Cheyenne is trivial.
cd /glade/work/$USER
git clone https://github.com/NCAR/DART
Since Cheyenne is a Linux system with Intel processors, there is already a
mkmf.template
that works. Move it into place.
cd DART/build_templates
mv mkmf.template.intel.linux mkmf.template
Build the lorenz_63
model to ensure the setup is correct.
cd ../models/lorenz_63/work
./quickbuild.csh
[...]
Success: All single task DART programs compiled.
Script is exiting after building the serial versions of the DART programs.
Troubleshooting
Problems can arise if the DART executables weren’t built with the same compilers that were used to compile the netCDF library.
vim /glade/scratch/damrhein/GNYF.f09_g17_e80/run/da.log.2810502.chadmin1.ib0.cheyenne.ucar.edu.220209-231033
Error
Wed Feb 9 23:21:33 MST 2022 – BEGIN FILTER /glade/scratch/damrhein/GNYF.f09_g17_e80/bld/filter: error while loading shared libraries: libhwloc.so.15: cannot open shared object file: No such file or directory MPT ERROR: could not run executable. HPE MPT 2.19 02/23/19 05:31:12
Rebuilding the DART executables
As a first attempt at troubleshooting, try rebuilding the DART executables.
cd /glade/work/johnsonb/git/DAN_DART_2021-11-17/models/POP/work
./quickbuild.csh
./setup_CESM_hybrid_ensemble.csh
Building cesm with output to /glade/scratch/johnsonb/GNYF.f09_g17_e3/bld/cesm.bldlog.220214-161400
Time spent not building: 1286.737632 sec
Time spent building: 1622.594832 sec
MODEL BUILD HAS FINISHED SUCCESSFULLY
cd /glade/work/johnsonb/cases/GNYF.f09_g17_e3
./case.submit -M begin,end
[...]
Wait for run to finish
[...]
./CESM_DART_config.csh
./case.submit -M begin,end
[...]
Wait for run to finish
[...]
vim /glade/scratch/johnsonb/GNYF.f09_g17_e3/run/GNYF.f09_g17_e3.pop.dart_log.2010-01-04-00000.out
This rebuilt case ran successfully. Check to see if the DART state output is present.
cd /glade/scratch/johnsonb/GNYF.f09_g17_e3/run
ls *prior*
GNYF.f09_g17_e3.pop.output_priorinf_mean.2010-01-04-00000.nc
GNYF.f09_g17_e3.pop.output_priorinf_sd.2010-01-04-00000.nc
GNYF.f09_g17_e3.pop.preassim_priorinf_mean.2010-01-04-00000.nc
GNYF.f09_g17_e3.pop.preassim_priorinf_sd.2010-01-04-00000.nc
input_priorinf_mean.nc
input_priorinf_sd.nc
ls *output*
GNYF.f09_g17_e3.pop.output_mean.2010-01-04-00000.nc
GNYF.f09_g17_e3.pop.output_priorinf_mean.2010-01-04-00000.nc
GNYF.f09_g17_e3.pop.output_priorinf_sd.2010-01-04-00000.nc
GNYF.f09_g17_e3.pop.output_sd.2010-01-04-00000.nc
This output matches the stages_to_write
entry in input.nml
.
Getting started
Community Earth System Model (CESM) installation instructions are available via the README on the GitHub repository. The cube sphere grid is available as of CESM2.2.0.
Cloning and installing
Important
CESM has already been ported and should work “out of the box” on most of the
supercomputers that are widely used in the geosciences community, including
Pleiades. When compiling the model, ensure to set the machine command line
option, --mach
to match the supercomputer you are working on.
cd <installation_directory>
git clone https://github.com/ESCOMP/CESM.git cesm2_1_3
cd cesm2_1_3
git checkout release-cesm2.1.3
./manage_externals/checkout_externals
Commonly used grids
The CESM documentation includes a comprehensive list of grids. Typically,
grids have descriptive long names when they are defined such as fv0.9x1.25
– which is the atmospheric finite volume ~1° grid – or gx1v7
– which is
the seventh version of the oceanic displaced Greenland pole ~1° grid.
These long names are shortened when the atmospheric/land grids are coupled to
the ocean/sea ice grid. Instead of fv0.9x1.25
and gx1v7
, the shortened
name becomes f09_g17
.
Atmospheric grids
The workhorse atmospheric grid is the ~1° finite-volume f09 grid, which is
used for CMIP experiments and the Large Ensemble. Other grids used for iHESP
are the f05
~0.5° finite volume grid and the f02
~0.25° finite volume
grid.
Cube sphere
As of CESM2.2.0, CAM supports a spectral element dynamical core (CAM-SE) on cube-sphere grids. Lauritzen et al. (2017) 1 list the available cube-sphere grids in their Table 1. A subset of their table is reproduced here.
Grid name |
Average node spacing |
Model timestep |
---|---|---|
ne16np4 |
∼208 km |
1,800 s |
ne30np4 |
∼111 km |
1,800 s |
ne60np4 |
∼56 km |
900 s |
ne120np4 |
∼28 km |
450 s |
ne240np4 |
∼14 km |
225 s |
The analog to the ~1.0° f09 finite volume grid is the ne30 ~1.0° spectral element grid.
Note
In addition to supporting the spectral element dynamical core on these grids, CAM also supports GFDL’s FV3 dynamical core on the ~1.0° C96 grid. For more information, see the CAM developmental compsets documentation.
Oceanic grids
Low-resolution
The workhorse oceanic grid is the ~1° displaced Greenland pole grid. There
are two configurations of it that CESM2.1.3 is compatible with, gx1v6 (g16)
and gx1v7 (g17)
. These grids are identical, except in g17
, the Caspian
Sea has been removed from the ocean/sea ice domain and inserted into the land
domain.
g16 |
g17 |
High-resolution
The eddy-resolving grid is ~0.1° Poseidon Tripole grid. Again, just like with
the low-resolution grid, there are two configurations of it that CESM2.1.3 is
compatible with, tx0.1v2 (t12)
and tx01.v3 (t13)
. These grids are
identical, except in t13
, the Caspian Sea has been removed from the
ocean/sea ice domain and inserted into the land domain.
t12 |
t13 |
Building a case
The scripts for building cases within CESM are part of a software collection
known as the Common Infrastructure for Modeling the Earth (CIME). This software
supports both NCAR models and those developed within the Department of Energy’s
Energy Exascale Earth System Model (E3SM) collection. Thus the build scripts to
create a new case are contained within the cime
subdirectory.
cd <installation_directory>/cesm2_1_3/cime/scripts
ls
create_clone create_test fortran_unit_testing query_config tests
create_newcase data_assimilation lib query_testlists Tools
The create_newcase
script is invoked and passed command line arguments to
build a new case.
Command line option |
Meaning |
---|---|
|
The directory the case will be built in. It is common practice to include the experiment’s grid resolution and component set (described below) in the name of the case so that these aspects can be easily identified when browsing the file system later. |
|
The component set of the experiment, including which
models will be actively integrating (atmosphere, land, ocean,
sea ice) and what boundary forcing will be used. CESM has an
extensive list of component set definitions
and these instructions using the |
|
The grid resolution the model will run on. Each grid includes
at least two parts, the atmospheric/land grid and the ocean/sea
ice grid to which it is coupled. These instructions use a
low-resolution finite volume grid for the atmosphere,
|
|
The upercomputer the case will be built on. These instructions build a case on NCAR’s Cheyenne computer, however, if you are building on Pleiades, consult the table in the note below. |
|
The account code the project will be run on. When jobs from the
experiment are run, the specified account will automatically be
debited. Replace |
|
Since the cube-sphere grid is a newly released aspect of CESM that is not used in Coupled Model Intercomparison Project runs, it is not considered a scientifically supported grid yet. In order to use it, you need to append this option. |
Note
If you are building on pleiades
, the core layout per node differs based
on which nodes you are using. These differences are alreay accounted for
within CESM. When specifying --mach
there are four valid options:
Compute node processor |
Corresponding |
Broadwell |
|
Haswell |
|
Ivy Bridge |
|
Sandy Bridge |
|
To build a case using the ~1° f09
finite volume grid:
./create_newcase --case /glade/work/johnsonb/cesm_runs/FHIST.cesm2_1_3.f09_g17.001 --compset FHIST --res f09_g17 --mach cheyenne --project PXXXXXXXX --run-unsupported
[...]
Creating Case directory /glade/work/johnsonb/cesm_runs/FHIST.cesm2_1_3.f09_g17.001
The case directory has successfully been created. Change to the case directory and set up the case.
cd /glade/work/johnsonb/cesm_runs/FHIST.cesm2_1_3.f09_g17.001
./case.setup
The case.setup
script scaffolds out the case directory, creating the
Buildconf
and CaseDocs
directories that you can customize. These
instructions use the default configurations and continue on to compiling the
model. On machines that don’t throttle CPU usage on the login nodes, the
case.build
command can be invoked. On Cheyenne, however, CPU intensive
activities are killed on the login nodes, you will need to use a build wrapper
to build the model on a shared compute node and specify a project code. Again,
replace PXXXXXXXX
with your project code.
qcmd -q share -l select=1 -A PXXXXXXXX -- ./case.build
The model build should progress for several minutes. If it compiles properly, a success message should be printed.
Time spent not building: 6.320388 sec
Time spent building: 603.685347 sec
MODEL BUILD HAS FINISHED SUCCESSFULLY
The model is actually built and run in a user’s scratch space.
/glade/scratch/johnsonb/FHIST.cesm2_1_3.f09_g17.001/bld/cesm.exe
Submitting a job
To submit a job, change to the case directory and use the case.submit
script. The -M begin,end
option sends the user an email when the job starts
and stops running.
When the case is built, its default configuration is to run for five model
days. This setting can be changed to run for a single model day using
./xmlchange STOP_N=1
.
cd /glade/work/johnsonb/cesm_runs/FHIST.cesm2_1_3.f09_g17.001
./xmlchange STOP_N=1
./case.submit -M begin,end
[...]
Submitted job id is 2658061.chadmin1.ib0.cheyenne.ucar.edu
Submitted job case.run with id 2658060.chadmin1.ib0.cheyenne.ucar.edu
Submitted job case.st_archive with id 2658061.chadmin1.ib0.cheyenne.ucar.edu
Restart file
After the job completes, restart files are written to the run directory which
is also in scratch space. These restart files are written for both active and
data components. The CAM restart file contains a cam.r
substring. By
default, the FHIST
case begins on January 1st, 1979. Thus, the restart file
will be for January 2nd, 1979.
/glade/scratch/johnsonb/FHIST.cesm2_1_3.f09_g17.001/run/FHIST.cesm2_1_3.f09_g17.001.cam.r.1979-01-02-00000.nc
The fields in the restart file can be plotted using various langauges such as MATLAB or Python’s matplotlib.
References
- 1
Lauritzen, P. H., and Coauthors, 2018: NCAR Release of CAM-SE in CESM2.0: A Reformulation of the Spectral Element Dynamical Core in Dry-Mass Vertical Coordinates With Comprehensive Treatment of Condensates and Energy. Journal of Advances in Modeling Earth Systems, 10, 1537–1570, doi:10.1029/2017MS001257.
Compsets
Overview
CESM’s component models can be run in a variety of combinations, ranging from all components being active to all components being off. A given set of component configurations is known as a “component set” or “compset” for short.
Similar compsets are grouped using aliases. The CESM2 documentation provides a table showing which components are active.
Coupled data assimilation experiments typically use compsets with aliases
beginning with B
, since the atmosphere, land, sea ice and ocean components
are all active in these compsets.
The CESM website provides a comprehensive list of all available compsets.
Data assimilation cycles
The DATA_ASSIMILATION_CYCLES
setting within CESM denotes how many model
integration, filter and inflation cycles will be attempted within a single CESM
job submission. This a typical setting for CAM:
STOP_N: 6
STOP_OPTION: nhours
DATA_ASSIMILATION_CYCLES=4
Within a single CESM ./case.submit
job, the model will run for 24 hours
with filter and inflation running at 6-hour intervals.
This is a typical setting for POP:
STOP_N: 1
STOP_OPTION: ndays
DATA_ASSIMILATION_CYCLES=5
So within a single CESM ./case.submit
job, the model will run for 5 days
with filter and inflation running daily.
Note
The CESM RESUBMIT
setting is distinct from DATA_ASSIMILATION_CYCLES
.
The value that RESUBMIT
is set to denotes how many times
./case.submit
will be invoked. Each submitted job must be completed
within the period denoted by JOB_WALLCLOCK_TIME
.
Modifying setup scripts for NUOPC
Note
It isn’t actually necessary to use a setup script to create a multi-instance
CESM case to test NUOPC functionality. CIME’s ./create_newcase
script
can be invoked with the --ninst
option instead.
Overview
The setup scripts for coupled CESM runs in the DART 9.X.X
releases are
already Manhattan compliant, however they invoke CESM utilities that are no
longer present in CESM2.0.
Upgrading these setup scripts to test NUOPC functionality involves removing references to CESM1 utilities and replacing them the analagous utilities from CIME that CESM2 makes use of.
Example
For example, setting up CESM1 made heavy use of environmental variables that
were accessed using the Tools/ccsm_getenv
utility.
The analagous functionality in CESM2 uses CIME’s xmlquery
utility, since
many of these variables are now stored in xml configuration files rather than
environmental variables.
Tractable path
To learn which aspects of the setup scripts need to be modified, you can diff
the existing cam-fv
and pop
scripts, since both CESM1 and CESM2
versions of these scripts are contained within the DART repository.
You should begin by modifying the CESM perfect model obs setup script in
DART/models/CESM/shell_scripts/CESM1_1_1_setup_pmo
.
POP example
cd DART/models/pop/shell_scripts
diff cesm1_x/CESM1_1_1_setup_pmo cesm2_1/setup_CESM_perfect_model.csh
CAM example
cd DART/models/cam-fv/shell_scripts
diff cesm1_5/setup_hybrid cesm2_1/setup_hybrid
Forcing ocean and sea ice components
Runs with active ocean and sea ice components that are forced with prescribed atmospheric fluxes can be customized by editing the data atmosphere namelist and streams files.
Note
The data atmosphere fields can come from any source as long as they contain the requisite forcing fields. The easiest way to generate a “correct” set of atmospheric forcing fields is to use coupler history files from a CESM run in which CAM is active, but forcing from other reanalyses and data products such as JRA-55 and GISS are also acceptable.
The CIME documentation does contain instructions for configuring data atmosphere files. However, the instructions contained in that document are vague:
“Edit the user_datm.streams.txt.* file.”
Setting up a case to get acquainted with the files
Setting up a G
compset case is the simplest way to become familiar with the
data atmosphere stream files (datm.streams*
).
cd <cesm_root>/cime/scripts/
./create_newcase --case /glade/work/${USER}/cases/G.fosi.f09_g17.001 --compset G --res f09_g17 --project PXXXXXXXX --run-unsupported
cd /glade/work/${USER}/cases/G.fosi.f09_g17.001
./case.setup
./preview_namelists
The preview_namelists
script will fill the CaseDocs
with the namelist
and data streams files necessary to build a forced ocean/sea ice (FOSI) run.
ls CaseDocs/
atm_modelio.nml datm.streams.txt.presaero.clim_2000 ice_in seq_maps.rc
cpl_modelio.nml drof_in ice_modelio.nml wav_in
datm_in drof.streams.txt.rof.diatren_ann_rx1 lnd_modelio.nml wav_modelio.nml
datm.streams.txt.CORE2_NYF.GISS drv_in ocn_modelio.nml
datm.streams.txt.CORE2_NYF.GXGXS esp_modelio.nml pop_in
datm.streams.txt.CORE2_NYF.NCEP glc_modelio.nml rof_modelio.nml
There are four datm.streams*
files. They contain a lists of all of the
forcing fields from coupler history files that are necessary to conduct a FOSI
run.
Structure of a data atmosphere stream file
The data atmosphere stream files are XML files that specify a domain file, a variable file and a time offset.
vim CaseDocs/datm.streams.txt.CORE2_NYF.GISS
These are the XML nodes in a datm.streams*
file:
<?xml version="1.0"?>
<file id="stream" version="1.0">
<dataSource>
GENERIC
</dataSource>
<domainInfo>
<variableNames>
time time
lon lon
lat lat
area area
mask mask
</variableNames>
<filePath>
/glade/p/cesmdata/cseg/inputdata/atm/datm7/NYF
</filePath>
<fileNames>
nyf.giss.T62.051007.nc
</fileNames>
</domainInfo>
<fieldInfo>
<variableNames>
lwdn lwdn
swdn swdn
swup swup
</variableNames>
<filePath>
/glade/p/cesmdata/cseg/inputdata/atm/datm7/NYF
</filePath>
<fileNames>
nyf.giss.T62.051007.nc
</fileNames>
<offset>
0
</offset>
</fieldInfo>
</file>
The dataSource
node typically specifies where the data come from. While
this GISS file merely says GENERIC
, the dataSource node from the CAM6
reanalysis is more descriptive, CAM6-DART Ensemble Reanalysis (NCAR
RDA ds345.0)
.
The domainInfo
node specifies a netCDF domain file and the variables it
contains, time
, lon
, lat
, area
, mask
. In this example, the
forcing fields are specified on the T62
spectral grid.
The fieldInfo
node specifies a netCDF variable file and the variables it
contains. There are two strings in the variableNames
entry because the
data source might name a field differently than what the coupler is expecting.
For example, in CaseDocs/datm.streams.txt.CORE2_NYF.GXGXS
, the
precipitation variable is declared in this manner:
<variableNames>
prc prec
</variableNames>
The pair of strings translate between the variable key in the GXGXS source file
(Large and Yeager, 2004) 1 which specifies this field as prc
and the
key expected by the coupler, which is prec
.
Time offset and axis mode
The time offset, offset
, and time axis mode, taxmode
, are the trickiest
aspects of the datm.streams*
files to get right.
The best explanation of these settings is in the CLM Customizing the DATM namelist documentation.
Specified fields
These are the fields that should be specified for a FOSI run:
lwdn
downwelling longwave radiationswdn
downwelling shortwave radiationswup
upwelling shortwave radiationprec
precipitationdens
densitypslv
sea level pressureshum
specific humiditytbot
10-meter temperatureu
10-meter zonal velocityv
10-meter meridional velocity
References
- 1
Large, W. G., and S. G. Yeager, 2004: Diurnal to decadal global forcing for ocean and sea-ice models: The data sets and flux climatologies. NCAR Tech. Note NCAR/TN-460+STR, 111 pp.
Reanalysis
The coupler history files from the Reanalysis are available both from:
NCAR’s Research Data Archive and
on Campaign storage.
To access the files on Campaign storage you’ll need to log onto either Casper or the data-access nodes. Campaign storage is not mounted on Cheyenne.
ssh <user>@data-access.ucar.edu
cd /gpfs/csfs1/cisl/dares/Reanalyses/f.e21.FHIST_BGC.f09_025.CAM6assim.011/cpl/hist
Each ensemble member’s coupler history files are stored in their own
subdirectories, 0001
, 0002
, 0003
, … 0080
.
CESM2 Large Ensemble
The CESM2 Large Ensemble (LENS2) is a 100-member ~1.0° ensemble covering the period 1850-2100. It provides restart files that can be used to initialize a CESM ensemble.
The ensemble is constructed in a way that samples internal variability throughout an extended pre-industrial control simulation:
Members 11-90: “The chosen start dates (model years 1231, 1251, 1281, & 1301) sample AMOC and sea surface height (SSH) in the Labrador Sea at their maximum, minimum, and transition states.”
Access on GLADE
Restart files from the LENS2 are available on GLADE:
/glade/campaign/cesm/collections/CESM2-LE/restarts
SMYLE initialization
The Seasonal-to-Multiyear Large Ensemble (SMYLE) seeks to use a CESM B Compset on the ~1.0° f09_g17
finite-volume grid in order to make multi-year predictions.
It’s important to note that the peer-reviewed papers used to justify the SMYLE effort use anomaly correlations to support the notion that extented forecasts have meaningful predictive value: Luo et al. (2008), 1 Dunstone et al. (2016), 2 DiNezio et al. (2017), 3 Lovenduski et al. (2019), 4 Dunstone et al. (2020), 5 and Esit et al. (2021). 6
Anomaly correlations are distinct from the skill scores used historically in studies of prediction skill. The skill scores that have been used by the numerical prediction community exhibit two features:
they make an explicit prediction
they estimate an error in that prediction.
For example, the \(S_1\) score compares the error in the forecasted 500 hPa pressure surface against the magnitude of the horizontal pressure gradient (Teweles and Wobus, 1954 7 ). The height of the 500 hPa pressure is an explicit prediction of the future state of the atmosphere and the horizontal gradient “normalizes” in a sense, the magnitude of the error, since the error should be larger in areas where gradients are large.
Skill scores of this type are meaningful and useful – the National Centers for Environmental Prediction have tracked the operational \(S_1\) score throughout the history of the center since it effectively tracks the improvement predictive skill through several scientific generations.
Before devoting considerable time to the SMYLE effort, note that:
anomaly correlations aren’t predictions,
strong, spatially coherent correlations on interannual timescales can be observed even when the “signal” that is being correlated is synthetic noise (Livezey and Chen, 1983 8 ), and
anomaly correlations aren’t predictions (this bears repeating).
References
- 1
Luo et al., 2008: Extended ENSO Predictions Using a Fully Coupled Ocean–Atmosphere Model, J Clim, 21(1), 84–93, https://doi.org/10.1175/2007JCLI1412.1.
- 2
Dunstone et al., 2016: Skilful predictions of the winter North Atlantic Oscillation one year ahead, Nat Geosci, 9, 809–814, https://doi.org/10.1038/NGEO2824.
- 3
DiNezio et al., 2017: A 2 Year Forecast for a 60–80% Chance of La Niña in 2017–2018, GRL, 44(22) 11,624-11,635, https://doi.org/10.1002/2017GL074904.
- 4
Lovenduski et al., 2019: Predicting near-term variability in ocean carbon uptake, Earth Syst Dynam, 10, 45–57, https://doi.org/10.5194/esd-10-45-2019.
- 5
Dunstone et al., 2020: Skilful interannual climate prediction from two large initialized model ensembles, ERL, 15(9), https://doi.org/10.1088/1748-9326/ab9f7d.
- 6
Esit, M., S. Kumar, A. Pandey, D. M. Lawrence, I. Rangwala, and S. Yeager, 2021: Seasonal to multi-year soil moisture drought forecasting. npj Clim Atmos Sci, 4, 1–8, https://doi.org/10.1038/s41612-021-00172-z.
- 7
Teweles, S., and H. B. Wobus, 1954: Verification of Prognostic Charts. Bul Am Meteor Soc, 35, 455–463, https://doi.org/10.1175/1520-0477-35.10.455.
- 8
Livezey, R. E., and W. Y. Chen, 1983: Statistical Field Significance and its Determination by Monte Carlo Techniques. Mon Wea Rev, 111, 46–59, doi:10.1175/1520-0493(1983)111<0046:SFSAID>2.0.CO;2.