From Cocoa/Readme instructions:
Note
We provide several cosmolike projects that can be loaded and compiled using setup_cocoa.sh and compile_cocoa.sh scripts. To activate them, comment the following lines on set_installation_options.sh
[Adapted from Cocoa/set_installation_options.sh shell script]
(...)
# ------------------------------------------------------------------------------
# The keys below control which cosmolike projects will be installed and compiled
# ------------------------------------------------------------------------------
#export IGNORE_COSMOLIKE_LSSTY1_CODE=1
#export IGNORE_COSMOLIKE_DES_Y3_CODE=1
#export IGNORE_COSMOLIKE_ROMAN_FOURIER_CODE=1
export IGNORE_COSMOLIKE_ROMAN_REAL_CODE=1
(...)
# ------------------------------------------------------------------------------
# Cosmolike projects below -------------------------------------------
# ------------------------------------------------------------------------------
(...)
export ROMAN_REAL_URL="https://github.com/CosmoLike/cocoa_roman_real.git"
export ROMAN_REAL_NAME="roman_real"
#BRANCH: if unset, load the latest commit on the specified branch
#export ROMAN_REAL_BRANCH="main"
#COMMIT: if unset, load the specified commit
export ROMAN_REAL_COMMIT="23a774c32480b7b4bd5da5f637270310bc88f86c"
#BRANCH: if unset, load the specified TAG
#export ROMAN_REAL_TAG="v4.0-beta17"
Note
If users want to recompile cosmolike, there is no need to rerun the Cocoa general scripts. Instead, run the following three commands:
source start_cocoa.sh
and
source ./installation_scripts/setup_cosmolike_projects.sh
and
source ./installation_scripts/compile_all_projects.sh
or (in case users just want to compile roman_real project)
source ./projects/roman_real/scripts/compile_roman_real.sh
Tip
Assuming Cocoa is installed on a local (not remote!) machine, type the command below after step 2️⃣ to run Jupyter Notebooks.
jupyter notebook --no-browser --port=8888
The terminal will then show a message similar to the following template:
(...)
[... NotebookApp] Jupyter Notebook 6.1.1 is running at:
[... NotebookApp] http://f0a13949f6b5:8888/?token=XXX
[... NotebookApp] or http://127.0.0.1:8888/?token=XXX
[... NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
Now go to the local internet browser and type http://127.0.0.1:8888/?token=XXX, where XXX is the previously saved token displayed on the line
[... NotebookApp] or http://127.0.0.1:8888/?token=XXX
The project roman_real contains jupyter notebook examples located at projects/roman_real.
To run the example
Step 1️⃣: activate the cocoa Conda environment, and the private Python environment
conda activate cocoa
and
source start_cocoa.sh
Step 2️⃣: Select the number of OpenMP cores (below, we set it to 8).
export OMP_PROC_BIND=close; export OMP_NUM_THREADS=8; export OMP_PLACES=cores; export OMP_DYNAMIC=FALSE
Step 3️⃣: The folder projects/roman_real contains examples. So, run the cobaya-run on the first example following the commands below.
-
One model evaluation:
-
Linux
mpirun -n 1 --oversubscribe --mca pml ^ucx --mca btl vader,tcp,self --report-bindings \ --bind-to core:overload-allowed --rank-by slot --map-by numa:pe=${OMP_NUM_THREADS} \ cobaya-run ./projects/roman_real/EXAMPLE_EVALUATE1.yaml -f -
macOS (arm)
mpirun -n 1 --oversubscribe cobaya-run ./projects/roman_real/EXAMPLE_EVALUATE1.yaml -f
-
-
MCMC (Metropolis-Hastings Algorithm):
-
Linux
mpirun -n 4 --oversubscribe --mca pml ^ucx --mca btl vader,tcp,self --report-bindings \ --bind-to core:overload-allowed --rank-by slot --map-by numa:pe=${OMP_NUM_THREADS} \ cobaya-run ./projects/roman_real/EXAMPLE_MCMC1.yaml -f -
macOS (arm)
mpirun -n 4 --oversubscribe cobaya-run ./projects/roman_real/EXAMPLE_MCMC1.yaml -f
-
Cocoa contains a few transformer- and CNN-based neural network emulators capable of simulating the CMB, cosmolike outputs, matter power spectrum, and distances. We provide a few scripts that exemplify their API. To run them, users ensure the following lines are commented out in set_installation_options.sh before running the setup_cocoa.sh and compile_cocoa.sh. By default, these lines should be commented out, but it is worth checking.
[Adapted from Cocoa/set_installation_options.sh shell script]
# insert the # symbol (i.e., unset these environmental keys on `set_installation_options.sh`)
#export IGNORE_EMULTRF_CODE=1 #SaraivanovZhongZhu (SZZ) transformer/CNN-based emulators
#export IGNORE_EMULTRF_DATA=1
#export IGNORE_LIPOP_LIKELIHOOD_CODE=1 # to run EXAMPLE_EMUL_(EVALUATE/MCMC/NAUTILUS/EMCEE1).yaml
#export IGNORE_LIPOP_CMB_DATA=1
#export IGNORE_ACTDR6_CODE=1 # to run EXAMPLE_EMUL_(EVALUATE/MCMC/NAUTILUS/EMCEE1).yaml
#export IGNORE_ACTDR6_DATA=1
#export IGNORE_NAUTILUS_SAMPLER_CODE=1 # to run PROJECTS/EXAMPLE/EXAMPLE_EMUL_NAUTILUS1.py
#export IGNORE_POLYCHORD_SAMPLER_CODE=1 # to run PROJECTS/EXAMPLE/EXAMPLE_EMUL_POLY1.yaml
#export IGNORE_GETDIST_CODE=1 # to run EXAMPLE_TENSION_METRICS.ipynb
#export IGNORE_TENSIOMETER_CODE=1 # to run EXAMPLE_TENSION_METRICS.ipynb
Tip
What if users have not configured ML-related keys before sourcing setup_cocoa.sh?
Answer: Comment the keys below before rerunning setup_cocoa.sh.
[Adapted from Cocoa/set_installation_options.sh shell script]
# These keys are only relevant if you run setup_cocoa multiple times
#export OVERWRITE_EXISTING_ALL_PACKAGES=1
#export OVERWRITE_EXISTING_COSMOLIKE_CODE=1
#export REDOWNLOAD_EXISTING_ALL_DATA=1
Now, users must follow all the steps below.
Step 1️⃣: Activate the private Python environment by sourcing the script start_cocoa.sh
source start_cocoa.sh
Step 2️⃣: Ensure OpenMP is OFF.
export OMP_NUM_THREADS=1
Step 3️⃣ Run cobaya-run on the first emulator example following the commands below.
-
One model evaluation:
-
Linux
mpirun -n 1 --oversubscribe --mca pml ^ucx --mca btl vader,tcp,self \ --bind-to core:overload-allowed --rank-by slot --map-by slot:pe=${OMP_NUM_THREADS} \ cobaya-run ./projects/roman_real/EXAMPLE_EMUL_EVALUATE1.yaml -f -
macOS (arm)
mpirun -n 1 --oversubscribe cobaya-run ./projects/roman_real/EXAMPLE_EMUL_EVALUATE1.yaml -f -
MCMC (Metropolis-Hastings Algorithm):
-
Linux
mpirun -n 4 --oversubscribe --mca pml ^ucx --mca btl vader,tcp,self \ --bind-to core:overload-allowed --rank-by slot --map-by slot:pe=${OMP_NUM_THREADS} \ cobaya-run ./projects/roman_real/EXAMPLE_EMUL_MCMC1.yaml -r -
macOS (arm)
mpirun -n 4 --oversubscribe cobaya-run ./projects/roman_real/EXAMPLE_EMUL_MCMC1.yaml -r
-
-
Halofit Comparison
The scripts that generated the plots below are provided at
scripts/EXAMPLE_PLOT_COMPARE_CHAINS_EMUL[1-4].py. -
PolyChord:
-
Linux
mpirun -n 90 --oversubscribe --mca pml ^ucx --mca btl vader,tcp,self \ --bind-to core:overload-allowed --rank-by slot --map-by slot:pe=${OMP_NUM_THREADS} \ cobaya-run ./projects/roman_real/EXAMPLE_EMUL_POLY1.yaml -r -
macOS (arm)
mpirun -n 12 --oversubscribe cobaya-run ./projects/roman_real/EXAMPLE_EMUL_POLY1.yaml -r
-
-
Nautilus:
-
Linux
mpirun -n 90 --oversubscribe --mca pml ^ucx --mca btl vader,tcp,self \ --bind-to core:overload-allowed --rank-by slot --map-by slot:pe=${OMP_NUM_THREADS} \ python -m mpi4py.futures ./projects/roman_real/EXAMPLE_EMUL_NAUTILUS1.py \ --root ./projects/roman_real/ --outroot "EXAMPLE_EMUL_NAUTILUS1" \ --maxfeval 750000 --nlive 2048 --neff 15000 --flive 0.01 --nnetworks 5 -
macOS (arm)
mpirun -n 12 --oversubscribe python -m mpi4py.futures ./projects/roman_real/EXAMPLE_EMUL_NAUTILUS1.py \ --root ./projects/roman_real/ --outroot "EXAMPLE_EMUL_NAUTILUS1" \ --maxfeval 750000 --nlive 2048 --neff 15000 --flive 0.01 --nnetworks 5
-
-
Emcee:
-
Linux
mpirun -n 51 --oversubscribe --mca pml ^ucx --mca btl vader,tcp,self \ --bind-to core:overload-allowed --rank-by slot --map-by slot:pe=${OMP_NUM_THREADS} \ python ./projects/roman_real/EXAMPLE_EMUL_EMCEE1.py --root ./projects/roman_real/ \ --outroot "EXAMPLE_EMUL_EMCEE1" --maxfeval 1000000 -
macOS (arm)
mpirun -n 12 --oversubscribe python ./projects/roman_real/EXAMPLE_EMUL_EMCEE1.py \ --root ./projects/roman_real/ --outroot "EXAMPLE_EMUL_EMCEE1" --maxfeval 1000000
The number of steps per MPI worker is
$n_{\rm sw} = {\rm maxfeval}/n_{\rm w}$ , with the number of walkers being$n_{\rm w}={\rm max}(3n_{\rm params},n_{\rm MPI})$ . For proper convergence, each walker should traverse 50 times the autocorrelation length ($\tau$ ), which is provided in the header of the output chain file. A reasonable rule of thumb is to assume$\tau > 200$ and therefore set${\rm maxfeval} > 10,000 \times n_{\rm w}$ . Finally, our code sets burn-in (per walker) at$5 \times \tau$ .With these numbers, users may ask when
Emceeis preferable toMetropolis-Hastings? Here are a few numbers based on ourPlanck CMB (l < 396) + SN + BAO + LSST-Y1test case.-
MHachieves convergence with$n_{\rm sw} \sim 150,000$ (number of steps per walker), but only requires four walkers. -
Emceehas$\tau \sim 300$ , so it requires$n_{\rm sw} \sim 15,000$ when running with$n_{\rm w}=114$ .
Conclusion:
Emceerequires$\sim 3$ more evaluations in this case, but the number of evaluations per MPI worker (assuming one MPI worker per walker) is reduced by$\sim 10$ . Therefore,Emceeseems well-suited for chains where the evaluation of a single cosmology is time-consuming (and there is no slow/fast decomposition).What if the user runs an
Emceechain withmaxevalinsufficient for convergence?Emceesaves the chain checkpoint atchains/outroot.h5. -
-
Sampler Comparison
The scripts that generated the plots below are provided at
scripts/EXAMPLE_PLOT_COMPARE_CHAINS_EMUL[1-4].py. -
Global Minimizer:
Our minimizer is a reimplementation of
Procoli, developed by Karwal et al (arXiv:2401.14225)-
Linux
mpirun -n 51 --oversubscribe --mca pml ^ucx --mca btl vader,tcp,self \ --bind-to core:overload-allowed --rank-by slot --map-by slot:pe=${OMP_NUM_THREADS} \ python ./projects/roman_real/EXAMPLE_EMUL_MINIMIZE1.py --root ./projects/roman_real/ \ --outroot "EXAMPLE_EMUL_MIN1" --nstw 450 -
macOS (arm)
mpirun -n 12 python ./projects/roman_real/EXAMPLE_EMUL_MINIMIZE1.py --root ./projects/roman_real/ \ --outroot "EXAMPLE_EMUL_MIN1" --nstw 450
The number of steps per Emcee walker per temperature is
$n_{\rm stw}$ , and the number of walkers is$n_{\rm w}={\rm max}(3n_{\rm params},n_{\rm MPI})$ . The minimum number of total evaluations is$3n_{\rm params} \times n_{\rm T} \times n_{\rm stw}$ , which can be distributed among$n_{\rm MPI} = 3n_{\rm params}$ MPI processes for faster results.The scripts that generated the plots below are provided at
scripts/EXAMPLE_PLOT_MIN_COMPARE_CONV[1-2].pyIn our testing,
$n_{\rm stw} \sim 250$ worked reasonably well up to$n_{\rm param} \sim \mathcal{O}(10)$ . -
-
Profile:
-
Linux
mpirun -n 51 --oversubscribe --mca pml ^ucx --mca btl vader,tcp,self \ --bind-to core:overload-allowed --rank-by slot --map-by slot:pe=${OMP_NUM_THREADS} \ python ./projects/roman_real/EXAMPLE_EMUL_PROFILE1.py \ --root ./projects/roman_real/ --cov 'chains/EXAMPLE_EMUL_MCMC1.covmat' \ --outroot "EXAMPLE_EMUL_PROFILE1" --factor 3 --nstw 450 --numpts 10 \ --profile ${SLURM_ARRAY_TASK_ID} \ --minfile="./projects/roman_real/chains/EXAMPLE_EMUL_MIN1.txt" -
macOS (arm)
mpirun -n 51 --oversubscribe python ./projects/roman_real/EXAMPLE_EMUL_PROFILE1.py \ --root ./projects/roman_real/ --cov 'chains/EXAMPLE_EMUL_MCMC1.covmat' \ --outroot "EXAMPLE_EMUL_PROFILE1" --factor 3 --nstw 450 --numpts 10 \ --profile ${SLURM_ARRAY_TASK_ID} \ --minfile="./projects/roman_real/chains/EXAMPLE_EMUL_MIN1.txt"
The argument
factorspecifies the start and end of the parameter being profiled:start value ~ mininum value - factor*np.sqrt(np.diag(cov)) end value ~ mininum value + factor*np.sqrt(np.diag(cov))We advise
${\rm factor} \sim 3$ for parameters that are well constrained by the data when a covariance matrix is provided. Ifcovis not supplied, the code estimates one internally from the prior. If a parameter is poorly constrained orcovis not given, we recommend${\rm factor} \ll 1$ .The script of the plot below is provided at
projects/roman_real/scripts/EXAMPLE_PLOT_PROFILE[1-2].pyProfile 1:
Cosmic Shear only (plus weak Gaussian priors) -
Warning
When running Profiles, you should not set flat priors on parameters that are not well constrained by the data.
By doing that, you then risk having the minimizer select values near the boundary of parameter space. This is a big problem when using emulators, as volume near the
boundary will be inevitable outside the training range. You can convert a flat prior to a Gaussian one by setting the standard deviation to be
prior:
# These priors are meant to prevent the sampler to wander far off training
g1: "lambda As_1e9: stats.norm.logpdf(As_1e9, loc=2.35, scale=1.6)"
g2: "lambda ns: stats.norm.logpdf(ns, loc=0.96, scale=0.05)"
g3: "lambda H0: stats.norm.logpdf(H0, loc=70, scale=10.0)"
g4: "lambda omegab: stats.norm.logpdf(omegab, loc=0.045, scale=0.012)"
g5: "lambda omegam: stats.norm.logpdf(omegam, loc=0.3 , scale=0.25)"
g6: "lambda w0pwa: stats.norm.logpdf(w0pwa, loc=-1.0 , scale=1.44)"
g7: "lambda w: stats.norm.logpdf(w, loc=-1.0, scale=1.44)"
g8: "lambda roman_A1_1: stats.norm.logpdf(roman_A1_1, loc=0, scale=2.5)"
g9: "lambda roman_A1_2: stats.norm.logpdf(roman_A1_2, loc=-1.7, scale=2.5)"
Running Profile also requires emulators trained on larger volumes of the parameter space.
Warning
The code and examples associated with this section are still in alpha stage
Our main line of research involves emulators that simulate the entire Cosmolike data vectors, and each project (LSST, Roman, DES) contains its own README with emulator examples. The speed of such emulators is incredible, especially when GPUs are available, and our emulators do take advantage of the CPU-GPU integration on Apple MX chips. For example, the average timing of lsst-y1 cosmic shear data vector emulation is around 0.005s (
While the data vector emulators are incredibly fast, there is an intermediate approach that emulates only the Boltzmann outputs (comoving distance and linear and nonlinear matter power spectra). This hybrid-ML approach can offer greater flexibility, especially in the initial phases of a research project, because changes to the modeling of nuisance parameters or to the assumed galaxy distributions do not require retraining the network.
Examples in the hybrid case all have the prefix EXAMPLE_EMUL2 (note the 2). The required flags in set_installation_options.sh are similar to those shown in the previous emulator section.
Now, users must follow all the steps below.
Step 1️⃣: Activate the private Python environment by sourcing the script start_cocoa.sh
source start_cocoa.sh
Step 2️⃣: Select the number of OpenMP cores. Below, we set it to 4, the ideal setting for hybrid examples.
-
Linux
export OMP_NUM_THREADS=4; export OMP_PROC_BIND=close; export OMP_PLACES=cores; export OMP_DYNAMIC=FALSE -
macOS (arm)
export OMP_NUM_THREADS=4; export OMP_PROC_BIND=disabled; export OMP_PLACES=cores; export OMP_DYNAMIC=FALSE
Step 3️⃣ Run cobaya-run on the first emulator example, following the commands below (here we only provide lsst-y1 examples).
-
One model evaluation:
-
Linux
mpirun -n 1 --oversubscribe --mca pml ^ucx --mca btl vader,tcp,self --report-bindings \ --bind-to core:overload-allowed --rank-by slot --map-by numa:pe=${OMP_NUM_THREADS} \ cobaya-run ./projects/roman_real/EXAMPLE_EMUL2_EVALUATE1.yaml -f -
macOS (arm)
mpirun -n 1 --oversubscribe cobaya-run ./projects/roman_real/EXAMPLE_EMUL2_EVALUATE1.yaml -f
-
-
MCMC (Metropolis-Hastings Algorithm):
-
Linux
mpirun -n 4 --oversubscribe --mca pml ^ucx --mca btl vader,tcp,self --report-bindings \ --bind-to core:overload-allowed --rank-by slot --map-by numa:pe=${OMP_NUM_THREADS} \ cobaya-run ./projects/roman_real/EXAMPLE_EMUL_EMUL2_MCMC1.yaml -r -
macOS (arm)
mpirun -n 4 --oversubscribe cobaya-run ./projects/roman_real/EXAMPLE_EMUL2_MCMC1.yaml -r
-
Details on the matter power spectrum emulator designs will be presented in the emulator_code repository. Basically, we apply standard neural network techniques to generalize the syren-new Eq. 6 of arXiv:2410.14623 formula for the linear power spectrum (w0waCDM with a fixed neutrino mass of
Note
Users can decide not to correct the syren-new formula for the linear power spectrum (flag in the yaml). Although we have not conducted extensive studies of the caveats of the syren-new approximation, it appears sufficient for w0waCDM forecasts when combined with the Euclid Emulator to compute the nonlinear boost.
For back-of-the-envelope LCDM calculations (e.g., to test cosmolike features), users can also choose not to correct the syren-Halofit formula for the LCDM nonlinear boost (see figure below). In this case, the overhead on top of cosmolike computations is minimum, at the order of
-
Emulator Comparison
The scripts that generated the plots below are provided at
scripts/EXAMPLE_PLOT_COMPARE_CHAINS_EMUL[1-4].py.
Emulators for Roman are stored on emulators/, we usually provide the YAML filed used to train them. For example, datavectors required to train an emulator
with settings similar to
nla_cosmic_shear/w0wa_takahashi_cs_CNN.h5
nla_cosmic_shear/w0wa_takahashi_cs_CNN.h5
can be found at
nla_cosmic_shear/w0wa_takahashi_cs_CNN.yaml
The repository emulators_code provides the script dataset_generator_lensing.py that can generate data vectors for any cosmological project.
The script below computes data vectors for cosmic shear (NLA,
mpirun -n 10 --oversubscribe \
python external_modules/code/emulators/emultrf/emultraining/dataset_generator_lensing.py \
--root projects/roman_real/ \
--fileroot emulators/nla_cosmic_shear/ \
--nparams 10000 \
--yaml 'w0wa_takahashi_cs_CNN.yaml' \
--datavsfile 'w0wa_takahashi_dvs_train' \
--paramfile 'w0wa_takahashi_params_train' \
--failfile 'w0wa_takahashi_params_failed_train' \
--chain 0 \
--unif 0 \
--temp 64 \
--maxcorr 0.15 \
--freqchk 2000 \
--loadchk 0 \
--append 1
-
The requested number of data vectors is given by the
--nparamsflag. -
There are two possible samplings.
- The option
--unif 1sets the sampling to follow a uniform distribution (respecting parameter boundaries set in the YAML file) - The option
--unif 0sets the sampling to follow a Gaussian distribution with the following options- The covariance matrix is set in the YAML file (keyword
params_covmat_fileinside thetrain_argsblock). For example, our provided YAML selects the Fisher-based w0wa_fisher_covmat.txt covariance matrix - Temperature reduces the curvature of the likelihood (
cov = cov/T) and is set by--tempflag - The correlations of the original covariance matrix are reduced to be less than
--maxcorr.
- The covariance matrix is set in the YAML file (keyword
- The option
-
For visualization purposes, setting
--chain 1sets the script to generate the training parameters without computing the data vectors. -
The output files are
# Distribution of training points ready to be plotted by GetDist w0wa_params_train_cs_64.1.txt w0wa_params_train_cs_64.covmat w0wa_params_train_cs_64.paramnames w0wa_params_train_cs_64.ranges #Corresponding data vectors w0wa_takahashi_nobaryon_dvs_train_cs_64.npy # Training parameters in which the data vector computation failed w0wa_params_failed_train_cs_64.txt -
The flags
--freqchk,--loadchk, and--appendare related to checkpoints.- The option
--freqchksets the frequency at which the code saves checkpoints (chk). - The options
--loadchkand--appendspecify whether the code loads the parameters and data vectors from a chk. In the two cases below, the code determines which remaining data vectors to compute based on the flags saved in the--failfilefile.- Case 1 (
--loadchk 1and--append 1): the code loads params from the chk and appends~nparamsmodels to it. - Case 2 (
--loadchk 1and--append 0): the code loads the params.
- Case 1 (
- The option
Warning
Computing data vectors for training requires so many MPI processes that, assuming here an HPC environment, it is advantageous to distribute the job among multiple compute nodes. Running Cocoa on multiple nodes requires a few modifications to the MPI command, as indicated below:
- Replace
--mca btl vader,tcp,selfwith--mca btl tcp,self(vader is a fast communication protocol that only works in a single node) - Add the line
-x PATH -x LD_LIBRARY_PATH -x CONDA_PREFIX -x ROOTDIR -x OMP_NUM_THREADS \ - replace
pythonby full path${ROOTDIR}/.local/bin/python - replace mpirun by full path "${CONDA_PREFIX}"/bin/mpirun
The SLURM script scripts/random_scripts_used_by_dev/EXAMPLE_EMUL_TRAINX_SBU.sbatch exemplify such changes




