in branch Eren_HF_DF_28Jun23 (merged with main, but this hasn't been tested in main), there's a platform-specific issue triggered by the line (in ADEMCMC.c) below:
" N.ITER=MCO.nSTART; //Pick up our status from nSTART, as we might be resuming an existing run! "
Wherein N.ITER is getting a nonsense value from MCO.nSTART; This causes random failure with some chance of success, because when N.ITER gets a positive value bigger than 200000, none of the MCMC routines run because the iteration-based conditional logic fails; when N.ITER < 0 it works so long as the number value isn't too huge that the stepping can never finish; e.g. starting N.ITER = -716000 finished; but starting N.ITER -= -384756000 did not finish, got to -361150000 out of 200000 within allowable clock time. E.g. out of a run with 842 pixels 4 completed and looked fine, while the rest failed.
Symptoms suggest that MCO.nSTART is being read as an uninitialized or corrupt memory value on TACC's lonestar6, but not on an M1 Mac.
in branch Eren_HF_DF_28Jun23 (merged with main, but this hasn't been tested in main), there's a platform-specific issue triggered by the line (in ADEMCMC.c) below:
" N.ITER=MCO.nSTART; //Pick up our status from nSTART, as we might be resuming an existing run! "
Wherein N.ITER is getting a nonsense value from MCO.nSTART; This causes random failure with some chance of success, because when N.ITER gets a positive value bigger than 200000, none of the MCMC routines run because the iteration-based conditional logic fails; when N.ITER < 0 it works so long as the number value isn't too huge that the stepping can never finish; e.g. starting N.ITER = -716000 finished; but starting N.ITER -= -384756000 did not finish, got to -361150000 out of 200000 within allowable clock time. E.g. out of a run with 842 pixels 4 completed and looked fine, while the rest failed.
Symptoms suggest that MCO.nSTART is being read as an uninitialized or corrupt memory value on TACC's lonestar6, but not on an M1 Mac.