Hi,
I am running the Aurora air pollution model autoregressively and I am seeing that the model becomes unstable after about 13 days of forecasting. After that point, the predictions start to degrade badly and eventually produce NaN values.
What I observed
The rollout is stable initially.
Around day 13, the forecast starts to drift strongly.
After that, the model appears to exhaust / blow up and begins predicting NaNs.
In my tests, the surface variables degrade much faster, while some pressure-level variables remain relatively reasonable for longer.
My question
Is this expected behavior for the Aurora air pollution model during long autoregressive rollouts?
Since the model is mainly designed for shorter forecasts, I would like to understand:
Is there a known reason the model exhausts during longer rollouts?
What are the recommended ways to extend the forecast horizon more safely?
Specifically, I would appreciate guidance on:
Whether this is a limitation of the released checkpoint
Whether any normalization or preprocessing issues could cause this
Whether clamping is recommended
Best practices for extending inference beyond the standard forecast range
Hi,
I am running the Aurora air pollution model autoregressively and I am seeing that the model becomes unstable after about 13 days of forecasting. After that point, the predictions start to degrade badly and eventually produce NaN values.
What I observed
The rollout is stable initially.
Around day 13, the forecast starts to drift strongly.
After that, the model appears to exhaust / blow up and begins predicting NaNs.
In my tests, the surface variables degrade much faster, while some pressure-level variables remain relatively reasonable for longer.
My question
Is this expected behavior for the Aurora air pollution model during long autoregressive rollouts?
Since the model is mainly designed for shorter forecasts, I would like to understand:
Is there a known reason the model exhausts during longer rollouts?
What are the recommended ways to extend the forecast horizon more safely?
Specifically, I would appreciate guidance on:
Whether this is a limitation of the released checkpoint
Whether any normalization or preprocessing issues could cause this
Whether clamping is recommended
Best practices for extending inference beyond the standard forecast range