You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Jun 14, 2026. It is now read-only.
Follow-up to #202 and part of the broader target-surface cleanup in #200.
#202 fixes the immediate inversion by applying a documented constant fallback: when a record has only a dividend total and no observed qualified/non-qualified components, split it 78% qualified / 22% non-qualified based on the 2015 PUF aggregate E00650/E00600 share. That is a good first-order patch, but it gives every unsplit CPS dividend row the same qualified share.
We should replace that constant fallback with a stochastic or modeled qualified_dividend_share imputation learned from PUF rows with observed dividend composition.
Suggested shape:
Train/impute qualified_dividend_share = qualified_dividend_income / ordinary_dividend_income from PUF donor rows where ordinary dividends are positive and the qualified/non-qualified split is observed.
Apply the imputed share only to rows with an unsplit positive dividend total and no observed components, e.g. CPS DIV_VAL-only rows.
Preserve each row's total dividend exactly: qualified + non_qualified == ordinary_dividend_income == dividend_income up to numerical tolerance.
Keep observed PUF component rows unchanged.
Make the stochastic draw reproducible via the pipeline seed/checkpoint metadata.
Prefer conditioning on relevant predictors if available, such as dividend amount, income/AGI proxies, age, filing/tax-unit features, and asset/investment indicators.
Validation target:
Rebuild or run a focused diagnostic showing the qualified/non-qualified split moves toward the SOI/eCPS evidence without breaking export support parity.
Report national weighted totals and filer counts for qualified_dividend_income, non_qualified_dividend_income, and total dividends before/after.
Confirm this does not reintroduce the old all-non-qualified CPS-spine failure.
This should be treated as a quality improvement after #202, not a reason to block the constant-share bug fix.
Follow-up to #202 and part of the broader target-surface cleanup in #200.
#202 fixes the immediate inversion by applying a documented constant fallback: when a record has only a dividend total and no observed qualified/non-qualified components, split it 78% qualified / 22% non-qualified based on the 2015 PUF aggregate E00650/E00600 share. That is a good first-order patch, but it gives every unsplit CPS dividend row the same qualified share.
We should replace that constant fallback with a stochastic or modeled
qualified_dividend_shareimputation learned from PUF rows with observed dividend composition.Suggested shape:
qualified_dividend_share = qualified_dividend_income / ordinary_dividend_incomefrom PUF donor rows where ordinary dividends are positive and the qualified/non-qualified split is observed.DIV_VAL-only rows.qualified + non_qualified == ordinary_dividend_income == dividend_incomeup to numerical tolerance.Validation target:
qualified_dividend_income,non_qualified_dividend_income, and total dividends before/after.This should be treated as a quality improvement after #202, not a reason to block the constant-share bug fix.