If there are multiple endogenous variables, IV2SLS.first_stage reports the F-statistics when regressing each component of the endogenous variables on the instruments (and controls). This is misleading. If the endogenous variables are correlated, the individual F-statistics can be large, while the causal parameter is not well identified.
See the following example:
In [1]: from linearmodels.iv import IV2SLS
...: import numpy as np
...:
...: rng = np.random.default_rng(0)
...:
...: n = 1000
...:
...: Z = rng.normal(size=(n, 3))
...:
...: H = rng.normal(size=(n, 3)) # confounder
...: X = Z @ np.ones((3, 2)) + H @ np.array([[1, 0], [0, -1], [0, 0]])
...: y = H @ np.array([1, 1, 0.1]) # beta = 0
...:
...: tsls = IV2SLS(y, None, X, Z).fit(cov_type="unadjusted")
...: print(tsls.first_stage)
First Stage Estimation Results
================================================
endog.0 endog.1
------------------------------------------------
R-squared 0.7256 0.7364
Partial R-squared 0.7256 0.7364
Shea's R-squared 0.0010 0.0010
Partial F-statistic 881.59 931.31
P-value (Partial F-stat) 1.11e-16 1.11e-16
Partial F-stat Distn F(3,997) F(3,997)
========================== ========== ==========
instruments.0 1.0079 0.9792
(31.124) (31.102)
instruments.1 0.9454 0.9773
(28.305) (30.095)
instruments.2 0.9673 0.9655
(30.639) (31.453)
------------------------------------------------
T-stats reported in parentheses
T-stats use same covariance type as original model
The individual F-statistics are large, suggesting that Wald-based confidence sets can be trusted. They cannot.
In [2]: tsls
Out[2]:
IV-2SLS Estimation Summary
==============================================================================
Dep. Variable: dependent R-squared: 0.9775
Estimator: IV-2SLS Adj. R-squared: 0.9775
No. Observations: 1000 F-statistic: 29.778
Date: Fri, Oct 04 2024 P-value (F-stat) 0.0000
Time: 16:48:29 Distribution: chi2(2)
Cov. Estimator: unadjusted
Parameter Estimates
==============================================================================
Parameter Std. Err. T-stat P-value Lower CI Upper CI
------------------------------------------------------------------------------
endog.0 0.8687 0.1592 5.4558 0.0000 0.5566 1.1807
endog.1 -0.8694 0.1593 -5.4568 0.0000 -1.1817 -0.5571
==============================================================================
Endogenous: endog.0, endog.1
Instruments: instruments.0, instruments.1, instruments.2
Unadjusted Covariance (Homoskedastic)
Debiased: False
IVResults, id: 0x15b26a510
Even though the true parameter is zero, the F-statistic is highly significant at ~30. So are the t-statistics.
In Testing for Weak Instruments in Linear IV Regression (2005), Stock and Yogo suggest to use the Cragg and Donald statistic for reduced rank to test for identification. If $P_Z$ is the projection onto the column span of $Z$, and $M_Z$ the projection onto the orthogonal column span, the statistic is $n \cdot \lambda_\mathrm{min}\left( (X^T M_Z X)^{-1} X^T P_Z X \right).$ In the case of a single endogenous variable, this is the F-statistic. Else, it takes the correlation of the columns of $\Pi$ in $X = Z \Pi + V$ into account. In Table 1, they report thresholds for the statistic, similarly to the first-stage F-test heuristic based on Staiger and Stock (1997).
In the example above, the Cragg and Donald test statistic is very small, correctly suggesting that Wald-based inference cannot be trusted.
In [3]: from ivmodels.tests import rank_test
...:
...: statistic, p_value = rank_test(Z, X, fit_intercept=False)
...: print(f"{statistic=}, {p_value=}")
statistic=np.float64(0.8939161043879634), p_value=np.float64(0.6395707363012899)
If there are multiple endogenous variables,
IV2SLS.first_stagereports the F-statistics when regressing each component of the endogenous variables on the instruments (and controls). This is misleading. If the endogenous variables are correlated, the individual F-statistics can be large, while the causal parameter is not well identified.See the following example:
The individual F-statistics are large, suggesting that Wald-based confidence sets can be trusted. They cannot.
Even though the true parameter is zero, the F-statistic is highly significant at ~30. So are the t-statistics.
In Testing for Weak Instruments in Linear IV Regression (2005), Stock and Yogo suggest to use the Cragg and Donald statistic for reduced rank to test for identification. If$P_Z$ is the projection onto the column span of $Z$ , and $M_Z$ the projection onto the orthogonal column span, the statistic is $n \cdot \lambda_\mathrm{min}\left( (X^T M_Z X)^{-1} X^T P_Z X \right).$ In the case of a single endogenous variable, this is the F-statistic. Else, it takes the correlation of the columns of $\Pi$ in $X = Z \Pi + V$ into account. In Table 1, they report thresholds for the statistic, similarly to the first-stage F-test heuristic based on Staiger and Stock (1997).
In the example above, the Cragg and Donald test statistic is very small, correctly suggesting that Wald-based inference cannot be trusted.