Skip to content

[BUG] error in LAPACK routine #291

@nurfatimaj

Description

@nurfatimaj

Bug report

Please complete the following information:

  • Stata version: 18.0 22 May 2024
  • OS: Windows 10
  • reghdfe version: 6.12.3 (08aug2023)

Behavior

  • Expected behavior: reghdfe to finish the estimation
  • Actual behavior:
. reghdfe income _Iyear* _IeduXyea* _IeduXage* age age2 age3, absorb(theta=iid psi=fid) residuals(r) verbose(4) poolsize(1)

...

Converged in 94 iterations (last error = 9.7e-09)

# Solving least-squares regression of partialled-out variables

note: age omitted because of collinearity

              _hqrdmult():   3930 error in LAPACK routine
            hqrdmultq1t():      - function returned error
               _qrsolve():      - function returned error
                qrsolve():      - function returned error
      reghdfe_solve_ols():      - function returned error
                  <istmt>:      - function returned error
r(3930);

The dataset is very large (>34 min observations). Therefore, we had to split the sample into two: one spanning years 1987-2003 and the other 1995-2019.

  • The exact same estimation in the first subset finishes without errors.
  • We previously also ran the estimation in 2004-2019 where there were also no errors.

But we do need to extend the year span, so we are now trying to run estimation in 1995-2019 (26.5 min observations). The command drops 390181 singleton observations. Given the error output above, it seems to me that it occurs at the very last stage. Since it finishes without errors in slightly different slicing of the exact same dataset, it is not clear to me, why it errors out now or what can I do to fix it. It also looks strange to me that the error does not arise in prior steps when solver is iterating.

Any comments/directions are welcome!

Steps to reproduce the problem

I cannot provide the underlying data due to confidentiality. The regression specification is given above. The data structure is person-year unbalanced panel in long format. Main variables:

  • iid - individual unique identifier
  • fid - firm that person i is working in year t
  • income - monthly income of person i in year t
  • year- year variable (1987, 1988, ..., 2019)
  • educ- categorical education variable of individual i in year t (1 = compulsory, 2 = secondary, 3 = tertiary)
  • age - age of individual i in year t (20, 21, ..., 65)
    All the variables are never missing. If a person is not working in a given year, then she is omitted from the dataset.
    Given these variables the regressors are generated as follows
gen age2 = age * age
gen age3 = age * age * age
xi i.educ*i.year i.educ|age i.educ|age2 i.educ|age3

Metadata

Metadata

Assignees

Labels

No labels
No labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions