Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test_mtl mid-z QSO test failure (intermittent?) #787

Open
sbailey opened this issue Jan 12, 2022 · 6 comments
Open

test_mtl mid-z QSO test failure (intermittent?) #787

sbailey opened this issue Jan 12, 2022 · 6 comments
Assignees

Comments

@sbailey
Copy link
Contributor

sbailey commented Jan 12, 2022

Nightly integration tests at NERSC failed last night with the following error, though I haven't been able to reproduce it by rerunning tests with my own checkout at NERSC. @geordie666 please check.

======================================================================
ERROR: test_midz_no_lock_in (desitarget.test.test_mtl.TestMTL)
Test true mid-z QSOs revert to tracers when redshifts change.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/global/common/software/desi/cori/desiconda/20200801-1.4.0-spec/code/desitarget/master/py/desitarget/test/test_mtl.py", line 317, in test_midz_no_lock_in
    pp[iimidzmtl] = pp[iilowzmtl]
ValueError: NumPy boolean array indexing assignment cannot assign 2 input values to the 1 output values where the mask is true
@geordie666
Copy link
Contributor

I ran python -m test_mtl 200 times at NERSC after sourcing master and I couldn't recreate this failure. It's a little hard to debug the issue when I can't recreate it, so how much should I push on this?

@sbailey
Copy link
Contributor Author

sbailey commented Jan 12, 2022

Thanks for checking. Since the exception is inside the test itself (rather than inside algorithmic code that the test is testing), let's call this good enough and I'll reopen the ticket if the failure occurs again.

@sbailey sbailey closed this as completed Jan 12, 2022
@sbailey
Copy link
Contributor Author

sbailey commented Jan 20, 2022

@geordie666 another intermittent test_mtl failure popped up last night, but passes this morning:

======================================================================
FAIL: test_midz_no_lock_in (desitarget.test.test_mtl.TestMTL)
Test true mid-z QSOs revert to tracers when redshifts change.
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/global/common/software/desi/cori/desiconda/20200801-1.4.0-spec/code/desitarget/master/py/desitarget/test/test_mtl.py", line 320, in test_midz_no_lock_in
    self.assertTrue(np.all(mtl['PRIORITY'] == pp))
AssertionError: False is not true

----------------------------------------------------------------------

Please review that code for whether there might be side effects to the individual tests such that running them in a different order could cause one of them to break another. e.g. leaving a test file behind, or a modification to a self.blah variable that isn't cleared out by the setUp call before the next test.

If nothing pops up after a few minutes investigation, ok to close this again and we'll re-report if it comes up again.

@sbailey sbailey reopened this Jan 20, 2022
@geordie666
Copy link
Contributor

Irritating. We could take the lazy option and turn off the test, as the lines of code that are being covered are stable and unlikely to change. But, led me dig some more to try to understand the problem. One solution might be to hard-code some numbers in that test, rather than inheriting them from the class.

@geordie666
Copy link
Contributor

@sbailey: Suggested approach. Let's skip this unit test with a decorator, for now, but leave this issue open. When I find time to drill down on a solution, I'll turn the unit test back on.

@sbailey
Copy link
Contributor Author

sbailey commented Jan 21, 2022

it fails infrequently enough that it isn't a big problem. I suggest leaving it on to build up stats and examples. I can post here if/when there are other failures, but it isn't urgent, nor is it overly distracting or problematic.

If we're going to do anything short term to the code, I'd suggest augmenting the test logging so that the next time one of those two tests fail, it leaves behind more information that might be useful for debugging what happened (e.g. printing the values of pp, mtl, iilowz, iihighz, ...)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants