-
Notifications
You must be signed in to change notification settings - Fork 290
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
suite/util: fix KeyError in get_sha1s #1643
Conversation
The git.ceph.com:8080/history returns json with 'err' instead of 'error'. 2021-04-21 08:29:51,334.334 DEBUG:teuthology.suite.util:got response: {'committish': 'e647a64c1e8147b04e84575a0fc53dee65cecab2', 'err': 'fatal: bad object e647a64c1e8147b04e84575a0fc53dee65cecab2\n', 'sha1s': []} Traceback (most recent call last): File "/home/runner/src/teuthology_master/virtualenv/bin/teuthology-suite", line 33, in <module> sys.exit(load_entry_point('teuthology', 'console_scripts', 'teuthology-suite')()) File "/home/runner/src/teuthology_master/scripts/suite.py", line 189, in main return teuthology.suite.main(args) File "/home/runner/src/teuthology_master/teuthology/suite/__init__.py", line 143, in main run.prepare_and_schedule() File "/home/runner/src/teuthology_master/teuthology/suite/run.py", line 397, in prepare_and_schedule num_jobs = self.schedule_suite() File "/home/runner/src/teuthology_master/teuthology/suite/run.py", line 612, in schedule_suite util.find_git_parent('ceph', self.base_config.sha1) File "/home/runner/src/teuthology_master/teuthology/suite/util.py", line 491, in find_git_parent sha1s = get_sha1s(project, sha1, 2) File "/home/runner/src/teuthology_master/teuthology/suite/util.py", line 485, in get_sha1s int(count), sha1, project, resp.json()['error']) KeyError: 'error' Signed-off-by: Kyr Shatskyy <[email protected]>
@susebot run deploy |
Commit dce3594 is NOT OK. |
@susebot run deploy |
Commit 1b86548 is NOT OK. |
@susebot run deploy |
Commit 181c8e8 is NOT OK. |
Signed-off-by: Kyr Shatskyy <[email protected]>
@susebot run deploy |
Commit aa89a2d is NOT OK. |
@susebot run deploy |
Commit aa89a2d is NOT OK. |
Scheduling is failed because arm build is failed, and build_complete returns False:
{
"status": "failed",
"sha1": "e647a64c1e8147b04e84575a0fc53dee65cecab2",
"distro_arch": "arm64",
"started": "2021-04-20 19:06:00.620116",
"distro_codename": null,
"completed": null,
"extra": {
"node_name": "172.21.4.63+confusa01",
"version": "",
"build_user": "",
"root_build_cause": "SCMTRIGGER",
"job_name": "ceph-dev-build/ARCH=arm64,AVAILABLE_ARCH=arm64,AVAILABLE_DIST=centos8,DIST=centos8,MACHINE_SIZE=gigantic"
},
"modified": "2021-04-20 20:29:55.141954",
"distro_version": "8",
"project": "ceph",
"url": "https://jenkins.ceph.com/job/ceph-dev-build/ARCH=arm64,AVAILABLE_ARCH=arm64,AVAILABLE_DIST=centos8,DIST=centos8,MACHINE_SIZE=gigantic/47457/",
"log_url": "https://jenkins.ceph.com/job/ceph-dev-build/ARCH=arm64,AVAILABLE_ARCH=arm64,AVAILABLE_DIST=centos8,DIST=centos8,MACHINE_SIZE=gigantic/47457//consoleFull",
"flavor": "default",
"ref": "octopus",
"distro": "centos"
}
{
"status": "completed",
"sha1": "e647a64c1e8147b04e84575a0fc53dee65cecab2",
"distro_arch": "x86_64",
"started": "2021-04-20 17:43:16.257781",
"distro_codename": null,
"completed": "2021-04-20 18:34:27.490982",
"extra": {
"node_name": "172.21.2.4+braggi04",
"version": "15.2.11-166-ge647a64c",
"build_user": "",
"root_build_cause": "SCMTRIGGER",
"job_name": "ceph-dev-build/ARCH=x86_64,AVAILABLE_ARCH=x86_64,AVAILABLE_DIST=centos8,DIST=centos8,MACHINE_SIZE=gigantic"
},
"modified": "2021-04-20 18:34:27.492338",
"distro_version": "8",
"project": "ceph",
"url": "https://jenkins.ceph.com/job/ceph-dev-build/ARCH=x86_64,AVAILABLE_ARCH=x86_64,AVAILABLE_DIST=centos8,DIST=centos8,MACHINE_SIZE=gigantic/47457/",
"log_url": "https://jenkins.ceph.com/job/ceph-dev-build/ARCH=x86_64,AVAILABLE_ARCH=x86_64,AVAILABLE_DIST=centos8,DIST=centos8,MACHINE_SIZE=gigantic/47457//consoleFull",
"flavor": "default",
"ref": "octopus",
"distro": "centos"
} |
And this fails reproduced with the command:
|
in general teuthology-suite is supposed to only check for builds that are required by the jobs being scheduled - so in this case arm status should not matter |
It looks like the problem was a 500 error from gitserver in the most-recent job: 2021-04-21 20:12:13,835.835 DEBUG:teuthology.suite.util:got response: {'committish': '7088a74fcebc164188f87a0b01ce9853309c18c6', 'err': 'fatal: bad object 7088a74fcebc164188f87a0b01ce9853309c18c6\n', 'sha1s': []} |
@susebot run deploy |
Commit aa89a2d is NOT OK. |
This "bad object" condition still obtains on git.ceph.com's local repo, but not on mine, which leads me to think that something's wrong with updating the repo (at git.ceph.com:/home/cephgit/gitserver/git). Looking into it. $ git rev-list --first-parent --max-count 1 7088a74fcebc164188f87a0b01ce9853309c18c6 |
this is only the part of the problem with 'err'. See this log:
It found first ARM build which is failed and gives up on checking next build. And only then it faces:
Why it stops to go further, since the I believe there is combination of different issues. |
it's supposed to be searching for arch as well.
I'm assuming the 500 error is treated as fatal. It's definitely the case that git.ceph.com is out of date and clogged up. Let's get that back into shape before going much further. It's possible that somehow the check for arch isn't working right, but I don't see it from code examination, and I don't want to experiment much right now with git.ceph.com as ill as it is. |
in fact....the query string contained x86_64; I don't know why shaman gave a result including arm64 build. That's odd and I can't reproduce it. Using the exact query string from above:
|
@dmick You are querying different request, the build_complete check is making request to, for example, https://shaman.ceph.com/api/builds/ceph/octopus/7088a74fcebc164188f87a0b01ce9853309c18c6, and then goes throw all the builds and compares with what you've been queried. |
@susebot run deploy |
@kshtsk : I see that now. I was using the URL from the log, but, build_complete doesn't log its request. hacking, unless you already have something. |
Commit aa89a2d is OK. |
@dmick no, not yet fixed, so far so good scheduling passed, did anyone fixed the internal server error? |
If you mean the git.ceph.com/history error, yes, that should be working well again. I suspect if you hit another condition where the arm64 build failed and is first in the response, it will still fail to find that build, but it might step back in history until it works now that the git.ceph.com/history works. |
yeah right |
Closing since there is another PR merged addressing the issue #1958 |
The git.ceph.com:8080/history returns json with 'err' instead of 'error'.
Signed-off-by: Kyr Shatskyy [email protected]