Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

suite/util: fix KeyError in get_sha1s #1643

Closed
wants to merge 2 commits into from
Closed

Conversation

kshtsk
Copy link
Contributor

@kshtsk kshtsk commented Apr 21, 2021

The git.ceph.com:8080/history returns json with 'err' instead of 'error'.

  2021-04-21 08:29:51,334.334 DEBUG:teuthology.suite.util:got response: {'committish': 'e647a64c1e8147b04e84575a0fc53dee65cecab2', 'err': 'fatal: bad object e647a64c1e8147b04e84575a0fc53dee65cecab2\n', 'sha1s': []}
  Traceback (most recent call last):
    File "/home/runner/src/teuthology_master/virtualenv/bin/teuthology-suite", line 33, in <module>
      sys.exit(load_entry_point('teuthology', 'console_scripts', 'teuthology-suite')())
    File "/home/runner/src/teuthology_master/scripts/suite.py", line 189, in main
      return teuthology.suite.main(args)
    File "/home/runner/src/teuthology_master/teuthology/suite/__init__.py", line 143, in main
      run.prepare_and_schedule()
    File "/home/runner/src/teuthology_master/teuthology/suite/run.py", line 397, in prepare_and_schedule
      num_jobs = self.schedule_suite()
    File "/home/runner/src/teuthology_master/teuthology/suite/run.py", line 612, in schedule_suite
      util.find_git_parent('ceph', self.base_config.sha1)
    File "/home/runner/src/teuthology_master/teuthology/suite/util.py", line 491, in find_git_parent
      sha1s = get_sha1s(project, sha1, 2)
    File "/home/runner/src/teuthology_master/teuthology/suite/util.py", line 485, in get_sha1s
      int(count), sha1, project, resp.json()['error'])
  KeyError: 'error'

Signed-off-by: Kyr Shatskyy [email protected]

The git.ceph.com:8080/history returns json with 'err' instead of 'error'.

  2021-04-21 08:29:51,334.334 DEBUG:teuthology.suite.util:got response: {'committish': 'e647a64c1e8147b04e84575a0fc53dee65cecab2', 'err': 'fatal: bad object e647a64c1e8147b04e84575a0fc53dee65cecab2\n', 'sha1s': []}
  Traceback (most recent call last):
    File "/home/runner/src/teuthology_master/virtualenv/bin/teuthology-suite", line 33, in <module>
      sys.exit(load_entry_point('teuthology', 'console_scripts', 'teuthology-suite')())
    File "/home/runner/src/teuthology_master/scripts/suite.py", line 189, in main
      return teuthology.suite.main(args)
    File "/home/runner/src/teuthology_master/teuthology/suite/__init__.py", line 143, in main
      run.prepare_and_schedule()
    File "/home/runner/src/teuthology_master/teuthology/suite/run.py", line 397, in prepare_and_schedule
      num_jobs = self.schedule_suite()
    File "/home/runner/src/teuthology_master/teuthology/suite/run.py", line 612, in schedule_suite
      util.find_git_parent('ceph', self.base_config.sha1)
    File "/home/runner/src/teuthology_master/teuthology/suite/util.py", line 491, in find_git_parent
      sha1s = get_sha1s(project, sha1, 2)
    File "/home/runner/src/teuthology_master/teuthology/suite/util.py", line 485, in get_sha1s
      int(count), sha1, project, resp.json()['error'])
  KeyError: 'error'

Signed-off-by: Kyr Shatskyy <[email protected]>
@kshtsk
Copy link
Contributor Author

kshtsk commented Apr 21, 2021

@susebot run deploy

@susebot
Copy link

susebot commented Apr 21, 2021

Commit dce3594 is NOT OK.
Check tests results in the Jenkins job: https://ceph-ci.suse.de/job/pr-teuthology-deploy/326/

@kshtsk
Copy link
Contributor Author

kshtsk commented Apr 21, 2021

@susebot run deploy

@susebot
Copy link

susebot commented Apr 21, 2021

Commit 1b86548 is NOT OK.
Check tests results in the Jenkins job: https://ceph-ci.suse.de/job/pr-teuthology-deploy/327/

@kshtsk
Copy link
Contributor Author

kshtsk commented Apr 21, 2021

@susebot run deploy

@susebot
Copy link

susebot commented Apr 21, 2021

Commit 181c8e8 is NOT OK.
Check tests results in the Jenkins job: https://ceph-ci.suse.de/job/pr-teuthology-deploy/328/

@kshtsk
Copy link
Contributor Author

kshtsk commented Apr 21, 2021

@susebot run deploy

@susebot
Copy link

susebot commented Apr 21, 2021

Commit aa89a2d is NOT OK.
Check tests results in the Jenkins job: https://ceph-ci.suse.de/job/pr-teuthology-deploy/329/

@kshtsk
Copy link
Contributor Author

kshtsk commented Apr 21, 2021

@susebot run deploy

@susebot
Copy link

susebot commented Apr 21, 2021

Commit aa89a2d is NOT OK.
Check tests results in the Jenkins job: https://ceph-ci.suse.de/job/pr-teuthology-deploy/330/

@kshtsk
Copy link
Contributor Author

kshtsk commented Apr 22, 2021

Scheduling is failed because arm build is failed, and build_complete returns False:

curl -s https://shaman.ceph.com/api/builds/ceph/octopus/e647a64c1e8147b04e84575a0fc53dee65cecab2/ | jq '.[] | select(.distro=="centos" and .distro_version=="8" and .flavor=="default") '                           
{
  "status": "failed",
  "sha1": "e647a64c1e8147b04e84575a0fc53dee65cecab2",
  "distro_arch": "arm64",
  "started": "2021-04-20 19:06:00.620116",
  "distro_codename": null,
  "completed": null,
  "extra": {
    "node_name": "172.21.4.63+confusa01",
    "version": "",
    "build_user": "",
    "root_build_cause": "SCMTRIGGER",
    "job_name": "ceph-dev-build/ARCH=arm64,AVAILABLE_ARCH=arm64,AVAILABLE_DIST=centos8,DIST=centos8,MACHINE_SIZE=gigantic"
  },
  "modified": "2021-04-20 20:29:55.141954",
  "distro_version": "8",
  "project": "ceph",
  "url": "https://jenkins.ceph.com/job/ceph-dev-build/ARCH=arm64,AVAILABLE_ARCH=arm64,AVAILABLE_DIST=centos8,DIST=centos8,MACHINE_SIZE=gigantic/47457/",
  "log_url": "https://jenkins.ceph.com/job/ceph-dev-build/ARCH=arm64,AVAILABLE_ARCH=arm64,AVAILABLE_DIST=centos8,DIST=centos8,MACHINE_SIZE=gigantic/47457//consoleFull",
  "flavor": "default",
  "ref": "octopus",
  "distro": "centos"
}
{
  "status": "completed",
  "sha1": "e647a64c1e8147b04e84575a0fc53dee65cecab2",
  "distro_arch": "x86_64",
  "started": "2021-04-20 17:43:16.257781",
  "distro_codename": null,
  "completed": "2021-04-20 18:34:27.490982",
  "extra": {
    "node_name": "172.21.2.4+braggi04",
    "version": "15.2.11-166-ge647a64c",
    "build_user": "",
    "root_build_cause": "SCMTRIGGER",
    "job_name": "ceph-dev-build/ARCH=x86_64,AVAILABLE_ARCH=x86_64,AVAILABLE_DIST=centos8,DIST=centos8,MACHINE_SIZE=gigantic"
  },
  "modified": "2021-04-20 18:34:27.492338",
  "distro_version": "8",
  "project": "ceph",
  "url": "https://jenkins.ceph.com/job/ceph-dev-build/ARCH=x86_64,AVAILABLE_ARCH=x86_64,AVAILABLE_DIST=centos8,DIST=centos8,MACHINE_SIZE=gigantic/47457/",
  "log_url": "https://jenkins.ceph.com/job/ceph-dev-build/ARCH=x86_64,AVAILABLE_ARCH=x86_64,AVAILABLE_DIST=centos8,DIST=centos8,MACHINE_SIZE=gigantic/47457//consoleFull",
  "flavor": "default",
  "ref": "octopus",
  "distro": "centos"
}

@kshtsk
Copy link
Contributor Author

kshtsk commented Apr 22, 2021

Hey @dmick this looks relates the d36cd1c can you review this patch, however I am not sure why it is failing and what should be the logic if arm container is failing however the x86_64 is complete, should the teuthology-suite fail like it does now?

@kshtsk kshtsk requested a review from dmick April 22, 2021 16:18
@kshtsk
Copy link
Contributor Author

kshtsk commented Apr 22, 2021

And this fails reproduced with the command:

teuthology-suite -v --machine-type gra         --ceph octopus --suite smoke         -d centos -D 7.6         --filter-out ubuntu,rhel,7.7,rados_bench,kclient_workunit_suites_dbench,cfuse_workunit_suites_iozone,_s3tests         --limit 2         --seed 0         --newest 100```

@jdurgin
Copy link
Member

jdurgin commented Apr 22, 2021

in general teuthology-suite is supposed to only check for builds that are required by the jobs being scheduled - so in this case arm status should not matter

@dmick
Copy link
Member

dmick commented Apr 22, 2021

It looks like the problem was a 500 error from gitserver in the most-recent job:

2021-04-21 20:12:13,835.835 DEBUG:teuthology.suite.util:got response: {'committish': '7088a74fcebc164188f87a0b01ce9853309c18c6', 'err': 'fatal: bad object 7088a74fcebc164188f87a0b01ce9853309c18c6\n', 'sha1s': []}
2021-04-21 20:12:13,835.835 ERROR:teuthology.suite.util:cant find 2 parents of 7088a74fcebc164188f87a0b01ce9853309c18c6 in ceph: fatal: bad object 7088a74fcebc164188f87a0b01ce9853309c18c6

@kshtsk
Copy link
Contributor Author

kshtsk commented Apr 22, 2021

@susebot run deploy

@susebot
Copy link

susebot commented Apr 22, 2021

Commit aa89a2d is NOT OK.
Check tests results in the Jenkins job: https://ceph-ci.suse.de/job/pr-teuthology-deploy/331/

@dmick
Copy link
Member

dmick commented Apr 22, 2021

This "bad object" condition still obtains on git.ceph.com's local repo, but not on mine, which leads me to think that something's wrong with updating the repo (at git.ceph.com:/home/cephgit/gitserver/git). Looking into it.

$ git rev-list --first-parent --max-count 1 7088a74fcebc164188f87a0b01ce9853309c18c6
fatal: bad object 7088a74fcebc164188f87a0b01ce9853309c18c6

@kshtsk
Copy link
Contributor Author

kshtsk commented Apr 22, 2021

This "bad object" condition still obtains on git.ceph.com's local repo, but not on mine, which leads me to think that something's wrong with updating the repo (at git.ceph.com:/home/cephgit/gitserver/git). Looking into it.

$ git rev-list --first-parent --max-count 1 7088a74fcebc164188f87a0b01ce9853309c18c6
fatal: bad object 7088a74fcebc164188f87a0b01ce9853309c18c6

this is only the part of the problem with 'err'.

See this log:

2021-04-22 20:29:31,699.699 INFO:teuthology.suite.util:container build centos/8, checking for build_complete
2021-04-22 20:29:31,700.700 DEBUG:teuthology.packaging:Querying https://shaman.ceph.com/api/search?status=ready&project=ceph&flavor=default&distros=centos%2F8%2Fx86_64&sha1=7088a74fcebc164188f87a0b01ce9853309c18c6
2021-04-22 20:29:32,594.594 DEBUG:teuthology.packaging:Check build complete for results: {'status': 'ready', 'sha1': '7088a74fcebc164188f87a0b01ce9853309c18c6', 'extra': {'build_url': 'https://jenkins.ceph.com/job/ceph-dev-build/ARCH=arm64,AVAILABLE_ARCH=arm64,AVAILABLE_DIST=centos8,DIST=centos8,MACHINE_SIZE=gigantic/47474/', 'root_build_cause': 'SCMTRIGGER', 'version': '15.2.11-172-g7088a74f', 'node_name': '172.21.4.66+confusa04', 'job_name': 'ceph-dev-build/ARCH=arm64,AVAILABLE_ARCH=arm64,AVAILABLE_DIST=centos8,DIST=centos8,MACHINE_SIZE=gigantic', 'package_manager_version': '15.2.11-172.g7088a74f'}, 'url': 'https://3.chacra.ceph.com/r/ceph/octopus/7088a74fcebc164188f87a0b01ce9853309c18c6/centos/8/flavors/default/', 'distro_codename': None, 'modified': '2021-04-21 21:53:30.940025', 'distro_version': '8', 'project': 'ceph', 'flavor': 'default', 'ref': 'octopus', 'chacra_url': 'https://3.chacra.ceph.com/repos/ceph/octopus/7088a74fcebc164188f87a0b01ce9853309c18c6/centos/8/flavors/default/', 'archs': ['x86_64', 'source', 'arm64'], 'distro': 'centos'}
2021-04-22 20:29:32,594.594 DEBUG:teuthology.packaging:Querying https://shaman.ceph.com/api/builds/ceph/octopus/7088a74fcebc164188f87a0b01ce9853309c18c6...
2021-04-22 20:29:33,448.448 DEBUG:teuthology.packaging:Matched build: {'status': 'failed', 'sha1': '7088a74fcebc164188f87a0b01ce9853309c18c6', 'distro_arch': 'arm64', 'started': '2021-04-21 20:34:41.083883', 'distro_codename': None, 'completed': None, 'extra': {'node_name': '172.21.4.66+confusa04', 'version': '', 'build_user': '', 'root_build_cause': 'SCMTRIGGER', 'job_name': 'ceph-dev-build/ARCH=arm64,AVAILABLE_ARCH=arm64,AVAILABLE_DIST=centos8,DIST=centos8,MACHINE_SIZE=gigantic'}, 'modified': '2021-04-21 21:49:52.434171', 'distro_version': '8', 'project': 'ceph', 'url': 'https://jenkins.ceph.com/job/ceph-dev-build/ARCH=arm64,AVAILABLE_ARCH=arm64,AVAILABLE_DIST=centos8,DIST=centos8,MACHINE_SIZE=gigantic/47474/', 'log_url': 'https://jenkins.ceph.com/job/ceph-dev-build/ARCH=arm64,AVAILABLE_ARCH=arm64,AVAILABLE_DIST=centos8,DIST=centos8,MACHINE_SIZE=gigantic/47474//consoleFull', 'flavor': 'default', 'ref': 'octopus', 'distro': 'centos'}
2021-04-22 20:29:33,449.449 INFO:teuthology.suite.util:build not complete

It found first ARM build which is failed and gives up on checking next build.

And only then it faces:

2021-04-22 20:29:33,449.449 ERROR:teuthology.suite.run:Packages for os_type 'centos', flavor basic and ceph hash '7088a74fcebc164188f87a0b01ce9853309c18c6' not found
2021-04-22 20:32:09,452.452 ERROR:teuthology.suite.util:git refresh failed for ceph: <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>500 Internal Server Error</title>
</head><body>
<h1>Internal Server Error</h1>
<p>The server encountered an internal error or
misconfiguration and was unable to complete
your request.</p>
<p>Please contact the server administrator at 
 [no address given] to inform them of the time this error occurred,
 and the actions you performed just before this error.</p>
<p>More information about this error may be available
in the server error log.</p>
<hr>
<address>Apache/2.4.25 (Ubuntu) Server at git.ceph.com Port 8080</address>
</body></html>

2021-04-22 20:32:11,904.904 DEBUG:teuthology.suite.util:got response: {'committish': '7088a74fcebc164188f87a0b01ce9853309c18c6', 'err': 'fatal: bad object 7088a74fcebc164188f87a0b01ce9853309c18c6\n', 'sha1s': []}
2021-04-22 20:32:11,904.904 ERROR:teuthology.suite.util:cant find 2 parents of 7088a74fcebc164188f87a0b01ce9853309c18c6 in ceph: fatal: bad object 7088a74fcebc164188f87a0b01ce9853309c18c6

Traceback (most recent call last):
  File "/home/runner/src/teuthology_master/virtualenv/bin/teuthology-suite", line 33, in <module>
    sys.exit(load_entry_point('teuthology', 'console_scripts', 'teuthology-suite')())
  File "/home/runner/src/teuthology_master/scripts/suite.py", line 189, in main
    return teuthology.suite.main(args)
  File "/home/runner/src/teuthology_master/teuthology/suite/__init__.py", line 143, in main
    run.prepare_and_schedule()
  File "/home/runner/src/teuthology_master/teuthology/suite/run.py", line 397, in prepare_and_schedule
    num_jobs = self.schedule_suite()
  File "/home/runner/src/teuthology_master/teuthology/suite/run.py", line 614, in schedule_suite
    util.schedule_fail('Backtrack for --newest failed', name)
  File "/home/runner/src/teuthology_master/teuthology/suite/util.py", line 76, in schedule_fail
    raise ScheduleFailError(message, name)
teuthology.exceptions.ScheduleFailError: Scheduling runner-2021-04-22_20:29:19-smoke-octopus-distro-basic-gra failed: Backtrack for --newest failed

Why it stops to go further, since the --newest 100 is provided?

I believe there is combination of different issues.

@dmick
Copy link
Member

dmick commented Apr 22, 2021

it found first ARM build

it's supposed to be searching for arch as well.

Why it stops to go further

I'm assuming the 500 error is treated as fatal.

It's definitely the case that git.ceph.com is out of date and clogged up. Let's get that back into shape before going much further. It's possible that somehow the check for arch isn't working right, but I don't see it from code examination, and I don't want to experiment much right now with git.ceph.com as ill as it is.

@dmick
Copy link
Member

dmick commented Apr 22, 2021

in fact....the query string contained x86_64; I don't know why shaman gave a result including arm64 build. That's odd and I can't reproduce it. Using the exact query string from above:

$ curl -L 'https://shaman.ceph.com/api/search?status=ready&project=ceph&flavor=default&distros=centos%2F8%2Fx86_64&sha1=7088a74fcebc164188f87a0b01ce9853309c18c6' | jq .
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   515  100   515    0     0   5000      0 --:--:-- --:--:-- --:--:--  4951
100   976  100   976    0     0   6921      0 --:--:-- --:--:-- --:--:--  6921
[
  {
    "status": "ready",
    "sha1": "7088a74fcebc164188f87a0b01ce9853309c18c6",
    "extra": {
      "build_url": "https://jenkins.ceph.com/job/ceph-dev-build/ARCH=arm64,AVAILABLE_ARCH=arm64,AVAILABLE_DIST=centos8,DIST=centos8,MACHINE_SIZE=gigantic/47474/",
      "root_build_cause": "SCMTRIGGER",
      "version": "15.2.11-172-g7088a74f",
      "node_name": "172.21.4.66+confusa04",
      "job_name": "ceph-dev-build/ARCH=arm64,AVAILABLE_ARCH=arm64,AVAILABLE_DIST=centos8,DIST=centos8,MACHINE_SIZE=gigantic",
      "package_manager_version": "15.2.11-172.g7088a74f"
    },
    "url": "https://3.chacra.ceph.com/r/ceph/octopus/7088a74fcebc164188f87a0b01ce9853309c18c6/centos/8/flavors/default/",
    "distro_codename": null,
    "modified": "2021-04-21 21:53:30.940025",
    "distro_version": "8",
    "project": "ceph",
    "flavor": "default",
    "ref": "octopus",
    "chacra_url": "https://3.chacra.ceph.com/repos/ceph/octopus/7088a74fcebc164188f87a0b01ce9853309c18c6/centos/8/flavors/default/",
    "archs": [
      "x86_64",
      "source",
      "arm64"
    ],
    "distro": "centos"
  }
]



@kshtsk
Copy link
Contributor Author

kshtsk commented Apr 23, 2021

@dmick You are querying different request, the build_complete check is making request to, for example, https://shaman.ceph.com/api/builds/ceph/octopus/7088a74fcebc164188f87a0b01ce9853309c18c6, and then goes throw all the builds and compares with what you've been queried.

@kshtsk
Copy link
Contributor Author

kshtsk commented Apr 23, 2021

@susebot run deploy

@dmick
Copy link
Member

dmick commented Apr 23, 2021

@kshtsk : I see that now. I was using the URL from the log, but, build_complete doesn't log its request. hacking, unless you already have something.

@susebot
Copy link

susebot commented Apr 23, 2021

Commit aa89a2d is OK.
Check tests results in the Jenkins job: https://ceph-ci.suse.de/job/pr-teuthology-deploy/332/

@kshtsk
Copy link
Contributor Author

kshtsk commented Apr 23, 2021

@dmick no, not yet fixed, so far so good scheduling passed, did anyone fixed the internal server error?

@dmick
Copy link
Member

dmick commented Apr 23, 2021

If you mean the git.ceph.com/history error, yes, that should be working well again. I suspect if you hit another condition where the arm64 build failed and is first in the response, it will still fail to find that build, but it might step back in history until it works now that the git.ceph.com/history works.

@kshtsk
Copy link
Contributor Author

kshtsk commented Apr 23, 2021

yeah right

@djgalloway djgalloway changed the base branch from master to main June 1, 2022 17:03
@kshtsk
Copy link
Contributor Author

kshtsk commented Jul 2, 2024

Closing since there is another PR merged addressing the issue #1958

@kshtsk kshtsk closed this Jul 2, 2024
@kshtsk kshtsk deleted the fix-get-sha1s branch July 2, 2024 00:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants