Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

server: broken tests in bazel #61913

Closed
rickystewart opened this issue Mar 12, 2021 · 17 comments · Fixed by #65899
Closed

server: broken tests in bazel #61913

rickystewart opened this issue Mar 12, 2021 · 17 comments · Fixed by #65899
Assignees
Labels
A-build-system C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.

Comments

@rickystewart
Copy link
Collaborator

We are making a lot of progress in the migration from make to Bazel. Before we can complete our migration, we need to make sure that all tests in-tree continue to pass when run inside of the Bazel sandbox. The following tests are broken when run in the Bazel sandbox:

pkg/util/caller:caller_test
pkg/util/log:log_test

Please help us by doing the following:

  • Reproduce the failure locally; for example, with bazel test bazel test pkg/util/caller:caller_test, and consult the test logs to see what went wrong
  • Identify the root cause of the failure
  • If you are able, repair the failure such that the test passes, and submit a PR removing the broken_in_bazel tag from the test (see the linked documentation below)
  • If not, take your findings to #bazel in Slack, and we’ll work with you to figure out how we can get the tests working

Helpful documentation:

@rickystewart rickystewart added the C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior. label Mar 12, 2021
@knz
Copy link
Contributor

knz commented Apr 13, 2021

@rickystewart I looked into this today; I encounter the following error when I run the bazel command, but it seems unrelated to the packages you refer to at the top:

kena@kenax ....com/cockroachdb/cockroach % bazel test pkg/util/log:log_test
INFO: Build option --define has changed, discarding analysis cache.
INFO: Repository com_github_cockroachdb_errors instantiated at:
  /data/home/kena/src/go/src/github.com/cockroachdb/cockroach/WORKSPACE:154:8: in <toplevel>
  /data/home/kena/src/go/src/github.com/cockroachdb/cockroach/DEPS.bzl:537:18: in go_deps
Repository rule go_repository defined at:
  /data/home/kena/.cache/bazel/_bazel_kena/9b9f53b7334495d6a871cbb472c466bb/external/bazel_gazelle/internal/go_repository.bzl:194:32: in <toplevel>
ERROR: An error occurred during the fetch of repository 'com_github_cockroachdb_errors':
   Traceback (most recent call last):
        File "/data/home/kena/.cache/bazel/_bazel_kena/9b9f53b7334495d6a871cbb472c466bb/external/bazel_gazelle/internal/go_repository.bzl", line 192, column 10, in _go_repository_impl
                patch(ctx)
        File "/data/home/kena/.cache/bazel/_bazel_kena/9b9f53b7334495d6a871cbb472c466bb/external/bazel_gazelle/internal/go_repository.bzl", line 290, column 17, in patch
                fail("Error applying patch %s:\n%s%s" %
Error in fail: Error applying patch @cockroach//build/patches:com_github_cockroachdb_errors.patch:
Hmm...  Looks like a unified diff to me...
The text leading up to this was:
--------------------------
|diff -urN a/errorspb/BUILD.bazel b/errorspb/BUILD.bazel
|--- a/errorspb/BUILD.bazel     1969-12-31 19:00:00.000000000 -0500
|+++ b/errorspb/BUILD.bazel     2000-01-01 00:00:00.000000000 -0000
--------------------------
Patching file errorspb/BUILD.bazel using Plan A...
Hunk #1 failed at 1.
No such line 35 in input file, ignoring
Hunk #2 succeeded at 25 (offset -12 lines).
1 out of 2 hunks failed--saving rejects to errorspb/BUILD.bazel.rej
done
ERROR: Error fetching repository: Traceback (most recent call last):
        File "/data/home/kena/.cache/bazel/_bazel_kena/9b9f53b7334495d6a871cbb472c466bb/external/bazel_gazelle/internal/go_repository.bzl", line 192, column 10, in _go_repository_impl
                patch(ctx)
        File "/data/home/kena/.cache/bazel/_bazel_kena/9b9f53b7334495d6a871cbb472c466bb/external/bazel_gazelle/internal/go_repository.bzl", line 290, column 17, in patch
                fail("Error applying patch %s:\n%s%s" %
Error in fail: Error applying patch @cockroach//build/patches:com_github_cockroachdb_errors.patch:
Hmm...  Looks like a unified diff to me...
The text leading up to this was:
--------------------------
|diff -urN a/errorspb/BUILD.bazel b/errorspb/BUILD.bazel
|--- a/errorspb/BUILD.bazel     1969-12-31 19:00:00.000000000 -0500
|+++ b/errorspb/BUILD.bazel     2000-01-01 00:00:00.000000000 -0000
--------------------------
Patching file errorspb/BUILD.bazel using Plan A...
Hunk #1 failed at 1.
No such line 35 in input file, ignoring
Hunk #2 succeeded at 25 (offset -12 lines).
1 out of 2 hunks failed--saving rejects to errorspb/BUILD.bazel.rej
done
ERROR: /data/home/kena/src/go/src/github.com/cockroachdb/cockroach/pkg/util/log/BUILD.bazel:120:8: //pkg/util/log:log_test depends on @com_github_cockroachdb_errors//:errors in repository @com_github_cockroachdb_errors which failed to fetch. no such package '@com_github_cockroachdb_errors//': Error applying patch @cockroach//build/patches:com_github_cockroachdb_errors.patch:
Hmm...  Looks like a unified diff to me...
The text leading up to this was:
--------------------------
|diff -urN a/errorspb/BUILD.bazel b/errorspb/BUILD.bazel
|--- a/errorspb/BUILD.bazel     1969-12-31 19:00:00.000000000 -0500
|+++ b/errorspb/BUILD.bazel     2000-01-01 00:00:00.000000000 -0000
--------------------------
Patching file errorspb/BUILD.bazel using Plan A...
Hunk #1 failed at 1.
No such line 35 in input file, ignoring
Hunk #2 succeeded at 25 (offset -12 lines).
1 out of 2 hunks failed--saving rejects to errorspb/BUILD.bazel.rej
done
ERROR: Analysis of target '//pkg/util/log:log_test' failed; build aborted: Analysis failed
INFO: Elapsed time: 0.521s
INFO: 0 processes.
FAILED: Build did NOT complete successfully (32 packages loaded, 122 targets configured)
FAILED: Build did NOT complete successfully (32 packages loaded, 122 targets configured)
    Fetching @com_github_cockroachdb_datadriven; fetching
    Fetching @com_github_kr_pretty; fetching
    Fetching @com_github_stretchr_testify; fetching
    Fetching @org_golang_x_net; fetching

Can you advise what to do?

@knz
Copy link
Contributor

knz commented Apr 13, 2021

(FYI I am using bazel 4.0.0)

@knz
Copy link
Contributor

knz commented Apr 13, 2021

I am able to work around the error above by commenting out the "patch" directives in DEPS.bzl.

Then I can reproduce the error (and I will be able to fix the broken test). Would still like to know what's wrong with the errorspb patch.

@knz
Copy link
Contributor

knz commented Apr 13, 2021

@rickystewart I have investigated the failing tests further.

There is something problematic in the way that bazel invokes the Go compiler or the test engine.

Specifically, the file names for source files stored by the compiler/linker into the executable is different: they all include the pkg/ prefix, whereas an executable built without bazel strips the pkg/ prefix from file names embedded inside the executable.

This is problematic because these file names are used in multiple places inside the source code -- not just in tests. The fact the tests fail is not an indication that the test is broken; it's a positive signal (as the test was intended) that the binary was not built properly.

I think we need to look at this first.

@knz
Copy link
Contributor

knz commented Apr 13, 2021

I could perhaps investigate further if you could teach me how to run bazel verbosely, with details about which flags are passed to the go commands and where these flags are defined.

@rickystewart
Copy link
Collaborator Author

The -s flag prints out all the commands that Bazel runs to produce a build/test. You might also want --sandbox_debug, which keeps some stuff in the sandbox around so you can cd in and manually play with things if that helps.

What configuration do we need to set to make sure the pkg/ prefix gets stripped out? Presumably there's something we're doing in the Makefile that wasn't translated to Bazel correctly, but I can't find it.

Let me open a bug for that diff thing.

@knz
Copy link
Contributor

knz commented Apr 28, 2021

I will use -s --sandbox_debug to scrutinize the flags we pass to the go compiler and see if there's a difference.

It's also possible that the go executable has logic internally that causes the file paths to be processed differently depending on whether you use go build to compile+link or only to link. Investigation will tell.

@knz
Copy link
Contributor

knz commented Apr 28, 2021

One step further in the analysis, using the flags you recommended:

  • I was able to confirm that the .a files generated for each sub-package properly include the pkg/ prefix in the object definitions therein.
  • the link script in bazel-out/freebsd-fastbuild/bin/pkg/cmd/cockroach-short/cockroach-short_/cockroach-short-0.params contain all the paths with a pkg/ prefix, so that's also fine.

However I see that the final link stage is not done using the go executable, but instead a program called builder, in bazel-out/host/bin/external/go_sdk/builder.

How is this program generated?

@rickystewart
Copy link
Collaborator Author

I believe that's https://github.com/bazelbuild/rules_go/tree/master/go/tools/builders (see builder.go in particular), but I'm not 100% sure. I can look more closely at it later.

@knz
Copy link
Contributor

knz commented Apr 28, 2021

Thank you I'll start there and investigate further

@knz
Copy link
Contributor

knz commented Apr 29, 2021

I found a much clearer symptom of the issue reported at top. Compare running the following command:

cockroach demo --empty -e " select crdb_internal.force_error('XX000', 'woo')"

Using either a binary produced by bazel build, or one produced by make buildshort.

With make buildshort, the stack trace looks like this:

DETAIL: stack trace:
github.com/cockroachdb/cockroach/pkg/sql/sem/builtins/builtins.go:4082: func223()
github.com/cockroachdb/cockroach/pkg/sql/sem/tree/eval.go:4049: Eval()
github.com/cockroachdb/cockroach/pkg/sql/values.go:156: startExec()
...

With bazel, the stack trace looks like this:

DETAIL: stack trace:
pkg/sql/sem/builtins/builtins.go:4082: func223()
pkg/sql/sem/tree/eval.go:4049: Eval()
pkg/sql/values.go:156: startExec()

The 2nd one is incorrect.

@knz
Copy link
Contributor

knz commented Apr 29, 2021

I will fork this into a separate issue.

@knz
Copy link
Contributor

knz commented Apr 29, 2021

Separate issue: #64379 - let's move the convo there. Once we solve that issue, I'll resume work on fixing the broken tests.

@knz
Copy link
Contributor

knz commented Apr 29, 2021

See this comment: #64379 (comment)

Until that separate issue is addressed, we may be able to alleviate the broken tests here by changing the logic to not rely on file paths as much.

rickystewart added a commit to rickystewart/cockroach that referenced this issue May 12, 2021
This works around an issue where Go source file names as stored in the
`cockroach` binary were truncated (e.g., with the leading
`github.com/cockroachdb/cockroach` prefix, or even the entire package
name prefix, removed). This breaks unit tests (cockroachdb#61913) and some other
internal stuff.

We solve this by staging all Go source files during the build in a
temporary directory named after the package. This incurs an additional
I/O cost, but for now while our codebase isn't able to deal with the
differing file names, we can deal with it.

Fixes cockroachdb#64379
See also cockroachdb#64383

Release note: None
rickystewart added a commit to rickystewart/cockroach that referenced this issue May 13, 2021
This works around an issue where Go source file names as stored in the
`cockroach` binary were truncated (e.g., with the leading
`github.com/cockroachdb/cockroach` prefix, or even the entire package
name prefix, removed). This breaks unit tests (cockroachdb#61913) and some other
internal stuff.

We solve this by staging all Go source files during the build in a
temporary directory named after the package. This incurs an additional
I/O cost, but for now while our codebase isn't able to deal with the
differing file names, we can deal with it.

Fixes cockroachdb#64379
See also cockroachdb#64383

Release note: None
craig bot pushed a commit that referenced this issue May 13, 2021
47413: kvserver/reports: handle joint-quorums r=andreimatei a=andreimatei

See individual commits.

65094: bazel: stage all go source files in a temp dir named after the package r=rail,knz a=rickystewart

This works around an issue where Go source file names as stored in the
`cockroach` binary were truncated (e.g., with the leading
`github.com/cockroachdb/cockroach` prefix, or even the entire package
name prefix, removed). This breaks unit tests (#61913) and some other
internal stuff.

We solve this by staging all Go source files during the build in a
temporary directory named after the package. This incurs an additional
I/O cost, but for now while our codebase isn't able to deal with the
differing file names, we can deal with it.

Fixes #64379
See also #64383

Release note: None

Co-authored-by: Andrei Matei <[email protected]>
Co-authored-by: Ricky Stewart <[email protected]>
@rickystewart
Copy link
Collaborator Author

whoops :)

@rickystewart rickystewart reopened this May 13, 2021
@rickystewart
Copy link
Collaborator Author

@knz Gentle ping on this -- these are the last tests we're waiting for, so once they're fixed we can move on to more exciting stuff :)

@knz
Copy link
Contributor

knz commented May 31, 2021

Hi Ricky, thanks for the reminder, and my apologies for the tardiness.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-build-system C-bug Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants