Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove dependency on CGo #2649

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

Conversation

pojntfx
Copy link

@pojntfx pojntfx commented Dec 3, 2024

This removes the dependency on CGo caused by to the hard-coded import of the github.com/mattn/go-sqlite3 SQLite driver by replacing it with the CGo-free github.com/ncruces/go-sqlite3 alternative. To answer #2092 (comment), unlike other alternatives like https://pkg.go.dev/modernc.org/sqlite, this is the C version, it's just embedded as Wasm instead of linked against the native version with CGo.

The reason for replacing the hard-coded import is mostly that importing it here makes it impossible to use a CGo-free SQLite driver anywhere that imports github.com/containers/image since it's not possible to register two drivers with the same name at the same time:

panic: sql: Register called twice for driver sqlite3

goroutine 1 [running]:
database/sql.Register({0x1c69587, 0x7}, {0x2010700, 0xc000ad7e60})
        /usr/lib/golang/src/database/sql/sql.go:62 +0x150
github.com/ncruces/go-sqlite3/driver.init.0()
        /home/pojntfx/go/pkg/mod/github.com/ncruces/[email protected]/driver/driver.go:99 +0x58
exit status 2

Alternatively it might also make sense to do what ORMs like ent and most other projects that depend on a SQLite implementation do which is to make it the responsibility of the user to import the driver themselves, which avoids this problem entirely.

Copy link
Collaborator

@mtrmac mtrmac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

I think, if this should happen, we’d need this to be coordinated with Podman at least. Cc: @mheon .

I’m a bit skeptical — if nothing else, embedding a 1.34 MB binary not built from source is not really acceptable, and requiring/maintaining a wasm toolchain to build c/image and/or Podman is a notable complication.


The reason for replacing the hard-coded import is mostly that importing it here makes it impossible to use a CGo-free SQLite driver anywhere that imports github.com/containers/image since it's not possible to register two drivers with the same name at the same time:

That conflict can only really be solved by one of the two projects choosing to use a different name. If we decided to embed the github.com/ncruces/go-sqlite3 here, that would, instead, prevent importing github.com/mattn/go-sqlite3. That’s not much of an improvement.

Alternatively it might also make sense to do what ORMs like ent and most other projects that depend on a SQLite implementation do which is to make it the responsibility of the user to import the driver themselves, which avoids this problem entirely.

The thing is, users should not need to care: this is an internal cache that users should not manage individually, so if they call c/image/v5/copy.Image, they need to get some cache. We could provide an option for users to use some other BlobInfoCache, or even some other backend to the sqlite cache backend — but we would still need a default for users who don’t specify any, and therefore we would still depend on some specific implementation.


One option here might be to use Go build tags to choose between the alternative backends. We’d probably still want to default to the current one, but, shrug, not-encouraged build tags are not that costly.


(Marked as ”request changes” to prevent overzealous merging.)

@mheon
Copy link
Member

mheon commented Dec 3, 2024

It's understandable to not want a library like c/image to depend on CGo, but Miloslav is quite correct that this is deeply tied to Podman (we'd end up with two separate sqlite3 drivers imported, which sounds like a bunch of problems waiting to happen - plus a substantial size increase to our executable). I'm reluctant to swap Podman over because of the massive amount of support the mattn driver receives (with so many users, we're a lot less likely to find bugs). I'm also a big worried about relative performance - losing CGo seems like it could cost there. But I am not entirely opposed to a swap.

@pojntfx
Copy link
Author

pojntfx commented Dec 9, 2024

I’m a bit skeptical — if nothing else, embedding a 1.34 MB binary not built from source is not really acceptable, and requiring/maintaining a wasm toolchain to build c/image and/or Podman is a notable complication.

This is very understandable. wazero-based libraries are usually super portable for embedding from my experience, but if the goal is to also rebuild them (as I presume most downstreams/distros probably would want to), then that adds a lot of complexity.

The thing is, users should not need to care: this is an internal cache that users should not manage individually, so if they call c/image/v5/copy.Image, they need to get some cache. We could provide an option for users to use some other BlobInfoCache, or even some other backend to the sqlite cache backend — but we would still need a default for users who don’t specify any, and therefore we would still depend on some specific implementation.

That's a fair point - users having to import the SQL driver manually this way would also be a pretty implicit dependency.

I'm reluctant to swap Podman over because of the massive amount of support the mattn driver receives (with so many users, we're a lot less likely to find bugs).

That's very fair. The Wasm versions are at least not rewrites/ports the way the modernc version is for example, but they are still much less tested than the CGo version.

I'm also a big worried about relative performance - losing CGo seems like it could cost there.

It's actually a bit faster than the CGo versions in most tasks - I presume due to the complexities with CGo's context switching: https://github.com/cvilsmeier/go-sqlite-bench - that's not to say that there aren't any regressions in unexpected places ofc.

One option here might be to use Go build tags to choose between the alternative backends. We’d probably still want to default to the current one, but, shrug, not-encouraged build tags are not that costly.

I think this might be a good idea. Go devs seem to be used to requiring some manual configuration for static compilation of Go applications already (containers_image_openpgp comes to mind). Maybe it could be an option to use the current SQLite driver by default, but to switch to BoltDB if containers_image_boltdb is specified?

@mtrmac
Copy link
Collaborator

mtrmac commented Dec 9, 2024

I was going to suggest using a build tag to choose between the two sqlite implementations, but that doesn’t work ideally — in that case users of c/image would include both versions in their source code. The deactivated one would presumably not be included in the final binary, but, still, Open Source concerns about shipping the full source code (in the “preferred form for making modifications” sense) would apply.

An option to switching to the existing BoltDB backend is not ideal (the BoltDB backend, IIRC, handles some corruptions by triggering a Go panic, a rather inconvenient error reporting mechanism), but at least the licensing/source code aspect would be all right.

@pojntfx
Copy link
Author

pojntfx commented Dec 11, 2024

If neither BoltDB nor shipping the prebuilt Wasm binary are an option - maybe using the Go port of SQLite (with https://pkg.go.dev/modernc.org/sqlite), but hiding it behind the build tag would work? It's less tested and more restrictive in terms of architectures it can run on compared to the Wasm version, but that should at least fix the issue with including the prebuilt binary otherwise, plus it wouldn't have an effect on existing systems if it's not explicitly specified, much like how containers_image_openpgp uses the Go port of PGP only if the tag is specified.

@mtrmac
Copy link
Collaborator

mtrmac commented Dec 11, 2024

If all the sqlite database/sql backends are used the same way, it might be simplest to gate just the import of the backend behind a (lack of a) Go tag. Then the default build of c/image would continue to use the current backend; and non-CGo callers of c/image could build c/image with the opt-out Go tag, and import a backend of their choice themselves.

@ncruces
Copy link

ncruces commented Dec 14, 2024

Hi! I'm obviously biased so I won't try to sway you too much. I'd still like to say a couple of things.

I’m a bit skeptical — if nothing else, embedding a 1.34 MB binary not built from source is not really acceptable, and requiring/maintaining a wasm toolchain to build c/image and/or Podman is a notable complication.

I understand the skepticism, but if it makes you more comfortable, building the binary is fully reproducible.

My release process is that I build it locally, usually on an Intel mac, the binary is committed, I then tag a release, and manually run this script in a GitHub action across 3 OSes (macOS, Linux and Windows). I use git to check all 3 builds produce the same binaries, and generate an attestation that includes the SHA256 hash of all binaries for that tag.

If you run the process locally (which is not hard), you'll get the same hash, but the point of attestations is that you shouldn't need to.

You can also do a custom build of SQLite and use my driver with it (which you can't with modernc).

That conflict can only really be solved by one of the two projects choosing to use a different name.

For my driver this can be solved by changing this variable at build time.

One reason to force the conflict, is that if you by accident use 2 different versions of SQLite in the same process to access the same database you are very likely to corrupt data in the process.

It's actually a bit faster than the CGo versions in most tasks - I presume due to the complexities with CGo's context switching: https://github.com/cvilsmeier/go-sqlite-bench - that's not to say that there aren't any regressions in unexpected places ofc.

Everything that's CPU bound will be slower than mattn and maybe modernc. My driver will also use more memory. There's a startup cost to JIT compiling the Wasm, and the interpreter is slow if your platform doesn't support the compiler. My driver wins out on IO, concurrency (modernc has some serious issues there), and (my actual goal): developer experience.

All that said, build tags are a perfectly reasonable way of supporting multiple drivers!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants