Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hdf5: figure out how to resolve the Go pointer to Go pointer issue #12

Open
kortschak opened this issue Aug 12, 2017 · 12 comments
Open

Comments

@kortschak
Copy link
Member

kortschak commented Aug 12, 2017

The GODEBUG=cgocheck=0 incantation is untenable in the long run. So we need to sort out how to deal with types holding pointers.

It is not clear to me how to deal with this using the current approach to calling HDF5 functions where entire values are simply handed off to the library. But anything else involves significant reengineering/reimplementation of the HDF5 code in Go.

Note that the example in the tests that fails this runtime check is already incorrect, since the string field that causes the failure is not correctly round-tripped anyway.

@sbinet
Copy link
Member

sbinet commented Aug 14, 2017

for gopy, I opted for the following:

  • when passing a Go value to python, just send an identifier (an int32 or int64) idtype
  • the identifier is kept in a map[idtype]unsafe.Pointer registry so Go won't change the address of the pointer (a reverse map is also built)
  • when one needs to call a function on this handle, send that call to the Go side (idtype + what call to perform)

@kortschak
Copy link
Member Author

In gopy, you still required GODEBUG=cgocheck=0 from the README.

I do not see how this approach fixes the problem though. The runtime will call the cgo pointer check at some point, and any pointer hiding just pushes the problem into harder to debug areas. The issue that raised this is the WriteSubset function which by necessity has Go pointers inside a pointer value, unless we prohibit the serialisation of structs containing slices and strings (and any other types allowed by HDF5 that are reference-like types).

@ianlancetaylor, do you have any suggestions for how to deal with this kind of a problem. The relevant type is s1Type which is handled here where it's interpreted into an unsafe.Pointer at L145, but containing a string field with its data pointer.

@sbinet
Copy link
Member

sbinet commented Aug 14, 2017

yes, gopy master is still requiring GODEBUG=cgocheck=0.
I am working on this new scheme I devised above.

@ianlancetaylor
Copy link

@kortschak If Python does not examine the contents of the Go value, and doesn't do anything with it other than pass it back to Go, then it seems to me that the scheme that @sbinet outlines will work. You can see a roughly similar scheme at https://github.com/swig/swig/blob/master/Lib/go/goruntime.swg#L418 , used to pass Go values through C++ code.

@kortschak
Copy link
Member Author

The issue here is not with python, but with HDF5. The HDF5 routine needs to read the data pointed to by the Go pointer in pointer in order to be able to serialise it in the HDF5 file. The HDF5 call in the Go wrapper passes what is essentially an opaque pointer with another parameter explaining how to handle it. The HDF5 code does not call back into Go.

@ianlancetaylor
Copy link

OK, if I understand that correctly, then the only way to handle it is to copy the data into memory allocated by C.malloc. Sorry.

@kortschak
Copy link
Member Author

That's what I thought. This is going to be unpleasant.

@kortschak
Copy link
Member Author

@ianlancetaylor, to follow this up, I'd like to clarify the position on having a C pointer, backed by an allocation be C.malloc, which is populated with a shallow copy of the original Go data structure. This seems like it breaks the spirit of the check and would fit in the vein of the last paragraph of the Passing pointers section.

Concretely:

type T struct {
    data string
}

func foo() {
    g := &T{"value"}
    c := C.malloc(unsafe.Sizeof(T{})
    defer C.free(c)
    *c  = *g
    C.hdf5_function(c)
}

If this is OK, it would solve partially the problem, unless there is nesting, at which point we really just need to recursively copy the data to C-allocated memory (which I think is what we will have to do in the long run).

@ianlancetaylor
Copy link

That is not OK, because it is never OK to store a Go pointer into C memory (where the definition of a Go pointer is a pointer to memory allocated by the Go memory allocator, and the definition of C memory is memory allocated by the C memory allocator).

It may or may not help to read https://github.com/golang/proposal/blob/master/design/12416-cgo-pointers.md .

@kortschak
Copy link
Member Author

Thanks. I figured that, but wanted to check. I guess that leaves us with the task of writing a generic deep copy function from Go to C memory.

I have read that proposal in the past, but probably should re-read it.

@kortschak
Copy link
Member Author

@sbinet @TacoVox While I was looking at the finalizer problem I was also consider what we will do here.

We need a deep copy routine for moving Go<->C, but we don't always need to use it; in the case of structs of arrays or structs of simple types there is no need for the copy. I added some code to the Datatype construction function that notes whether the value contains a pointer, leaving a mark at the root type that can then be used later. This means that when a write is performed with one of the non-pointerish types not copy will be done, but would be if it is pointerish.

The read operation is more complicated since we don't start with a known Go data structure; the first operation is an OpenDataset which gives no access to the structure of the data type. That can however be determined on the first read by the Dataset which could be given a similar field reflecting whether there are pointers so caching that work and not copying on subsequence reads.

Does this sound reasonable?

The copy should be reasonably straightforward since we are not handling map types. It's just a walk and retaining a map of visited unsafe.Pointer to ensure we don't get caught in cycles. We cannot walk unsafe.Pointers and we should panic on that.

@donkahlero
Copy link
Collaborator

SGTM 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants