Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DuckDBClient.of does not create tables in database when Arrow tables are passed in parameter #623

Open
keller-mark opened this issue Mar 15, 2024 · 4 comments

Comments

@keller-mark
Copy link

Is your feature request related to a problem? Please describe.

Is DuckDBClient.of() intended to work with arrow Table objects?

The following code snippet returns an empty list of tables.

(await DuckDBClient.of({
    arrowTable: arrow.tableFromArrays({
      col1: [1, 2, 3],
      col2: ["one", "two", "three"],
    }),
  })).describeTables()

All examples i can find use FileAttachments. The stdlib source code seems to indicate this is possible but I cannot find examples or tests to reference.

Describe the solution you'd like

  • Documentation about using DuckDBClient.of with Arrow table objects directly (rather than FileAttachments), specifically whether a CREATE TABLE step is required vs. implicit based on the Arrow table schema.
  • Perhaps throw an error/warning if the above pattern is not supported

Describe alternatives you've considered

Additional context

Minimal reproducer: https://observablehq.com/d/e21c08e832074f40

@mootari
Copy link
Member

mootari commented Mar 15, 2024

The problem is that DuckDB loads its own instance of Arrow and then does instanceof checks against those symbols. If you load Arrow from a different URL it becomes a separate module and those checks will fail.

The workaround (for now) is to load the exact same module:

arrow = import('https://cdn.observableusercontent.com/npm/[email protected]/+esm')

@mbostock
Copy link
Member

For what it’s worth, this isn’t an issue with Observable Framework because we expressly override dependency resolution to ensure a consistent version of Apache Arrow. It would be better if DuckDB used duck testing instead of instanceof, though. (And c’mon, you’d think DuckDB would know to use “duck” testing… 🦆)

https://github.com/observablehq/framework/blob/84d3e5c3a4809d0062dabc815c94402eaef9c838/src/npm.ts#L161-L166

Though, there is a separate issue with Observable Framework which is that db.describeTables is currently broken because we’ve switched to returning Arrow tables from queries for performance. But I have a fix for that latter issue up at observablehq/framework#1068.

@keller-mark
Copy link
Author

Thanks for the info and the workaround! It seems the instanceof checks are happening within the Arrow source code (possibly one of these lines) and not the DuckDB source code (insertArrowTable source), so another workaround is to run arrow.tableToIPC (followed by conn.insertArrowFromIPCStream) using the same Arrow library instance that was used to run arrow.tableFromArrays (example in notebook). Since buffer is a Uint8Array, there are no instanceof issues (though at the same time, it could potentially be any Uint8Array).

@mootari
Copy link
Member

mootari commented May 19, 2024

Related: duckdb/duckdb-wasm#1708 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants