Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Glob reading with the file: schema fails on some paths #582

Open
thomaspurchas opened this issue Jul 17, 2024 · 5 comments
Open

Glob reading with the file: schema fails on some paths #582

thomaspurchas opened this issue Jul 17, 2024 · 5 comments

Comments

@thomaspurchas
Copy link
Contributor

The handling of files reads using read* is a bit broken at the moment. Behaviour changes depending on whether or not the file: schema is used, and behaviour in a pkl vs the REPL doesn't always seem consistent. Additionally there are a number of simple glob patterns that cause NullPointerExceptionss or similar incorrect behaviour.

Pkl REPL:

An unexpected error has occurred. Would you mind filing a bug report?
Cmd+Double-click the link below to open an issue.
Please copy and paste the entire error output into the issue's description, provided you can share it.

https://github.com/apple/pkl/issues/new

java.lang.NullPointerException

–– Pkl Error ––
None (cause has no message)

1 | read*("file:**/*.txt")
    ^^^^^^^^^^^^^^^^^^^^^^
at  (repl:pkl0)

Pkl 0.27.0-dev+7917ddb0c (macOS 15.0, native)

java.lang.NullPointerException
	at org.pkl.core.util.GlobResolver.splitGlobPatternIntoBaseAndWildcards(GlobResolver.java:452)
	at org.pkl.core.util.GlobResolver.resolveGlob(GlobResolver.java:483)

This error is is caused by the unsafe assumption that .getPath always returns a value. We should probably be using . getSchemeSpecificPart instead.

Evaling the following will work

a = read*("**/*.txt")

but not (addition of the file: schema)

a = read*("file:**/*.txt")

and running read*("**/*.txt") in the REPL results in

read*("**/*.txt")
–– Pkl Error ––
No resource reader is registered for scheme null.

1 | read*("**/*.txt")
    ^^^^^^^^^^^^^^^^^
at  (repl:pkl0)
@bioball
Copy link
Contributor

bioball commented Jul 17, 2024

The underlying issue here is that file: URIs are hierarchical; it's invalid for a file URI to not have a hierarchical path part (a path that starts with /); see RFC-8089.

Right now, read("file:foo.txt") will cause Pkl to read foo.txt relative to the process CWD, but file:foo.txt is not a valid file URI. This might be changed to a throw in the future. For similar reasons, read*("file:*") is also not supported, and would also be changed to a throw.

When globbing using a file scheme, the path part must be a hierarchical path. For example, read*("file:/path/to/*.txt") (or read("file:///path/to/*.txt" for a more canonical representation; file URIs typically have a host section.)

@thomaspurchas
Copy link
Contributor Author

thomaspurchas commented Jul 17, 2024

In that situation, how are we meant to use read() with a relative path, in particular, how can we use read() with a relative in the REPL where the schema is mandatory?

Also these nuances aren't described in the docs. The docs mention the file: scheme, and under the Globbed Read section provides examples of using read without the file: scheme. But I can't see anywhere that explicitly states relative reads shouldn't use the file: scheme.

Also for the read API, do we have to stick so closely to the RFC? Obviously we should correctly handle any RFC-8089 valid URI. But there's no reason we can't also accept technically invalid URI, but where the intent is still obvious and useful (e.g. using the file: scheme with a relative path)

@bioball
Copy link
Contributor

bioball commented Jul 17, 2024

I don't see a strong need for relative paths with file URIs. If this is declared from within a file-based module, it should use a relative path without the scheme part. If declared within a non file-based module (e.g. from a package), what should the path be relative to? It can't be relative to the enclosing module, and also shouldn't be relative to the CWD, because we want imports to be statically analyzable.

In particular, how can we use read() with a relative in the REPL where the schema is mandatory

I think this is a tangential but definitely a problem. Ideally, you should be able to use relative file imports from within the REPL.

@thomaspurchas
Copy link
Contributor Author

That sounds reasonable. I guess then my issue is that docs aren't very clear. It's not obvious that using read() with file: and without are different. It also creates a slightly odd situation where file reading has a "special" status, in that it supports relative reads is you don't use the file: scheme, but no other scheme has that capability. Although it doesn't really make sense for other schemes to support relative paths in any situation, it still creates a slightly inconsistent read() API, that's a little surprising.

@bioball
Copy link
Contributor

bioball commented Jul 17, 2024

Agree that we should improve our docs here. We have a section on relative paths, but it's easily missed and a common source of confusion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants