Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add input cleaner #51

Merged
merged 1 commit into from
Feb 19, 2019
Merged

Add input cleaner #51

merged 1 commit into from
Feb 19, 2019

Conversation

djdv
Copy link
Contributor

@djdv djdv commented Feb 13, 2018

So that input such as http://localhost:8080/ipfs/QmcbLLZJa9oXnYTZWfS4T16n8B2mHFZkAzkKBNdGVzgG8g, https://ipfs.io/ipfs/QmcbLLZJa9oXnYTZWfS4T16n8B2mHFZkAzkKBNdGVzgG8g, etc. may be passed in and turned into proper references (/ipfs/QmcbLLZJa9oXnYTZWfS4T16n8B2mHFZkAzkKBNdGVzgG8g)
If nothing is found the input string is returned so that it may pass or fail by path.ParsePath later.

Blocked by: #50
submitting now so I don't forget.

@djdv
Copy link
Contributor Author

djdv commented Jan 31, 2019

Specifically I originally wanted this to do this:
djdv/go-ipfs@0d250cc

@djdv
Copy link
Contributor Author

djdv commented Jan 31, 2019

Pinging @Stebalien for opinions on this.
I feel like it's more likely that people are going to have url's on their clipboard and in their scripts than bare canonical paths. They should probably be supported as valid input.

@Stebalien
Copy link
Member

Stebalien commented Jan 31, 2019

I'm really wary of stuff like this. What if we were to match URLs only? That is, check:

  1. Does it start with /.
  2. Does it start with a CID.
  3. Does it start with https?://foo.com/(ipfs|ipns)/...?

@djdv djdv force-pushed the feat/url-clean branch 3 times, most recently from 2510ef3 to cba2051 Compare February 4, 2019 04:56
@djdv
Copy link
Contributor Author

djdv commented Feb 4, 2019

I changed this so that we first try to use go-path's parser which should handle 1 & 2, and return early.
If that fails, we check if the path looks like a URL and pass the found string to go-path again.

Regex changed ^.*(/ip(f|n)s/.+)$ (group anything in the string that looks like an ipfs/ipns path)
->
^https?.*(/(ipfs|ipns|ipld)/.+)$ (specifically try to clamp to HTTP gateways, added IPLD)

Added a simple test as well.
(Ignore the force push spam.)

@djdv djdv force-pushed the feat/url-clean branch 4 times, most recently from 877795e to eb0b41c Compare February 4, 2019 05:28
@djdv
Copy link
Contributor Author

djdv commented Feb 4, 2019

The CI was not happy about the lack of execute permission on the test script file. This should be good for real now (assuming it passes).

Edit: I lied I left an additional test in there by mistake

@djdv
Copy link
Contributor Author

djdv commented Feb 4, 2019

This may be a better variant of the regex string.
^https?://.*(/(?:ipfs|ipns|ipld)/.{46}(?:/.+)?$)
match HTTP, check for {/namespace/}, {fixed length for cid} {optional subpath}

If that's too specific
^https?.*(/(ipfs|ipns|ipld)/.+)$ should still probably be changed to
^https?://.*(/(?:ipfs|ipns|ipld)/.+)$ (add ://, and don't capture the namespace, we have no need for that)
Edit: pushed the latter, waiting for judgment on the former. I'm not certain about the length of CID's being fixed.
Double edit: I forgot it doesn't matter since we're passing it to go-path anyway which will error out if it is a false positive.
Too much caffeine.

main.go Outdated
return ipfsPath, nil
}
// allow HTTP gateway URLs as input as well
matches := regexp.MustCompile(`^https?://.*(/(?:ipfs|ipns|ipld)/.+)$`).FindStringSubmatch(path)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

* is greedy. This'll parse http://bla.com/ipfs/QmA/ipfs/foobar as /ipfs/foobar. What if we just parsed the input as a URL and took the path? IMO, it's reasonable to assume that gateways are run on their own origin (for security reasons).

@djdv djdv force-pushed the feat/url-clean branch 2 times, most recently from 28576b3 to 0af4734 Compare February 13, 2019 02:42
@djdv
Copy link
Contributor Author

djdv commented Feb 13, 2019

What if we just parsed the input as a URL and took the path?

Much better!

I also added the handling for these ipfs/ipfs-companion#533
ipfs://QmcbLLZJa9oXnYTZWfS4T16n8B2mHFZkAzkKBNdGVzgG8g
and it seems fine.
But I'm not sure if this is wanted or not. (revertible)

Edit:
Forced over this: 0af4734#diff-b9cfc7f2cdf78a7f4b91a753d10865a2

Copy link
Member

@Stebalien Stebalien left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two nits but otherwise LGTM.

main.go Outdated

switch proto := u.Scheme; proto {
case "ipfs", "ipld", "ipns":
return ipath.ParsePath(gopath.Join(fmt.Sprintf("/%s/", proto), u.Host, u.Path))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: We can just write gopath.Join("/", proto, u.Host, u.Path).

main.go Outdated
switch proto := u.Scheme; proto {
case "ipfs", "ipld", "ipns":
return ipath.ParsePath(gopath.Join(fmt.Sprintf("/%s/", proto), u.Host, u.Path))
default:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be conservative for now.

case "https", "http":

Accept HTTP gateway URLs and IPFS URIs as arguments
@Stebalien Stebalien merged commit 8f760b0 into ipfs:master Feb 19, 2019
@djdv djdv deleted the feat/url-clean branch February 20, 2019 14:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants