Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide something like is_dnsname(s: str) -> bool and is_fqdn(s: str) -> bool #1019

Closed
jpgoldberg opened this issue Dec 2, 2023 · 7 comments

Comments

@jpgoldberg
Copy link

Motivation

I am trying to handle Mastodon addresses, which are of the form "[email protected]". See sinaatalay/rendercv#10 for my attempt.

In doing so, I was unable to find a RFC compliant validator for domain names. It is possible that there is a python package that does what I was seeking, but I failed to find it. I feel that dnspython would be the most natural place for it.

I will also add (though this might be a separate bug report) that dns.name() does not validate whether all of the bytes within a label are valid. For example, I would have expected an error from

print(dns.name.from_text("an.st*red.example"))

So what I am seeking would require additional label validation beyond what dns.name._validate_labels(). I do not know whether the behavior with of name.from_text() and _validate_labels() is intended in this respect. So I don't know whether making _validate_labels() stricter is the correct approach.

Describe the solution you'd like.

I would like a method (or two) in dns.name which tells me if I have a syntactically valid domain name. I would also like to have a related method which tells me if I have a syntactically valid fully qualified domain name (with an optional argument on whether the root "." is required.

@rthalley
Copy link
Owner

rthalley commented Dec 2, 2023

Dnspython already does this, as dns.name.from_text() only accepts valid domain names. I realize that it may be surprising that an.st*red.example is a valid domain name, but it is! I don't know what rules Mastodon imposes, but if the part to the right of the "@" in a Mastodon name is going to be used as a hostname, then additional rules apply as defined in RFC 952, modified by RFC 1123 and subsequent updates. Also, dnspython appends the origin supplied to dns.name.from_text(), or the default of dns.name.root to any names that do not explicitly end in ., so you will always get a FQDN (what dnspython calls an "absolute" name) unless you specify an origin of None. So basically there is nothing to be done in dnspython.

Here's an example program to show how you could check for hostname-ness, and also showing different ways of dealing with the relativity / absoluteness of text names:

import dns.name

LOWER_A = ord("a")
LOWER_Z = ord("z")
UPPER_A = ord("A")
UPPER_Z = ord("Z")
DIGIT_0 = ord("0")
DIGIT_9 = ord("9")
HYPHEN = ord("-")


def is_hostname(name: dns.name.Name) -> bool:
    for label in name.labels:
        if len(label) > 0:
            for c in label:
                if not (
                    (c >= LOWER_A and c <= LOWER_Z)
                    or (c >= UPPER_A and c <= UPPER_Z)
                    or (c >= DIGIT_0 and c <= DIGIT_9)
                    or c == HYPHEN
                ):
                    return False
            # Starting or ending with "-" is also forbidden.
            if label[0] == HYPHEN or label[-1] == HYPHEN:
                return False
    return True

# these are all valid domain names, but only the first two are valid hostnames too
for text in [
    "dnspython.org",
    "host-123.example.",
    "dnspython-99-bogus--.org",
    "-dnspython.org",
    "an.st*red.example",
]:
    n = dns.name.from_text(text)
    print(n, is_hostname(n))

print()
print("using the default origin, dns.name.root")
for text in [ "no-dot-at-end", "dot-at-end." ]:
    n = dns.name.from_text(text)
    print(text, n, "absolute =", n.is_absolute())

print()
print("using an origin of 'example.'")
origin = dns.name.from_text("example.")
for text in [ "no-dot-at-end", "dot-at-end." ]:
    n = dns.name.from_text(text, origin=origin)
    print(text, n, "absolute =", n.is_absolute())

print()
print("using an origin of None")
for text in [ "no-dot-at-end", "dot-at-end." ]:
    n = dns.name.from_text(text, origin=None)
    print(text, n, "absolute =", n.is_absolute())

@rthalley rthalley closed this as completed Dec 2, 2023
@rthalley
Copy link
Owner

rthalley commented Dec 2, 2023

P.S. my is_hostname() isn't quite right as it allows the root name to be a hostname, which isn't right :). It should probably check that the name is absolute and that the number of labels is > 1.

@jpgoldberg
Copy link
Author

I realize that it may be surprising that an.st*red.example is a valid domain name, but it is!

Thank you. I failed to distinguish between valid hostname and valid domain name. My intent was for hostname, and so this is not where I should have been looking. So it is fit and proper that labels in domain names be much more liberal than what is required for hostnames. So it's not this package that should be providing hostname validation.

So the relevant standards for hostnames are RFC 952 updated with the "3Com amendment" in RFC 1123 allowing leading digits. But of course a valid hostname has to meet both those and also be valid domain names. (So the limits on label lengths apply.) I'm still not sure which standard tells me that the last non-root component of a hostname can't be all digits, but I see it the fact referred to in RFC 1123 §2.1

Well, this all could we worse. I would be trying to synoptically validate email address.

@pspacek
Copy link
Collaborator

pspacek commented Dec 4, 2023

Funnily enough, e-mail might be a (good) special case. In theory you could take the whole domain.name after @, stick it into DNS query for domain.name MX and see is something comes back. If it is anything else than . for mail server name you actually validated that the domain has e-mail configured for it - and unlike regex or static validation it will catch typos.

@pspacek
Copy link
Collaborator

pspacek commented Dec 4, 2023

Maybe lemme add an example: You can compare google.com MX vs. googl.com MX vs. surelynonexistentdomain.example MX to see the difference. Only the first would qualify as valid a e-mail target. See also https://datatracker.ietf.org/doc/html/rfc7505

@jpgoldberg
Copy link
Author

jpgoldberg commented Dec 4, 2023

I wasn't talking about MX records, I was talking about the fact that RFCs {8,28,53}22 literally allow comments in the part of the address after an "@" among other very very strange things. For example

(first comment) jsmith@ (second comment)
 example.com (third comment)

Is valid, even with the newline.

(first comment) jsmith@ (second comment)
example.com (third comment)

is not valid, because there needs to be white space after the newline.

The rules for the local part of the address are even more complicated.

@pspacek
Copy link
Collaborator

pspacek commented Dec 5, 2023

Oh, sorry! We were talking completely different topics then.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants