Provide something like `is_dnsname(s: str) -> bool` and `is_fqdn(s: str) -> bool` #1019

jpgoldberg · 2023-12-02T18:44:19Z

Motivation

I am trying to handle Mastodon addresses, which are of the form "[email protected]". See sinaatalay/rendercv#10 for my attempt.

In doing so, I was unable to find a RFC compliant validator for domain names. It is possible that there is a python package that does what I was seeking, but I failed to find it. I feel that dnspython would be the most natural place for it.

I will also add (though this might be a separate bug report) that dns.name() does not validate whether all of the bytes within a label are valid. For example, I would have expected an error from

print(dns.name.from_text("an.st*red.example"))

So what I am seeking would require additional label validation beyond what dns.name._validate_labels(). I do not know whether the behavior with of name.from_text() and _validate_labels() is intended in this respect. So I don't know whether making _validate_labels() stricter is the correct approach.

Describe the solution you'd like.

I would like a method (or two) in dns.name which tells me if I have a syntactically valid domain name. I would also like to have a related method which tells me if I have a syntactically valid fully qualified domain name (with an optional argument on whether the root "." is required.

The text was updated successfully, but these errors were encountered:

rthalley · 2023-12-02T20:34:39Z

Dnspython already does this, as dns.name.from_text() only accepts valid domain names. I realize that it may be surprising that an.st*red.example is a valid domain name, but it is! I don't know what rules Mastodon imposes, but if the part to the right of the "@" in a Mastodon name is going to be used as a hostname, then additional rules apply as defined in RFC 952, modified by RFC 1123 and subsequent updates. Also, dnspython appends the origin supplied to dns.name.from_text(), or the default of dns.name.root to any names that do not explicitly end in ., so you will always get a FQDN (what dnspython calls an "absolute" name) unless you specify an origin of None. So basically there is nothing to be done in dnspython.

Here's an example program to show how you could check for hostname-ness, and also showing different ways of dealing with the relativity / absoluteness of text names:

import dns.name

LOWER_A = ord("a")
LOWER_Z = ord("z")
UPPER_A = ord("A")
UPPER_Z = ord("Z")
DIGIT_0 = ord("0")
DIGIT_9 = ord("9")
HYPHEN = ord("-")


def is_hostname(name: dns.name.Name) -> bool:
    for label in name.labels:
        if len(label) > 0:
            for c in label:
                if not (
                    (c >= LOWER_A and c <= LOWER_Z)
                    or (c >= UPPER_A and c <= UPPER_Z)
                    or (c >= DIGIT_0 and c <= DIGIT_9)
                    or c == HYPHEN
                ):
                    return False
            # Starting or ending with "-" is also forbidden.
            if label[0] == HYPHEN or label[-1] == HYPHEN:
                return False
    return True

# these are all valid domain names, but only the first two are valid hostnames too
for text in [
    "dnspython.org",
    "host-123.example.",
    "dnspython-99-bogus--.org",
    "-dnspython.org",
    "an.st*red.example",
]:
    n = dns.name.from_text(text)
    print(n, is_hostname(n))

print()
print("using the default origin, dns.name.root")
for text in [ "no-dot-at-end", "dot-at-end." ]:
    n = dns.name.from_text(text)
    print(text, n, "absolute =", n.is_absolute())

print()
print("using an origin of 'example.'")
origin = dns.name.from_text("example.")
for text in [ "no-dot-at-end", "dot-at-end." ]:
    n = dns.name.from_text(text, origin=origin)
    print(text, n, "absolute =", n.is_absolute())

print()
print("using an origin of None")
for text in [ "no-dot-at-end", "dot-at-end." ]:
    n = dns.name.from_text(text, origin=None)
    print(text, n, "absolute =", n.is_absolute())

rthalley · 2023-12-02T20:42:45Z

P.S. my is_hostname() isn't quite right as it allows the root name to be a hostname, which isn't right :). It should probably check that the name is absolute and that the number of labels is > 1.

jpgoldberg · 2023-12-03T07:04:33Z

I realize that it may be surprising that an.st*red.example is a valid domain name, but it is!

Thank you. I failed to distinguish between valid hostname and valid domain name. My intent was for hostname, and so this is not where I should have been looking. So it is fit and proper that labels in domain names be much more liberal than what is required for hostnames. So it's not this package that should be providing hostname validation.

So the relevant standards for hostnames are RFC 952 updated with the "3Com amendment" in RFC 1123 allowing leading digits. But of course a valid hostname has to meet both those and also be valid domain names. (So the limits on label lengths apply.) I'm still not sure which standard tells me that the last non-root component of a hostname can't be all digits, but I see it the fact referred to in RFC 1123 §2.1

Well, this all could we worse. I would be trying to synoptically validate email address.

pspacek · 2023-12-04T13:20:58Z

Funnily enough, e-mail might be a (good) special case. In theory you could take the whole domain.name after @, stick it into DNS query for domain.name MX and see is something comes back. If it is anything else than . for mail server name you actually validated that the domain has e-mail configured for it - and unlike regex or static validation it will catch typos.

pspacek · 2023-12-04T13:23:06Z

Maybe lemme add an example: You can compare google.com MX vs. googl.com MX vs. surelynonexistentdomain.example MX to see the difference. Only the first would qualify as valid a e-mail target. See also https://datatracker.ietf.org/doc/html/rfc7505

jpgoldberg · 2023-12-04T23:14:00Z

I wasn't talking about MX records, I was talking about the fact that RFCs {8,28,53}22 literally allow comments in the part of the address after an "@" among other very very strange things. For example

(first comment) jsmith@ (second comment)
 example.com (third comment)

Is valid, even with the newline.

(first comment) jsmith@ (second comment)
example.com (third comment)

is not valid, because there needs to be white space after the newline.

The rules for the local part of the address are even more complicated.

pspacek · 2023-12-05T11:49:18Z

Oh, sorry! We were talking completely different topics then.

jpgoldberg added the Enhancement Request label Dec 2, 2023

rthalley closed this as completed Dec 2, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Provide something like `is_dnsname(s: str) -> bool` and `is_fqdn(s: str) -> bool` #1019

Provide something like `is_dnsname(s: str) -> bool` and `is_fqdn(s: str) -> bool` #1019

jpgoldberg commented Dec 2, 2023

rthalley commented Dec 2, 2023 •

edited

Loading

rthalley commented Dec 2, 2023

jpgoldberg commented Dec 3, 2023

pspacek commented Dec 4, 2023

pspacek commented Dec 4, 2023

jpgoldberg commented Dec 4, 2023 •

edited

Loading

pspacek commented Dec 5, 2023

Provide something like is_dnsname(s: str) -> bool and is_fqdn(s: str) -> bool #1019

Provide something like is_dnsname(s: str) -> bool and is_fqdn(s: str) -> bool #1019

Comments

jpgoldberg commented Dec 2, 2023

rthalley commented Dec 2, 2023 • edited Loading

rthalley commented Dec 2, 2023

jpgoldberg commented Dec 3, 2023

pspacek commented Dec 4, 2023

pspacek commented Dec 4, 2023

jpgoldberg commented Dec 4, 2023 • edited Loading

pspacek commented Dec 5, 2023

Provide something like `is_dnsname(s: str) -> bool` and `is_fqdn(s: str) -> bool` #1019

Provide something like `is_dnsname(s: str) -> bool` and `is_fqdn(s: str) -> bool` #1019

rthalley commented Dec 2, 2023 •

edited

Loading

jpgoldberg commented Dec 4, 2023 •

edited

Loading