-
-
Notifications
You must be signed in to change notification settings - Fork 143
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adds Mastodon social network #10
Conversation
but there are tests
Thank you! I will review it thoroughly this weekend. |
Whilst I cannot review this, @jpgoldberg I think what you'll actually want to do is make a webfinger request:
this'll either succeed or fail, and gives the URLs to any such profile. e.g., mine:
|
Oh, that is a very cool thing. I did not know about it. (As you see, I really know very little about the protocols associated with Mastodon.) But I do not want to take that approach for two (related) reasons
I really am a stickler for checking syntactic validity before any use. And, as you see, I do not pass up the opportunity to rant/lecture about it. Checking strict syntactic validity before any use is a really good habit to get into, and it prevents security issues that might otherwise arise years later. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, this looks great. I would like to merge it. Before the merge, it would be great if we could change a few small things I mentioned in the comments. Let me know if there is something you disagree with. I might have missed something.
@staticmethod | ||
def is_valid_hostname(hostname: str) -> bool: | ||
"""Is hostname a valid hostname by RFCs 952 and 1123""" | ||
|
||
# slightly modified from | ||
# https://stackoverflow.com/a/33214423/1304076 | ||
if hostname[-1] == ".": | ||
# strip exactly one dot from the right, if present | ||
hostname = hostname[:-1] | ||
if len(hostname) > 253: | ||
return False | ||
|
||
labels = hostname.split(".") | ||
|
||
# the last label must be not all-numeric | ||
if re.match(r"[0-9]+$", labels[-1]): | ||
return False | ||
|
||
# labels cannot begin with a hyphen | ||
# labels must have at least character | ||
# labels may not have more than 63 characters | ||
allowed = re.compile(r"(?!-)[a-z0-9-]{1,63}(?<!-)$", re.IGNORECASE) | ||
return all(allowed.match(label) for label in labels) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we need to verify Mastodon usernames, we should also verify other networks. Is it necessary to validate usernames? Moreover, if we want to do a validation, we should use the @field_validator decorator of Pydantic. This will help exception handling since we will know it's a validation error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are two separate suggestions. The reason that Mastodon needs special validation as opposed to, say, LinkedIn is that we know that know that www.linkedin.com
ia syntactically valid hostname. We aren't going to create a malicious PDF by using it.
There may be value in syntactically validating other usernames. But I do think it would be a mistake to actually perform any network calls while running RenderCV. So the validation is only syntactic.
I very much suspect you are correct about @field_validator
, but the be perfectly honest, I don't fully grok what Pydantic is doing. However, I am happy to learn. So I will work on this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, (if you haven’t started working on it), we can keep it like this for now because the @field_validator check should be done on the SocialNetwork
class since the Connection
class is not used by the users directly. But if we move it to SocialNetwork
, we will need to do something about other social networks, too. Therefore, we can address this validation problem later. Whatever you prefer.
@staticmethod | ||
def MastodonUname2Url(address: str) -> Optional[HttpUrl]: | ||
"""returns profile url from a mastodon user address. | ||
|
||
Args: | ||
address (str): A Mastodon user address. E.g., "[email protected]" | ||
|
||
Returns: | ||
A pydantic HttpUrl object with the https URL for the user profile | ||
|
||
Example: | ||
``` | ||
url = MastodonUname2Url("[email protected]") | ||
assert(url == HttpUrl(http://social.example/@user)) | ||
``` | ||
|
||
Exceptions: | ||
ValueError if the address is malformed. | ||
Note that well-formed addresses should never yield | ||
syntactically invalid URLs. | ||
""" | ||
|
||
# The closest thing to a formal spec of Mastodon usernames | ||
# where these regular expressions from a (reference?) | ||
# implementation | ||
# | ||
# https://github.com/mastodon/mastodon/blob/f1657e6d6275384c199956e8872115fdcec600b0/app/models/account.rb#L68 | ||
# | ||
# USERNAME_RE = /[a-z0-9_]+([a-z0-9_.-]+[a-z0-9_]+)?/i | ||
# MENTION_RE = %r{(?<![=/[:word:]])@((#{USERNAME_RE})(?:@[[:word:].-]+[[:word:]]+)?)}i | ||
# | ||
# `[[:word:]]` in Ruby includes lots of things that could never be in a # domain name. As my intent here is to construct an HTTPS URL, | ||
# What we need are valid hostnames, | ||
# and so need to satisfy the constraints of RFC 952 and and 1123. | ||
|
||
pattern = re.compile( | ||
r""" | ||
^\s* # ignore leading spaces | ||
@? # Optional @ prefix | ||
(?P<uname>[a-z0-9_]+([a-z0-9_.-]+[a-z0-9_]+)?) # username part | ||
@ # separator | ||
(?P<domain>[a-z0-9]+([a-z0-9.-]+)?) # domain part | ||
\s*$ # ignore trailing whitespace | ||
""", | ||
re.VERBOSE | re.IGNORECASE, | ||
) | ||
|
||
m = pattern.match(address) | ||
if m is None: | ||
raise ValueError("Invalid mastodon address") | ||
uname = m.group("uname") | ||
domain = m.group("domain") | ||
|
||
# the domain part of pattern allows some things that are not | ||
# valid names. So we run a stricter check | ||
if not Connection.is_valid_hostname(domain): | ||
raise ValueError("Invalid hostname in mastodon address") | ||
|
||
url = HttpUrl(f"https://{domain}/@{uname}") | ||
return url |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see this function being used anywhere other than the url
method. Maybe we can move the contents inside the url
method for clarity?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have different opinions about "clarity", but it is your project, so I will follow your suggestion.
def test_mastodon_parsing(self): | ||
mastodon_name = "[email protected]" | ||
expected = HttpUrl("https://example.exchange/@a_tooter") | ||
result = data_model.Connection.MastodonUname2Url(mastodon_name) | ||
with self.subTest("Without '@' prefix"): | ||
self.assertEqual(result, expected) | ||
|
||
mastodon_name = "@[email protected]" | ||
expected = HttpUrl("https://example.exchange/@a_tooter") | ||
result = data_model.Connection.MastodonUname2Url(mastodon_name) | ||
with self.subTest("With '@' prefix"): | ||
self.assertEqual(result, expected) | ||
|
||
mastodon_name = "@too@many@symbols" | ||
with self.subTest("Too many '@' symbols"): | ||
with self.assertRaises(ValueError): | ||
data_model.Connection.MastodonUname2Url(mastodon_name) | ||
|
||
mastodon_name = "@not_enough_at_symbols" | ||
with self.subTest("Missing '@' separator"): | ||
with self.assertRaises(ValueError): | ||
data_model.Connection.MastodonUname2Url(mastodon_name) | ||
|
||
mastodon_name = "user@bad_domain.example" | ||
with self.subTest("Underscore in domain portion"): | ||
with self.assertRaises(ValueError): | ||
data_model.Connection.MastodonUname2Url(mastodon_name) | ||
|
||
mastodon_name = "[email protected]" | ||
with self.subTest("All digit TLD"): | ||
with self.assertRaises(ValueError): | ||
data_model.Connection.MastodonUname2Url(mastodon_name) | ||
|
||
mastodon_name = "[email protected]." | ||
expected = HttpUrl("https://example.exchange./@a_tooter") | ||
result = data_model.Connection.MastodonUname2Url(mastodon_name) | ||
with self.subTest("With FQDN root '.'"): | ||
self.assertEqual(result, expected) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes sense, and I will consider creating a new test for each social network. However, instead of using the MastodonUname2Url
method directly, we can create a Connection object and check its url
field (since we plan to move MastodonUname2Url
to inside the url
method.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. Note that we can use Pydantic's HttpUrl class for the urls created for the other social networks.
Indeed, that offers use a simpler way of validating Mastodon domain names, as Pydantic does (mostly) check that the host part of a URL is syntactically valid.
Hello, if you're not working on this PR, I would like to merge it as it is. |
The purpose of this PR is to add Mastodon as a social network, resolving issue #11.
Unlike the other social networks supported, there is no fixed hostname for the URL. Instead the hostname for the URL must be constructed from the full Mastodon address.
That is, a mastodon address may look like "
[email protected]
" which should then have the associated url be "https://social.example/@a_tooter
"Why is this PR more complicated than one might expect?
This really seems like it would be a simple fix.
Mastodon specs and DNS hostnames
Unfortunately there is no formal specification for these things. I am grateful @ThisIsMissEm for pointing me to the relevant lines of Mastodon reference implementation.
Unfortunately, what I glean from that would allow syntactically invalid hostnames. And so I included stricter validation of hostnames. Perhaps I failed to understand the Pydantic and dnspython documentation packages, but I found no obvious way to validate hostnames properly. So I resorted to a stack exchange answer.
I wasn't sure where to put my Mastodon address handling methods, so they are static methods in data_model.Connections.
Getting things to run
I need to tinker with
pyproject.toml
to get even the original to build on my system. I do not believe that I've done any harm, but testers should make sure that I haven't messed up build or packaging process.I also added tests, and to make it easier for me to run tests tinkered with
tests/test_data_model.py
. Again, I don't think I did anything that should interfere with other testing.