Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adds Mastodon social network #10

Merged
merged 84 commits into from
Jan 26, 2024
Merged

Conversation

jpgoldberg
Copy link
Contributor

@jpgoldberg jpgoldberg commented Nov 27, 2023

The purpose of this PR is to add Mastodon as a social network, resolving issue #11.

Unlike the other social networks supported, there is no fixed hostname for the URL. Instead the hostname for the URL must be constructed from the full Mastodon address.

That is, a mastodon address may look like "[email protected]" which should then have the associated url be "https://social.example/@a_tooter"

Why is this PR more complicated than one might expect?

This really seems like it would be a simple fix.

Mastodon specs and DNS hostnames

Unfortunately there is no formal specification for these things. I am grateful @ThisIsMissEm for pointing me to the relevant lines of Mastodon reference implementation.

Unfortunately, what I glean from that would allow syntactically invalid hostnames. And so I included stricter validation of hostnames. Perhaps I failed to understand the Pydantic and dnspython documentation packages, but I found no obvious way to validate hostnames properly. So I resorted to a stack exchange answer.

I wasn't sure where to put my Mastodon address handling methods, so they are static methods in data_model.Connections.

Getting things to run

I need to tinker with pyproject.toml to get even the original to build on my system. I do not believe that I've done any harm, but testers should make sure that I haven't messed up build or packaging process.

I also added tests, and to make it easier for me to run tests tinkered with tests/test_data_model.py. Again, I don't think I did anything that should interfere with other testing.

@sinaatalay
Copy link
Owner

Thank you! I will review it thoroughly this weekend.

@ThisIsMissEm
Copy link

Whilst I cannot review this, @jpgoldberg I think what you'll actually want to do is make a webfinger request:

https://host.example/.well-known/webfinger?resource=acct:[email protected]

this'll either succeed or fail, and gives the URLs to any such profile.

e.g., mine:

{
  "subject" : "acct:[email protected]",
  "aliases" : [
    "https://hachyderm.io/@thisismissem",
    "https://hachyderm.io/users/thisismissem"
  ],
  "links" : [
    {
      "rel" : "http://webfinger.net/rel/profile-page",
      "type" : "text/html",
      "href" : "https://hachyderm.io/@thisismissem"
    },
    {
      "rel" : "self",
      "type" : "application/activity+json",
      "href" : "https://hachyderm.io/users/thisismissem"
    },
    {
      "rel" : "http://ostatus.org/schema/1.0/subscribe",
      "template" : "https://hachyderm.io/authorize_interaction?uri={uri}"
    },
    {
      "rel" : "http://webfinger.net/rel/avatar",
      "type" : "image/jpeg",
      "href" : "https://media.hachyderm.io/accounts/avatars/109/296/581/037/186/439/original/b55ac3f3a27e171e.jpeg"
    }
  ]
}

@jpgoldberg
Copy link
Contributor Author

Whilst I cannot review this, @jpgoldberg I think what you'll actually want to do is make a webfinger request:

https://host.example/.well-known/webfinger?resource=acct:[email protected]

Oh, that is a very cool thing. I did not know about it. (As you see, I really know very little about the protocols associated with Mastodon.)

But I do not want to take that approach for two (related) reasons

  1. I do not want to make a query to https://host.example/ before knowing that "host.example is syntactically valid. Attempting to use malformed data to see if it breaks as a way to test for its validity is the road to many security bugs. Instead, we should always check that something is syntactically valid before attempting to use it. Making a network call with potentially malformed data is a use, even if the use is intended as a validity check.

  2. I would like to be able to construct the CV without a network connection. The CV should build off-line.

I really am a stickler for checking syntactic validity before any use. And, as you see, I do not pass up the opportunity to rant/lecture about it. Checking strict syntactic validity before any use is a really good habit to get into, and it prevents security issues that might otherwise arise years later.

Copy link
Owner

@sinaatalay sinaatalay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, this looks great. I would like to merge it. Before the merge, it would be great if we could change a few small things I mentioned in the comments. Let me know if there is something you disagree with. I might have missed something.

Comment on lines +1059 to +1082
@staticmethod
def is_valid_hostname(hostname: str) -> bool:
"""Is hostname a valid hostname by RFCs 952 and 1123"""

# slightly modified from
# https://stackoverflow.com/a/33214423/1304076
if hostname[-1] == ".":
# strip exactly one dot from the right, if present
hostname = hostname[:-1]
if len(hostname) > 253:
return False

labels = hostname.split(".")

# the last label must be not all-numeric
if re.match(r"[0-9]+$", labels[-1]):
return False

# labels cannot begin with a hyphen
# labels must have at least character
# labels may not have more than 63 characters
allowed = re.compile(r"(?!-)[a-z0-9-]{1,63}(?<!-)$", re.IGNORECASE)
return all(allowed.match(label) for label in labels)

Copy link
Owner

@sinaatalay sinaatalay Dec 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we need to verify Mastodon usernames, we should also verify other networks. Is it necessary to validate usernames? Moreover, if we want to do a validation, we should use the @field_validator decorator of Pydantic. This will help exception handling since we will know it's a validation error.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are two separate suggestions. The reason that Mastodon needs special validation as opposed to, say, LinkedIn is that we know that know that www.linkedin.com ia syntactically valid hostname. We aren't going to create a malicious PDF by using it.

There may be value in syntactically validating other usernames. But I do think it would be a mistake to actually perform any network calls while running RenderCV. So the validation is only syntactic.

I very much suspect you are correct about @field_validator, but the be perfectly honest, I don't fully grok what Pydantic is doing. However, I am happy to learn. So I will work on this.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, (if you haven’t started working on it), we can keep it like this for now because the @field_validator check should be done on the SocialNetwork class since the Connection class is not used by the users directly. But if we move it to SocialNetwork, we will need to do something about other social networks, too. Therefore, we can address this validation problem later. Whatever you prefer.

Comment on lines +1083 to +1142
@staticmethod
def MastodonUname2Url(address: str) -> Optional[HttpUrl]:
"""returns profile url from a mastodon user address.

Args:
address (str): A Mastodon user address. E.g., "[email protected]"

Returns:
A pydantic HttpUrl object with the https URL for the user profile

Example:
```
url = MastodonUname2Url("[email protected]")
assert(url == HttpUrl(http://social.example/@user))
```

Exceptions:
ValueError if the address is malformed.
Note that well-formed addresses should never yield
syntactically invalid URLs.
"""

# The closest thing to a formal spec of Mastodon usernames
# where these regular expressions from a (reference?)
# implementation
#
# https://github.com/mastodon/mastodon/blob/f1657e6d6275384c199956e8872115fdcec600b0/app/models/account.rb#L68
#
# USERNAME_RE = /[a-z0-9_]+([a-z0-9_.-]+[a-z0-9_]+)?/i
# MENTION_RE = %r{(?<![=/[:word:]])@((#{USERNAME_RE})(?:@[[:word:].-]+[[:word:]]+)?)}i
#
# `[[:word:]]` in Ruby includes lots of things that could never be in a # domain name. As my intent here is to construct an HTTPS URL,
# What we need are valid hostnames,
# and so need to satisfy the constraints of RFC 952 and and 1123.

pattern = re.compile(
r"""
^\s* # ignore leading spaces
@? # Optional @ prefix
(?P<uname>[a-z0-9_]+([a-z0-9_.-]+[a-z0-9_]+)?) # username part
@ # separator
(?P<domain>[a-z0-9]+([a-z0-9.-]+)?) # domain part
\s*$ # ignore trailing whitespace
""",
re.VERBOSE | re.IGNORECASE,
)

m = pattern.match(address)
if m is None:
raise ValueError("Invalid mastodon address")
uname = m.group("uname")
domain = m.group("domain")

# the domain part of pattern allows some things that are not
# valid names. So we run a stricter check
if not Connection.is_valid_hostname(domain):
raise ValueError("Invalid hostname in mastodon address")

url = HttpUrl(f"https://{domain}/@{uname}")
return url
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see this function being used anywhere other than the url method. Maybe we can move the contents inside the url method for clarity?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have different opinions about "clarity", but it is your project, so I will follow your suggestion.

Comment on lines +864 to +901
def test_mastodon_parsing(self):
mastodon_name = "[email protected]"
expected = HttpUrl("https://example.exchange/@a_tooter")
result = data_model.Connection.MastodonUname2Url(mastodon_name)
with self.subTest("Without '@' prefix"):
self.assertEqual(result, expected)

mastodon_name = "@[email protected]"
expected = HttpUrl("https://example.exchange/@a_tooter")
result = data_model.Connection.MastodonUname2Url(mastodon_name)
with self.subTest("With '@' prefix"):
self.assertEqual(result, expected)

mastodon_name = "@too@many@symbols"
with self.subTest("Too many '@' symbols"):
with self.assertRaises(ValueError):
data_model.Connection.MastodonUname2Url(mastodon_name)

mastodon_name = "@not_enough_at_symbols"
with self.subTest("Missing '@' separator"):
with self.assertRaises(ValueError):
data_model.Connection.MastodonUname2Url(mastodon_name)

mastodon_name = "user@bad_domain.example"
with self.subTest("Underscore in domain portion"):
with self.assertRaises(ValueError):
data_model.Connection.MastodonUname2Url(mastodon_name)

mastodon_name = "[email protected]"
with self.subTest("All digit TLD"):
with self.assertRaises(ValueError):
data_model.Connection.MastodonUname2Url(mastodon_name)

mastodon_name = "[email protected]."
expected = HttpUrl("https://example.exchange./@a_tooter")
result = data_model.Connection.MastodonUname2Url(mastodon_name)
with self.subTest("With FQDN root '.'"):
self.assertEqual(result, expected)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sense, and I will consider creating a new test for each social network. However, instead of using the MastodonUname2Url method directly, we can create a Connection object and check its url field (since we plan to move MastodonUname2Url to inside the url method.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. Note that we can use Pydantic's HttpUrl class for the urls created for the other social networks.

Indeed, that offers use a simpler way of validating Mastodon domain names, as Pydantic does (mostly) check that the host part of a URL is syntactically valid.

@sinaatalay
Copy link
Owner

Hello, if you're not working on this PR, I would like to merge it as it is.

@sinaatalay sinaatalay merged commit 5456a8b into sinaatalay:main Jan 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants