Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FormAnalyzer] scoring password hints #720

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

dbajpeyi
Copy link
Collaborator

@dbajpeyi dbajpeyi commented Dec 16, 2024

Reviewer: @GioSensation
Asana: https://app.asana.com/0/1203822806345703/1208965540697669/f

Description

  1. Use password hints to score form, when signals are weak.
  2. Moves input attribute evaluation in evaluateForm instead of in the constructor. This way all inputs get evaluated for. Hindustan times got regressed because of this, but that form seems a bit out of date. I think their new login form is different.

Steps to test

https://sleeper.com/create?type=league is the site that was originally broken.
Score before:
image

Score after:
Screenshot 2024-12-20 at 16 17 27

@dbajpeyi dbajpeyi force-pushed the dbajpeyi/feature/password-hints branch from 201c856 to 0640b6f Compare December 19, 2024 14:00
@dbajpeyi dbajpeyi changed the title wip: scoring password hints towards signup [FormAnalyzing] scoring password hints and sibling headers Dec 19, 2024
});
}

evaluatePasswordHints() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looping through divs and spans is complex and potentially expensive. What if instead we looked at the whole form textContent and run the regex once per form? We'd need to increase the TEXT_LENGTH_CUTOFF, or have a carveout for this situation, like making safeRegexTest accept a parameter and then have another higher cutoff constant we can use here.

@dbajpeyi dbajpeyi changed the title [FormAnalyzing] scoring password hints and sibling headers [FormAnalyzer] scoring password hints and sibling headers Dec 19, 2024
@dbajpeyi dbajpeyi marked this pull request as ready for review December 20, 2024 15:05
@dbajpeyi dbajpeyi force-pushed the dbajpeyi/feature/password-hints branch from 5efa4cd to 71d51da Compare December 20, 2024 15:06
if (this.form.textContent) {
const hasPasswordHints = safeRegexTest(this.matching.getDDGMatcherRegex('passwordHintsRegex'), this.form.textContent, 200);
if (hasPasswordHints) {
this.increaseSignalBy(5, 'Password hints');
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have given a much stronger score to hints, as they seem to be a very clear indicator.

@@ -331,11 +345,16 @@ class FormAnalyzer {
}

// A form with many fields is unlikely to be a login form
const relevantFields = this.form.querySelectorAll(this.matching.cssSelector('genericTextField'));
const relevantFields = this.form.querySelectorAll(this.matching.cssSelector('genericTextInputField'));
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a rename!

@dbajpeyi dbajpeyi force-pushed the dbajpeyi/feature/password-hints branch from 71d51da to a110052 Compare December 20, 2024 15:07
if (relevantFields.length >= 4) {
this.increaseSignalBy(relevantFields.length * 1.5, 'many fields: it is probably not a login');
}

// If we can't decide at this point, try reading password hints
if (this.areLoginOrSignupSignalsWeak()) {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we see more examples in triages, I might just remove this check. But for now feels safer.

@@ -442,6 +442,9 @@ const matchingConfiguration = {
// French
'| avec ',
},
passwordHintsRegex: {
match: '\\b(?:password.*?(?:must|should|has to|needs to|can))?\\b.*?(?:(at least|minimum|no fewer than)\\s+\\d+\\s+(characters?|letters?|numbers?|special characters?)|(uppercase|lowercase|capital|digit|number|symbol|special character)|\b(no spaces|cannot contain your email|cannot repeat characters|must be unique|case sensitive)\\b)',
Copy link
Collaborator Author

@dbajpeyi dbajpeyi Dec 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The regex is based on sleeper.com's text.

@dbajpeyi dbajpeyi force-pushed the dbajpeyi/feature/password-hints branch from a110052 to 7402c0d Compare December 20, 2024 15:11
@dbajpeyi dbajpeyi changed the title [FormAnalyzer] scoring password hints and sibling headers [FormAnalyzer] scoring password hints Dec 20, 2024
@dbajpeyi dbajpeyi force-pushed the dbajpeyi/feature/password-hints branch from 83b7d1c to d494250 Compare December 20, 2024 15:53
@dbajpeyi dbajpeyi force-pushed the dbajpeyi/feature/password-hints branch from d494250 to cb29187 Compare December 20, 2024 15:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants