Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Establishing Transparency and Fairness Guidelines for Feature Visibility #277

Open
jcscottiii opened this issue May 14, 2024 · 11 comments
Open
Labels
community Issues seeking input from the community on project direction, policy, or decision-making

Comments

@jcscottiii
Copy link
Collaborator

Description

As webstatus.dev grows to encompass a wide range of browser feature
implementations, situations may arise where we need to temporarily or
permanently hide feature scores or features altogether. This could be due to a
variety of reasons, such as:

  • Data anomalies: Potential errors or inconsistencies in gathered data (such
    as features that are actually implemented but have no Web Platform Test
    coverage or the coverage implies the feature is not implemented but in
    reality the Web Platform Test suite needs to improve the tests).
  • Ongoing development: Features in an early stage where scores aren't
    representative (such as Web Platform Test scores showing 100% for a feature
    that has not been fully implemented yet).

While these scenarios are understandable, it's crucial to handle them in a way
that maintains trust, transparency, and fairness amongst browser vendors while
also describing the actual state of the world to end-users. We want to invite
open discussion on how to best achieve this.

Desired Goals

  1. Equitable Process: Establish clear, unbiased criteria for when and how to
    hide information.
  2. Transparency: Document all decisions regarding feature visibility, along
    with the rationale.
  3. Accountability: Create a mechanism for the community to raise concerns and
    suggest changes.
  4. User Information: Provide clear explanations for end-users when information
    is hidden, including links back to the relevant discussion.

Possible Solutions (Not Exhaustive)

  • Test Suite Review Process: Review of test suite to ensure reasonable coverage
    and that failures are explainable.
  • Public Comment Period: Allow a timeframe for feedback before any information
    is hidden.
  • "Hidden Score" Label: Add a visual indicator to hidden items, with a link to
    the rationale.
  • GitHub Discussions: Utilize Discussions to host conversations around feature
    visibility concerns.

These are just starting points. We encourage everyone to share their ideas,
concerns, and suggestions to ensure we create a process that upholds the values
of this project.

Call to Action

Please feel free to comment on this issue with your thoughts. Your input is
invaluable in shaping the future of this project and ensuring it is a trusted
resource for everyone. Let's work together to build a truly transparent and
equitable process!

Please voice your concerns as well, while adhering to the project's
Code of Conduct.

This process can evolve over time as well, as we try things out.

@jcscottiii jcscottiii pinned this issue May 14, 2024
@jcscottiii jcscottiii added the community Issues seeking input from the community on project direction, policy, or decision-making label May 14, 2024
@foolip
Copy link
Member

foolip commented May 17, 2024

Feedback from @meyerweb on Mastodon:
https://mastodon.social/@Meyerweb/112457440224134542

@foolip
Copy link
Member

foolip commented May 17, 2024

On "data anomalies", I think we'll need clear criteria for hiding scores, a few different common rationales, and perhaps links out to issues tracking fixing it.

Common rationales are insufficient coverage and widespread failures for reasons unrelated to the implementation quality.

@meyerweb
Copy link

meyerweb commented May 17, 2024

As a followup on @foolip’s link to my toot (thanks, @foolip!) I think as long as rationales for absent scores are clear and consistent, you’ll be a lot further along.

I also believe there should be a lot more transparency on why a thing is listed at all when the supporting data doesn’t seem to be there. Example: https://webstatus.dev/features/canvas-text-baselines is listed as a newly-available baseline even though one of the tracked browsers is passing 0% of tests. (A whole three tests, it is true.) How can this be considered baseline when it’s apparently not supported at all by a tracked browser? I mean, I can think of at least one scenario where that sort of thing might be defensible, but I have no idea if this is such a scenario, nor does anyone else.

Even beyond that, https://webstatus.dev/features/hyphens is listed as baseline when its scores are mostly in the 50s, and the highest score is just short of 75%. It also has 55 tests, of which only 20 are passed by all four tracked browsers, which is a 36.4% Interop score. Does that qualify as baseline? I personally wouldn’t think so, but if there were a list of the ways things can get on the list, that would help a lot.

And then, I found https://webstatus.dev/features/conic-gradients, which is “Widely available” baseline, with one browser passing 18% of tests? And then https://webstatus.dev/features/webvtt, which ranges from 37-56% in terms of passing tests, and would have an Interop score of 9.1%? These also seem strange to include.

(I know that scores aren’t always the basis of something being considered baseline, but because the scores are so prominent, the questions seem inevitable. This is especially the case since “Insufficient test coverage” is given as a reason to not list scores, even if inconsistently.)

@foolip
Copy link
Member

foolip commented May 20, 2024

As a start, I've added source comments explaining each case in #301. We used the same reason for all of these for expedience, but we should make the distinction between a few different reasons:

  • Obviously insufficient coverage, like for AVIF
  • Widespread failures that we know to be for some reason other than the feature's implementation quality, like for device orientation events
  • Failures that aren't understood, but seem unlikely to be a reflection of the implementation quality based on some out-of-band knowledge. For example, I'm fairly confident that preservesPitch works well enough in the majority of use cases web developers care about, so 22.2% Firefox and 0% for Safari would be unreasonable.

@jcscottiii what do you think about always showing the ⓘ when we don't have a percentage to show, and to have more reasons? The existing "---" should be "no tests found" with an invitation to contribute to the mapping.

Reviewing the specific features @meyerweb mentioned:

https://webstatus.dev/features/canvas-text-baselines: I reviewed the Safari failures and guessed that since it was implemented in Safari so long before other browsers, that the spec probably changed in some way and Safari's implementation doesn't match the current spec. But this needs to be verified, I've filed web-platform-dx/web-features#1120.

https://webstatus.dev/features/hyphens: We'll need a subject matter expert to review this test suite. It's hard to tell if the failures are for cases that will affect web developers or not. My guess is that basic usage of the feature is fine, but that interoperability in the details isn't very good.

https://webstatus.dev/features/conic-gradients: This was on oversight on my part. The failures mostly look like minor pixel value differences. If we can't fix the tests we should hide this score for Safari specifically.

https://webstatus.dev/features/webvtt: I think WebVTT interop is somewhat bad, but you can use the basic feature. However, I see that I need to update this mapping to split WebVTT from WebVTT regions, since that contributes to the low score in at least Chrome and Edge.

@foolip
Copy link
Member

foolip commented May 23, 2024

For WebVTT I've filed web-platform-tests/wpt#46453 and sent #314 to hide the scores on webstatus.dev. This fits the "Widespread failures that we know to be for some reason other than the feature's implementation quality" reason I think.

@foolip
Copy link
Member

foolip commented May 23, 2024

Thinking about some guardrails for support status vs. test results:

  • Show no scores when a feature isn't supported (already the case)
  • For supported features, automatically hide scores <50% until reviewed, because most such cases will be a problem with the test suite or infrastructure more than the implementation quality
  • If a score is changed more than 10% by an issue other than implementation quality, hide the score for that specific browser

This would be the general approach, but exceptions could still be made based on other documented principles.

@dmitriid
Copy link

For sake of transparency features whose status is "not on any standards track" should be shown as such instead of "limited availability"

@jcscottiii
Copy link
Collaborator Author

For sake of transparency features whose status is "not on any standards track" should be shown as such instead of "limited availability"

@dmitriid

That's a great idea. And it would provide better insights than the current solution.

Looking at your comment on the related issue, we can leverage the status field from caniuse and check if it is unoff.

@dmitriid
Copy link

Yeah, I didn't realize there was a related issue, so ended up commenting (rather tersly 😬) on both.

I don't know how complete/up-to-date the data is, but it's probably okay if Can I Use ended up using it :)

@past
Copy link
Collaborator

past commented Aug 21, 2024

I don't see why we would conflate "not on any standards track" and "limited availability" given they are orthogonal issues. I agree that the first part should be captured somehow though, which we should explore in #486.

@alphachart

This comment was marked as spam.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
community Issues seeking input from the community on project direction, policy, or decision-making
Projects
Status: 2024-Q4
Development

No branches or pull requests

6 participants