Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature - SSO keycloak #5691

Open
wants to merge 3 commits into
base: develop
Choose a base branch
from

Conversation

paulbauriegel
Copy link
Contributor

Introduces a new SSO option using Keycloak

Enables a different SSO provider next to HuggingFace SSO

Type of change

  • New feature (non-breaking change which adds functionality)

How Has This Been Tested
Local build of front-end & backend. Keycloak deployment as described in the docs

Checklist

  • I added relevant documentation
  • I followed the style guidelines of this project
  • I did a self-review of my code
  • I made corresponding changes to the documentation
  • I confirm My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • TODO - I have added relevant notes to the CHANGELOG.md file (See https://keepachangelog.com/)

How to test & use it

  • Follow the instructions for Keycloak: docs/_source/tutorials_and_integrations/integrations/sso_keycloak.md
  • Build both front & backend
  • Then it should look like this:
    Argilla_SSO

@frascuchon
Copy link
Member

@paulbauriegel Thanks for this contribution. Last week I started working on a code refactoring to simplify the OAuth provider configuration, having a better integration with the social auth package. The design changes a bit with my changes. Maybe it would be nice if you could adapt yours based on this PR. If not, we can combine them later.

@paulbauriegel
Copy link
Contributor Author

@frascuchon Ok, let me have a look. I will try to understand the changes.

Since you are working on the oauth, it would be nice to be able to use the roles from the oauth audience to have oauth users access specific workspaces based on those roles. I wanted to contribute this in a later part.

@frascuchon
Copy link
Member

Great @paulbauriegel! I have some doubts about how to match the OAuth audience with the Argilla roles. I would love to hear your thoughts on that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This docs folder is outdated. For 2.x docs you should use the argilla/docs folder.

@frascuchon
Copy link
Member

Hi @paulbauriegel ,

Here are docs section related to the refactor PR. It would be nice if you could take a look and give some feedback. Also, maybe can be useful to understand the refactoring approach.

@paulbauriegel
Copy link
Contributor Author

Hi @paulbauriegel ,

Here are docs section related to the refactor PR. It would be nice if you could take a look and give some feedback. Also, maybe can be useful to understand the refactoring approach.

Thank you, yes I will have a look. Just to set expectations, I will only have some time later this week :-)

@paulbauriegel
Copy link
Contributor Author

@frascuchon I looked through your code. It's rather clear to integrate a new SSO. Integration of self-hosted SSO providers, such as Keycloak, where the e.g. authorization_url is dynamic based on the configuration create a small problem.
E.g. if you forget to set the correct environment variables or misspell them self._authorization_endpoint = self._backend.authorization_url() resolves to None. It's hard to debug those issues without knowing social core too much, might be helpful to have some check in place there.

Generally speaking I would rather configure such settings in the oauth.yaml then via env variables, but it might be a bit more complicated now since there is a common Provider class so there are no optional extra settings that one might need for an SSO that requires more settings.

I will open a new MR based on the new code tomorrow.

@paulbauriegel
Copy link
Contributor Author

Great @paulbauriegel! I have some doubts about how to match the OAuth audience with the Argilla roles. I would love to hear your thoughts on that.

This is an example I get back from the Oauth token:

{
  "exp": 1732293901,
  "iat": 1732293601,
  "auth_time": 1732292840,
  "jti": "0d11e667-4174-41fe-978e-45a9248d009d",
  "iss": "http://localhost:8080/realms/argilla",
  "aud": [
    "argilla",
    "account"
  ],
  "sub": "66caeedf-df61-4cc3-9ebf-c269643e454e",
  "typ": "Bearer",
  "azp": "argilla",
  "sid": "359fe83f-7263-486d-93a3-61949d15d224",
  "acr": "0",
  "allowed-origins": [
    "http://127.0.0.1:5000",
    "http://localhost:5000"
  ],
  "realm_access": {
    "roles": [
      "default-roles-argilla",
      "offline_access",
      "uma_authorization",
      "llmbot-annotations-at"
    ]
  },
  "resource_access": {
    "account": {
      "roles": [
        "manage-account",
        "manage-account-links",
        "view-profile"
      ]
    },
    "argilla": {
      "roles": [
        "argilla-access"
      ]
    }
  },
  "scope": "microprofile-jwt aud email openid profile",
  "upn": "paulat",
  "email_verified": true,
  "name": "Paul Bauriegel",
  "groups": [
    "default-roles-argilla",
    "offline_access",
    "uma_authorization",
    "llmbot-annotations-at"
  ],
  "preferred_username": "paulat",
  "given_name": "Paul",
  "family_name": "Bauriegel",
  "email": "...
}

In this case, the llmbot-annotations-at group was added through Keycloak. My initial thought is to leverage these group roles to define Argilla roles and control access to specific Argilla workspaces.

Would it make sense to integrate this logic into the new UserInfo class? In our enterprise environment, managing groups of annotators via a central SSO (e.g., Entra ID) is crucial, as we need fine-grained control over roles and workspace access for different groups.

The issue might be that OpenID Connect (OIDC) does not inherently guarantee the availability of roles or groups in all implementations.
However, the major OIDC providers offer support for including role and group information in tokens, e.g. Azure AD, Keycloak, Gitlab etc.

What do you think @frascuchon

@frascuchon
Copy link
Member

frascuchon commented Nov 27, 2024

Great @paulbauriegel! I have some doubts about how to match the OAuth audience with the Argilla roles. I would love to hear your thoughts on that.

This is an example I get back from the Oauth token:

{
  "exp": 1732293901,
  "iat": 1732293601,
  "auth_time": 1732292840,
  "jti": "0d11e667-4174-41fe-978e-45a9248d009d",
  "iss": "http://localhost:8080/realms/argilla",
  "aud": [
    "argilla",
    "account"
  ],
  "sub": "66caeedf-df61-4cc3-9ebf-c269643e454e",
  "typ": "Bearer",
  "azp": "argilla",
  "sid": "359fe83f-7263-486d-93a3-61949d15d224",
  "acr": "0",
  "allowed-origins": [
    "http://127.0.0.1:5000",
    "http://localhost:5000"
  ],
  "realm_access": {
    "roles": [
      "default-roles-argilla",
      "offline_access",
      "uma_authorization",
      "llmbot-annotations-at"
    ]
  },
  "resource_access": {
    "account": {
      "roles": [
        "manage-account",
        "manage-account-links",
        "view-profile"
      ]
    },
    "argilla": {
      "roles": [
        "argilla-access"
      ]
    }
  },
  "scope": "microprofile-jwt aud email openid profile",
  "upn": "paulat",
  "email_verified": true,
  "name": "Paul Bauriegel",
  "groups": [
    "default-roles-argilla",
    "offline_access",
    "uma_authorization",
    "llmbot-annotations-at"
  ],
  "preferred_username": "paulat",
  "given_name": "Paul",
  "family_name": "Bauriegel",
  "email": "...
}

In this case, the llmbot-annotations-at group was added through Keycloak. My initial thought is to leverage these group roles to define Argilla roles and control access to specific Argilla workspaces.

Would it make sense to integrate this logic into the new UserInfo class? In our enterprise environment, managing groups of annotators via a central SSO (e.g., Entra ID) is crucial, as we need fine-grained control over roles and workspace access for different groups.

The issue might be that OpenID Connect (OIDC) does not inherently guarantee the availability of roles or groups in all implementations. However, the major OIDC providers offer support for including role and group information in tokens, e.g. Azure AD, Keycloak, Gitlab etc.

What do you think @frascuchon

Great! Based on my proposal here one thing we can do is to extend the backend.get_user_details method to extend the response with all parsed info about roles and workspaces (get the proper role and the proper list of workspaces and set as part of the user data).

class KeycloakOpenId(OpenIdConnectAuth):
    """Huggingface OpenID Connect authentication backend."""

    name = "keycloak"

    @staticmethod
    def from_oidc_endpoint(oidc_endpoint: str):
        if oidc_endpoint is None:
            raise ValueError(....)

        KeycloakOpenId.OIDC_ENDPOINT = oidc_endpoint.rstrip("/")
        KeycloakOpenId.AUTHORIZATION_URL = f"{oidc_endpoint}/protocol/openid-connect/auth"
        KeycloakOpenId.ACCESS_TOKEN_URL = f"{oidc_endpoint}/protocol/openid-connect/token"

        return KeycloakOpenId

    def oidc_endpoint(self) -> str:
        return self.OIDC_ENDPOINT

    def get_user_details(self, response):
       data = super().get_user_details(response)

       data["role"] = # ... compute the role based on response content
       data["allowed_workspaces"] = # ... something similar to identity allowed workspaces dynamically

       return data

Then, we could extend the UserInfo to include workspaces-level access and use here to setup the user properly. Something as:

   ...,
   workspaces = userinfo.allowed_workspaces or settings.oauth.allowed_workspaces
)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants