Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KCM: provide mechanism to purge expired credentials #6667

Open
qralston opened this issue Apr 8, 2023 · 11 comments
Open

KCM: provide mechanism to purge expired credentials #6667

qralston opened this issue Apr 8, 2023 · 11 comments
Assignees
Labels

Comments

@qralston
Copy link

qralston commented Apr 8, 2023

Because KCM permits multiple credentials for the same user to be stored in the cache collection, credentials tend to accumulate. Over time, as cached credentials expire, the user’s cache collection becomes littered with duplicate credentials.

We have discovered that having duplicate expired credentials in the cache collection causes breakage. For example, ssh credential delegation can select an expired credential to delegate to the target host, even when a duplicate non-expired credential existed in the collection. In our environment, where home directories are mounted via NFSv4 with sec=krb5p, this locks the user out of their home directory, as they must either acquire or delegate a non-expired credential in order to access their home directory.

Problems like this—where a failure on a remote host is in fact being caused by issues on the local host that initiated the remote connection—are exceedingly difficult for many users to grasp.

(Issue #6357, where KCM will randomly change the primary cache in the cache collection (now fixed, but it will take a while for that fix to propagate out to distros) makes this even worse.)

User complaints have gotten bad enough that we are trying to figure out a way to throw together some sort of “poor man’s expired credential purger.” But unfortunately, sssd makes this exceedingly difficult, because sssctl provides no ability to query any aspect of KCM.

After trial and error, running this command as root:

$ tdbdump /var/lib/sss/secrets/secrets.ldb |
  grep ^key |
  tr , '\012' |
  grep -E '^CN=[[:digit:]]+$' |
  sort |
  uniq |
  cut -d= -f2 |
  xargs -e -r getent passwd |
  cut -d: -f1

…looks like it will show us the usernames of all users with credentials in KCM. From there, it should be possible to enumerate over those users via runuser and run a script to purge any expired credentials:

$ klist -l |
  awk '$3 == "(Expired)" {print $2}' |
  xargs -e -r -t -l kdestroy -c

But if sssd users have to resort to kluges like this—using third-party tools (tdbdump is a Samba utility) to dump KCM internals—in order to prevent KCM from causing breakage, it means that KCM lacks critical functionality.

Specifically: KCM needs a mechanism to automatically purge expired credentials. E.g., something like this:

krb5_expired_purge_interval (string)

The time in seconds between checks for expired credentials in KCM. When a check for expired credentials occurs, all expired credentials found in KCM, for all users except the root user, will be purged, regardless of the mechanism by which the credential was added to KCM. The value is an integer immediately followed by a time unit:

s for seconds
m for minutes
h for hours
d for days.

If there is no unit given, s is assumed.

NOTE: It is not possible to mix units. To set the purge interval to one and a half hours, use 90m instead of 1h30m.

If this option is not set, or is set to 0, no checks for expired credentials occur. This means that expired credentials will persist in all users’ respective cache collections until manually deleted via kdestroy.

Default: not set

Note that the “regardless of the mechanism by which the credential was added to KCM” part is critical: our users frequently use kinit to stuff other credentials into their cache collections.

There is a pressing need for this: it will eliminate problems caused by other processes and services unintentionally plucking expired credentials out of the user’s cache collection, and it will prevent the secrets database from growing without bounds because expired credentials are never purged.

Please add this feature.

@alexey-tikhonov
Copy link
Member

aplopez added a commit to aplopez/sssd that referenced this issue Sep 6, 2023
When adding a new credential and the user reached its quota, try to
remove the user's oldest expired credential to make place.

Resolves: SSSD#6667
aplopez added a commit to aplopez/sssd that referenced this issue Sep 7, 2023
When adding a new credential and the user reached its quota, try to
remove the user's oldest expired credential to make place.

Resolves: SSSD#6667
:feature: The auto removal of expired credentials allows to automatically
          remove the oldest expired credential when the user's maximum
          limit was reached and a new credential is to be added to KCM.
          If no expired credential is found to be removed, the operation
          will fail as it happened in the previous versions.
aplopez added a commit to aplopez/sssd that referenced this issue Sep 7, 2023
When adding a new credential and the user reached its quota, try to
remove the user's oldest expired credential to make place.

Resolves: SSSD#6667
:feature: When adding a new credential to KCM and the user has
          already reached their limit, the oldest expired credential
          will be removed to free some space.
          If no expired credential is found to be removed, the operation
          will fail as it happened in the previous versions.
aplopez added a commit to aplopez/sssd that referenced this issue Sep 7, 2023
:feature: When adding a new credential to KCM and the user has
          already reached their limit, the oldest expired credential
          will be removed to free some space.
          If no expired credential is found to be removed, the operation
          will fail as it happened in the previous versions.

Resolves: SSSD#6667
aplopez added a commit to aplopez/sssd that referenced this issue Sep 7, 2023
:feature: When adding a new credential to KCM and the user has
          already reached their limit, the oldest expired credential
          will be removed to free some space.
          If no expired credential is found to be removed, the operation
          will fail as it happened in the previous versions.

Resolves: SSSD#6667
aplopez added a commit to aplopez/sssd that referenced this issue Sep 7, 2023
:feature: When adding a new credential to KCM and the user has
          already reached their limit, the oldest expired credential
          will be removed to free some space.
          If no expired credential is found to be removed, the operation
          will fail as it happened in the previous versions.

Resolves: SSSD#6667
aplopez added a commit to aplopez/sssd that referenced this issue Sep 8, 2023
:feature: When adding a new credential to KCM and the user has
          already reached their limit, the oldest expired credential
          will be removed to free some space.
          If no expired credential is found to be removed, the operation
          will fail as it happened in the previous versions.

Resolves: SSSD#6667
aplopez added a commit to aplopez/sssd that referenced this issue Sep 20, 2023
:feature: When adding a new credential to KCM and the user has
          already reached their limit, the oldest expired credential
          will be removed to free some space.
          If no expired credential is found to be removed, the operation
          will fail as it happened in the previous versions.

Resolves: SSSD#6667
aplopez added a commit to aplopez/sssd that referenced this issue Oct 3, 2023
:feature: When adding a new credential to KCM and the user has
          already reached their limit, the oldest expired credential
          will be removed to free some space.
          If no expired credential is found to be removed, the operation
          will fail as it happened in the previous versions.

Resolves: SSSD#6667
aplopez added a commit to aplopez/sssd that referenced this issue Oct 3, 2023
:feature: When adding a new credential to KCM and the user has
          already reached their limit, the oldest expired credential
          will be removed to free some space.
          If no expired credential is found to be removed, the operation
          will fail as it happened in the previous versions.

Resolves: SSSD#6667
aplopez added a commit to aplopez/sssd that referenced this issue Oct 3, 2023
:feature: When adding a new credential to KCM and the user has
          already reached their limit, the oldest expired credential
          will be removed to free some space.
          If no expired credential is found to be removed, the operation
          will fail as it happened in the previous versions.

Resolves: SSSD#6667
aplopez added a commit to aplopez/sssd that referenced this issue Oct 9, 2023
:feature: When adding a new credential to KCM and the user has
          already reached their limit, the oldest expired credential
          will be removed to free some space.
          If no expired credential is found to be removed, the operation
          will fail as it happened in the previous versions.

Resolves: SSSD#6667
pbrezina pushed a commit that referenced this issue Oct 11, 2023
:feature: When adding a new credential to KCM and the user has
          already reached their limit, the oldest expired credential
          will be removed to free some space.
          If no expired credential is found to be removed, the operation
          will fail as it happened in the previous versions.

Resolves: #6667

Reviewed-by: Sumit Bose <[email protected]>
Reviewed-by: Tomáš Halman <[email protected]>
(cherry picked from commit 93ee015)
@pbrezina
Copy link
Member

Pushed PR: #6917

  • master
    • 96d8b77 - KCM: Display in the log the limit as set by the user
    • 93ee015 - KCM: Remove the oldest expired credential if no more space.
  • sssd-2-9
    • 834b536 - KCM: Display in the log the limit as set by the user
    • 1fa7210 - KCM: Remove the oldest expired credential if no more space.

@pbrezina pbrezina added the Closed: Fixed Issue was closed as fixed. label Oct 11, 2023
@opoplawski
Copy link

So, I'm curious about the actual fix implemented. The requester here seems to have asked for expired credentials to automatically be removed from the cache (which I would like to see as well). But what seems to have been implemented is that if the cache fills up the oldest expired credential will be removed to make room. This would still suggest that we will end up with large amounts of expired credentials in the cache. Is that right or am I missing something?

@alexey-tikhonov
Copy link
Member

what seems to have been implemented is that if the cache fills up the oldest expired credential will be removed to make room. This would still suggest that we will end up with large amounts of expired credentials in the cache. Is that right or am I missing something?

That's right.

@aplopez
Copy link
Contributor

aplopez commented Feb 1, 2024

You are right. What was implemented is that the oldest expired credential will be removed if a new credential needs to be added to the cache. This is the best solution considering other users want the opposite behavior.

If you want to limit the number of credentials, you can use max_uid_ccaches, max_ccaches and max_ccache_size. Please check man sssd-kcm(8).

And, of course, you can always run kdestroy -A to clean the whole cache.

@joakim-tjernlund
Copy link
Contributor

How do I use kdestroy to delete another users cache ?

@alexey-tikhonov
Copy link
Member

How do I use kdestroy to delete another users cache ?

'su $user; kdestroy -A'

@qralston
Copy link
Author

While I appreciate the effort that went into PR #6917, unfortunately, PR #6917 does not fix this issue.

Our issue isn’t that we’re filling up KCM with credentials. Our issue is that we rely heavily on authenticated filesystem access (CIFS, NFS RPCGSS) where a kernel upcall mechanism needs to obtain user credentials, and these upcall mechanisms seem to assume kernel persistent keyring behavior, where 1) duplicate credentials are not permitted, and 2) the kernel automatically purges expired credentials. As such, these upcall mechanisms can misbehave if there are multiple expired user credentials in the user’s cache collection, plucking an expired credential instead of a non-expired one, causing Permission denied errors and all sorts of other breakage.

To put it simply as possible: having expired credentials present in a user’s cache collections badly breaks things. KCM needs a mechanism to purge expired credentials, regardless of how they were added to KCM, reasonably quickly after the credential expires.

In psuedocode, we need this:

for each user U cache collection in KCM; do
  for each credential C in user U cache collection; do
    if credential C is expired; then
      kdestroy C
    fi
  done
done

Basically, we want this option:

  • purge_expired_credentials_interval

    This parameter goes in the [kcm] section.

    KCM periodically purges all expired credentials, for all users who have credential collections. This parameter specifies how many seconds KCM waits after completing a purge before performing the next purge.

    The minimum value is 300 (5 minutes) and the maximum value is 86400 (24 hours).

    As a special case, a value of 0 is also supported. If the value is 0, KCM does not purge expired credentials.

    The default is 0; that is, KCM does not purge expired credentials by default; one must specifically set this parameter to a legal nonzero value to enable purging expired credentials.

If the root user could easily enumerate the set of users who have any active credentials in KCM, then we could implement our own purge_expired_credentials_interval feature using a cron job / systemd timer that enumerated over the users with credentials, and used setpriv to invoke a (e.g.) purge-any-expired-credentials script for each user with credentials. (klist -l flags which credentials are expired in its output, so at that point, one can purge expired credentials simply by looking for expired credentials in the klist -l output and then executing kdestroy with KRB5CCNAME set to that specific credential.)

But, alas, the contents of KCM are completely opaque: even if one is running as root, there is no sssd tool (e.g. sssctl) that will enumerate the set of users who have credentials in KCM. (I briefly played around with attempting to parse the output of tdbdump /var/lib/sss/secrets/secrets.ldb, but lordy, that would graduate from a kluge to an ugly hack.)

So, we are stuck: SSSD neither implements a feature to purge expired credentials (which cause massive breakage in our environment), nor gives us the ability to kluge something together ourselves. We don’t want to abandon KCM and go back to using the kernel persistent keyring, but for the amount of breakage we are experiencing with KCM and expired credentials, we are reluctantly considering it.

I know it is difficult to infer tone in online communication, so I will specifically disclaim that this is a completely honest question (neither sarcasm nor snark): have I adequately explained what the issue here is? If not, what is unclear; what do I need to clarify?

Finally: please reopen this issue, because the issue is not fixed.

@andreboscatto
Copy link
Contributor

Hi @qralston,

Thank you for your honesty and taking the time to respond with such a detailed explanation. It really helped us discuss and develop the following User Story, Description, and Acceptance Criteria. Could you please confirm if these address the needs you described?

Before that, we'd like to be transparent as well. We do intend to work on this, but our pipeline is currently full. We're focusing on new features related to Zero Trust Architecture, Passwordless authentication, OAuth2, and others.

@aplopez is about to start a significant work related to the performance of SSSD's caching mechanism, which has been a frequent source of user complaints over the years. Identifying the bottlenecks and potential solutions to address those, drafting the design page with the proposed changes, development, testing and other tasks to enhancing SSSD performance related to caching will take some good amount of time. Once that work is accomplished, we can tackle this KCM RFE. If you’re okay with waiting a few months, that’s great. If anyone reading this comment is willing to contribute, you are more than welcome, and we will assist however we can.

User Story

As an admin, I want to implement a mechanism to periodically purge expired credentials from the KCM, Then the system will automatically remove expired credentials to prevent permission errors and system breakage.

Description:

The system currently faces issues with expired credentials in the Key Collection Manager (KCM) causing permission errors and operational disruptions. These issues arise because the kernel upcall mechanisms, which handle authenticated filesystem access (such as CIFS, NFS RPCGSS), incorrectly handle expired credentials. To resolve this, we need to introduce a parameter purge_expired_credentials_interval in the KCM configuration that allows the system to periodically purge expired credentials for all users. This feature will ensure that expired credentials are promptly removed, thus maintaining the integrity and functionality of the upcall mechanisms.

Acceptance Criteria

  1. Configuration Parameter Addition:
  • A new configuration parameter purge_expired_credentials_interval is added to the [kcm] section of the KCM configuration file.
  • The parameter accepts values starting with 300 seconds (no less than that)
  • A value of 0 disables the purging mechanism.
  1. Default Behavior:
  • By default, purge_expired_credentials_interval is set to 0, meaning no automatic purging of expired credentials occurs unless explicitly configured.
  1. Purging Mechanism Implementation:
  • The system periodically checks and purges expired credentials based on the interval specified by purge_expired_credentials_interval.
  • The purging process involves iterating over each user's credential collection and removing credentials that have expired.
  • The purge will happen regardless what user is.
  1. System Integrity and Logging:
  • Ensure that the purging process does not affect valid credentials or disrupt active sessions.
  • Detailed logging is implemented for purging activities, including timestamps and user identifiers for purged credentials, to aid in monitoring and troubleshooting (SSSD debug level 9)
  1. Man page:
  • This information should be available at the man page, describing its behavior and warning users about the potential harm when enabling both mechanisms (Remove the oldest expired credential if nor more space)
  1. Test:
  • Create and automate tests of this new feature

Kindly
André Boscatto - SSSD Product Owner

@andreboscatto andreboscatto reopened this Jun 23, 2024
@yrro
Copy link
Contributor

yrro commented Jun 27, 2024

I think that if credentials are only purged on a timer, there can still be a period of time (up to 300 seconds in the above design) where a user's KCM cache collection will contain a valid credential and an expired credential for the same principal.

If purging of expired credentials happens during the process of adding a new credential to the cache then this window is greatly shortened. We're already removing the oldest expired credential if there's no space: how about an option to, when a credential cache is added for a principal, remove all other credential caches for that principal? That way, as long as a new credential cache for a principal is added before the old one expires, there's no period of time where an expired credential cache confuses clients.

As for the clients themselves: it might be worth filing separate issues with the clients (nfs-utils/gssproxy/cifs-utils) to improve their behaviour in the presence of credential cache collections that may contain multiple credential caches for a given principal. If that were to happen then this improvement in SSSD wouldn't be so important.

[I've edited this comment to improve wording and flesh a few things out]

@qralston
Copy link
Author

qralston commented Jul 8, 2024

Hi @andreboscatto, yes; I think the (User Story, Description, and Acceptance Criteria) you described are accurate. Thank you!

@yrro: I think it would be fine if there were an option to enable KCM to automatically purge any expired credentials in a credential collection when certain types of interactions occur (or perhaps any type of interaction occurs) with that credential collection. However, I think that feature might be more difficult to implement than a simple background timer/cleanup action. Furthermore, that’s something that can easily be implemented outside of sssd. E.g., an /etc/profile.d/purge-expired-credentials.sh file as follows:

#! /bin/sh

if [ 0$(id -u 2>/dev/null) -gt 0 ]; then
  klist -l 2>/dev/null | awk '$3 == "(Expired)" {print $2}' | while read C; do
    env KRBCCNAME="${C}" kdestroy 2>/dev/null
  done
  unset C
fi

This won’t help us when a user logs in with expired credentials that derail upcall mechanisms, though, because the upcall mechanism fires when the shell touches the user’s home directory, which occurs before the /etc/profile.d scripts are sourced.

And yes, I agree that in an ideal world, the upcall mechanisms should not misbehave. But the reality is that changing the upcall mechanisms is likely going to be a tough sell, because the behavior of the kernel persistent keyring (no duplicate credentials; the kernel automatically purges credentials when they expire) is the de-facto standard behavior for credential collections, and with that behavior, no issues occur.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants