Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: rotate root CA #433

Open
wants to merge 15 commits into
base: main
Choose a base branch
from
Open

WIP: rotate root CA #433

wants to merge 15 commits into from

Conversation

vkhoroz
Copy link
Member

@vkhoroz vkhoroz commented Aug 19, 2024

This is incomplete and not yet fully tested amid API changes not ready too.

@mike-sul @doanac @StealthyCoder @camilamacedo86 @detsch I'd like to start an internal team discussion on this first.

Please, pay attention to the UI flow, and not the implementation details in the first iteration.

While working on this I've completely rewritten the commands at least twice, and I am not happy with the current flow yet.
A reason is: it is hard to find a proper trade off between the user control and UI simplicity:

  • Counter-intuitively, a simpler UI would mean that the user would screw up more often, given that the operation needs to be as extended in time as possible.
  • While a more complex UI apparently leads to a harder user engagement and more work for the customer success/support teams.

Copy link
Member

@doanac doanac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you walk us through the flow you see? I'm guessing its:

  • fioctl keys ca renewal start
  • fioctl keys ca renewal resign-device-ca

At that point I'm not sure if i do active or the config operations.

This allows old certificates (issued by a previous root CA) to continue being used to issue device client certificates.

Re-signed device CA certificates are stored in the provided PKI directory.
An old PKI directory is used to locate corresponding private keys, and copy them into the PKI directory.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't we use what we know from the server fioctl keys ca show --just-device-cas. Trusting this local directory is correct seems error-prone?

Copy link
Member Author

@vkhoroz vkhoroz Aug 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to copy private keys from an old PKI directory to the new PKI directory. Otherwise, a user might lose them when deleting an old directory.

We don't have those private keys on our servers. But, we have their public parts which we can use for a match.

As for your concerns about a trust: YES, a list of device CA certificates is fetched from the server, not a local directory.

@vkhoroz
Copy link
Member Author

vkhoroz commented Aug 24, 2024

@doanac sorry, I should have posted the command order initially.
It was planned as a help message in the final commit... but as it is not there yet, here is the flow:

# Start CA root renewal
fioctl keys ca renewal start

# Provision renewal bundle to all devices
fioctl config renew-root
# Just think about an alternative notation to the above command like this:
fioctl keys ca renewal provision | deploy --device <uuid> | --group <name> | --all

# Some time after all devices were updated, activate the new CA root, and then re-sign all certificates
fioctl keys ca renewal activate
fioctl keys ca renewal re-sign-device-ca
fioctl keys ca renewal re-sign-tls

# Finally, two optional steps to revoke the previous root CA and provision that change to all devices
fioctl keys ca renewal revoke-root
fioctl config renew-root

Does it make sense to you?

@StealthyCoder
Copy link
Member

I only have a first comment. I do not think the flow is that bad, the only suggestion I would do is make fioctl config renew-root part of fioctl keys ca renewal start and fioctl keys ca renewal revoke-root. Then all commands are within fioctl keys ca renewal "namespace". Then the big question is of course, when can someone determine if all devices have received the new CA ?

This is a trivial change which makes the code layout more clear.
Further commits will make gradual changes to this module.
Hence, keeping the public interface on the top helps a lot.

Signed-off-by: Volodymyr Khoroz <[email protected]>
This starts making common CA routines more navigable.
There will me more functions added to it during the course of this PR

Signed-off-by: Volodymyr Khoroz <[email protected]>
In 80% of use cases they are exactly the same for all PKI related commands.

Signed-off-by: Volodymyr Khoroz <[email protected]>
The x509 common package should not know anything about the argument parsing or validation.
That is a sole responsibility of the subcommands.

Signed-off-by: Volodymyr Khoroz <[email protected]>
A command help and examples will be extended as new sub-commands are being added.

Signed-off-by: Volodymyr Khoroz <[email protected]>
This is the Root CA that is used to sign Device CAs and TLS certificates.
Several factory root CAs can be valid at the same time, but only one of them can be active.

Signed-off-by: Volodymyr Khoroz <[email protected]>
This adds the first workflow command of the root CA renewal,
which generates the EST compliant CA renewal bundle and uploads it to the server.

This is a bare minimum implementation, further extended with auxiliary features in later commits.
For example, an HSM support is added in the next commit.
That approach allows to decrease the level of complexity while traversing commits.

Signed-off-by: Volodymyr Khoroz <[email protected]>
This is useful from 2 perspectives:
1. A user may want to view the certificates (e.g. using openssl storeutl).
2. A user may need to (re-)upload this file to the API (e.g. while experimenting or fixing a broken PKI).

Signed-off-by: Volodymyr Khoroz <[email protected]>
This closely resembles the `rotate-cert` commands layout by adding two new commands:
- A `devices config renew-root` for device level config.
- A `config renew-root` for group and factory level configs.

The key difference is the support for factory-wide config change.
That is needed to facilitate the root CA renewal for group-less devices and devices (auto-)registered in the future.

The created config needs a correlation ID which is fetched from the server.
The server generates a new correlation ID upon any changes to the root CA bundle; entire change log is stored for audit.
This is needed so that config updates triggered for the same root CA renewal (but different devices) are counted as one.
That allows to accurately calculate a number of already updated devices, before proceeding to the next root CA renewal step.

I am not sure if the proposed layout is the best one from the user convenience perspective.
The other option was to add single command like `keys ca renewal deploy [-g | -d]`.
That has its pros and cons, so I am open to start a discussion on it.

Signed-off-by: Volodymyr Khoroz <[email protected]>
This command can be used two-way to switch between old and new root CA.
It is made a standalone command to give the user a better grasp of what is going on.

Signed-off-by: Volodymyr Khoroz <[email protected]>
There are several sub-use cases here:
- A user may lose all or a part of Device CA private keys.
- A user may keep these private key files in different folders.
- A user may wish to not copy them into a new PKI folder.

All of the above use cases are supported by this extension.

Signed-off-by: Volodymyr Khoroz <[email protected]>
@mike-sul
Copy link
Contributor

mike-sul commented Oct 7, 2024

@vkhoroz Is it possible to try and experiment with the new functionality against the backend version that supports it? Maybe something running in the meds/dev environment? I guess to test it e2e will require fioconfig changes too?

@vkhoroz
Copy link
Member Author

vkhoroz commented Oct 9, 2024

Is it possible to try and experiment

@mike-sul The API changes are not ready at this stage.
Fioconfig PoC is here, but it is still not tested: foundriesio/fioconfig#51

Copy link
Member

@doanac doanac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is looking sane. I've added a couple of comments.

Now that you are getting the mechanics of this out of the way: I think we need to think about how we can understand how we can determine the status of this as it gets rolled out

var caRenewalCmd = &cobra.Command{
Use: "renewal",
Short: "Renew the root of trust for your factory PKI",
Long: `These sub-commands allow you to gradually renew a root of trust for your factory PKI.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

factory-> Factory


A guided process allows you to:
- Generate a new root of trust.
- Create an EST standard compliant root CA renewal bundle.- Re-sign all necessary factory PKI certificates.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

line break missing there

If you are using an HSM device - a new private key should be stored under a different label, possibly a new device.
This extreme level of isolation is necessary to prevent an accidental rewrite of the old root of trust.

Once this command completes successfully, a root of trust renewal process is started.`,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kind of a nitpick, but I think you are doing newlines by sentence. Cobra doesn't handle smart line breaks or anything, so it would probably render better if you just hard wrap a column 72 or 80.

factory := viper.GetString("factory")
newCertsDir := args[0]
oldCertsDir := args[1]
if oldCertsDir == newCertsDir {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably should normalize these paths before doing this check so you don't get messed up by something like start ./ $(pwd)

1. First, it generates a new root of trust for your factory.
2. Second, it cross-signs a new root of trust using an old root of trust to prepare a CA renewal bundle.
This CA renewal bundle is compliant with the EST standard, necessary for a secure root CA update on devices.
3. Finally, it uploads the CA renewal bundle to the backend API for validation and storage.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure where to ask in the PR, but I'm wondering if there's going to wind up being an easy way to check "if in progress" and also make sure two different people don't try this at the same time?


if len(group) > 0 {
logrus.Debugf("Renewing device root CA for devices in group %s", group)
} else {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we've decided we should stop adding factory wide config operations like this - they are too dangerous.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants