-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: rotate root CA #433
base: main
Are you sure you want to change the base?
WIP: rotate root CA #433
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you walk us through the flow you see? I'm guessing its:
fioctl keys ca renewal start
fioctl keys ca renewal resign-device-ca
At that point I'm not sure if i do active
or the config operations.
This allows old certificates (issued by a previous root CA) to continue being used to issue device client certificates. | ||
|
||
Re-signed device CA certificates are stored in the provided PKI directory. | ||
An old PKI directory is used to locate corresponding private keys, and copy them into the PKI directory. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't we use what we know from the server fioctl keys ca show --just-device-cas
. Trusting this local directory is correct seems error-prone?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to copy private keys from an old PKI directory to the new PKI directory. Otherwise, a user might lose them when deleting an old directory.
We don't have those private keys on our servers. But, we have their public parts which we can use for a match.
As for your concerns about a trust: YES, a list of device CA certificates is fetched from the server, not a local directory.
@doanac sorry, I should have posted the command order initially.
Does it make sense to you? |
I only have a first comment. I do not think the flow is that bad, the only suggestion I would do is make |
This is a trivial change which makes the code layout more clear. Further commits will make gradual changes to this module. Hence, keeping the public interface on the top helps a lot. Signed-off-by: Volodymyr Khoroz <[email protected]>
This starts making common CA routines more navigable. There will me more functions added to it during the course of this PR Signed-off-by: Volodymyr Khoroz <[email protected]>
In 80% of use cases they are exactly the same for all PKI related commands. Signed-off-by: Volodymyr Khoroz <[email protected]>
The x509 common package should not know anything about the argument parsing or validation. That is a sole responsibility of the subcommands. Signed-off-by: Volodymyr Khoroz <[email protected]>
A command help and examples will be extended as new sub-commands are being added. Signed-off-by: Volodymyr Khoroz <[email protected]>
This is the Root CA that is used to sign Device CAs and TLS certificates. Several factory root CAs can be valid at the same time, but only one of them can be active. Signed-off-by: Volodymyr Khoroz <[email protected]>
This adds the first workflow command of the root CA renewal, which generates the EST compliant CA renewal bundle and uploads it to the server. This is a bare minimum implementation, further extended with auxiliary features in later commits. For example, an HSM support is added in the next commit. That approach allows to decrease the level of complexity while traversing commits. Signed-off-by: Volodymyr Khoroz <[email protected]>
Signed-off-by: Volodymyr Khoroz <[email protected]>
This is useful from 2 perspectives: 1. A user may want to view the certificates (e.g. using openssl storeutl). 2. A user may need to (re-)upload this file to the API (e.g. while experimenting or fixing a broken PKI). Signed-off-by: Volodymyr Khoroz <[email protected]>
This closely resembles the `rotate-cert` commands layout by adding two new commands: - A `devices config renew-root` for device level config. - A `config renew-root` for group and factory level configs. The key difference is the support for factory-wide config change. That is needed to facilitate the root CA renewal for group-less devices and devices (auto-)registered in the future. The created config needs a correlation ID which is fetched from the server. The server generates a new correlation ID upon any changes to the root CA bundle; entire change log is stored for audit. This is needed so that config updates triggered for the same root CA renewal (but different devices) are counted as one. That allows to accurately calculate a number of already updated devices, before proceeding to the next root CA renewal step. I am not sure if the proposed layout is the best one from the user convenience perspective. The other option was to add single command like `keys ca renewal deploy [-g | -d]`. That has its pros and cons, so I am open to start a discussion on it. Signed-off-by: Volodymyr Khoroz <[email protected]>
This command can be used two-way to switch between old and new root CA. It is made a standalone command to give the user a better grasp of what is going on. Signed-off-by: Volodymyr Khoroz <[email protected]>
Signed-off-by: Volodymyr Khoroz <[email protected]>
There are several sub-use cases here: - A user may lose all or a part of Device CA private keys. - A user may keep these private key files in different folders. - A user may wish to not copy them into a new PKI folder. All of the above use cases are supported by this extension. Signed-off-by: Volodymyr Khoroz <[email protected]>
Signed-off-by: Volodymyr Khoroz <[email protected]>
Signed-off-by: Volodymyr Khoroz <[email protected]>
80f29d6
to
039da9f
Compare
@vkhoroz Is it possible to try and experiment with the new functionality against the backend version that supports it? Maybe something running in the meds/dev environment? I guess to test it e2e will require fioconfig changes too? |
@mike-sul The API changes are not ready at this stage. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is looking sane. I've added a couple of comments.
Now that you are getting the mechanics of this out of the way: I think we need to think about how we can understand how we can determine the status of this as it gets rolled out
var caRenewalCmd = &cobra.Command{ | ||
Use: "renewal", | ||
Short: "Renew the root of trust for your factory PKI", | ||
Long: `These sub-commands allow you to gradually renew a root of trust for your factory PKI. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
factory-> Factory
|
||
A guided process allows you to: | ||
- Generate a new root of trust. | ||
- Create an EST standard compliant root CA renewal bundle.- Re-sign all necessary factory PKI certificates. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
line break missing there
If you are using an HSM device - a new private key should be stored under a different label, possibly a new device. | ||
This extreme level of isolation is necessary to prevent an accidental rewrite of the old root of trust. | ||
|
||
Once this command completes successfully, a root of trust renewal process is started.`, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
kind of a nitpick, but I think you are doing newlines by sentence. Cobra doesn't handle smart line breaks or anything, so it would probably render better if you just hard wrap a column 72 or 80.
factory := viper.GetString("factory") | ||
newCertsDir := args[0] | ||
oldCertsDir := args[1] | ||
if oldCertsDir == newCertsDir { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably should normalize these paths before doing this check so you don't get messed up by something like start ./ $(pwd)
1. First, it generates a new root of trust for your factory. | ||
2. Second, it cross-signs a new root of trust using an old root of trust to prepare a CA renewal bundle. | ||
This CA renewal bundle is compliant with the EST standard, necessary for a secure root CA update on devices. | ||
3. Finally, it uploads the CA renewal bundle to the backend API for validation and storage. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure where to ask in the PR, but I'm wondering if there's going to wind up being an easy way to check "if in progress" and also make sure two different people don't try this at the same time?
|
||
if len(group) > 0 { | ||
logrus.Debugf("Renewing device root CA for devices in group %s", group) | ||
} else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we've decided we should stop adding factory wide config operations like this - they are too dangerous.
This is incomplete and not yet fully tested amid API changes not ready too.
@mike-sul @doanac @StealthyCoder @camilamacedo86 @detsch I'd like to start an internal team discussion on this first.
Please, pay attention to the UI flow, and not the implementation details in the first iteration.
While working on this I've completely rewritten the commands at least twice, and I am not happy with the current flow yet.
A reason is: it is hard to find a proper trade off between the user control and UI simplicity: