-
Notifications
You must be signed in to change notification settings - Fork 316
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(NODE-1474): refine config tool and add config versioning #2299
base: master
Are you sure you want to change the base?
Conversation
}; | ||
|
||
// Assemble GuestOSConfig | ||
let guestos_config = GuestOSConfig { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider creating a function which returns Result, testing would be easier.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
generate_testnet_config does return a Result. Are you saying to actually return the file path or the contents written to the file rather than just ()?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I missed the whole point in my comment, what I meant was a function that returns Result. The tests have to initialize a NamedTempFile (and we can't even nicely test the contents without parsing the output file). If you split the implementation into two, it'd be easier to test the contents of the created GuestOSConfig.
rs/ic_os/config/src/lib.rs
Outdated
"elasticsearch_tags": "tag1 tag2" | ||
}, | ||
"nns_public_key_exists": true, | ||
"nns_urls": [ | ||
"http://localhost" | ||
], | ||
"node_operator_private_key_exists": true, | ||
"use_ssh_authorized_keys": false, | ||
"icos_dev_settings": {} | ||
}, | ||
"hostos_settings": { | ||
"vm_memory": 490, | ||
"vm_cpu": "kvm", | ||
"verbose": false | ||
}, | ||
"guestos_settings": { | ||
"inject_ic_crypto": false, | ||
"inject_ic_state": false, | ||
"inject_ic_registry_local_store": false, | ||
"guestos_dev_settings": { | ||
"backup_spool": { | ||
"backup_retention_time_seconds": 3600, | ||
"backup_purging_interval_seconds": 600 | ||
}, | ||
"malicious_behavior": null, | ||
"query_stats_epoch_length": 1000, | ||
"bitcoind_addr": "127.0.0.1:8333", | ||
"jaeger_addr": "127.0.0.1:6831", | ||
"socks_proxy": "127.0.0.1:1080", | ||
"hostname": "my-node" | ||
} | ||
} | ||
} | ||
"#; | ||
|
||
const GUESTOS_CONFIG_JSON_V1_0_0: &str = r#" | ||
{ | ||
"network_settings": { | ||
"ipv6_config": { | ||
"Fixed": { | ||
"address": "2a00:fb01:400:200::2/64", | ||
"gateway": "2a00:fb01:400:200::1" | ||
} | ||
}, | ||
"ipv4_config": null | ||
}, | ||
"icos_settings": { | ||
"config_version": "1.0.0", | ||
"mgmt_mac": "ec:2a:72:31:a2:0c", | ||
"deployment_environment": "Mainnet", | ||
"logging": { | ||
"elasticsearch_hosts": "elasticsearch-node-0.mercury.dfinity.systems:443", | ||
"elasticsearch_tags": "tag1 tag2" | ||
}, | ||
"nns_public_key_exists": true, | ||
"nns_urls": [ | ||
"http://localhost" | ||
], | ||
"node_operator_private_key_exists": true, | ||
"use_ssh_authorized_keys": false, | ||
"icos_dev_settings": {} | ||
}, | ||
"guestos_settings": { | ||
"inject_ic_crypto": true, | ||
"inject_ic_state": true, | ||
"inject_ic_registry_local_store": true, | ||
"guestos_dev_settings": { | ||
"backup_spool": { | ||
"backup_retention_time_seconds": 7200, | ||
"backup_purging_interval_seconds": 1200 | ||
}, | ||
"malicious_behavior": null, | ||
"query_stats_epoch_length": 2000, | ||
"bitcoind_addr": "127.0.0.1:8333", | ||
"jaeger_addr": "127.0.0.1:6831", | ||
"socks_proxy": "127.0.0.1:1080", | ||
"hostname": "guest-node" | ||
} | ||
} | ||
} | ||
"#; | ||
|
||
#[test] | ||
fn test_deserialize_hostos_config_v1_0_0() -> Result<(), Box<dyn std::error::Error>> { | ||
let config: HostOSConfig = serde_json::from_str(HOSTOS_CONFIG_JSON_V1_0_0)?; | ||
assert_eq!(config.icos_settings.config_version, "1.0.0"); | ||
assert_eq!(config.hostos_settings.vm_cpu, "kvm"); | ||
Ok(()) | ||
} | ||
|
||
#[test] | ||
fn test_deserialize_guestos_config_v1_0_0() -> Result<(), Box<dyn std::error::Error>> { | ||
let config: GuestOSConfig = serde_json::from_str(GUESTOS_CONFIG_JSON_V1_0_0)?; | ||
assert_eq!(config.icos_settings.config_version, "1.0.0"); | ||
assert_eq!( | ||
config.icos_settings.mgmt_mac.to_string(), | ||
"ec:2a:72:31:a2:0c" | ||
); | ||
Ok(()) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The simplest solution is to just maintain config backwards compatibility via unit tests (and system tests once we have those going).
When someone updates config, they will increment the CONFIG_VERSION field and add deserialization unit tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about it a little more and I have a few concerns with this approach:
- When would the version be bumped? After every change (addition, removal)?
- What happens if a type changes which is defined outside types.rs?
- How would the unit tests ensure that an older version of the code can read a newer version of the json?
- Is the plan to add a json constant every time a new version is introduced?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When would the version be bumped? After every change (addition, removal)?
That's what I was thinking, yes:
//! - **Updating CONFIG_VERSION**: Always update the CONFIG_VERSION constant (increment the minor version) whenever you modify the configuration.
What happens if a type changes which is defined outside types.rs?
That's a great point. The types defined outside of types.rs:
- malicous_behavior (this is only used in testing, so I don't think it's an issue. It should always be null in production)
- FormattedMacAddress (FormattedMacAddress is just a String wrapper, but even if it were to change, the unit tests would catch any deserialization issues that arise with the old config)
pub struct FormattedMacAddress(String);
How would the unit tests ensure that an older version of the code can read a newer version of the json?
The deserialization code is updated on HostOS and GuestOS updates, so this shouldn't be an issue. Therefore, each version of deserialize_config only needs backwards compatibility, not forwards compatibility, in case we did want to change deserialize_config.
Is the plan to add a json constant every time a new version is introduced?
That was what I was thinking, yes. If you're introducing a new version, add a GUESTOS_CONFIG_JSON_V1_1_0 and HOSTOS_CONFIG_JSON_V1_1_0 constant and a unit test for each.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
Therefore, each version of deserialize_config only needs backwards compatibility, not forwards compatibility
Are you sure? What if we deploy a newer HostOS and then retrograde the GuestOS? Then an older GuestOS would need to understand a config "from the future".
For example, say we have an ip_addr
field which we rename to ip_address
at some point. Even if we implement reading from both ip_addr
and ip_address
in the newer version to be backwards compatible, we'd still have problems if serialize the config using the newer field name (ip_address
) which the older GuestOS would not understand. Or am I missing something?
add a GUESTOS_CONFIG_JSON_V1_1_0 and HOSTOS_CONFIG_JSON_V1_1_0 constant and a unit test for each.
What does the unit test check? Just that deserialization is successful or also that all fields have been preserved? In the latter case, the unit tests would be huge.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you sure? What if we deploy a newer HostOS and then retrograde the GuestOS? Then an older GuestOS would need to understand a config "from the future".
Ah, yes, you're right. I'll add a note capturing this and emphasize that deserialize_config should only be updated when necessary (and it's very short so I doubt it will be) and that if it is updated, to not do it in tandem with config updates. Are we comfortable with this risk?
What does the unit test check? Just that deserialization is successful or also that all fields have been preserved?
Just that deserialization is successful. Does this seem sufficient?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we comfortable with this risk?
I'm concerned about the following situation (lmk if I'm wrong): we have a logging
field in ICOSSettings
. One day someone comes and says "hmm we don't need to configure logging anymore" and they remove the logging
field while bumping the version and adding a unittest. The change roles out to a newly installed HostOS and the machine is added to a subnet which runs a GuestOS version without the change. This older version will error out / crash while parsing the config because of the missing logging
field and we may not even be able to easily reproduce the problem.
Just that deserialization is successful. Does this seem sufficient?
I see, so it protects us from adding a new required field without a default value. Makes sense. We may want to consider marking every field as optional or defaulted (which is what protocol buffers do). That would address my concern above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm concerned about the following situation...
Arff. You're absolutely right. So our options for handling this:
- Set policy: you cannot remove a non-optional/non-default field unless you are absolutely sure it's no longer in use anywhere on mainnet (and sure that we won't rollback to a version that uses it). ANALYSIS: I actually think this is manageable. I suspect we won't be removing a lot of fields. And the impact of hitting issue is not catastrophic. As you just outlined, it's a downgrade issue impacting freshly-deployed nodes. Removing fields should never introduce an upgrade compatibility risk.
- Make every field optional or have a default value. ANALYSIS: I don't love this. What do we do for ipv6_config? This is not optional, and we don't want default values for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unless you are absolutely sure it's no longer in use anywhere on mainnet
When removing a field, we'd need to remove the field from the guestos deserialization (while keeping it in the hostos), wait until all guests have been updated and only then remove the field.
This is not optional, and we don't want default values for it.
Fair, if we want to refactor it in the future, we'll have to keep populating the old field as well.
I believe the correct policy is to say that required fields must not be removed directly. In a first step, they have to be made optional and code that reads the value must be removed/handle missing values. In a second step, after the first step has rolled out to all guestos-s, the field can be removed. We need to keep a list of removed field names to ensure they're never reused.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In addition, it'd also be a good policy that we require a thorough review process on these files to ensure that changes are long-term and correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good! I've updated the Configuration Update Protocol:
//! - Removing Fields: To prevent backwards-compatibility deserialization errors, required fields must not be removed directly:
//! In a first step, they have to be made optional and code that reads the value must be removed/handle missing values.
//! In a second step, after the first step has rolled out to all OSes and there is no risk of a rollback, the field can be removed.
Regarding this:
We need to keep a list of removed field names to ensure they're never reused.
I actually believe the unit tests should handle this. Consider field X is a String in unit tests. Then it's removed from config, and later, a new field named X is added that's an integer. When this new field X is added to the config, the old unit test will fail as it's unable to deserialize the String as an int.
In addition, it'd also be a good policy that we require a thorough review process on these files to ensure that changes are long-term and correct.
I agree. Is there any way to formalize this? I added to the config readme:
Any changes to the configuration should undergo a thorough review process to ensure they follow the guidlines.
..Default::default() | ||
}; | ||
|
||
let result = generate_testnet_config(args, PathBuf::from("/tmp/guestos_config.json")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not 100% optimal to do I/O during a unit test (unless it's the I/O that we're testing) but it's also not too bad. Optimally, we'd have a function which returns Result instead of Result<()> which we's used for unit testing. But I'm fine with the current approach as well because it's simpler and we can always refactor the code in the future.
Changes: