-
Notifications
You must be signed in to change notification settings - Fork 275
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: Client refactor #1291
WIP: Client refactor #1291
Conversation
Thanks, this is a great start. There is already a discussion on how to continue (longer living branch or PRs or what) on slack so I'll just leave some thoughts on where we should try to move next. I think this the right time to try splitting the Updater functionality into meaningful components -- the Metadata API does a good bit of that already but I think Updater is still going to be too large and should be split if possible (and I think it is possible, also tuf-on-a-plane has some useful ideas). Listing some Updater functionality below for inspiration:
To me tracking the currently valid set of metadata is one potentially separate thing here: an object could hold the currently valid set of metadata (and verify new metadata and maybe handle reading/writing the local metadata cache) without knowing about all the steps that the TUF spec requires or about the API that Updater wants to provide. Maybe this leads to most code being in the metadata handler but this might be worth testing -- it's also something a bit like what @trishankatdatadog did in tuf-on-a-plane (Repository vs models,writers,readers). On download.py: maybe download helper methods are still useful but I don't think you should keep using download.py as it is. What might make sense is a redesign of mirrors.py+download.py where a "MirrorDownloader" knows about the fetcher, knows about all the mirrors and gives the Updater an easy way to download metadata and targets without worrying about the details? On MetadataWrapper: I think you should start aggressively filing issues for any individual functionality that you think should be in Metadata so we can discuss them one-by-one... unless there's a wider discussion that you think should be had about it? -- in any case I think a separate bug works in that case too. On public API changes: I think no-one has talked about major changes to the client API so I think fine-tuning (if needed) can wait until we have the architecture looking reasonable? The only major change I can imagine is cleaning up the mirrors configuration somehow but even there I have no practical ideas. Reading local metadata vs downloading metadata: currently these are intermingled in the update process, I'm not sure they should be? This also relates to the API as well: refresh() can be called multiple times (even though it's not something e.g. pip intends to do), it seems odd to reload the local files every time. On the other hand the delegated target metadata must anyway be loaded only when needed... so maybe my point isn't that well made. Some random comments from reading the code:
|
Big thanks for pioneering this, @sechkova, and for your great high-level review, @jku! Let me add some quick thoughts to the latter:
💯 Let's once more stress the importance of the word reference in reference implementation. :)
I agree with @jku that it would be good to implement this separately (i.e. in a re-usable way). We will need similar VCS-like functionality for a repository tool too. Although, other than the repository tool, the client does not need to worry about differences between metadata in memory and on disk within a given version number. (related issues: #955, #958, #964)
Note that simple read/write to file storage is implemented on the Metadata class see
Will need to take a closer look at @sechkova's diff in order to comment.
I agree that we should evaluate, whether there is an architectural need for a client-side MetadataWrapper, or if these things could be on the Metadata class. -- So much for now. Will comment more after having taken a closer look. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Had a closer look, and here are more comments. :) (Note: they are not related to the lines where I posted them, only to the modules).
tuf/download.py
Outdated
@@ -195,42 +195,42 @@ def _download_file(url, required_length, fetcher, STRICT_REQUIRED_LENGTH=True): | |||
average_download_speed = 0 | |||
number_of_bytes_received = 0 | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I seem to recall that there has been discussion about download.py
, where it should live, why it shouldn't be in the client subdirectory, or even part of updater.py
, etc... I can't remember details. Can someone else? Maybe from the #1250 team?
Also, I suggest we get rid of safe_download
and unsafe_download
. The names are unspecific and the latter sounds scarier than it is. AFAICS they only add an unnecessary level to the call stack.
Maybe we can even merge _download_file
and _check_downloaded_length
. The latter seems to be mostly docstring and logging statements.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I seem to recall that there has been discussion about
download.py
, where it should live, why it shouldn't be in the client subdirectory, or even part ofupdater.py
, etc... I can't remember details. Can someone else? Maybe from the #1250 team?
This is at least one of the places it has been mentioned: #1250 (comment)
Also, I suggest we get rid of
safe_download
andunsafe_download
. The names are unspecific and the latter sounds scarier than it is. AFAICS they only add an unnecessary level to the call stack.Maybe we can even merge
_download_file
and_check_downloaded_length
. The latter seems to be mostly docstring and logging statements.
I think we are all on the same page about totally reworking download.py + mirrors.py.
# Copyright 2020, New York University and the TUF contributors | ||
# SPDX-License-Identifier: MIT OR Apache-2.0 | ||
|
||
"""TUF client 1.0.0 draft |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool stuff, @sechkova! Is this as feature-complete as the existing updater.py
? If so, nice job condensing it to 1/3 of LOC! :)
Some high-level comments:
_mirror_meta_download
and_mirror_target_download
look a lot a like. Do we need both?- Same goes for
_get_full_meta_name
and_get_relative_meta_name.
Maybe the former could be a wrapper around the latter? - The 5 top-level functions at the bottom of the module feel a bit lost. Should we add a client.util module? Or make them part of the
Updater
class, e.g. asstaticmethod
s?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool stuff, @sechkova! Is this as feature-complete as the existing
updater.py
? If so, nice job condensing it to 1/3 of LOC! :)
I believe it implements the same features as the existing updater but most likely it misses to address corner cases. Anyway, in terms of volume I think this will be approximately the end result :)
Some high-level comments:
_mirror_meta_download
and_mirror_target_download
look a lot a like. Do we need both?
I consider them part of the "totally rework download and mirrors" plan and I hope to see them gone!
- Same goes for
_get_full_meta_name
and_get_relative_meta_name.
Maybe the former could be a wrapper around the latter?- The 5 top-level functions at the bottom of the module feel a bit lost. Should we add a client.util module? Or make them part of the
Updater
class, e.g. asstaticmethod
s?
Yes, you are right, staticmethod
would have helped describe them better. They are actually copy-paste transfer of the old code which I needed to make a full update cycle work. They need a better place (same as _get_*_name
methods)
# SPDX-License-Identifier: MIT OR Apache-2.0 | ||
|
||
"""Metadata wrapper | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find it a bit paradox that MetadataWrapper
is another container on top of Metadata
, adding another level of nesting, when most of the methods/properties are actually shortcuts to attributes contained somewhere in Metadata
. If we want a client-variant of Metadata
, I'd maybe use inheritance instead of composition. 🤷
But maybe we don't even need it. Let's quickly go through the individual methods/properties of here:
On MetadataWrapper
:
from_json_file
-> available asMetadata.from_file
with json as default (Move metadata class model de/serialization to sub-package #1279)from_json_object
-> could be a format-agnostic method onMetadata
just likefrom_file
.persist
-> available asMetadata.to_file
with json as default (Move metadata class model de/serialization to sub-package #1279)expires
-> could be a method onSigned
, I'd call it 'is_expired'verify
-> This is definitely needed for the repository tool as well. So it should be somewhere where both client and repository tool can access it. Maybe even onMetadata
in addition to the vanilla one-signatureverify
we currently have. See A simple, per-file metadata CRUD API #1060 (comment) for thoughts about that distinction.
All the other methods here are either pure shortcuts or filters of attributes contained in Metadata objects (or contained objects).
- Re shortcuts: https://github.com/theupdateframework/tuf/pull/1060/files#r452906629 has arguments against shortcuts.
- Re filters: Maybe it will become easier to access individual properties once we have classes for all complex attributes as planned in Add classes for complex metadata fields #1139? But similarly to the arguments against shortcuts, I'd be careful when implementing behavior that relates to contained classes on container classes.
Some really nice work happening here, thank you all! An a more procedural point, we agreed that we want to do this experimental client in a separate branch ( |
I'm not opposed to merge this PR as is. The only disadvantage I can see is that it will close this discussion board. So let's make sure to capture all relevant items in new or existing issues first. @sechkova, would you be willing to do that? |
Agree, let me work on this a couple of days and I will notify you when I think it is ready to be merged. |
Rebased on the latest changes to include the new metadata serialization. |
I opened the following list of issues to capture the discussion, most of them are marked as
Let me know if you disagree or you find something missing. Otherwise I think we can merge this PR and continue working on the list. |
That's a healthy list of issues, thanks @sechkova. I'd like CI to pass for PRs agains the experimental-client branch before we merge pull requests. I think it would be reasonable to add a temporary revertible commit which prevents coverage failing on the new client, either by ignoring it or by lowering coveralls' With that change, we should be able to see genuine problems in PRs to the experimental-client branch, such as the fact that test_refresh is failing on Windows. Lets fix this before we land this PR:
|
e5c9502
to
fa4065f
Compare
Temporary reduced the coverage tool failure boundary to 90%, as suggested, in order to uncover other test failures. The Windows tests were failing due to DOS-style line ending conversion in the test metadata files which on its turn was leading to a file size not matching the one defined in the metadata. This is fixed by adding a Now with tests passing, probably I should do something with the linter too or at least temporary ignore the new files. |
Looks like ~95% of the warnings are "bad-indentation". I suggest to run It might be worth to rebase on top of #1314 once it's merged and configure linting and auto-formatting for your new files akin to how we do for new files in |
Applied your suggestions and now Fixed most of the linter issues but still had to temporary disable some checks that required less trivial changes. I suggest fixing them when working on the related issues (see last two commits). |
Opened another issue to track "temporary" commits: #1320 |
182ed12
to
0172753
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Everything I could think of is now filed as an issue, will merge this (to experimental-client) today if there are no more objections.
A proposal of a new Updater.refresh() implementation: - based on metadata API - no longer dependent on keydb/roledb - follows the TUF specification's client workflow Introduces a MetadataWrapper class with the goal of providing functionality which is at this point missing in metadata API. Signed-off-by: Teodora Sechkova <[email protected]>
Mostly a transfer of the current client code related to the actual target files download. Needs to be further reworked. Signed-off-by: Teodora Sechkova <[email protected]>
Adds a basic test case for Updater. Applies the linter config used in api/metadata.py to all files under client_rework. Signed-off-by: Teodora Sechkova <[email protected]>
Coverage failures may hide other failing tests in the CI. Configure coverage to fail under 90 percent during the ongoing experimental-client development. Signed-off-by: Teodora Sechkova <[email protected]>
For compatibility with Windows systems, declare repository_data files to always have LF line endings on checkout. A trailing "/**" matches everything inside, with infinite depth. Signed-off-by: Teodora Sechkova <[email protected]>
Apply the updated api/pylintrc config to the client_rework directory. Signed-off-by: Teodora Sechkova <[email protected]>
Run manually the black and isort code formatters over the client_rework code. Signed-off-by: Teodora Sechkova <[email protected]>
Configure tox to run black and isort over the files under client_rework directory. Signed-off-by: Teodora Sechkova <[email protected]>
Fix linter issues after applying the api/pylintrc config over the client_rework/* code. Signed-off-by: Teodora Sechkova <[email protected]>
Temporary disable (inline) try-except-raise and broad-except warnings in the new Updater code until client exception handling is revised (theupdateframework#1312). Signed-off-by: Teodora Sechkova <[email protected]>
Temporary disable (inline) undefined-loop-variable pylint checks in the new Updater code until the download functionality is revised (theupdateframework#1307). Signed-off-by: Teodora Sechkova <[email protected]>
The commits are only reworded, no code changes. |
Description of the changes being introduced by the pull request:
This is an in-progress draft for the client code refactor.
Proposes a new
Updater.refresh()
implementation:- based on
metadata API
- no longer dependent on
keydb/roledb
- follows the TUF specification's client workflow
- reusing the network abstraction work #1250
Introduces a
MetadataWrapper
class with the goal of providing functionality which is at this point missing in metadata API.I imagine this code would find a proper place with the advancement of the refactor.
The code related to the actual target files download is mostly a transfer from the old code. Needs to be further
reworked.
A non-extensive list of next steps:
Updater
methods and exceptions Design client library #1135get_one_valid_targetinfo
,updated_targets
,download_target
,_preorder_depth_first_walk
etc.) Split Updater into logical components: bundle metadata #1308MetadataWrapper
functionality: metadata api, keys api, ??? RemoveMetadataWrapper
class #1304 Add 'expired' method to Signed class #1305 Implement verification by a threshold of keys #1306download.py,
mirrors.py
together with the newfetcher.py
client/updater design: mirror config redesign #1143 Split Updater into logical components: redesign mirrors.py and download.py #1307Please verify and check that the pull request fulfills the following
requirements: