Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Embiggen the goodness of maghemite #385

Open
wants to merge 14 commits into
base: main
Choose a base branch
from
Open

Conversation

taspelund
Copy link
Contributor

@taspelund taspelund commented Oct 3, 2024

This addresses a few different items found in Maghemite.

  1. remove_prefix_path() did not delete the prefix key from rib_in after the last path was removed.
    Fixes: Static route key retained despite deleting all paths #369

  2. remove_prefix_path() was removing all paths for a prefix that happened to share a nexthop. This is bad because a single call to static_remove_v4_route() would delete all paths with that nexthop even when fields like rib_priority didn't match. This would likely also allow the deletion of a floating static route to inadvertently clobber a BGP route via the same next-hop.

  3. apply_update() and send_update() were incorrectly applying the relevant ImportExportPolicy to the list of withdrawn routes. Filtering route withdrawals causes stale routing information, as we can end up ignoring a legitimate withdrawal from a peer or we could forget to inform a peer that we're no longer originating a route (e.g. if we first set an export policy matching N routes in an announce-set, then removed a route from that announce-set). In either case, this is incorrect behavior that can lead to forwarding issues.
    Fixes: When Export Policy is configured, withdrawn NLRI can only ever contain routes allowed by Export Policy #330

@taspelund taspelund added Bug mgd Maghemite daemon labels Oct 3, 2024
@taspelund taspelund self-assigned this Oct 3, 2024
@taspelund
Copy link
Contributor Author

taspelund commented Oct 3, 2024

rib_in key behavior before:

➜  maghemite git:(f5dbb6a) ./target/debug/mgadm static get-v4-routes
{
    "1.1.1.1/32": [
        Path {
            bgp: None,
            local_pref: None,
            nexthop: 100.64.0.1,
            shutdown: false,
            vlan_id: None,
        },
    ],
}
➜  maghemite git:(f5dbb6a) ./target/debug/mgadm static remove-v4-routes 1.1.1.1/32 100.64.0.1
➜  maghemite git:(f5dbb6a) ./target/debug/mgadm static get-v4-routes
{
    "1.1.1.1/32": [],
}

rib_in key behavior after:

➜  maghemite git:(trey/cleanup_static_route_key) ./target/debug/mgadm static get-v4-routes
{
    "1.1.1.1/32": [
        Path {
            bgp: None,
            nexthop: 100.64.0.0,
            rib_priority: 2,
            shutdown: false,
            vlan_id: None,
        },
        Path {
            bgp: None,
            nexthop: 100.64.0.0,
            rib_priority: 10,
            shutdown: false,
            vlan_id: None,
        },
    ],
}
➜  maghemite git:(trey/cleanup_static_route_key) ✗ ./target/debug/mgadm static remove-v4-routes 1.1.1.1/32 100.64.0.0
➜  maghemite git:(trey/cleanup_static_route_key) ✗ ./target/debug/mgadm static get-v4-routes
{}

@taspelund taspelund marked this pull request as ready for review October 3, 2024 22:25
@taspelund taspelund marked this pull request as draft October 4, 2024 07:50
@taspelund taspelund changed the title Cleanup key in rib_in when all paths are removed Embiggen the goodness of maghemite Oct 4, 2024
@taspelund taspelund added bgp Border Gateway Protocol static Static Routing labels Oct 4, 2024
@taspelund taspelund marked this pull request as ready for review October 4, 2024 08:19
Copy link

@zeeshanlakhani zeeshanlakhani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems fine to me, but can we add a test demonstrating the example you showed for removing the prefix from the rib?

Import/Export filters are meant to modify which advertised prefixes
are allowed. For Import, this is simply an allow-list that accepts a
subset of the advertised nlri in a received update. For Export, this
is an allow-list that accepts a subset of the locally originated nlri.
In neither case do you want to apply these filters to the list of
withdrawn nlri, as this can result in stale routes if a legitimate
withdrawal is not sent or received.

Fixes: #330

Signed-off-by: Trey Aspelund <[email protected]>
Removes an unused function.
Guards an illumos-specific import.
Consolidates (bfd) nexthop enabled functions into one that takes a bool.
Moves RIB locking and looping of prefixes and PrefixChangeNotifications
into RIB removal helper functions to improve pcn batching and
consolidate locations for future RIB batch work.
Removes re-processing of BGP path attributes.
Removes re-looping over routes/paths in a few places.
Eliminates some return types when no callers handled Result.
Adds some TODOs.

Signed-off-by: Trey Aspelund <[email protected]>
@taspelund taspelund force-pushed the trey/cleanup_route_key branch 2 times, most recently from 334fbdb to eb07d9d Compare October 20, 2024 08:29
@taspelund
Copy link
Contributor Author

taspelund commented Oct 20, 2024

I added two more commits + rebased off main.

The second commit adds the requested tests, plus additional unit testing for rdb::Db methods.

The first commit is a refactor of various rdb::Db methods and their callers. Highlights below:

  • Rename of various add/remove methods + helper functions, ideally helping to keep naming more consistent (e.g. using "peer" instead of "id", adding "bgp"/"static" to method names)
  • In the add/remove codepaths, there is now a single PrefixChangeNotification used to batch all pcn's together instead of firing them off individually. I tried to keep the updated routes together in a collection up until the point of processing them, so it will be more clear in the future where to make changes to support bulk prefix updates.
  • Delete codepath now accepts a closure that determines what paths are removed
  • removed an unused method get_imported() and consolidated the enable_nexthop()/disable_nexthop() methods into a single method accepting a bool: set_nexthop_shutdown().
  • Changed return types from Result -> void, since callers were not actually checking the Result anyway
  • loc_rib() now returns a Rib, which aligns with full_rib(), static_rib(), and bgp_rib()
  • Moved BGP attribute -> Path parsing out of a for loop, since these are per-update not per-route
  • Guarded an illumos-specific debug import
  • Added some TODOs into the comments, primarily for bgp: router id insufficient discriminator for route-refresh stale marking #241 and to check that we cover all cases for nexthop parsing
  • Introduced get_selected_prefix_paths() to get an entry from rib_loc

Other refactoring we might consider later:

  • - Moving test helpers into a common spot (e.g. wait_for_eq macros in bgp
  • - Moving logging helpers into a common spot (e.g. all of bgp/src/log.rs)
  • - Cleaning up remaining instances of .lock().unwrap() by moving to lock!()
  • - More ergonomic conversions between messages::Prefix and rdb::Prefix types/subtypes (e.g. Prefix4)

edit: Addressed the logging helpers and updates to use lock!() in new commits

mgd/src/bgp_admin.rs was re-defining lock! identically to the one in
mg-common.  Remove this dupe definition and just import from mg-common.

Signed-off-by: Trey Aspelund <[email protected]>
Replace all old instances of .lock().unwrap() with lock!()

Signed-off-by: Trey Aspelund <[email protected]>
Signed-off-by: Trey Aspelund <[email protected]>
Actually run the same cargo clippy command as CI, so I can see
errors locally :/

Signed-off-by: Trey Aspelund <[email protected]>
Copy link
Collaborator

@rcgoodfellow rcgoodfellow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Trey! Some great improvements here. Just a few comments.

rdb/src/db.rs Show resolved Hide resolved
bgp/src/session.rs Show resolved Hide resolved
@@ -378,8 +379,13 @@ impl Db {
}
}

pub fn get_imported(&self) -> Rib {
lock!(self.rib_in).clone()
pub fn get_selected_prefix_paths(&self, prefix: &Prefix) -> Vec<Path> {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this function is only used for tests, it should probably have a #[cfg(test)] attribute.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Today it's only used in tests, but it seems like this would be a useful function for exposing per-prefix details up through mgadm/nexus. I can add the attribute if you'd like, but if so then I'd probably want to add a TODO comment along with it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get_prefix_paths() is used in the same way today, except it's being imported by #[test] functions in bgp/src/test.rs... and mod test; in bgp/src/lib.rs is also guarded by #[cfg(test)].

For whatever reason, clippy refuses to accept the import of get_prefix_paths() in bgp/src/test.rs while its definition is guarded with #[cfg(test)].

I think that's probably something that could be addressed while doing cleanup of the existing test code. If you feel that's important to address in this PR, I can do that. Otherwise I can follow up on that later.

rdb/src/db.rs Outdated Show resolved Hide resolved
}

Self::update_loc_rib(&rib, &mut lock!(self.rib_loc), prefix);
self.notify(prefix.into());
self.notify(pcn);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we may want to flush before sending out the PCN.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved the flush after the notification because an error can cause an early return. My thought was that a successful update to the loc_rib should trigger a PCN even if the flush of the persistent route DB fails later, since the "running state" is what other listeners are likely intending to react to.

You can correct me if my read on the situation is wrong, or if we need to give some more thought to robust handling here.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The thing I worry about here is sending out a PCN that causes a persistent DB read and the data not being flushed yet. I think there is a deeper issue here, as well as in several other methods that modify in-memory state and then write the result to disk (that were also there before this PR). We need transactional semantics for these methods, either everything succeeds or nothing does. If it's possible to run all the fallible commands that persist state up front and then run the soft state updates that don't have error conditions, that would be ideal.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably not an important detail in this context, but it doesn't look like PCNs trigger a read of the persistent DB.
mg-lower seems to be the only watcher and it reacts to PCNs by kicking off sync_prefix(), which looks at running state known in dpd (ASIC) or the loc_rib.

All that seems appropriate to me, as I don't think we want mg-lower paying attention to rib_in or state changes that don't make their way to the set of best paths. And since the persistent bits here are specific to the static route DB, I also think it makes sense to only update the running state if the persistent change succeeds.

I do agree we should have transactional semantics (especially when the config and running state overlap so closely), but I'm not sure how to go about reverting a failed operation, or what an effective way to propagate errors for a collection of operands (routes) would be.
e.g.
If flush() fails after making a file smaller (removing a route), would we expect the revert action (presumably re-adding the route back into the DB and calling flush() again) to succeed? What happens if the subsequent flush() to get us into the last known good state also fails?
The API also allows for a collection of static routes to come in through this delete path. What happens if an error happens for just a subset of those routes? Should we bail out upon first error (which I think we did prior to this PR), stop and try to revert the operations we've already done, or try to complete the operations for as many routes as we can (aggregating or transforming the errors we've gotten up until this point)?

For the revert action, I think it might behoove us to understand what guarantees flush() makes when it returns an Err. Do we have any guarantees which prior writes have/haven't made it to disk? If not, then I'm not sure what other approach to take beyond calling flush() again... which feels a bit like the definition of insanity (something something... expecting a different outcome)

rdb/src/db.rs Show resolved Hide resolved
rdb/src/db.rs Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bgp Border Gateway Protocol Bug mgd Maghemite daemon needs testing static Static Routing
Projects
None yet
3 participants