Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added config for socket options for listeners #5352

Merged
merged 2 commits into from
Aug 15, 2023

Conversation

tsaarni
Copy link
Member

@tsaarni tsaarni commented May 10, 2023

This PR adds new optional config field to support DSCP marking for outbound IP packets, for both IPv4 (TOS field) and IPv6 (Traffic Class field).

Fixes #4605

Signed-off-by: Tero Saarni [email protected]

Detailed description

The new configuration is added to existing listener stanza:

Example config file:

listener:
  socket-options:
   tos: 64
   traffic-class: 64

Example ContourConfiguration CR

apiVersion: projectcontour.io/v1alpha1
kind: ContourConfiguration
metadata:
  name: contour
  namespace: projectcontour
spec:
  envoy:
   listener:
    socketOptions:
     tos: 64
     trafficClass: 64

The DSCP field is the 6 most significant bits of the exposed fields. However, the PR proposes exposing the socket options as they are exposed in Envoy and the socket API (complete TOS and TrafficClass bytes). For example, to set DSCP value 16 user needs to bit shift the value to 6 most significant bits of the byte and set the tos and/or trafficClass to value 64.

The PR proposes moving the existing TCP keepalive options into a generic socket option file. If more socket options are added in the future, the new file would be the place to add them - regardless of being TCP or IP level options.

Limitations

Since the two new socket options depend on IP version, the user needs to know in advance which one they want to set.

For example, if the worker nodes where Envoy runs have IPv6 support enabled in the kernel, and Envoy was configured to listen to IPv6 address, then traffic-class can be set. If not, setting traffic-class will cause failure to configure listeners to Envoy. Envoy will call setsockopt() to set IPv6 socket option on a socket that does not support IPv6, the syscall returns error and Envoy rejects the whole listener configuration. Contour cannot validate in advance if this syscall will succeed on the nodes where Envoy runs, so unfortunately it will be runtime failure.

The documentation for the config has a note about this limitation.

@tsaarni tsaarni requested a review from a team as a code owner May 10, 2023 15:15
@tsaarni tsaarni requested review from stevesloka and skriss and removed request for a team May 10, 2023 15:15
@tsaarni tsaarni added the release-note/small A small change that needs one line of explanation in the release notes. label May 10, 2023
@tsaarni tsaarni added release-note/small A small change that needs one line of explanation in the release notes. and removed release-note/small A small change that needs one line of explanation in the release notes. labels May 10, 2023
@codecov
Copy link

codecov bot commented May 10, 2023

Codecov Report

Merging #5352 (f616af2) into main (aa18b98) will increase coverage by 0.03%.
The diff coverage is 94.11%.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #5352      +/-   ##
==========================================
+ Coverage   78.56%   78.59%   +0.03%     
==========================================
  Files         138      138              
  Lines       19101    19150      +49     
==========================================
+ Hits        15006    15051      +45     
- Misses       3809     3812       +3     
- Partials      286      287       +1     
Files Changed Coverage Δ
cmd/contour/serve.go 20.26% <0.00%> (-0.03%) ⬇️
pkg/config/parameters.go 85.80% <72.72%> (-0.46%) ⬇️
cmd/contour/servecontext.go 83.55% <100.00%> (+0.14%) ⬆️
internal/envoy/v3/listener.go 98.48% <100.00%> (ø)
internal/envoy/v3/socket_options.go 100.00% <100.00%> (ø)
internal/envoy/v3/stats.go 100.00% <100.00%> (ø)
internal/featuretests/v3/envoy.go 99.06% <100.00%> (ø)
internal/xdscache/v3/listener.go 92.05% <100.00%> (+0.15%) ⬆️

@github-actions
Copy link

The Contour project currently lacks enough contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

  • After 14d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, the PR is closed

You can:

  • Mark this PR as fresh by commenting or pushing a commit
  • Close this PR
  • Offer to help out with triage

Please send feedback to the #contour channel in the Kubernetes Slack

@github-actions github-actions bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 25, 2023
@skriss skriss removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 25, 2023
@github-actions
Copy link

github-actions bot commented Jun 9, 2023

The Contour project currently lacks enough contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

  • After 14d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, the PR is closed

You can:

  • Mark this PR as fresh by commenting or pushing a commit
  • Close this PR
  • Offer to help out with triage

Please send feedback to the #contour channel in the Kubernetes Slack

@github-actions github-actions bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 9, 2023
@tsaarni tsaarni removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 10, 2023
@github-actions
Copy link

The Contour project currently lacks enough contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

  • After 14d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, the PR is closed

You can:

  • Mark this PR as fresh by commenting or pushing a commit
  • Close this PR
  • Offer to help out with triage

Please send feedback to the #contour channel in the Kubernetes Slack

@github-actions github-actions bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 25, 2023
@tsaarni tsaarni removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 30, 2023
@tsaarni
Copy link
Member Author

tsaarni commented Jun 30, 2023

I've now resolved merge conflicts that had appeared during recent changes in main. The PR is now again ready for review.

@github-actions
Copy link

The Contour project currently lacks enough contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

  • After 14d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, the PR is closed

You can:

  • Mark this PR as fresh by commenting or pushing a commit
  • Close this PR
  • Offer to help out with triage

Please send feedback to the #contour channel in the Kubernetes Slack

@github-actions github-actions bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 15, 2023
@sunjayBhatia sunjayBhatia self-requested a review July 19, 2023 18:26
@github-actions github-actions bot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 20, 2023
@github-actions
Copy link

github-actions bot commented Aug 4, 2023

The Contour project currently lacks enough contributors to adequately respond to all PRs.

This bot triages PRs according to the following rules:

  • After 14d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, the PR is closed

You can:

  • Mark this PR as fresh by commenting or pushing a commit
  • Close this PR
  • Offer to help out with triage

Please send feedback to the #contour channel in the Kubernetes Slack

@github-actions github-actions bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 4, 2023
@sunjayBhatia sunjayBhatia removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 4, 2023
@tsaarni
Copy link
Member Author

tsaarni commented Aug 4, 2023

Catching up with some more merge conflicts. These changes seem to be on a path that is bit prone to merge conflicts over the time, so reviews would be appreciated! 😄 🙏

Copy link
Member

@sunjayBhatia sunjayBhatia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overall looks correct from a quick pass over the code, seems straightforward enough to set some socket options on the Envoy Listeners

should maybe add a featuretest in lieu of an e2e test to make sure config gets plumbed all the way through correctly

this looks like it will intersect a bit with #5523 but I can sort that out if this merges first 👍🏽

}

// SocketOptions defines configurable socket options for Envoy listeners.
type SocketOptions struct {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(w/o knowing the details of TOS vs. Traffic Class) is it valid to auto detect the Listener ip family to be able to collapse these two into one field?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reliably auto-detecting could be challenging when listener address is set as ::. We use Ipv4Compat: true so Envoy accepts also IPv4 clients - unless IPv4 is disabled on the host where Envoy is running, I think.

Copy link
Member

@sunjayBhatia sunjayBhatia Aug 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah fair point, if we wanted to be clever could we maybe do something like "only have one new API field and if listen address == :: then set the ipv4 and ipv6 option with that value, otherwise set the appropriate option for the address family"?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if in practice users may want to set different values between ipv4 and ipv6 seems more correct to keep these options/configurable values separate

@tsaarni
Copy link
Member Author

tsaarni commented Aug 4, 2023

Thank You for review @sunjayBhatia!

should maybe add a featuretest in lieu of an e2e test to make sure config gets plumbed all the way through correctly

So far, I've checked the values manually from IP header using Wireshark. It would be great to have e2e test, but I think it would require a test client that processes HTTP responses at IP packet level. Normally the values are processed only by the routers & host IP stacks, but the receiving application is unaware of the values.

Did you have some other approach in mind for e2e?

@sunjayBhatia
Copy link
Member

sunjayBhatia commented Aug 4, 2023

Thank You for review @sunjayBhatia!

should maybe add a featuretest in lieu of an e2e test to make sure config gets plumbed all the way through correctly

So far, I've checked the values manually from IP header using Wireshark. It would be great to have e2e test, but I think it would require a test client that processes HTTP responses at IP packet level. Normally the values are processed only by the routers & host IP stacks, but the receiving application is unaware of the values.
Did you have some other approach in mind for e2e?

yeah totally agree, I think just an additional explicit test in internal/featuretest (unless I missed it is already there) for the new socket options added in this PR to make sure the config is generated properly when all the different internal components are hooked up should be sufficient on top of that validation you've done

@tsaarni
Copy link
Member Author

tsaarni commented Aug 7, 2023

Sorry @sunjayBhatia, somehow, I've accidentally managed to edit your reply instead of replying to you. Maybe I picked "edit" inadvertently, instead of "quote reply" 😵‍💫

Well, regardless of that confusion, I've now added test in internal/featuretests/ to validate the new socket options. I still left also internal/envoy/v3/socket_options_test.go, although it is bit redundant now.

Copy link
Member

@sunjayBhatia sunjayBhatia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like some conflicts to resolve and just to close out this thread #5352 (comment)

Otherwise 👍🏽

New config field was added to support DSCP marking for outbound traffic,
for both IPv4 (TOS field) and IPv6 (Traffic Class field).

Signed-off-by: Tero Saarni <[email protected]>
@sunjayBhatia sunjayBhatia merged commit e508a4a into projectcontour:main Aug 15, 2023
27 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release-note/small A small change that needs one line of explanation in the release notes.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Make configurable Envoy listener socketOptions to allow DSCP marking for QoS purposes
3 participants