Skip to content

Meeting Minutes: February 15, 2018

Mukesh Hira edited this page Apr 19, 2018 · 3 revisions

Meeting Time and Location

10 am to 11:30 am PST (GMT - 8), Barefoot Networks, 4750 Patrick Henry Dr, Santa Clara, CA 95054

Attendees

Alibaba: Heidi Ou
AT&T: Shyam Parekh
Barefoot Networks: Jeongkeun Lee, Mickey Spiegel, Arkadiy Shapiro, Roberto Mari
Cisco Systems: Andy Fingerhut
Dell: Anoop Ghanwani, Senthil Ganesan
Ixia: Chris Sommers
Marvell: Tal Mizrahi, Gidi Navon
Netronome: Bapi Vinnakota
Netsia: Serkant Uluderya
POSTECH: Jonghwan Hyun
Surfnet: Ronald Vanderpol
VMware: Mukesh Hira
Xilinx: Robert Halstead

Discussion

Probe Marker Approach

The group reviewed Heidi’s pull request for supporting probe marker approach and agreed to the changes. Only caveat is that the probe marker must not be used in combination with the other mechanisms to indicate INT since a switch that recognizes an INT packet based only on L4 destination port/DSCP/IPv4 options may start processing the probe marker bytes following L4 header as INT headers. We will clarify this in the spec.

DSCP bits

We revisited use of DSCP bits to identify presence of INT metadata. There are two orthogonal aspects to this -

  1. DSCP space to use for indicating INT: The current spec calls for reserving one bit to indicate INT. However, in brownfield scenarios, the network operator may have a fragmented space of 32 DSCP values such that a single bit cannot be reserved. But the operator may still have half the space unused, can allocate an INT-enabled DSCP value for every QoS DSCP value, map the INT-enabled DSCP value to the same egress queue as the corresponding QoS DSCP value. The spec could allow this flexibility.
  2. Handling conflict of INT-signaling DSCP values in non-INT packets: JK presented a proposal to handle DSCP conflicts by defining INT behavior based on the port that the traffic is received on (port connected to a switch within the INT domain vs. a non-INT switch). The proposal may need to be refined to account for all possible network topologies.

Action item: JK from Barefoot Networks to follow up on addressing these aspects related to use of DSCP bits for identifying INT.

Tal Mizrahi from Marvell proposed adding a Checksum Complement metadata field to the INT specifications. This allows an INT transit switch to add 4 bytes of metadata to the stack such that the L4 checksum remains unchanged, thus performing a “Checksum neutral” update to the packet instead of having to modify the L4 checksum certain number of bytes away from the point where it is inserting metadata. INT sink still needs to decapsulate INT metadata and recompute L4 checksum, however this could be useful at INT transit switches. It was pointed out that IPv4 and IPv6 UDP checksum can be set to zero or ignored in constrained situations such as VXLAN and Geneve tunnels that carry checksum protected payloads. But in situations that require L4 checksum updates (e.g. INT over native TCP/UDP), this capability may be useful. It was decided that if checksum complement metadata was supported in the INT dataplane specification, this should not be enforced on the entire path. A transit switch may choose to add the checksum complement field while another switch may insert 0xFFFFFFFF (or any arbitrary contents in the checksum complement metadata), but update the L4 checksum.

Question for switch vendors in the working group - In the case of INT over VxLAN GPE or Geneve, an INT switch writes checksum complement as the last metadata in the packet. However, in the case of INT over native TCP/UDP, the INT source will need to write to the INT tail header past the metadata, then compute the L4 checksum or checksum complement value, and write the checksum/checksum-complement at a lower byte offset in the packet from the INT tail header. Any particular challenges in supporting such an operation in different chip architectures?

Mukesh from VMware presented the motivation for being able to report both physical and logical port IDs in INT metadata. The group agreed this would be useful. However, reporting the entire port stack (e.g. egress tunnel, L3 SVI, LAG, Physical Port) would need a variable length metadata field which implies significant changes to the specification which is currently based on a conscious choice of using fixed-length metadata fields. The group was inclined to add a second port metadata field for reporting a second level of Port IDs, leaving it to individual switches to decide what gets reported in the logical port field (LAG/SVI/Tunnel ID). This allows for supporting the majority of use-cases with a small modification to the spec.

The question then is whether the new field should be a similar 4 Byte field with 2 bytes for ingress logical port and 2 bytes for egress logical port, or do we need to scale beyond 64K logical ports on a single INT switch and define 4 bytes for ingress logical port and 4 bytes for egress logical port.

Action item: Mukesh to follow up with proposed modifications.

Next Steps

  • We are targeting releasing version 1.0 of the data plane and report format specifications by end of March. Please call out issues you would like to address in version 1.0 at the upcoming meetings
  • Next bi-weekly meeting will be on Thursday March 1, 2018, 10 am to 11:30 am. Meeting location and agenda will be announced over the meeting list closer to the meeting date.