Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

QUIC Initial Packet Decryption and Parsing #37

Merged
merged 12 commits into from
Jul 26, 2024

Conversation

sippejw
Copy link
Contributor

@sippejw sippejw commented Jul 15, 2024

Summary

The standard QUIC handshake between two hosts who do not have a pre shared key involves the use of QUIC Initial packets to share a TLS 1.3 client and server hello. After these two messages have been received, the hosts are able to change to new keys unknown to network observers. This PR add the ability to parse QUIC Initial packets, decrypt the protected payloads, parse the QUIC frames, and parse the TLS client and server hello. This gives users the ability to filter QUIC traffic on many of the same features as TLS, such as SNI.

Features

  • Fixes bug that causes subtraction overflow panic on short retry packets
  • Adds module (crypto.rs) for initial key derivation, based on Cloudflare's Quiche
  • Decrypts all init packets
  • Parses QUIC frames through new module frame.rs
  • Introduces QuicConn for tracking multi-packet QUIC connections
  • Handles CRYPTO frame reassembly, necessary for Chrome
  • Parses TLS Client and ServerHello, stored in a field on the QuicConn object
  • Adds buffers for multi packet ClientHello and ServerHello TLS messages, necessary for connections negotiating a Kyber key

Testing

  • Adds two new traces quic_kyber.pcapng and quic_xargs.pcap for testing QUIC parsing
  • Additionally testing on all preexisting pcaps in traces/
  • 5 minutes on network traffic:
Port 0 statistics
SW Capture %: UNKNOWN
Out of Buffer %: UNKNOWN
HW Discard %: UNKNOWN
+--------------+---------------------------+
|    824503802 | rx_good_packets           |
| 681271301084 | rx_good_bytes             |
|      1515575 | rx_missed_errors          |
|            0 | rx_mbuf_allocation_errors |
+--------------+---------------------------+
----------------------------------------------
Current time: 299s
mempool_0 avail: 8355951, in use: 32683 (0.390%)
Ingress: 0 bps / 0 pps
Good:    19021608712 bps / 2792173 pps
Process: 19021608712 bps / 2792173 pps
Drop: 0 pps (NaN%)
HW Dropped: 0 pkts (NaN%)
SW Dropped: 1515575 pkts (inf%)
Total Dropped: 1515575 pkts (inf%)
----------------------------------------------
AVERAGE Ingress: 0.000 bps / 0.000 pps
AVERAGE Good:    18820386371.329 bps / 2766791.282 pps
AVERAGE Process: 18820386371.329 bps / 2766791.282 pps
DROPPED: 1515575 pkts (inf%)

Main done. Ran for 300.100442755s
Done. Logged 316235 Quic stream to "quic.json"

@sippejw sippejw marked this pull request as draft July 15, 2024 22:17
@sippejw
Copy link
Contributor Author

sippejw commented Jul 15, 2024

Currently, this PR has the ability to decrypt at least some portion of QUIC client initial traffic. I will test further with a variety of client traces to ensure that no edge cases have been missed. I have gone ahead and opened this PR to get feedback from @thegwan and @thearossman on the direction that you guys were planning to go with QUIC processing at the QUIC frame, TLS, and application layer.

@thearossman
Copy link
Collaborator

Hey! Initial thoughts on the WIP:

  • Related to any directions we're going w/r/t QUIC, TLS, and app-layer parsing: nothing is in the works right now, so that shouldn't impact this PR.

  • Can you add more detail to both the PR and comments in the documents around what you mean by decrypting QUIC payload? Would be helpful to have something like RFC link, clarification around client init packets, etc. (When a few of us first looked at this PR, our reaction was... "wait, you can decrypt the contents of a QUIC stream???" I now see that's not what you're doing, but would be helpful to have the information more clear up-front.)

Copy link
Collaborator

@thearossman thearossman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two clarifying q's, thank you!

@@ -0,0 +1,295 @@
// This is heavily based on Cloudflare's Rust implementation of QUIC, known as Quiche.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

similar to general comment -- at some point, header comment would be helpful for us (I'm only surface-level familiar with inner workings of QUIC)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do!

@@ -39,6 +40,9 @@ pub struct QuicPacket {

/// The number of bytes contained in the estimated payload
pub payload_bytes_count: Option<u64>,

// The decrypted QUIC packet payload
pub decrypted_payload: Option<Vec<u8>>,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We had a few questions out of curiosity - not feedback -

I ran this, and got a dump of raw payload (just bytes). Can you clarify what is expected to be in that payload? Is there anything that should be parsed from it, for example that a researcher might reasonably want to filter on? If not, what kinds of possible use-cases are there for collecting the raw bytes?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, raw bytes are definitely not helpful!

For background, I have an ongoing project looking at specific features of the QUIC Client Init that can be used to develop a fingerprint and identify QUIC clients. I already have a functioning QUIC parser that was built on top of TLS Fingerprint. However, I have been wanting to rebuild the QUIC version on top of Retina.

Before this is ready to merge I intend to add support for a fully parsed QUIC Client Init including the TLS ClientHello. I added that field temporarily to ensure that the decrypted payload was as expected, a list of QUIC frames.

To answer your question, I envision a researcher being able to filter on the same things one could filter in TLS like SNI.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The latest commit introduces frame parsing. I have replaced the decrypted_payload with a frames vector on the QuicPacket struct. Once I have the CRYPTO frame reassembly and TLS ClientHello parsing implemented I will remove the data field from the serialized CRYPTO frame output which will help clean things up further.

@sippejw
Copy link
Contributor Author

sippejw commented Jul 16, 2024

  • Can you add more detail to both the PR and comments in the documents around what you mean by decrypting QUIC payload? Would be helpful to have something like RFC link, clarification around client init packets, etc. (When a few of us first looked at this PR, our reaction was... "wait, you can decrypt the contents of a QUIC stream???" I now see that's not what you're doing, but would be helpful to have the information more clear up-front.)

I will gladly add more detail as it comes together! At this stage, the PR does decrypt the contents of the first packet. This packet, which I've been referring to as the QUIC Client Init, is encrypted using keys derived from a version specific publicly available salt. The plaintext contains QUIC frames that can be parsed to produce a TLS 1.3 ClientHello. The same process can be applied to the QUIC Server Init which will produce a TLS ServerHello. Once the key exchange has been completed the connection uses the new keys and we can no longer see into the packet.

@sippejw
Copy link
Contributor Author

sippejw commented Jul 16, 2024

Here are a couple of links for more details:
RFC 9000 21.1.2 - Describes packet protection and the initial packet encryption.
RFC 9001 5 - An in depth explanation of how the initial keys are derived and what is occurring in crypto.rs.
A nice illustrated example of how to parse the initial packets of a QUIC connection: https://quic.xargs.org

@sippejw sippejw changed the title [WIP] QUIC Payload Decryption Support [WIP] QUIC Initial Packet Decryption and Parsing Jul 19, 2024
@sippejw sippejw changed the title [WIP] QUIC Initial Packet Decryption and Parsing QUIC Initial Packet Decryption and Parsing Jul 25, 2024
@sippejw
Copy link
Contributor Author

sippejw commented Jul 25, 2024

Hi @thearossman,
I have implemented a significant number of changes and figured this is a good stopping point for now. You'll notice that the structure of the output has changed significantly, mainly due to QUIC being parsed per connection instead of per packet. The output also contains a TLS object that will typically contain a parsed ClientHello and ServerHello. Let me know your thoughts and if there are any changes you would like to see before getting this merged upstream. Thanks!

@sippejw sippejw marked this pull request as ready for review July 25, 2024 18:42
@thearossman
Copy link
Collaborator

Thank you!! I agree re: connection-level quic data making sense.

Going to run this in the afternoon when there's more traffic on our network with a few different types of subscribed data (quic, connection/flow features, frames).

One high-level question -- is there any point in the QuicStream when the connection could be reasonably considered ParseResult::Done, given that this PR is parsing connection-level data? Being "done" parsing a session, at some point, would help performance.

@sippejw
Copy link
Contributor Author

sippejw commented Jul 26, 2024

I'm glad you asked about that, I thought for a while about how to handle this situation and I'm open to whatever you think makes the most sense.

Unfortunately, it is difficult to tell when connection has ended as the frames that the hosts send to close a connection are in the encrypted payload. I decided to always leave the connection open so that additional stats on connection size and shape could be collected. I figured that somewhere further up the stack Retina has a timeout on flows and would eventually consider the connection closed.

However, I have noticed an increase in drops after the application has been running on live traffic for a few minutes. So if this is a persistent issue, and related to the lack of connection closure, I see a couple of ways it could be handled:

  • Complete the connection after the client and server hello messages
    The initial packets contain the only decipherable information about the connection. For what I am working on, this is the most interesting information. However, questions on QUIC performance or adoption may want statistics on connection duration, size, number of packets, etc.
  • Cap the number of packets collected
    I could create a counter that parses the first n packets and then marks the connection as complete once that cap has been met.
  • Adjust connection timeout threshold
    As I hinted at above, I am not super familiar with how Retina is managing long lived connections that it never receives a close result for. However, I could dig into this and attempt to lower the threshold for QUIC specific flows to account for the unknown connection closure.

Let me know what you think and I am happy to make changes accordingly!

@thearossman
Copy link
Collaborator

This is really helpful, thank you! I agree with all of this. Two thoughts --

(1)
There are a few issues with UDP connections in Retina right now, to be honest. What you observe around drops spiking after a few minutes is pretty consistent for UDP protocols across the board.

  • The timer wheel will age out connections eventually, so lower timeouts help -- but this also risks prematurely aging out UDP connections that are actually still valid (e.g., an IoT device that occasionally pings a controller).
  • One development task on the back-burner is to improve UDP connection handling, such as using different heuristics for timeouts.

So, for a QUIC-specific experiment, decreasing the connection timeout (see the config file) makes a lot of sense and is definitely sufficient!

(2)
For a dev solution, I like your option 1 -- stop parsing when there's no more interesting parsable information. For QUIC performance or adoption, a user can subscribe to Connection metadata (connection duration, size, num packets, etc.) and filter for QUIC (could mean writing a subscribable type that delivers parsed QUIC data + connection metadata).

Maybe one clarification is that "stop tracking the connection" is different from "stop parsing". The former is dependent on both the filter and the subscribable type. The latter is dependent on the filter only (match_state, nomatch_state, ParseResult).

Some examples:

(1) If a user filters for TLS and subscribes to a type that is Level::Connection, then the connection will still be tracked until termination or timeout -- that's when the subscribable type is ready to stop tracking the data. You still get reasonable performance because parsing is expensive, and TLS is considered "done parsing" after the handshake.
(2) If a user filters for TLS and subscribes to TLS handshakes, then the handshake is delivered when it's parsed. After that, the connection won't be tracked; there won't be another handshake.
(3) HTTP is a bit different, since there can be multiple HTTP txns per connection; session match state == start parsing again.

I'm thinking for QUIC, we can stop parsing if we don't think there will be more interesting information (ParseResult::Done) and once we have the first session (match/nomatch state).

Lots of info here - does that make sense to you?

@sippejw
Copy link
Contributor Author

sippejw commented Jul 26, 2024

That makes sense to me! Thank you for the clarification, I am still trying to wrap my head around all of the nuances of Retina.

The latest commits add a return state for when the first short header packet is seen. This packet can't be sent until at least one of the hosts has completed the handshake. So it will be well after the initial packets are sent. Let me know what you think of this solution.

@thearossman
Copy link
Collaborator

This makes sense based on my understanding of QUIC, which is admittedly not super deep. Trusting you for the deeper QUIC expertise and experience with what's useful from QUIC parsing, but the output and perf look good to us. Going to merge!

@thearossman thearossman merged commit 68a5de2 into stanford-esrg:main Jul 26, 2024
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants