Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IPIP-0421: HTTP Delegated Routing Reader Privacy Upgrade #421

Open
wants to merge 15 commits into
base: main
Choose a base branch
from
79 changes: 79 additions & 0 deletions src/ipips/ipip-0421.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,79 @@
---
title: "IPIP-0421: HTTP Delegated Routing Reader Privacy Upgrade"
date: 2023-05-31
ipip: ratified
lidel marked this conversation as resolved.
Show resolved Hide resolved
editors:
- name: Andrew Gillis
github: gammazero
- name: Ivan Schasny
github: ischasny
- name: Masih Derkani
github: masih
- name: Will Scott
github: willscott
order: XXX
lidel marked this conversation as resolved.
Show resolved Hide resolved
tags: ['ipips', 'routing', 'privacy', 'double hashing']
---

## Summary

This IPIP specifies new HTTP API for Privacy Preserving Delegated Content Routing provider lookups.

## Motivation

IPFS is currently lacking of many privacy protections. One of its main weak points lies in the lack
of privacy protections for the Content Routing subsystem. Currently neither Readers (clients accessing files)
nor Writers (hosts storing and distributing content) have much privacy with regard to content they publish or
consume. It is very easy for a Content Router or a Passive Observer to learn which file is requested by
which client during the routing process, as the potential adversary easily learns about the requested `CID`.
A curious actor could request the same `CID` and download the associated file to monitor the user’s behavior.
This is obviously undesirable and has been for some time now a strong request from the community.

The latest upgrades to the DHT and IPNI have introduced Double Hashing - a technique that aims to better preserve Reader Privacy.
With Double Hashing in place Provider Records are encrypted and opaque to Content Routers. If presented with the original `CID` a
Content Router can decrypt the relevant Provider Records and serve them via the existing Delegated Routing API.
However in order to benefit from the privacy enhancement users need to change the way they interact with Content Routers, in particular:
- A second hash over the original `Multihash` must be used when looking up the content;
- Returned Provider Records are encrypted and must be decrypted by the client before using them;
- The client might choose to fetch additional encrypted Metadata from the Content Router.

This new way of interaction can not be fullfilled by the existing API. This IPIP is an incremental improvement to the HTTP Delegated Routing API that adds
new endpoints for serving encrypted content. The original API can still be used for not Privacy Preserving lookups.

Writer Privacy is out of scope of this IPIP and is going to be addressed separately.

## Detailed design

See the Delegated Routing Reader Privacy Upgrade spec (:cite[http-routing-reader-privacy-v1]) included with this IPIP.

## Design rationale

This API proposal makes the following changes:
- Adds new methods for looking up encrypted Provider Records and encrypted Metadata;
- Defines Hashing and Encryption functions and response payloads structure.

There are no ideomatic changes to the API - all data formats, design rationale and principles outlined in the original [HTTP Delegated Routing IPIP](./ipip-0337.md) apply here.
lidel marked this conversation as resolved.
Show resolved Hide resolved

### User benefit

With the new APIs users can protect themselves from:
- a malicious actor spying on the user by observing the user to Content Router traffic and then downloading the same data;
- the new API is a first step towards fully private HTTP Delegated Routing protocol that will eliminate IPNI as centralised observers.

There are no other functional improvements.

### Compatibility
masih marked this conversation as resolved.
Show resolved Hide resolved

#### Backwards Compatibility
lidel marked this conversation as resolved.
Show resolved Hide resolved

The new API will be implemented in [go-delegated-routing](https://github.com/ipfs/boxo/tree/main/routing/http) and will not introduce any breaking changes.
ischasny marked this conversation as resolved.
Show resolved Hide resolved
The API will be released in a new minor version.

### Resources

- [IPIP-272 (double hashed DHT)](https://github.com/ipfs/specs/pull/373/)
- [ipni#5 (reader privacy in indexers)](https://github.com/ipni/specs/pull/5)

### Copyright

Copyright and related rights waived via [CC0](https://creativecommons.org/publicdomain/zero/1.0/).
107 changes: 107 additions & 0 deletions src/routing/http-routing-reader-privacy-v1.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,107 @@
---
title: Routing V1 HTTP Delegated Routing Reader Privacy Upgrade
description: >
This specification describes Delegated Routing Reader Privacy Upgrade. It's an
incremental improvement to HTTP Delegated Routing API and inherits all of its
formats and design rationale.
date: 2023-05-31
maturity: reliable
editors:
- name: Andrew Gillis
github: gammazero
- name: Ivan Schasny
github: ischasny
- name: Masih Derkani
github: masih
- name: Will Scott
github: willscott
order: 0
tags: ['routing', 'double hashing', 'privacy']
---

This specification describes a new HTTP API for Privacy Preserving Delegated Content Routing provider lookups. It's an extension to HTTP Delegated Routing API and inherits all of its formats and design rationale.

## API Specification

### Magic Values

All salts below are 64-bytes long, and represent a string padded with `\x00`.

- `SALT_DOUBLEHASH = bytes("CR_DOUBLEHASH\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00")`
- `SALT_ENCRYPTIONKEY = bytes("CR_ENCRYPTIONKEY\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00")`
masih marked this conversation as resolved.
Show resolved Hide resolved
- `SALT_NONCE = bytes("CR_NONCE\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00")`
lidel marked this conversation as resolved.
Show resolved Hide resolved

### Glossary

- **`enc`** is [AESGCM](https://en.wikipedia.org/wiki/Galois/Counter_Mode) encryption. The following notation will be used for the rest of the specification `enc(passphrase, nonce, payload)`.
- **`hash`** is [SHA256](https://en.wikipedia.org/wiki/SHA-2) hashing.
- **`||`** is concatenation of two values.
- **`deriveKey`** is deriving a 32-byte encryption key from a passphrase that is done as `hash(SALT_ENCRYPTIONKEY || passphrase)`.
- **`Nonce`** is a 12-byte nonce used as Initialization Vector (IV) for the AESGCM encryption. IPNI expects an explicit instruction to delete a record (comparing to the DHT where records expire).
Hence the IPNI server needs to be able to compare encrypted values without having to decrypt them as that would require a key that it is unaware of.
That means that the nonce has to be deterministically chosen so that `enc(passphrase, nonce, payload)` produces the same output for the same
`passpharase` + `payload` pair. Nonce must be calculated as `hash(SALT_NONCE || passphrase || len(payload) || payload)[:12]`, where `len(payload)` is
an 8-byte length of the `payload` encoded in Little Endian format. Choice of nonce is not enforced by the IPNI specification. The described approach will
be used while IPNI encrypts Advertisements on behaf of Publishers. However once Writer Privacy is implemented, the choice of nonce will be left up to the Publisher.
- **`CID`** is the [Content IDentifier](https://github.com/multiformats/cid).
- **`MH`** is the [Multihash](https://github.com/multiformats/multihash) contained in a `CID`. It corresponds to the
digest of a hash function over some content. `MH` is represented as a 32-byte array.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean CIDs with longer hash functions are truncated at 32-byte mark?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed the 32-byte length as indeed it can be longer.

- **`HASH2`** is a second hash over the multihash. Second Hashes must be of `Multihash` format with `DBL_SHA_256` codec.
The digest must be calculated as `hash(SALT_DOUBLEHASH || MH)`.
- **`ProviderRecord`** is a Provider Record as described in the [HTTP Delegated Routing Specification](http-routing-v1.md).
ischasny marked this conversation as resolved.
Show resolved Hide resolved
- **`ProviderRecordKey`** is a concatentation of `peerID || contextID`. There is no need for explicitly encoding lengths as they are
already encoded as a part of the multihash format.
- **`EncProviderRecordKey`** is `Nonce || enc(deriveKey(multihash), Nonce, ProviderRecordKey)`.
- **`HashProviderRecordKey`** is a hash over `ProviderRecordKey` that must be calculated as `hash(SALT_DOUBLEHASH || ProviderRecordKey)`.
- **`Metadata`** is free form bytes that can represent such information such as IPNI metadata.
lidel marked this conversation as resolved.
Show resolved Hide resolved
- **`EncMetadata`** is `Nonce || enc(deriveKey(ProviderRecordKey), Nonce, Metadata)`.

### API

Assembling a full `ProviderRecord` from the encrypted data will require multiple roundtrips to the server. The first one to fetch a list of `EncProviderRecordKey`s and then one per
`EncProviderRecordKey` to fetch `EncMetadata`. In order to reduce the number of roundtrips to one the client implementation should use the local libp2p peerstore for multiaddress discovery
and [libp2p multistream select](https://github.com/multiformats/multistream-select) for protocol negotiation.
lidel marked this conversation as resolved.
Show resolved Hide resolved

#### `GET /routing/v1/encrypted/providers/{HASH2}`
masih marked this conversation as resolved.
Show resolved Hide resolved
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is Hash2 encoded here?

  • Some specific base (e.g. 16, 32, 64)
  • Multibase prefixed, with a standard base being expected (16, 32, 64)
  • CIDv1 with 0x55 (i.e. raw) codec (and a standard multibase)
  • ...


##### Response codes

- `200` (OK): the response body contains 0 or more records
- `404` (Not Found): must be returned if no matching records are found
lidel marked this conversation as resolved.
Show resolved Hide resolved
- `422` (Unprocessable Entity): request does not conform to schema or semantic constraints

##### Response Body

```json
{
"EncProviderRecordKeys": [
"EBxdYDhd.....",
"IOknr9DK.....",
]
}
```

Where:

- `EncProviderRecordKeys` a list of base58 encoded `EncProviderRecordKey`;

#### `GET /routing/v1/encrypted/metadata/{HashProviderRecordKey}`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question about encoding as for HASH2


##### Response codes

- `200` (OK): the response body contains 1 record
- `404` (Not Found): must be returned if no matching records are found
- `422` (Unprocessable Entity): request does not conform to schema or semantic constraints

##### Response Body

```json
{
"EncMetadata": "EBxdYDhd....."
}
```

Where:

- `EncMetadatas` is base58 encoded `EncMetadata`;
lidel marked this conversation as resolved.
Show resolved Hide resolved