Identity Matching

Identity matching is the core mechanism that links external identifiers -- wallet holder keys, institutional subject IDs (the institution-scoped eduID, the eduPersonPrincipalName), and other credentials -- to internal identity IDs without storing any plaintext identifiers in the database. When a user presents a trusted eduID credential through an OID4VP wallet presentation, or authenticates through a federated OIDC login via SURFconext, the system converts their identifier into an HMAC-SHA256 hash using a domain-separated key, then looks up that hash in the database. If a match is found, the system resolves the user's internal identity and decrypts the cached canonical attributes associated with it.

It is important to note that matching only occurs for credentials that the system trusts. The DCQL query configuration defines which credential types and issuers are accepted. Only a valid eduID credential from a trusted issuer triggers the matching process. The wallet does not just present "a key" -- it presents a verified eduID credential that contains identity attributes, and the system uses both the credential's holder key (for matching) and its attributes (like the eduPersonPrincipalName) in the reconciliation process.

This approach means the database is cryptographically useless to an attacker. Even with full read access to every table, an adversary sees only one-way HMAC hashes and AES-256-GCM ciphertext. There are no plaintext wallet keys, no plaintext institutional identifiers, and no plaintext personal attributes stored anywhere in the matching tables. The cryptographic keys that produced these hashes and ciphertexts reside in a Hardware Security Module (HSM) or Key Management Service (KMS), entirely separate from the application database.

Why HMAC Instead of Plaintext

The decision to use HMAC-SHA256 rather than storing identifiers in plaintext -- or even using bare SHA-256 hashes -- is driven by a careful threat model analysis.

Plaintext storage is vulnerable to data breach. If an attacker gains read access to the database (through SQL injection, a misconfigured backup, or a compromised admin account), they immediately obtain every user's wallet public key and institutional identifier. This enables impersonation, tracking, and correlation attacks across systems.

Bare SHA-256 hashes are vulnerable to rainbow table attacks. While SHA-256 is a strong hash function, it is deterministic and keyless. An attacker who knows the format of the input (for example, a JWK thumbprint or a SURFconext subject identifier) can precompute hashes for a large set of plausible inputs and match them against the database. This is especially feasible when the input space is structured, as it is with OIDC subject identifiers.

HMAC-SHA256 eliminates rainbow table attacks because it requires a secret key. Without the key, an attacker cannot compute the hash for any input, no matter how much they know about the input format. The key is not stored in the database or in the application configuration files -- it lives in the HSM or KMS, accessible only through authenticated API calls.

Domain separation prevents cross-key correlation. The system uses separate HMAC keys for different identifier types (Key A for wallet holder keys, Key B for institutional identifiers). This means that even if an attacker somehow obtained Key A, they could not use it to reverse or correlate Key B hashes. Each domain is cryptographically isolated.

Three-Key Architecture

The matching system uses three distinct cryptographic keys, each serving a different purpose. This separation of concerns ensures that compromising one key does not compromise the entire system.

Key A -- Holder HMAC Key (`reconciliation:holder`, HMAC-SHA256)

Key A is used to hash wallet public key fingerprints. When a holder presents a Verifiable Presentation (VP) through the OID4VP protocol, the system extracts the holder's public key from the VP proof and computes its JWK Thumbprint according to RFC 7638. This thumbprint is a deterministic, canonical representation of the key that is independent of serialization format. The thumbprint is then passed through HMAC-SHA256 with Key A, and the resulting hash is stored as holder_identifier_hash in the database.

This means the system never stores the actual public key. It stores only a keyed hash of the key's thumbprint, which cannot be reversed without access to both the original key material and the HMAC secret.

Key B -- Institution HMAC Key (`reconciliation:institution`, HMAC-SHA256)

Key B is used to hash institutional identifiers. When a user completes a reconciliation flow through SURFconext (or another OIDC provider), the system receives the sub claim from the OIDC token. This subject identifier is passed through HMAC-SHA256 with Key B, and the resulting hash is stored as institution_identifier_hash.

Using a separate key from Key A ensures that wallet hashes and institutional hashes are cryptographically independent. An attacker who somehow obtained Key A could not use it to look up or correlate institutional identifiers, and vice versa.

Key C -- Encryption Key (`reconciliation:encryption`, AES-256-GCM)

Key C is fundamentally different from Keys A and B. While the HMAC keys produce irreversible hashes (used for lookup), Key C performs reversible encryption using AES-256-GCM. This key encrypts data that must be recoverable: the actual eduID identifier, canonical claims (name, email, affiliation), and auxiliary metadata.

Key C is used only when data must be read back in its original form. The encrypted payloads include the persisted_attributes_envelope (the cached canonical claims bundle) and encrypted_institution_id (the plaintext institutional identifier needed for token generation). On read, the system decrypts these values using Key C and projects the attributes into the OIDC response.

Data Model Overview

The matching system uses a two-table design that separates fast lookup from detailed storage.

`identity_match` -- The Fast-Lookup Index

The identity_match table is the primary lookup structure. Given an HMAC-hashed identifier and its type, it returns the internal identity ID in a single indexed query. This table supports multiple identifier types per identity, meaning a single person can be looked up by their wallet key, their institutional subject ID, their email address, or any other registered identifier type.

The table is intentionally lightweight. It contains only the hashed identifier, the identifier type, the resolved internal identity ID, and metadata for key versioning, timestamps, and soft deletion. No encrypted payloads, no canonical attributes, no assurance metadata. This keeps the index small and the lookups fast.

`identity_link_binding` -- The Full Dossier

The identity_link_binding table stores everything else: encrypted canonical attributes from the reconciliation, assurance metadata about how the identity was verified, version tracking for staleness detection, and both holder and institution hashes for bidirectional lookup. This table is linked to identity_match via a match_id foreign key.

When a known wallet holder returns, the system first queries identity_match to resolve their identity, then loads the associated identity_link_binding to decrypt the cached attributes and project them into the token. This two-step process keeps the common lookup path (identity resolution) fast while still providing access to the full attribute set when needed.

Lookup Flow

The following pseudocode illustrates how an external identifier is resolved to an internal identity with decrypted attributes.

1. Receive external identifier (e.g., wallet public key from VP proof)
2. Compute fingerprint = JWK_Thumbprint(public_key)  [RFC 7638]
3. Compute identifier_hash = HMAC-SHA256(fingerprint, Key_A)
4. Query: SELECT * FROM identity_match
     WHERE tenant_id = :tenant
       AND identifier_hash = :identifier_hash
       AND identifier_type = 'KEY'
       AND deleted_at IS NULL
5. If found -> load identity_link_binding by match_id
6. Decrypt persisted_attributes_envelope using Key C (AES-256-GCM)
7. Validate canonical_schema_version against current configuration
8. Return resolved identity with decrypted canonical attributes

Step 3 is where the privacy protection happens: the raw public key is never sent to the database. Only the HMAC hash is used in the query, and the database index is built on these hashes. If dual-read is active during a key rotation, step 3 may also compute a hash with the previous key version and attempt a second lookup if the first returns no results.

Step 7 is a staleness check. If the schema version stored in the binding does not match the currently configured schema version, the system knows the cached attributes may be outdated and can trigger a re-materialization on the next reconciliation opportunity.

Identifier Types

The system supports multiple identifier types, each representing a different way a user can be identified.

Type	Description	When Used
`KEY`	Wallet holder key fingerprint (JWK thumbprint per RFC 7638)	OID4VP wallet authentication. The holder's public key from the VP proof is thumbprinted and HMAC-hashed with Key A.
`SUBJECT_ID`	OIDC subject identifier (`sub` claim)	After reconciliation completes. The `sub` claim from SURFconext is HMAC-hashed with Key B and stored as a reverse-lookup path.
`EMAIL`	Email address hash	Alternative identifier matching when email is the only available correlating attribute.
`DID`	Decentralized Identifier	DID-based authentication flows where the user presents a DID rather than a raw public key.
`CLAIM_TUPLE`	Composite hash of multiple claims	Multi-attribute matching where no single claim is sufficient for unique identification. Multiple claims are concatenated in canonical order and hashed together.

Each identifier type uses its own HMAC key (Key A for holder-side identifiers like KEY and DID, Key B for institution-side identifiers like SUBJECT_ID). The EMAIL and CLAIM_TUPLE types use whichever key corresponds to the context in which they are being matched.

Multi-Match Design

A single internal identity can have multiple matches pointing to it from different identifier types. This is not an edge case -- it is the expected outcome of a successful reconciliation.

Consider a first-time wallet user who completes the reconciliation flow. Before reconciliation, the system has only a KEY match from the wallet presentation. After reconciliation links the wallet identity to an institutional identity, the system creates a second SUBJECT_ID match pointing to the same internal_identity_id. The result is two matches for one identity:

Match 1 (type KEY): HMAC(wallet_thumbprint, Key_A) maps to internal_identity_id = abc-123
Match 2 (type SUBJECT_ID): HMAC(surfconext_sub, Key_B) maps to internal_identity_id = abc-123

This enables bidirectional lookup. When the user returns with their wallet, the system finds them through Match 1 (the KEY path). When an external system queries the REST API with an institutional identifier, the system finds them through Match 2 (the SUBJECT_ID path). Both paths resolve to the same internal identity and the same cached attributes.

Over time, additional identifier types may be added to the same identity. For example, if the user later authenticates with a DID, a third match of type DID is created, still pointing to abc-123. The identity accumulates matches as the user interacts with the system through different channels, building a richer but still privacy-preserving identity graph.

Why HMAC Instead of Plaintext​

Three-Key Architecture​

Key A -- Holder HMAC Key (reconciliation:holder, HMAC-SHA256)​

Key B -- Institution HMAC Key (reconciliation:institution, HMAC-SHA256)​

Key C -- Encryption Key (reconciliation:encryption, AES-256-GCM)​

Data Model Overview​

identity_match -- The Fast-Lookup Index​

identity_link_binding -- The Full Dossier​

Lookup Flow​

Identifier Types​

Multi-Match Design​