Database Overview
PostgreSQL serves as the single persistence layer for the Auth Bridge service in the eduID Wallet Matching Portal. The database schema is deliberately compact, consisting of seven carefully designed tables that together cover the full spectrum of identity matching, reconciliation workflow state, auxiliary institution metadata, cryptographic key migration tracking, and regulatory audit logging.
Every piece of sensitive data stored in these tables is protected at rest. External identifiers are never stored in plaintext; instead, they are transformed through HMAC-SHA256 hashing before being written to any column. Sensitive attributes such as institution identifiers and persisted identity attributes are encrypted using AES-256-GCM before storage. This means that even in the event of a full database breach, an attacker would find only opaque hashes and ciphertext, with no way to recover the original values without simultaneous access to the corresponding cryptographic keys held in the KMS.
Entity Relationship Diagram
The following diagram illustrates the relationships between all seven tables, their primary keys, foreign key references, and the flow of data through the system.
Table Summary
The seven tables each serve a distinct role in the system. The following summary provides a quick reference for what each table holds and whether it contains cryptographically protected data.
| Table | Purpose | Sensitive Data |
|---|---|---|
identity_match | Hash-to-identity lookup index. Maps HMAC-hashed external identifiers to internal identity IDs, enabling fast lookups without exposing the original identifier values. | HMAC-SHA256 hashes of external identifiers |
identity_link_binding | Encrypted holder-institution mapping. Records the binding between a wallet holder and an institutional identity, including encrypted institution identifiers and persisted attribute envelopes. | HMAC hashes (Key A and Key B) plus AES-256-GCM ciphertext (Key C) |
reconciliation_session | IDV session state machine. Tracks the lifecycle of each identity verification reconciliation flow, from authorization URL generation through token exchange to completion or failure. | AES-256-GCM encrypted identity data |
reconciliation_provider | OIDC provider configurations. Stores the configuration for each upstream identity provider used during reconciliation, including client IDs and attribute mappings. | None (configuration data only) |
auxiliary_data | Encrypted institution metadata. Holds supplementary data associated with an identity, such as enrollment information or institutional attributes, encrypted at rest. | AES-256-GCM ciphertext (Key C) |
key_migration_history | Key rotation tracking. Records the progress and outcome of cryptographic key migration operations, enabling operators to monitor and audit key rotation events. | None (operational metadata only) |
audit_event | Append-only audit trail. Captures every significant operation performed by the system, including identity lookups, binding creation, GDPR erasure requests, and key migrations. | HMAC-hashed subject identifiers |
SqlDelight: Type-Safe Database Access
The database schema is defined using SqlDelight .sq files, which are compiled at build time into type-safe Kotlin code. This approach eliminates raw SQL strings from the application codebase entirely, providing several important benefits:
- Compile-time SQL validation: Every SQL statement is checked against the schema at compile time. If a query references a column that does not exist or uses an incompatible type, the build fails immediately rather than at runtime.
- Generated type-safe API: SqlDelight generates Kotlin data classes for query results and type-safe function signatures for all named queries. This means the compiler enforces that all parameters are supplied and that result columns are accessed with their correct types.
- IDE auto-completion: Because the generated code is standard Kotlin, developers get full auto-completion, navigation, and refactoring support in their IDE for all database operations.
- Schema as source of truth: The
.sqfiles serve as the canonical schema definition. There is no separate migration framework to keep in sync; the schema files are the migrations.
Connection Pool Configuration
The Auth Bridge service maintains a connection pool to PostgreSQL, configurable through the database.max-pool-size property in application.yml. The default pool size is 5 connections, which is appropriate for typical development and light production workloads.
sphereon:
app:
database:
url: "jdbc:postgresql://localhost:5432/auth_bridge"
username: "auth_bridge"
password: "${env:DB_PASSWORD}"
max-pool-size: 5
The connection pool is managed through the PostgreSQL JDBC driver as configured by SqlDelight's JDBC driver adapter. For production environments with higher concurrency requirements, the pool size should be tuned based on the expected number of concurrent reconciliation sessions and external API requests.
Multi-Tenancy
The database is designed for multi-tenant operation from the ground up. Every table that holds tenant-specific data includes a tenant_id column, and every query that accesses tenant-specific data filters on this column. This design ensures strict data isolation between tenants at the database level.
The multi-tenancy model extends beyond simple row filtering. The HMAC hashing scheme incorporates the tenant ID into the hash computation, which means that even if two tenants happen to have a user with the same external identifier, the resulting hash values will be different. This prevents any possibility of cross-tenant correlation through hash comparison.
Tenant isolation is enforced at the application layer through the query design. Every named SqlDelight query that touches tenant data requires a tenant_id parameter, making it impossible to accidentally omit the tenant filter. There is no "superuser" query that bypasses tenant isolation in the application code.
Further Reading
The remaining pages in this section provide detailed information about each aspect of the database:
- Schema Reference -- Complete DDL for all seven tables, including column types, constraints, indexes, and encryption annotations. This is the definitive reference for understanding what each column holds and how tables relate to one another.
- GDPR Data Lifecycle -- Data retention policies, soft delete and hard delete mechanisms, inactive binding cleanup, GDPR Article 17 erasure implementation, and crypto-shredding. This page explains how data flows through its lifecycle from creation to eventual deletion.