Skip to main content

GDPR Data Lifecycle

The eduID Wallet Matching Portal implements a comprehensive data lifecycle management system designed to comply with the European General Data Protection Regulation (GDPR). Every piece of identity data in the system has a defined lifecycle, from creation through active use to eventual deletion. This page describes the mechanisms that govern how data ages, how inactive data is detected and cleaned up, how deletion requests are handled, and how cryptographic techniques provide an ultimate guarantee of data destruction.

GDPR Data Lifecycle

Active Data

When identity data is first created -- whether through a wallet presentation, a reconciliation binding, or an auxiliary data write -- the created_at timestamp is set to the current time, and the last_used_at timestamp is initialized to the same value.

On every subsequent access, the last_used_at field is updated to the current timestamp. This applies to identity match lookups, binding retrievals, and auxiliary data reads. The consistent update of this field is the foundation of the inactivity detection system: it provides a reliable signal of whether a piece of data is still being actively used.

The last_used_at field is updated regardless of whether the access originates from a wallet presentation, an external API lookup, or an internal system operation. This ensures that any form of legitimate use resets the inactivity clock.

Inactivity Detection

The system runs a background job that periodically scans for identity link bindings that have not been accessed within a configurable inactivity threshold. The default threshold is two years (730 days), reflecting the assumption that an identity binding not used for two full years is likely no longer needed.

Configuration

sphereon:
app:
retention:
inactive-days: 730

The inactive-days property controls the number of days of inactivity after which a binding is considered stale. This value can be adjusted per deployment to meet specific regulatory or institutional requirements.

How It Works

The background cleanup job runs every 60 minutes and executes the findInactiveBindings query, which selects all bindings where the last_used_at timestamp is older than the current time minus the configured inactivity threshold. The query also filters for records that have not already been soft-deleted (i.e., deleted_at IS NULL).

For each inactive binding found, the system performs a soft delete: it sets the deleted_at timestamp to the current time and the deletion_reason to INACTIVE. The corresponding identity_match record is also checked; if all bindings for a given match have been soft-deleted, the match itself is soft-deleted as well.

An audit event of type binding_soft_deleted is logged for each affected binding, providing a traceable record of the automated cleanup action.

Soft Delete

Soft deletion is the first phase of the two-phase deletion process. When a record is soft-deleted, it remains physically present in the database but is excluded from all standard queries. The deleted_at timestamp records when the deletion occurred, and the deletion_reason field documents why.

Deletion Reasons

ReasonTrigger
INACTIVEAutomated inactivity detection exceeded the configured threshold.
GDPR_ERASUREExplicit deletion request via the external API (Article 17).
ADMIN_REQUESTManual administrative action.
KEY_MIGRATIONRecord purged during a key rotation migration.

Retention Period

Soft-deleted records are retained in the database for a configurable period before being permanently removed. The default retention period is 30 days.

sphereon:
app:
retention:
soft-delete-days: 30

During this retention window, the data is not accessible through normal application operations, but it remains available for recovery if the deletion was made in error. This is particularly important for automated inactivity deletions, where a temporary period of non-use (such as a year-long sabbatical) might be mistaken for permanent abandonment.

Hard Delete

After the soft-delete retention period has elapsed, the background cleanup job identifies candidates for permanent deletion. Two queries are used:

  • findSoftDeletedMatchesPastRetention: Finds identity match records where deleted_at is older than the current time minus the soft-delete retention period.
  • findSoftDeletedBindingsPastRetention: Finds identity link binding records under the same criteria.

For each candidate, the system performs a hard delete, which permanently removes the record from the database. This is an irreversible operation; once a record is hard-deleted, it cannot be recovered from the database.

The hard delete process follows a specific order to respect foreign key constraints:

  1. First, all bindings associated with a match are hard-deleted.
  2. Then, the match record itself is hard-deleted.
  3. Any auxiliary data associated with the identity is also hard-deleted.

An audit event is logged for each hard-deleted record, ensuring that even though the data itself is gone, there is an immutable record that the deletion occurred.

GDPR Article 17: Right to Erasure

The GDPR grants data subjects the right to request the erasure of their personal data (Article 17, commonly known as the "right to be forgotten"). The eduID Wallet Matching Portal implements this through a dedicated external API endpoint.

Erasure API

DELETE /api/v1/identities/{internalIdentityId}

When this endpoint is called with a valid internal identity ID, the system performs an immediate and complete deletion of all data associated with that identity:

  1. All identity_match records for the given internal_identity_id are permanently deleted (hard delete, not soft delete).
  2. All identity_link_binding records associated with those matches are permanently deleted.
  3. All auxiliary_data records for the given identity_id are permanently deleted.

Unlike the inactivity-based cleanup process, GDPR erasure does not go through a soft-delete phase. The deletion is immediate and permanent, reflecting the urgency and legal weight of an Article 17 request.

Audit Trail

An audit event of type gdpr_erasure is logged with the following details:

  • The correlation_id linking the erasure to the API request that triggered it.
  • The subject_hash (HMAC-hashed, not plaintext) of the erased identity.
  • A detail JSON object recording the number of match records, bindings, and auxiliary data records that were deleted.

This audit record serves as evidence that the erasure was performed, satisfying the GDPR requirement to demonstrate compliance. The audit record itself contains no plaintext personal data -- only hashed identifiers and operational counts.

Important Considerations

  • Irreversibility: GDPR erasure is irreversible. There is no undo mechanism. The API should be called only after appropriate authorization checks have been performed by the calling system.
  • Cascading deletion: The erasure cascades across all three data tables. There is no option for partial deletion; the entire identity is removed.
  • Concurrent access: If a wallet holder happens to be in the middle of a session when their erasure is processed, the session will fail gracefully. The reconciliation session itself is not deleted by the erasure endpoint (sessions are cleaned up by their own expiration mechanism), but any binding creation or update that the session attempts after the erasure will find no match record and will fail.

Auxiliary Data Expiration

Auxiliary data records support an optional expires_at timestamp. When set, this timestamp indicates that the data is only valid until a specific point in time. This is useful for time-limited data such as semester-specific enrollment information, temporary access grants, or certification validity periods.

Background Cleanup

A background job periodically executes the findExpiredAux query, which identifies auxiliary data records where expires_at is in the past. Expired records are permanently deleted.

# Auxiliary data expiration is handled by the same cleanup job
# that manages session and inactive binding cleanup.

Use Cases

CategoryExample ExpirationRationale
semester_enrollmentEnd of academic semesterEnrollment status is only valid for the duration of the semester.
temp_access_grant24 hours after issuanceTemporary access tokens should not persist beyond their validity.
certification_validityCertificate expiry dateProfessional certifications have defined validity periods.
course_registrationEnd of course periodCourse-specific data is irrelevant after the course concludes.

Auxiliary data without an expires_at value is retained indefinitely (subject to the identity-level inactivity detection and GDPR erasure mechanisms).

Session Cleanup

Reconciliation sessions are inherently ephemeral. Each session has an expires_at timestamp set at creation (typically 5 to 15 minutes after creation, depending on configuration). A background job runs every 5 minutes to delete expired sessions.

sphereon:
app:
session-cleanup:
interval-minutes: 5

The session cleanup process executes the deleteExpiredSessions query, which permanently removes all sessions where expires_at is in the past. This is a hard delete; there is no soft-delete phase for sessions, as they are transient by nature and contain no long-term value.

Session cleanup is particularly important because sessions may contain sensitive temporary data such as PKCE code verifiers and OIDC nonces. Prompt deletion of expired sessions limits the window during which this data is present in the database.

Crypto-Shredding

Crypto-shredding is the ultimate data destruction mechanism. All sensitive data in the database is encrypted with keys managed by the KMS (Key Management Service). If a key is destroyed, all data encrypted with that key becomes permanently irrecoverable, even if the database itself is fully intact.

How It Works

The system uses three domain-separated keys:

KeyPurposeTables Affected
Key AHMAC hashing of wallet/holder identifiersidentity_match.identifier_hash, identity_link_binding.holder_identifier_hash
Key BHMAC hashing of institution identifiersidentity_match.identifier_hash (for institution-type identifiers), identity_link_binding.institution_identifier_hash
Key CAES-256-GCM encryption of sensitive dataidentity_link_binding.encrypted_institution_id, identity_link_binding.persisted_attributes_envelope, reconciliation_session.encrypted_identity, auxiliary_data.encrypted_payload

Destroying Key C makes all encrypted institution identifiers, persisted attributes, session identity data, and auxiliary data payloads permanently unreadable. The HMAC hashes (protected by Keys A and B) would remain, but they are one-way functions and cannot be reversed to recover the original identifiers.

Destroying all three keys renders the entire database contents meaningless: the hashes cannot be correlated with any external identifiers, and the ciphertext cannot be decrypted. The database would contain only opaque strings with no recoverable information.

When to Use Crypto-Shredding

Crypto-shredding is an extreme measure and should be considered only in scenarios such as:

  • Tenant decommissioning: When a tenant is being permanently removed from the system, destroying their keys provides a cryptographic guarantee that no residual data can be recovered.
  • Regulatory requirement: Some data protection frameworks may require the ability to demonstrate that data has been rendered permanently inaccessible, beyond what database deletion alone can provide.
  • Breach response: In the event of a suspected database compromise, destroying the encryption keys immediately neutralizes any exfiltrated data.

Configuration Reference

The following table summarizes all data lifecycle configuration properties:

PropertyDefaultDescription
retention.inactive-days730Number of days of inactivity before a binding is soft-deleted.
retention.soft-delete-days30Number of days a soft-deleted record is retained before hard deletion.
session-cleanup.interval-minutes5How often the session cleanup background job runs.
Inactivity check interval60 minutesHow often the inactive binding cleanup job runs (hardcoded).
Session TTLConfigured per providerHow long a reconciliation session remains valid before expiring.
Auxiliary data expires_atSet per recordOptional per-record expiration for auxiliary data.