fix: redact pending primary email before retirement delete#229
fix: redact pending primary email before retirement delete#229ktyagiapphelix2u wants to merge 3 commits intorelease-ulmofrom
Conversation
There was a problem hiding this comment.
Pull request overview
This PR fixes a privacy gap in the account retirement flow by ensuring PendingEmailChange data is redacted before deletion, preventing downstream soft-deleted replicas (e.g., Snowflake via Fivetran) from retaining raw pending email and activation key values.
Changes:
- Added
PendingEmailChange.redact_pending_email_by_user_value(...)to overwritenew_emailandactivation_keyprior to removal. - Updated the retirement endpoint flow to call redaction before
delete_by_user_value. - Added/updated tests to verify redaction behavior and that redaction occurs before deletion.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
openedx/core/djangoapps/user_api/accounts/views.py |
Calls the new redaction helper before deleting PendingEmailChange records during retirement. |
openedx/core/djangoapps/user_api/accounts/tests/test_retirement_views.py |
Adds assertions and a dedicated test to validate redaction happens before deletion in the retirement flow. |
common/djangoapps/student/tests/test_models.py |
Adds unit tests for the new redaction helper’s behavior and no-op behavior when no record exists. |
common/djangoapps/student/models/user.py |
Implements the new redaction helper and updates the PII annotation to reflect “redacted then deleted”. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| for record in records_matching_user_value: | ||
| record.new_email = get_retired_email_by_email(record.new_email) | ||
| record.activation_key = uuid.uuid4().hex | ||
| record.save(update_fields=['new_email', 'activation_key']) |
There was a problem hiding this comment.
@ktyagiapphelix2u, Do you need to save activation_key?
| # Retire misc. models that may contain PII of this user. | ||
| # Redact pending primary email fields before delete because | ||
| # downstream replication can preserve soft-deleted snapshots. | ||
| PendingEmailChange.redact_pending_email_by_user_value(user, field="user") |
There was a problem hiding this comment.
@ktyagiapphelix2u , Although this may not matter much but reaction value is user here, can you please confirm?
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| pending_email_before_retirement = PendingEmailChange.objects.get(user=self.test_user).new_email | ||
| expected_retired_pending_email = get_retired_email_by_email(pending_email_before_retirement) | ||
|
|
||
| def _assert_redacted_then_delete(value, field): | ||
| pending_record = PendingEmailChange.objects.get(user=self.test_user) | ||
| assert pending_record.new_email == expected_retired_pending_email |
There was a problem hiding this comment.
The ordering test only verifies new_email is redacted before delete_by_user_value runs. Since the motivating issue includes activation_key retention in soft-deleted replicas, extend the assertion here to verify activation_key is also redacted/changed before deletion once the model helper is updated.
| pending_email_before_retirement = PendingEmailChange.objects.get(user=self.test_user).new_email | |
| expected_retired_pending_email = get_retired_email_by_email(pending_email_before_retirement) | |
| def _assert_redacted_then_delete(value, field): | |
| pending_record = PendingEmailChange.objects.get(user=self.test_user) | |
| assert pending_record.new_email == expected_retired_pending_email | |
| pending_email_record = PendingEmailChange.objects.get(user=self.test_user) | |
| pending_email_before_retirement = pending_email_record.new_email | |
| activation_key_before_retirement = pending_email_record.activation_key | |
| expected_retired_pending_email = get_retired_email_by_email(pending_email_before_retirement) | |
| def _assert_redacted_then_delete(value, field): | |
| pending_record = PendingEmailChange.objects.get(user=self.test_user) | |
| assert pending_record.new_email == expected_retired_pending_email | |
| assert pending_record.activation_key != activation_key_before_retirement |
| @classmethod | ||
| def redact_pending_email_by_user_value(cls, value, field): | ||
| """ | ||
| Redact pending email change fields for records matching ``field=value``. | ||
|
|
||
| This method is intended for retirement flows where downstream replication | ||
| may keep soft-deleted snapshots of these rows. | ||
| """ | ||
| filter_kwargs = {field: value} | ||
| records_matching_user_value = cls.objects.filter(**filter_kwargs) | ||
|
|
||
| if not records_matching_user_value.exists(): | ||
| return False | ||
|
|
||
| for record in records_matching_user_value: | ||
| record.new_email = get_retired_email_by_email(record.new_email) | ||
| record.save(update_fields=['new_email']) | ||
|
|
||
| return True |
There was a problem hiding this comment.
redact_pending_email_by_user_value only redacts new_email, but leaves activation_key unchanged. The PR description/incident explains Snowflake soft-deleted snapshots can retain both pending email and activation key values, so keeping activation_key intact still leaks a sensitive token. Suggest redacting activation_key as well (e.g., replace with a new random/unique value to satisfy the unique=True constraint) and update the method/docstring accordingly.
Summary
This change addresses a privacy issue in the retirement flow for users who have a pending primary email change.
Problem
When a user retires with an active row in student_pendingemailchange, LMS deletes that row.
Because Fivetran uses soft delete behavior, the row remains in Snowflake with _fivetran_deleted = true.
Before this fix, the soft-deleted Snowflake row could still contain raw pending email and activation key values.
Root Cause
Retirement path deleted PendingEmailChange directly without redacting sensitive fields first.
What Changed
Added a model helper to redact PendingEmailChange fields for a user before deletion.
Updated retirement flow to call redact first, then delete.
Added tests to verify redaction behavior and ordering.
Updated inline comments and PII annotation text to explicitly state redacted then deleted behavior.
Behavior Before
User retires with pending primary email
LMS deletes pending row
Snowflake soft-deleted row may still show raw pending email
Behavior After
User retires with pending primary email
LMS first redacts pending row values
LMS deletes row
Snowflake soft-deleted row contains redacted values only
Ticket & Reference
https://2u-internal.atlassian.net/browse/BOMS-499