Skip to content

feat: replace user identity strings with user IDs#38775

Open
ttak-apphelix wants to merge 17 commits into
openedx:masterfrom
ttak-apphelix:ttak/BOMS-641
Open

feat: replace user identity strings with user IDs#38775
ttak-apphelix wants to merge 17 commits into
openedx:masterfrom
ttak-apphelix:ttak/BOMS-641

Conversation

@ttak-apphelix

@ttak-apphelix ttak-apphelix commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

Update application logging to prevent exposure of customer identity information in logs when SQUELCH_PII_IN_LOGS is enabled. Log records that currently include usernames, email addresses, or other user-identifying strings should instead use non-PII identifiers (for example, numeric user IDs) where appropriate.
This change should be applied consistently across the platform to reduce PII exposure in logs

Private JIRA ticket:
https://2u-internal.atlassian.net/browse/BOMS-641

@ttak-apphelix ttak-apphelix requested review from a team as code owners June 17, 2026 12:46
log.info(f'The user "{username}" has been added to the retirement pipeline \
by "{request.user}"')
if settings.FEATURES['SQUELCH_PII_IN_LOGS']:
log.info('A user has been added to the retirement pipeline')

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@robrap Should we keep request.user in this logger? request.user is the user who triggered the bulk retirement pipeline.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably just use the request user's id instead.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

Comment thread lms/djangoapps/instructor/views/api.py Outdated
})
log.warning('email %s already exist', email)
if settings.FEATURES['SQUELCH_PII_IN_LOGS']:
log.warning('email [REDACTED] already exist')

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the logger warning is a necessity to know which user's email is being referred here , then as a suggestion we can use it as:

log.warning(
    'email for user_id=%s already exists',
    user.id,
)

Just a suggestion upto you to decide, can we use this or the current change is enough?

Other places also may need some attention to see if user id can be used in the logger, so take a look at those also.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

Comment thread cms/djangoapps/course_creators/admin.py Outdated
except SMTPException:
log.warning("Failure sending 'pending state' e-mail for %s to %s", user.email, studio_request_email)
if settings.FEATURES['SQUELCH_PII_IN_LOGS']:
log.warning("Failure sending 'pending state' e-mail for user ID %s to [REDACTED]", user.id)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. I'd keep away from using "REDACTED". That's a nice automated solution if we were trying to have Datadog redact based on regex, or something. But when updating code, we have the ability to spell things out in english in the message.
  2. Also, I was wondering what studio_request_email was and how we'd spell that out. It turns out it is just a system email coming from a setting, so there is no reason to redact that email in the first place. Something this brings up, if we choose to not redact an email (something that looks like PII), what's a good way to annotate that? For now, you could just add a comment like:
# studio_request_email is a system email address, not PII, which can safely be logged.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

@robrap robrap left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't finish, so I might have missed other things, but I'd rather go through after clean-up of the types of issues I already pointed out. Thank you.

Comment thread cms/djangoapps/course_creators/admin.py Outdated
user.email_user(subject, message, studio_request_email)
except: # pylint: disable=bare-except
log.warning("Unable to send course creator status e-mail to %s", user.email)
except: # lint-amnesty, pylint: disable=bare-except

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed, lint-amnesty should never be add manually. It was used in the past, and means that the disable pragma was necessary to make pylint happy at one point in time, but could possibly be fixed. If the disable pragma is what is wanted permanently, then lint-amnesty should actually be removed. There is an OEP around this that was since deleted, but this should be documented somewhere at some point. Maybe I'll open a ticket.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ttak-apphelix: I was going to document something, but then I realized there are only 3 usages left: https://github.com/search?q=repo%3Aopenedx%2Fopenedx-platform%20lint-amnesty&type=code. I imagine there were zero at some point, but they returned via copy/paste somehow.

Was there a doc or something you saw that had you add #lint-amnesty? I wonder if there is something we can correct, and if we can help get rid of the remaining usages, even if on a separate PR?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed lint-amnesty from here. Will create a separate PR to remove lint-amnesty across the project.

Comment thread cms/djangoapps/course_creators/admin.py Outdated
except: # pylint: disable=bare-except
log.warning("Unable to send course creator status e-mail to %s", user.email)
except: # lint-amnesty, pylint: disable=bare-except
if settings.FEATURES['SQUELCH_PII_IN_LOGS']:

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't we help deprecate the FEATURES dict? Shouldn't we be getting this setting from the top-level now?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, Updated

log.warning("User %s failed to enroll in non-existent course %s", user.id, str(course_key))
else:
log.warning("User %s failed to enroll in non-existent course %s", user.username, str(course_key))
raise NonExistentCourseError from err # lint-amnesty, pylint: disable=raise-missing-from

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I presume you resolved the raise-missing-from linting issue, so you probably don't need the pragma?

Suggested change
raise NonExistentCourseError from err # lint-amnesty, pylint: disable=raise-missing-from
raise NonExistentCourseError from err

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, removed pragma.

Comment on lines +757 to +770
if settings.FEATURES['SQUELCH_PII_IN_LOGS']:
log.warning(
"User %s failed to enroll in course %s because enrollment is closed (can_upgrade=%s).",
user.id,
str(course_key),
can_upgrade,
)
else:
log.warning(
"User %s failed to enroll in course %s because enrollment is closed (can_upgrade=%s).",
user.username,
str(course_key),
can_upgrade,
)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure how many places this would apply, but the following has a lot less redundancy, and thus changes to the log message are more likely to remain consistent.

Suggested change
if settings.FEATURES['SQUELCH_PII_IN_LOGS']:
log.warning(
"User %s failed to enroll in course %s because enrollment is closed (can_upgrade=%s).",
user.id,
str(course_key),
can_upgrade,
)
else:
log.warning(
"User %s failed to enroll in course %s because enrollment is closed (can_upgrade=%s).",
user.username,
str(course_key),
can_upgrade,
)
if settings.FEATURES['SQUELCH_PII_IN_LOGS']:
user_identifier_for_log = user.id
else:
user_identifier_for_log = user.username
log.warning(
"User %s failed to enroll in course %s because enrollment is closed (can_upgrade=%s).",
user_identifier_for_log,
str(course_key),
can_upgrade,
)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated across the PR

@@ -829,7 +863,10 @@ def enroll_by_email(cls, email, course_id, mode=None, ignore_errors=True):
return cls.enroll(user, course_id, mode)
except User.DoesNotExist:
err_msg = "Tried to enroll email {} into course {}, but user not found"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Not sure why err_msg was broken out and not using %s, but as written, this variable belongs in the else where it is used (if the message were left redundant).
  2. That said, what if you used email_for_log = email and email_for_log = '[REDACTED]', and have a single log?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactored to use correct syntax

raise ValueError(_('Unable to send email activation link. Please try again later.')) # pylint: disable=raise-missing-from # noqa: B904
if settings.FEATURES['SQUELCH_PII_IN_LOGS']:
log.error(
'Unable to send email activation link for user %s from a redacted address',

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think redacted address is misleading. It makes it sound like the address in the DB is redacted. I'd probably go with the simpler:

Suggested change
'Unable to send email activation link for user %s from a redacted address',
'Unable to send email activation link for user %s.',

But you could also use:

Suggested change
'Unable to send email activation link for user %s from a redacted address',
'Unable to send email activation link for user %s from email [REDACTED].',
  1. What is the from_address? Is this just a system address, and does it not even need to be redacted?
  2. Are there other places where we used lowercase "redacted", where the meaning might be misleading?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a system address. Reverted the message

try:
ace.send(msg)
except Exception: # pylint: disable=broad-except
log.warning('Unable to send confirmation email to old address', exc_info=True)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Why did this need to change? It wasn't including any user arguments?
  • Note: If you remove this, you could probably remove the # pylint: disable=too-many-statements that was added earlier as well.
  1. Are there other cases like this?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. My bad, reverted the changes
  2. Removed # pylint: disable=too-many-statements
  3. NO

except User.DoesNotExist:
log.exception(f'The user "{username}" does not exist.')
if settings.FEATURES['SQUELCH_PII_IN_LOGS']:
log.exception('A user does not exist for bulk retirement.')

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One idea is to start the loop with for index, username in enumerate(usernames_to_retire): (note: this is untested code), and update the message to something like:

Suggested change
log.exception('A user does not exist for bulk retirement.')
log.exception('Bulk retirement user at index %s does not exist.', index)

You could possibly add the index to the INFO log for consistency as well.

Alternatively, since we have all the other INFO logs, we could just use username_in_log='[REDACTED]', and someone could probably figure out which inputs weren't found.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

if not user.has_usable_password():
log.info(f'Goal Reminder User is disabled {user.username} course {goal.course_key}')
if settings.FEATURES['SQUELCH_PII_IN_LOGS']:
log.info(f'Goal Reminder User is disabled user ID {user.id} course {goal.course_key}')

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternative example with less redundancy:

Suggested change
log.info(f'Goal Reminder User is disabled user ID {user.id} course {goal.course_key}')
log.info('Goal Reminder User is disabled user %s course %s', user_identifier_for_log, goal.course_key)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

if not is_course_passed(student, course):
log.info("User %s has not passed the course: %s", student.username, course_id)
if settings.FEATURES['SQUELCH_PII_IN_LOGS']:
log.info("User ID %s has not passed the course: %s", student.id, course_id)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another example...

Suggested change
log.info("User ID %s has not passed the course: %s", student.id, course_id)
log.info("User %s has not passed the course: %s", user_identifier_for_log, course_id)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated. Also updated at other places

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants