feat: replace user identity strings with user IDs#38775
Conversation
| log.info(f'The user "{username}" has been added to the retirement pipeline \ | ||
| by "{request.user}"') | ||
| if settings.FEATURES['SQUELCH_PII_IN_LOGS']: | ||
| log.info('A user has been added to the retirement pipeline') |
There was a problem hiding this comment.
@robrap Should we keep request.user in this logger? request.user is the user who triggered the bulk retirement pipeline.
There was a problem hiding this comment.
We should probably just use the request user's id instead.
| }) | ||
| log.warning('email %s already exist', email) | ||
| if settings.FEATURES['SQUELCH_PII_IN_LOGS']: | ||
| log.warning('email [REDACTED] already exist') |
There was a problem hiding this comment.
If the logger warning is a necessity to know which user's email is being referred here , then as a suggestion we can use it as:
log.warning(
'email for user_id=%s already exists',
user.id,
)
Just a suggestion upto you to decide, can we use this or the current change is enough?
Other places also may need some attention to see if user id can be used in the logger, so take a look at those also.
| except SMTPException: | ||
| log.warning("Failure sending 'pending state' e-mail for %s to %s", user.email, studio_request_email) | ||
| if settings.FEATURES['SQUELCH_PII_IN_LOGS']: | ||
| log.warning("Failure sending 'pending state' e-mail for user ID %s to [REDACTED]", user.id) |
There was a problem hiding this comment.
- I'd keep away from using "REDACTED". That's a nice automated solution if we were trying to have Datadog redact based on regex, or something. But when updating code, we have the ability to spell things out in english in the message.
- Also, I was wondering what
studio_request_emailwas and how we'd spell that out. It turns out it is just a system email coming from a setting, so there is no reason to redact that email in the first place. Something this brings up, if we choose to not redact an email (something that looks like PII), what's a good way to annotate that? For now, you could just add a comment like:
# studio_request_email is a system email address, not PII, which can safely be logged.
robrap
left a comment
There was a problem hiding this comment.
I didn't finish, so I might have missed other things, but I'd rather go through after clean-up of the types of issues I already pointed out. Thank you.
| user.email_user(subject, message, studio_request_email) | ||
| except: # pylint: disable=bare-except | ||
| log.warning("Unable to send course creator status e-mail to %s", user.email) | ||
| except: # lint-amnesty, pylint: disable=bare-except |
There was a problem hiding this comment.
As discussed, lint-amnesty should never be add manually. It was used in the past, and means that the disable pragma was necessary to make pylint happy at one point in time, but could possibly be fixed. If the disable pragma is what is wanted permanently, then lint-amnesty should actually be removed. There is an OEP around this that was since deleted, but this should be documented somewhere at some point. Maybe I'll open a ticket.
There was a problem hiding this comment.
@ttak-apphelix: I was going to document something, but then I realized there are only 3 usages left: https://github.com/search?q=repo%3Aopenedx%2Fopenedx-platform%20lint-amnesty&type=code. I imagine there were zero at some point, but they returned via copy/paste somehow.
Was there a doc or something you saw that had you add #lint-amnesty? I wonder if there is something we can correct, and if we can help get rid of the remaining usages, even if on a separate PR?
There was a problem hiding this comment.
removed lint-amnesty from here. Will create a separate PR to remove lint-amnesty across the project.
| except: # pylint: disable=bare-except | ||
| log.warning("Unable to send course creator status e-mail to %s", user.email) | ||
| except: # lint-amnesty, pylint: disable=bare-except | ||
| if settings.FEATURES['SQUELCH_PII_IN_LOGS']: |
There was a problem hiding this comment.
Didn't we help deprecate the FEATURES dict? Shouldn't we be getting this setting from the top-level now?
| log.warning("User %s failed to enroll in non-existent course %s", user.id, str(course_key)) | ||
| else: | ||
| log.warning("User %s failed to enroll in non-existent course %s", user.username, str(course_key)) | ||
| raise NonExistentCourseError from err # lint-amnesty, pylint: disable=raise-missing-from |
There was a problem hiding this comment.
I presume you resolved the raise-missing-from linting issue, so you probably don't need the pragma?
| raise NonExistentCourseError from err # lint-amnesty, pylint: disable=raise-missing-from | |
| raise NonExistentCourseError from err |
There was a problem hiding this comment.
Yes, removed pragma.
| if settings.FEATURES['SQUELCH_PII_IN_LOGS']: | ||
| log.warning( | ||
| "User %s failed to enroll in course %s because enrollment is closed (can_upgrade=%s).", | ||
| user.id, | ||
| str(course_key), | ||
| can_upgrade, | ||
| ) | ||
| else: | ||
| log.warning( | ||
| "User %s failed to enroll in course %s because enrollment is closed (can_upgrade=%s).", | ||
| user.username, | ||
| str(course_key), | ||
| can_upgrade, | ||
| ) |
There was a problem hiding this comment.
I'm not sure how many places this would apply, but the following has a lot less redundancy, and thus changes to the log message are more likely to remain consistent.
| if settings.FEATURES['SQUELCH_PII_IN_LOGS']: | |
| log.warning( | |
| "User %s failed to enroll in course %s because enrollment is closed (can_upgrade=%s).", | |
| user.id, | |
| str(course_key), | |
| can_upgrade, | |
| ) | |
| else: | |
| log.warning( | |
| "User %s failed to enroll in course %s because enrollment is closed (can_upgrade=%s).", | |
| user.username, | |
| str(course_key), | |
| can_upgrade, | |
| ) | |
| if settings.FEATURES['SQUELCH_PII_IN_LOGS']: | |
| user_identifier_for_log = user.id | |
| else: | |
| user_identifier_for_log = user.username | |
| log.warning( | |
| "User %s failed to enroll in course %s because enrollment is closed (can_upgrade=%s).", | |
| user_identifier_for_log, | |
| str(course_key), | |
| can_upgrade, | |
| ) |
There was a problem hiding this comment.
Updated across the PR
| @@ -829,7 +863,10 @@ def enroll_by_email(cls, email, course_id, mode=None, ignore_errors=True): | |||
| return cls.enroll(user, course_id, mode) | |||
| except User.DoesNotExist: | |||
| err_msg = "Tried to enroll email {} into course {}, but user not found" | |||
There was a problem hiding this comment.
- Not sure why
err_msgwas broken out and not using%s, but as written, this variable belongs in the else where it is used (if the message were left redundant). - That said, what if you used
email_for_log = emailandemail_for_log = '[REDACTED]', and have a single log?
There was a problem hiding this comment.
Refactored to use correct syntax
| raise ValueError(_('Unable to send email activation link. Please try again later.')) # pylint: disable=raise-missing-from # noqa: B904 | ||
| if settings.FEATURES['SQUELCH_PII_IN_LOGS']: | ||
| log.error( | ||
| 'Unable to send email activation link for user %s from a redacted address', |
There was a problem hiding this comment.
I think redacted address is misleading. It makes it sound like the address in the DB is redacted. I'd probably go with the simpler:
| 'Unable to send email activation link for user %s from a redacted address', | |
| 'Unable to send email activation link for user %s.', |
But you could also use:
| 'Unable to send email activation link for user %s from a redacted address', | |
| 'Unable to send email activation link for user %s from email [REDACTED].', |
- What is the
from_address? Is this just a system address, and does it not even need to be redacted? - Are there other places where we used lowercase "redacted", where the meaning might be misleading?
There was a problem hiding this comment.
It's a system address. Reverted the message
| try: | ||
| ace.send(msg) | ||
| except Exception: # pylint: disable=broad-except | ||
| log.warning('Unable to send confirmation email to old address', exc_info=True) |
There was a problem hiding this comment.
- Why did this need to change? It wasn't including any user arguments?
- Note: If you remove this, you could probably remove the
# pylint: disable=too-many-statementsthat was added earlier as well.
- Are there other cases like this?
There was a problem hiding this comment.
- My bad, reverted the changes
- Removed # pylint: disable=too-many-statements
- NO
| except User.DoesNotExist: | ||
| log.exception(f'The user "{username}" does not exist.') | ||
| if settings.FEATURES['SQUELCH_PII_IN_LOGS']: | ||
| log.exception('A user does not exist for bulk retirement.') |
There was a problem hiding this comment.
One idea is to start the loop with for index, username in enumerate(usernames_to_retire): (note: this is untested code), and update the message to something like:
| log.exception('A user does not exist for bulk retirement.') | |
| log.exception('Bulk retirement user at index %s does not exist.', index) |
You could possibly add the index to the INFO log for consistency as well.
Alternatively, since we have all the other INFO logs, we could just use username_in_log='[REDACTED]', and someone could probably figure out which inputs weren't found.
| if not user.has_usable_password(): | ||
| log.info(f'Goal Reminder User is disabled {user.username} course {goal.course_key}') | ||
| if settings.FEATURES['SQUELCH_PII_IN_LOGS']: | ||
| log.info(f'Goal Reminder User is disabled user ID {user.id} course {goal.course_key}') |
There was a problem hiding this comment.
Alternative example with less redundancy:
| log.info(f'Goal Reminder User is disabled user ID {user.id} course {goal.course_key}') | |
| log.info('Goal Reminder User is disabled user %s course %s', user_identifier_for_log, goal.course_key) |
| if not is_course_passed(student, course): | ||
| log.info("User %s has not passed the course: %s", student.username, course_id) | ||
| if settings.FEATURES['SQUELCH_PII_IN_LOGS']: | ||
| log.info("User ID %s has not passed the course: %s", student.id, course_id) |
There was a problem hiding this comment.
Another example...
| log.info("User ID %s has not passed the course: %s", student.id, course_id) | |
| log.info("User %s has not passed the course: %s", user_identifier_for_log, course_id) |
There was a problem hiding this comment.
updated. Also updated at other places
Update application logging to prevent exposure of customer identity information in logs when SQUELCH_PII_IN_LOGS is enabled. Log records that currently include usernames, email addresses, or other user-identifying strings should instead use non-PII identifiers (for example, numeric user IDs) where appropriate.
This change should be applied consistently across the platform to reduce PII exposure in logs
Private JIRA ticket:
https://2u-internal.atlassian.net/browse/BOMS-641