[PULP-578] Optimize filtering of repo versions by content#7817
Conversation
| if isinstance(content, models.QuerySet): | ||
| content_pks = list(content.values_list("pk", flat=True)) | ||
| else: | ||
| content_pks = list(content) |
There was a problem hiding this comment.
What is this branch for? (According to the docstring, content is always a queryset.)
There was a problem hiding this comment.
DistributionWithContentFilter.filter sends a list (versions = RepositoryVersion.objects.with_content([content.pk]).values_list("pk", flat=True)), so the docstring was already outdated. Updated now.
There was a problem hiding this comment.
values_list should still return a (lazily evaluated) queryset:
https://docs.djangoproject.com/en/6.0/ref/models/querysets/#values-list
There was a problem hiding this comment.
values_list is not what is passed in. [content.pk] is the argument to with_content(), and this is a plain list, not a queryset
There was a problem hiding this comment.
So I changed list(content) to just content as __overlap accepts any iterable.
EDIT: overlap only works on lists, list() is still needed
There was a problem hiding this comment.
I may be missing something. But the queryset from line 902 works with overlap perfectly, right?
There was a problem hiding this comment.
Yeah, overlap is happy with both querysets and lists (so not only with lists, I was wrong).
There was a problem hiding this comment.
So does that mean the branch is unnecessary?
There was a problem hiding this comment.
Well, my thinking is that composing querysets (being lazy) can reduce the back and forth between app and db. Calling "list" on a queryset evaluates it forcing the id's to be loaded into django.
(Whether leaving a much more complex query for postgres to digest is actually an optimization, i can only assume...)
35787ad to
fc896b9
Compare
|
Measured benefit by repeatedly adding/removing the same content unit into a repo (creating 2 RepositoryVersion and 1 RepositoryContent per cycle): Without patch: With patch: |
4462e89 to
5b75d44
Compare
| if not content_pks: | ||
| return self.none() |
There was a problem hiding this comment.
Now i wonder if these to lines give us anything.
In case of a queryset, the truthy test will probably call a premature evaluation defeating the benefit from composing SQL.
Maybe in the case of the parameter being an empty list of id's, this can spare us from makeing an SQL call altogether. But it should be part of the "else" branch then.
There was a problem hiding this comment.
All this still depends on what kind of argument we actually allow. I see you changed the docstring...
There was a problem hiding this comment.
Updated. We allow lists of PKs (this behavior existed before any of my changes) or querysets.
closes pulp#7799 Assisted By: Claude Opus 4.6
closes #7799
📜 Checklist
See: Pull Request Walkthrough