Skip to content

[PULP-578] Optimize filtering of repo versions by content#7817

Open
jobselko wants to merge 1 commit into
pulp:mainfrom
jobselko:7799
Open

[PULP-578] Optimize filtering of repo versions by content#7817
jobselko wants to merge 1 commit into
pulp:mainfrom
jobselko:7799

Conversation

@jobselko

Copy link
Copy Markdown
Member

closes #7799

📜 Checklist

  • Commits are cleanly separated with meaningful messages (simple features and bug fixes should be squashed to one commit)
  • A changelog entry or entries has been added for any significant changes
  • Follows the Pulp policy on AI Usage
  • (For new features) - User documentation and test coverage has been added

See: Pull Request Walkthrough

@jobselko jobselko self-assigned this Jun 25, 2026

@mdellweg mdellweg left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work.

Comment thread pulpcore/app/models/repository.py Outdated
if isinstance(content, models.QuerySet):
content_pks = list(content.values_list("pk", flat=True))
else:
content_pks = list(content)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this branch for? (According to the docstring, content is always a queryset.)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DistributionWithContentFilter.filter sends a list (versions = RepositoryVersion.objects.with_content([content.pk]).values_list("pk", flat=True)), so the docstring was already outdated. Updated now.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

values_list should still return a (lazily evaluated) queryset:

https://docs.djangoproject.com/en/6.0/ref/models/querysets/#values-list

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

values_list is not what is passed in. [content.pk] is the argument to with_content(), and this is a plain list, not a queryset

@jobselko jobselko Jun 26, 2026

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I changed list(content) to just content as __overlap accepts any iterable.

EDIT: overlap only works on lists, list() is still needed

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may be missing something. But the queryset from line 902 works with overlap perfectly, right?

@jobselko jobselko Jun 26, 2026

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, overlap is happy with both querysets and lists (so not only with lists, I was wrong).

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So does that mean the branch is unnecessary?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, my thinking is that composing querysets (being lazy) can reduce the back and forth between app and db. Calling "list" on a queryset evaluates it forcing the id's to be loaded into django.

(Whether leaving a much more complex query for postgres to digest is actually an optimization, i can only assume...)

Comment thread pulpcore/app/models/repository.py Outdated
@jobselko jobselko force-pushed the 7799 branch 2 times, most recently from 35787ad to fc896b9 Compare June 25, 2026 12:57
@jobselko

Copy link
Copy Markdown
Member Author

Measured benefit by repeatedly adding/removing the same content unit into a repo (creating 2 RepositoryVersion and 1 RepositoryContent per cycle):

Without patch:
Response time grows with the number of cycles (e.g. ~0.5s at 100, ~2-3s at 700)

With patch:
Response time stays constant at ~0.45s regardless of the number of cycles

@jobselko jobselko force-pushed the 7799 branch 3 times, most recently from 4462e89 to 5b75d44 Compare June 29, 2026 10:50
Comment thread pulpcore/app/models/repository.py Outdated
Comment on lines +906 to +907
if not content_pks:
return self.none()

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now i wonder if these to lines give us anything.
In case of a queryset, the truthy test will probably call a premature evaluation defeating the benefit from composing SQL.
Maybe in the case of the parameter being an empty list of id's, this can spare us from makeing an SQL call altogether. But it should be part of the "else" branch then.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All this still depends on what kind of argument we actually allow. I see you changed the docstring...

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated. We allow lists of PKs (this behavior existed before any of my changes) or querysets.

closes pulp#7799
Assisted By: Claude Opus 4.6
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Use new memoized field in repository content API filtering

3 participants