Tag search optimization by evgenyfadeev · Pull Request #969 · ASKBOT/askbot-devel

evgenyfadeev · 2026-05-23T00:21:37Z

No description provided.

… JOINs The previous implementation added one JOIN through the M2M for every filter tag: for tag in tags: qs = qs.filter(tags__name=tag) With 6-8 tags this produces ~16 JOINs against thread_tags/tag and the planner degrades superlinearly. On a small dataset (~3000 posts), 5-8 tag intersection queries took 4-12 seconds. The relational idiom for set intersection is one subquery counting matched tags per thread: SELECT thread_id FROM thread_tags WHERE tag_name IN (...) GROUP BY thread_id HAVING COUNT(DISTINCT tag_id) = N Implemented via the auto-generated M2M through model (Thread.tags.through) with Count(distinct=True). Semantics preserved: case-sensitive name match, AND across all tags. Language-code filtering remains handled upstream (via the per-tag Tag.objects.get(name__iexact=..., language_code=...) when TAG_SEARCH_INPUT_ENABLED is true). Tests: extend ThreadTagModelsTests.test_run_adv_search_ANDing_tags with 4-tag and 6-tag assertions to exercise the new code path; the existing 1/2/3-tag assertions continue to verify semantics. Production data on ask.wingware.com (2026-05-19): avg response time on tag-filter URLs dropped from 947ms to 112ms after this fix landed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

The subquery + HAVING COUNT form compares Count(DISTINCT tag_id) to len(tags), but unified_tags() concatenates query_tags + tags so the same tag can appear twice (e.g. "#tag1" plus selector "tag1") and the inflated len() can never match. Likewise, when every requested tag is unknown the tag list becomes empty after the case-insensitive lookup, which under the new query returns zero threads instead of leaving the result set unfiltered. Deduplicate the names and skip the subquery when the list is empty.

scripts/perf_tag_search.py drives /questions/tags:.../ with an increasing tag count against a running dev server and prints min/median/max timings; useful for measuring the JOIN-vs-subquery gap before/after the optimization.

sdeibel and others added 4 commits May 19, 2026 14:40

Merge branch 'master' into tag-search-optimization

e93eb73

evgenyfadeev merged commit 9c79816 into master May 23, 2026
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Tag search optimization#969

Tag search optimization#969
evgenyfadeev merged 4 commits into
masterfrom
tag-search-optimization

evgenyfadeev commented May 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

evgenyfadeev commented May 23, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants