Claude/perf test fixes response size#45
Merged
Merged
Conversation
The performance summary outgrew Slack's 3000-char per-section text limit when the runner started emitting per-outcome stats, so the incoming webhook silently rejected the POST and the human-readable summary stopped appearing in Slack. Split each notification across multiple section blocks at line boundaries and log non-2xx webhook responses so the next time this happens it isn't invisible.
get_ars_responses shared a single start_time across every child poll, so once one ARA exhausted MAX_ARA_TIME the next children entered their poll loop with the deadline already past and were recorded as timed out without ever being checked. Set start_time per child so each ARA is polled independently against the full MAX_ARA_TIME window.
A query can come back with a successful status but a payload that isn't a full TRAPI message (eg an error body), and the summary had no way to surface that. Track the byte size of each completed query's final response and report per-outcome min/max/avg plus a distinct-size count, with an explicit warning when responses with the same outcome came back at different sizes. For ARS, the /trace poll only carries status metadata, so fetch the merged_version PK on completion and use that response's content length as the recorded size. ARAs already returned the final TRAPI directly, so no extra request is needed there. A custom QUERY event listener collects per-outcome sizes and pipes them through results into the result collector.
The summary previously listed every distinct Locust failure row with its raw error message, which could push the message past Slack's 3000-char per-section limit on a noisy run. Print just the total occurrence count and the distinct-row count; the full failure bodies are already included in the uploaded performance_stats JSON.
The text summary in Slack didn't show the time-series shape of a run and (after moving failure details out of the summary) no longer captured distinct error types either. Locust already collects the data for both: env.runner.stats.history holds per-second snapshots, and locust.html.get_html_report renders the full UI page programmatically. For each performance target the runner now also returns the history list and the Locust HTML report. ResultCollector exposes a generator that yields (filename, bytes) pairs - a stacked-subplot PNG built with matplotlib (RPS+failures, p50/p95 response time, user count) and the Locust HTML report itself. main.py uploads each artifact via a new Slacker.upload_binary_file helper, isolating failures so one bad upload doesn't block the rest. The Slack text summary's failures line now points to the HTML report for the full Locust-identical failures table.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.