Fix SELECT detection and max_rows bypass for comment-prefixed SQL#87
Merged
FreeOnePlus merged 3 commits intoMay 21, 2026
Merged
Conversation
Replace brittle sql_upper.startswith() keyword check with cursor.description, which reliably detects any statement that returns a result set (SELECT, SHOW, DESCRIBE, EXPLAIN, CTEs, etc.) without maintaining a hardcoded list of SQL keywords.
Same root cause as apache#75: sql.upper().startswith("SELECT") at query_executor.py:689 returns False when SQL begins with a leading '--' or '/* */' comment, so the auto-injected LIMIT {max_rows} is silently skipped and large queries can be dispatched unbounded. This callsite runs before cursor.execute, so cursor.description is not available. Use a small helper that strips leading SQL comments before extracting the first keyword.
These tests pin the user-facing contract of DorisConnection.execute(): any SQL the driver reports as producing a result set must return its rows, regardless of how the statement is phrased. Guards against regression of: - Issue apache#62 Bug 5 (CTE / WITH returning empty data) - The leading-comment bug fixed in apache#75 (data: [] with row_count > 0 for SELECT prefixed by '--' or '/* */') - The same-root-cause LIMIT bypass fixed in the previous commit Also adds unit tests for the new get_first_sql_keyword helper.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #86. Continues #75.
Two related bugs, one root cause
sql.startswith("SELECT")at two callsites fails when SQL begins with a leading--or/* */comment.db.py:100data: []withrow_count > 0query_executor.py:689LIMIT {max_rows}silently skippedFixes
db.py(commit by @jonasbrami, cherry-picked from [fix] Use cursor.description to detect result-returning queries #75): usecursor.descriptioninstead of parsing SQL text. PEP 249 guarantees the driver populates it for any result-returning statement, eliminating the "missing-from-whitelist" bug class ([some bugs] #62 Bug 5, comments, parenthesized SELECT, etc.).query_executor.py: this runs beforecursor.execute, socursor.descriptionisn't available. Use a small helper (get_first_sql_keyword) that strips leading comments before extracting the first keyword.Regression tests (addresses @catpineapple's feedback on #75)
TestExecuteResultSetDetectionpins the user-facing contract: any SQL the driver reports as producing a result set must return its rows, regardless of phrasing. Parametrized across 12 positive cases (--comment,/* */, multiline block, mixed whitespace, parenthesized, CTE, comment+CTE,SHOW,DESC,EXPLAIN, …) and 4 negative cases (INSERT/UPDATE/DELETE/CREATE).Tests assert on the contract, not the implementation — they'll continue to guard whatever heuristic future maintainers choose.
Note to @jonasbrami
Your commit is preserved as-is at the base of this branch. If you'd prefer to continue #75 yourself, I'll close this PR — just let me know.