Skip to content

Fix SELECT detection and max_rows bypass for comment-prefixed SQL#87

Merged
FreeOnePlus merged 3 commits into
apache:masterfrom
mortalBibo:fix/select-detection-and-limit-bypass
May 21, 2026
Merged

Fix SELECT detection and max_rows bypass for comment-prefixed SQL#87
FreeOnePlus merged 3 commits into
apache:masterfrom
mortalBibo:fix/select-detection-and-limit-bypass

Conversation

@mortalBibo

Copy link
Copy Markdown
Contributor

Closes #86. Continues #75.

Two related bugs, one root cause

sql.startswith("SELECT") at two callsites fails when SQL begins with a leading -- or /* */ comment.

Callsite Symptom
db.py:100 data: [] with row_count > 0
query_executor.py:689 LIMIT {max_rows} silently skipped

Fixes

  1. db.py (commit by @jonasbrami, cherry-picked from [fix] Use cursor.description to detect result-returning queries #75): use cursor.description instead of parsing SQL text. PEP 249 guarantees the driver populates it for any result-returning statement, eliminating the "missing-from-whitelist" bug class ([some bugs] #62 Bug 5, comments, parenthesized SELECT, etc.).

  2. query_executor.py: this runs before cursor.execute, so cursor.description isn't available. Use a small helper (get_first_sql_keyword) that strips leading comments before extracting the first keyword.

Regression tests (addresses @catpineapple's feedback on #75)

TestExecuteResultSetDetection pins the user-facing contract: any SQL the driver reports as producing a result set must return its rows, regardless of phrasing. Parametrized across 12 positive cases (-- comment, /* */, multiline block, mixed whitespace, parenthesized, CTE, comment+CTE, SHOW, DESC, EXPLAIN, …) and 4 negative cases (INSERT/UPDATE/DELETE/CREATE).

Tests assert on the contract, not the implementation — they'll continue to guard whatever heuristic future maintainers choose.

Note to @jonasbrami

Your commit is preserved as-is at the base of this branch. If you'd prefer to continue #75 yourself, I'll close this PR — just let me know.

jonasbrami and others added 3 commits May 21, 2026 17:42
Replace brittle sql_upper.startswith() keyword check with
cursor.description, which reliably detects any statement that
returns a result set (SELECT, SHOW, DESCRIBE, EXPLAIN, CTEs, etc.)
without maintaining a hardcoded list of SQL keywords.
Same root cause as apache#75: sql.upper().startswith("SELECT") at
query_executor.py:689 returns False when SQL begins with a leading
'--' or '/* */' comment, so the auto-injected LIMIT {max_rows} is
silently skipped and large queries can be dispatched unbounded.

This callsite runs before cursor.execute, so cursor.description is not
available. Use a small helper that strips leading SQL comments before
extracting the first keyword.
These tests pin the user-facing contract of DorisConnection.execute():
any SQL the driver reports as producing a result set must return its
rows, regardless of how the statement is phrased.

Guards against regression of:
- Issue apache#62 Bug 5 (CTE / WITH returning empty data)
- The leading-comment bug fixed in apache#75 (data: [] with row_count > 0
  for SELECT prefixed by '--' or '/* */')
- The same-root-cause LIMIT bypass fixed in the previous commit

Also adds unit tests for the new get_first_sql_keyword helper.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

SELECT with leading comment returns empty data, and max_rows is silently bypassed for the same statements

3 participants