Fix SELECT detection and max_rows bypass for comment-prefixed SQL by mortalBibo · Pull Request #87 · apache/doris-mcp-server

mortalBibo · 2026-05-21T09:59:14Z

Closes #86. Continues #75.

Two related bugs, one root cause

sql.startswith("SELECT") at two callsites fails when SQL begins with a leading -- or /* */ comment.

Callsite	Symptom
`db.py:100`	`data: []` with `row_count > 0`
`query_executor.py:689`	`LIMIT {max_rows}` silently skipped

Fixes

db.py (commit by @jonasbrami, cherry-picked from [fix] Use cursor.description to detect result-returning queries #75): use cursor.description instead of parsing SQL text. PEP 249 guarantees the driver populates it for any result-returning statement, eliminating the "missing-from-whitelist" bug class ([some bugs] #62 Bug 5, comments, parenthesized SELECT, etc.).
query_executor.py: this runs before cursor.execute, so cursor.description isn't available. Use a small helper (get_first_sql_keyword) that strips leading comments before extracting the first keyword.

Regression tests (addresses @catpineapple's feedback on #75)

TestExecuteResultSetDetection pins the user-facing contract: any SQL the driver reports as producing a result set must return its rows, regardless of phrasing. Parametrized across 12 positive cases (-- comment, /* */, multiline block, mixed whitespace, parenthesized, CTE, comment+CTE, SHOW, DESC, EXPLAIN, …) and 4 negative cases (INSERT/UPDATE/DELETE/CREATE).

Tests assert on the contract, not the implementation — they'll continue to guard whatever heuristic future maintainers choose.

Note to @jonasbrami

Your commit is preserved as-is at the base of this branch. If you'd prefer to continue #75 yourself, I'll close this PR — just let me know.

Replace brittle sql_upper.startswith() keyword check with cursor.description, which reliably detects any statement that returns a result set (SELECT, SHOW, DESCRIBE, EXPLAIN, CTEs, etc.) without maintaining a hardcoded list of SQL keywords.

Same root cause as apache#75: sql.upper().startswith("SELECT") at query_executor.py:689 returns False when SQL begins with a leading '--' or '/* */' comment, so the auto-injected LIMIT {max_rows} is silently skipped and large queries can be dispatched unbounded. This callsite runs before cursor.execute, so cursor.description is not available. Use a small helper that strips leading SQL comments before extracting the first keyword.

These tests pin the user-facing contract of DorisConnection.execute(): any SQL the driver reports as producing a result set must return its rows, regardless of how the statement is phrased. Guards against regression of: - Issue apache#62 Bug 5 (CTE / WITH returning empty data) - The leading-comment bug fixed in apache#75 (data: [] with row_count > 0 for SELECT prefixed by '--' or '/* */') - The same-root-cause LIMIT bypass fixed in the previous commit Also adds unit tests for the new get_first_sql_keyword helper.

jonasbrami and others added 3 commits May 21, 2026 17:42

FreeOnePlus merged commit 3fe6811 into apache:master May 21, 2026

mortalBibo mentioned this pull request May 22, 2026

[fix] Use cursor.description to detect result-returning queries #75

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix SELECT detection and max_rows bypass for comment-prefixed SQL#87

Fix SELECT detection and max_rows bypass for comment-prefixed SQL#87
FreeOnePlus merged 3 commits into
apache:masterfrom
mortalBibo:fix/select-detection-and-limit-bypass

mortalBibo commented May 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

mortalBibo commented May 21, 2026

Two related bugs, one root cause

Fixes

Regression tests (addresses @catpineapple's feedback on #75)

Note to @jonasbrami

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants