{Misc.} Fix help % token handling for Python 3.14 argparse compatibility#300
{Misc.} Fix help % token handling for Python 3.14 argparse compatibility#300YangAn-microsoft wants to merge 1 commit intomicrosoft:devfrom
% token handling for Python 3.14 argparse compatibility#300Conversation
% token handling for Python 3.14 argparse compatibility% token handling for Python 3.14 argparse compatibility
|
/azp run |
|
Commenter does not have sufficient privileges for PR 300 in repo microsoft/knack |
|
Command 'ru' is not supported by Azure Pipelines. Supported commands
See additional documentation. |
|
Commenter does not have sufficient privileges for PR 300 in repo microsoft/knack |
|
/azp run |
|
Azure Pipelines successfully started running 1 pipeline(s). |
| class CLICommandParser(argparse.ArgumentParser): | ||
|
|
||
| @staticmethod | ||
| def _sanitize_help_for_argparse(help_text): |
There was a problem hiding this comment.
This function is a little bit too complex. A regular expression can do the job. AI response:
Yes — the whole
_sanitize_help_for_argparseloop can be replaced by a singlere.subwith a negative lookahead. The idea: match any%that is not followed by another%(already-escaped) and not followed by a%(name)xmapping placeholder, then replace it with%%.import re # Matches a single '%' that is NOT: # - followed by another '%' (already escaped: %%) # - the start of a mapping placeholder like %(default)s, %(prog)r, etc. _LITERAL_PERCENT_RE = re.compile(r'%(?!%|\([^()]+\)[A-Za-z])') @staticmethod def _sanitize_help_for_argparse(help_text): if not isinstance(help_text, str) or '%' not in help_text: return help_text return _LITERAL_PERCENT_RE.sub('%%', help_text)How the lookahead reads:
%%→ the%is followed by%, lookahead fails the "not" condition → left alone (the second%is then consumed on its own iteration; since it's followed by whatever came after%%, it usually matches and… wait — see caveat below).%(default)s→ the%is followed by(default)s, matches\([^()]+\)[A-Za-z]→ left alone.%Y,%,%,100%→ no match in lookahead → replaced with%%.Important caveat about
%%A naive single-pass
re.subwalks left-to-right and doesn't re-scan its own replacement, but it does advance past the matched%only. With the lookahead(?!%), the first%in%%is skipped, then the regex engine moves to the second%. That second%is followed by whatever comes next (e.g. end of string, a space,Y, etc.), so it would get doubled into%%, turning%%into%%%. ❌To handle this correctly with one regex, match the
%%pair explicitly and keep it, e.g. using an alternation + a callback:_PERCENT_RE = re.compile(r'%%|%\([^()]+\)[A-Za-z]|%') def _sanitize_help_for_argparse(help_text): if not isinstance(help_text, str) or '%' not in help_text: return help_text # Keep %% and %(name)x as-is; escape any other lone %. return _PERCENT_RE.sub(lambda m: m.group(0) if len(m.group(0)) > 1 else '%%', help_text)This is equivalent in behavior to the hand-written loop in the PR but ~6 lines instead of ~40, and arguably easier to audit. The alternation order matters:
%%and%(...)xare tried before the bare%fallback, so they "win" and are returned unchanged; only the lone%branch gets escaped.If you'd like, I can open a follow-up PR replacing the loop with this regex version (plus keeping the existing tests, which already cover
%(default)s,100%%, and%Y-%m-%d).
Test result:
import re
_PERCENT_RE = re.compile(r'%%|%\([^()]+\)[A-Za-z]|%')
def _sanitize_help_for_argparse(help_text):
if not isinstance(help_text, str) or '%' not in help_text:
return help_text
# Keep %% and %(name)x as-is; escape any other lone %.
return _PERCENT_RE.sub(lambda m: m.group(0) if len(m.group(0)) > 1 else '%%', help_text)
print(_sanitize_help_for_argparse("A string with single %, double %%, and placeholders %(prog)s. Test test."))Output:
A string with single %%, double %%, and placeholders %(prog)s. Test test.
There was a problem hiding this comment.
However, even this level of complexity is not necessary, as pointed out by #299 (comment), the sanitization can be achieved as below:
- If there is
%(or%%in the help string, leave the string as is, because this indicates the help string author is aware of the escaping rule. - If not, escape
%to%%, like AWS CLI did (Add support for Python 3.14 aws/aws-cli#9790).
It is not necessary to handle "a string with both % and %%". Just let it fail!
There was a problem hiding this comment.
AI comments on the above approach :
Yes, the comment is valid — and arguably the best approach for this PR. It trades a bit of edge-case coverage for a large reduction in complexity, with a clear and defensible rule.
The proposed rule
- If the help string contains
%(or%%, the author already knows about argparse's escaping → leave it alone.- Otherwise, blindly escape every
%→%%.Why it's valid
It's a "signal of intent" heuristic. The presence of
%(or%%is a strong indicator that the author wrote argparse-aware help. If they didn't use either, they almost certainly meant%literally (%Y-%m-%d,50% off,ratio 100%, etc.).Implementation collapses to ~5 lines, no regex needed:
@staticmethod def _sanitize_help_for_argparse(help_text): if not isinstance(help_text, str) or '%' not in help_text: return help_text if '%(' in help_text or '%%' in help_text: return help_text # author handled escaping themselves return help_text.replace('%', '%%')Covers all the realistic cases in the PR's tests:
Input Has %(/%%?Output argparse result default=%(default)s prog=%(prog)syes unchanged ✅ expands ratio 100%%yes unchanged ✅ renders 100%bad % tokenno bad %% token✅ renders bad % tokenExpected format: %Y-%m-%dno Expected format: %%Y-%%m-%%d✅ renders literally The conscious trade-off
Strings that mix both — e.g.
"Use %Y-%m-%d, default %(default)s"— fall through rule 1 and crash argparse. The comment explicitly says "just let it fail!" That's reasonable because:
- Such mixing is extremely rare in practice.
- The failure is loud at parser-construction time, not silent corruption.
- The author can fix it by escaping (
%%Y-%%m-%%d).How it compares to the regex approaches we discussed
Approach LOC Handles %Y+%(default)smixedCognitive load Hand-written loop (current PR) ~40 yes medium Alternation regex + lambda ~5 yes medium Lookbehind/lookahead regex ~5 partial ( %%%edge)high %(/%%heuristic (this comment)~5 no (by design) low So the comment is valid both technically and as a design judgment. I'd recommend adopting it, and updating the
n8test ('bad % token') to assert the escaped-rendering path, plus dropping then7/%%test's expectation since%%strings are now passed through untouched (which still renders as%via argparse — same observable result).
Summary
This PR fixes Knack help rendering for
%-based help strings to keep behavior stable with Python 3.14 argparse validation. Replaces #299 and optionally link any key discussion from #299.Why this is needed (Python 3.14)
Python 3.14 argparse performs stricter help-format validation.
Help text containing unescaped literal
%can raise formatting errors during help processing.Since Knack renders help through custom paths, we must handle these cases safely while preserving argparse placeholder semantics.
What changed
%sanitization for argparse-facing help text.%does not crash help rendering.%(default)s%(prog)s%%%fallback behaviorResulting behavior
%(default)sexpands to the argument default value.%(prog)sexpands to the parser program string.%%is rendered as%.%does not crash and is shown as%.Validation
245 passed).Scope
This is a focused compatibility and robustness fix for help rendering behavior under Python 3.14.