fix(parsers/php): extract procedural top-level + closure units; seed PHP entry points#76
Open
gadievron wants to merge 2 commits into
Open
Conversation
…PHP entry points
Two extraction/seeding defects on the PHP analysis path, plus the PHP
entry-point-seeding gap they depend on. All changes are confined to the extraction
layer (parsers/php/function_extractor.py) and the entry-point detector
(utilities/agentic_enhancer/entry_point_detector.py); the PHP call-graph builder is
untouched.
1. Procedural top-level blackout:
_extract_functions_from_tree emitted units only for named definitions; top-level
procedural statements (assignments, echo, add_action(...) hook registrations)
fell through the catch-all else and produced NO unit, so a WordPress-style
plugin.php was invisible to reachability seeding. The Python parser has a
module-level synthesizer (extract_module_level_code -> unit_type='module_level');
PHP had none. Adds _extract_module_level_unit (called from process_file),
synthesising a <file>:__module__ unit from program-level statements. Handles
braceless + braced namespaces; emits nothing for files with no file-scope code.
2. PHP entry-point seeding:
entry_point_detector USER_INPUT_PATTERNS / MODULE_LEVEL_INPUT_PATTERNS were
Python/JS-only, so a PHP handler reading $_POST was never an entry point (Check 3)
and the module_level unit could not fire Check 4. Adds PHP superglobals
($_GET/$_POST/$_REQUEST/$_COOKIE/$_SERVER/$_FILES/$_ENV/$_SESSION), php://input /
filter_input, and WordPress hook idioms (add_action/add_filter/do_action/
apply_filters) for the module-level path.
3. Anonymous closures + arrow functions as units:
anonymous_function / arrow_function nodes fell through the same else and were
never modeled; the named-definition walk also did not descend into function/method
bodies, so nested closures were unreachable. Adds a closure dispatch branch
(unit_type='closure', synthetic {closure@line:col} name) and makes
function_definition / method_declaration recurse into their bodies. The
closure-DISPATCH edge ($cb() -> closure) lives in call_graph_builder.py and is out
of this file's scope; this fixes only the extraction half.
Out of scope (not fixed here):
- The use_declaration -> namespace_use_declaration node-type correction is already
handled by the existing PHP import-extraction code in _extract_imports; re-touching
it here would duplicate that change. Left untouched.
- Aliased `use Foo\Bar as Baz` -> alias-to-FQN translation lives in
call_graph_builder.py::_resolve_class_call (out of this file's scope). An
alias-capture in function_extractor alone would be unobservable (the only consumer
of the imports map is call_graph_builder.py) and would risk regressing
import-matching. No no-op change made.
Tests: tests/parsers/php/test_php_extractor.py (new; the package had no PHP
extractor tests). Modules loaded under unique importlib names (function_extractor.py
is a basename shared by every parser). Eight tests covering module_level synthesis,
no-false-positive on class-only files, PHP superglobal entry-point seeding (Check 3
+ Check 4), and closure/arrow-function unit extraction. 6 failed pre-fix (2 guard
tests green at base by design) -> 8 passed after the fix. ruff clean; full suite 184
passed / 63 skipped (suite excluding the new file is exactly 176/63 — zero
regression).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…th components extract_all() skipped files whose ABSOLUTE path contained the substring 'tmp' (or 'vendor'/'node_modules'/'.git'/'.cache'). When the analyzed repo lives under such an ancestor directory — e.g. a Linux /tmp working dir (as on CI), or any path with a 'template'-like segment — every file was wrongly excluded and zero functions were extracted. Match the excluded names against the path's components RELATIVE to the repo root instead, so only the repo's own vendored/transient dirs are skipped, not ancestor directories. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Two extraction/seeding defects on the PHP analysis path, plus the PHP
entry-point-seeding gap they depend on. All changes are confined to the extraction
layer (parsers/php/function_extractor.py) and the entry-point detector
(utilities/agentic_enhancer/entry_point_detector.py); the PHP call-graph builder is
untouched.
Procedural top-level blackout:
_extract_functions_from_tree emitted units only for named definitions; top-level
procedural statements (assignments, echo, add_action(...) hook registrations)
fell through the catch-all else and produced NO unit, so a WordPress-style
plugin.php was invisible to reachability seeding. The Python parser has a
module-level synthesizer (extract_module_level_code -> unit_type='module_level');
PHP had none. Adds _extract_module_level_unit (called from process_file),
synthesising a :module unit from program-level statements. Handles
braceless + braced namespaces; emits nothing for files with no file-scope code.
PHP entry-point seeding:
entry_point_detector USER_INPUT_PATTERNS / MODULE_LEVEL_INPUT_PATTERNS were
Python/JS-only, so a PHP handler reading $_POST was never an entry point (Check 3)
and the module_level unit could not fire Check 4. Adds PHP superglobals
($_GET/$_POST/$_REQUEST/$_COOKIE/$_SERVER/$_FILES/$_ENV/$_SESSION), php://input /
filter_input, and WordPress hook idioms (add_action/add_filter/do_action/
apply_filters) for the module-level path.
Anonymous closures + arrow functions as units:
anonymous_function / arrow_function nodes fell through the same else and were
never modeled; the named-definition walk also did not descend into function/method
bodies, so nested closures were unreachable. Adds a closure dispatch branch
(unit_type='closure', synthetic {closure@line:col} name) and makes
function_definition / method_declaration recurse into their bodies. The
closure-DISPATCH edge ($cb() -> closure) lives in call_graph_builder.py and is out
of this file's scope; this fixes only the extraction half.
Out of scope (not fixed here):
handled by the existing PHP import-extraction code in _extract_imports; re-touching
it here would duplicate that change. Left untouched.
use Foo\Bar as Baz-> alias-to-FQN translation lives incall_graph_builder.py::_resolve_class_call (out of this file's scope). An
alias-capture in function_extractor alone would be unobservable (the only consumer
of the imports map is call_graph_builder.py) and would risk regressing
import-matching. No no-op change made.
Tests: tests/parsers/php/test_php_extractor.py (new; the package had no PHP
extractor tests). Modules loaded under unique importlib names (function_extractor.py
is a basename shared by every parser). Eight tests covering module_level synthesis,
no-false-positive on class-only files, PHP superglobal entry-point seeding (Check 3
tests green at base by design) -> 8 passed after the fix. ruff clean; full suite 184
passed / 63 skipped (suite excluding the new file is exactly 176/63 — zero
regression).
Co-Authored-By: Claude Opus 4.7 (1M context) noreply@anthropic.com