v1.8.0 updates: improve parser.c size, fix dotted statements edge cases and nested #IF 0 edge cases#41
Open
hkimura-intersys wants to merge 4 commits into
Open
Conversation
…nts with blocks, and dotted statements with tags dotted statement fix includes: - ability to parse multiline blocks (ex: if blocks within dotted statements); - ability to parse dotted statements that are part of tags, as it is valid for a dotted statement to come after a tag - ability to parse dotted statements that are not part of a do command, this is valid and compiles in iris even though it doesn't actually run. other scanner changes include: - a comment is a statement termination for argumentless commands - the special #IF 0 #else case is only considered for outer #IF 0 cases, as nested #IF 0 cases still require #ENDIF - fixed parsing dotted statements within #IF cases - separated argumentless statement termination tokens from termination tokens. _statement_termination is used for a singluar statement, and _termination is used to terminate something that could have multiple statements, like an if command or dotted statement. other changes: - rewrote many parts of the grammar, so the parser.c size has now decreased from 75MB to 44MB.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Overview
The key changes from this PR include:
Testing
First, all test cases pass here. (see workflow).
Additionally, I tested this on all
.rtnand.macfiles from//projects/sql/databases/sys/rtn/routine/and//projects/sql/databases/sys/rtn/sql/. This tests are not included in this repo, as they are not open source. But, this was done as part of my local testing.Detailed Explanation of Changes
Parser.c size decrease
I was able to decrease the parser.c size mainly by:
_statements_block,build_legacy_version).All of these efforts were done in the core grammar, so there is still room for improvement for the udl grammar, specifically the keywords.js file. Although the parser.c size isn't notably larger in udl, the wasm file is, so that would be a good thing to optimize next.
Dotted Statements Fix (multiline statements and tags)
My first fix was correctly parsing dotted statements that included multiline commands, such as:

Before, the scanner expected dotted statements to be contained to a single line.
Additionally, this now correctly parses dotted statements with tags. It is valid to have a tag before a dotted statement, like this:

Lastly, this is now able to parse dotted statements that are not part of do commands. This scenario actually makes it so any statement that is a part of this doesn't run, but since this is something that compiles, I decided to add support for it.
ex:
. w "hi"I decided against treating this as a comment even though it doesn't run, because the compiler requires the content to be valid statements after the dot.
IF 0 edge case fix
There is a special edge case of #IF where if #IF 0, then you don't actually need #ENDIF, it is valid to terminate using #ELSE. However, this does not apply to nested #IF 0 scenarios. So, if there is an #IF 0 within an #IF 0, that nested #IF 0 requires an #ENDIF. This specification has been added, and tests have been added as well to reflect this.
Example (note that everything within the outer #IF 0 is treated as a comment, as it will never be reached:

Other Scanner Changes
I rewrote all of the logic for _termination, _argumentless_command_end, BOL, _immediate_single_whitespace.., _argumentless_loop, because these scenarios really should be evaluated together, and there were too many separate blocks that it was hard to trace the flow. With this rewrite, I also decided to change a few things:
_statement_termination. Before, I allowed either_argumentless_command_endor_termination, but that overcomplicated things, and led to newlines and whitespace being wrongfully consumed sometimes. In all scenarios,_statement_terminationis a zero width token.INLINE_COMMENTthat represents a comment within a command, specifically a comment that is usually between the if expression and the statements block that follows._XECUTE_ARG_INVALIDand ZBREAK_DEVICE_TERMINATION, they didn't add anything but complexity.