Build out component type section parsing#10
Conversation
Support is added for the majority of the <type> section and recursive parsing of inner components. Parser is now able to pass all cases from wasm-tools/types.wast and wasm-tools/definedtypes.wast, with expected WasmComponent structures added for equality assertions in each valid component case.
andreaTP
left a comment
There was a problem hiding this comment.
Skimmed through and looks again a very good step forward.
To effectively review this one, can I ask a direct link to the spec you are following?
| .build(); | ||
| } | ||
|
|
||
| private static WasmComponent case29() { |
There was a problem hiding this comment.
The idea here is to tie a given expected structure to a particular .wast test case. When wasm-tools json-from-wast is run, it splits each individual test case out into its own .wasm binary, such as input.29.wasm. These class and method names are meant to line up with the individual cases.
I think originally I was using the test case line numbers that are also part of the json output, but one minor change to a preceding case in the .wast could throw those numbers off and break things. There is still the chance of a case being added somewhere in the middle that would also throw off the case numbers, but this should be more stable overall. In my observation of the spec repo, these cases are all fairly stable at this point. Long term it could be better to have a more dynamic approach like what Endive has done, but this is working well for now.
| } | ||
| } | ||
|
|
||
| private static final class WasmToolsDefinedTypesAssertions { |
There was a problem hiding this comment.
is this intended to be a separate class?
There was a problem hiding this comment.
Yes. For the sake of organization, I've taken the approach of creating these *Assertions classes that each line up with a specific .wast file. This class contains the assertions that match up to spec-tests/wasm-tools/definedtypes.wast, etc. As I continue building these out, I'll probably start moving them into top level classes in the test source set to make them easier to track down and to keep ComponentParserTests itself from getting too huge.
| assertThat(actualComponent) | ||
| .usingRecursiveComparison() | ||
| .ignoringFields("customSections") | ||
| .ignoringFieldsMatchingRegexes(".*customSection\\.bytes.*") |
There was a problem hiding this comment.
noticing this, I believe that adding(basic) support for custom sections(if they are similar to Wasm core we don't need to "understand" the content) is less work than keeping ignoring them.
There was a problem hiding this comment.
Right, I would like to build that out eventually, but it seems less of a priority. Currently they are all being parsed as an UnknownCustomSection, and ignoring the content of the byte[] allows me to easily build out the other parts of the expected class structure.
The ones that I'm ignoring are all from this Name Section part of the spec. To properly build out the expected structure I'd need to populate it with all of the variable names used in the .wast text format of the test.
| return sort; | ||
| } | ||
|
|
||
| public static final class ID { |
There was a problem hiding this comment.
It definitely could be, and probably should be? I only did it this way to try to stay consistent with some of the code in the main Endive parser such as ValType, but as I look at that code some more I'm seeing there are probably some special reasons as for why ValType.ID is not itself an enum.
There was a problem hiding this comment.
I went ahead and did the conversion here and in DefValType to enums. I also renamed ID to Kind, which is a slightly more accurate name and is consistent with some of the other added classes such as ExternDesc (which I am considering breaking up into a class hierarchy the way I have with Alias and DefValType).
| this.id = id; | ||
| } | ||
|
|
||
| public static final class ID { |
Sure! The two most relevant documents are the Component Model AST Explainer and the Component Model Binary Format Explainer. In building out the parser, I spend probably 85% of the time looking at the Binary.md doc and just refer to the Explainer.md doc when I need more context. My main focus thus far has been on the |
andreaTP
left a comment
There was a problem hiding this comment.
This is an heavy lift! Thanks a lot for taking the time!
LGTM
It's less of a lift than the code makes it appear, as I've already written this once in my experimental personal project using Kaitai. This is mostly a porting exercise. And LLMs have gotten quite good at being able to handle the repetitive and tedious parts where there are obvious patterns being repeated. :) I got the Alias, Imports, Exports, and Core Module sections almost completed as well this weekend...will have another PR up for those sometime this week, just want to fill out more of the test cases first. After that, the biggest remaining part of the parsing is the Canonical Definitions section. |
Support is added for the type section and recursive parsing of inner components. Parser is now able to pass all cases from wasm-tools/types.wast and wasm-tools/definedtypes.wast, with expected
WasmComponentstructures added for equality assertions in each valid component case.The type section is the largest set of definitions from the spec, so this felt like another good place to pause and submit as a PR. Classes and their corresponding parser methods have been added to support all of the type section for the sake of thoroughness, but not all of them are being exercised by the parser tests yet. That should come naturally as I continue to work through more of the spec test suite.