Skip to content

Build out component type section parsing#10

Merged
andreaTP merged 3 commits into
roastedroot:mainfrom
jeremyg484:type-section-parsing
Jun 29, 2026
Merged

Build out component type section parsing#10
andreaTP merged 3 commits into
roastedroot:mainfrom
jeremyg484:type-section-parsing

Conversation

@jeremyg484

@jeremyg484 jeremyg484 commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

Support is added for the type section and recursive parsing of inner components. Parser is now able to pass all cases from wasm-tools/types.wast and wasm-tools/definedtypes.wast, with expected WasmComponent structures added for equality assertions in each valid component case.

The type section is the largest set of definitions from the spec, so this felt like another good place to pause and submit as a PR. Classes and their corresponding parser methods have been added to support all of the type section for the sake of thoroughness, but not all of them are being exercised by the parser tests yet. That should come naturally as I continue to work through more of the spec test suite.

Support is added for the majority of the <type> section and recursive
parsing of inner components. Parser is now able to pass all cases from
wasm-tools/types.wast and wasm-tools/definedtypes.wast, with expected
WasmComponent structures added for equality assertions in each valid
component case.

@andreaTP andreaTP left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Skimmed through and looks again a very good step forward.

To effectively review this one, can I ask a direct link to the spec you are following?

.build();
}

private static WasmComponent case29() {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

case29?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea here is to tie a given expected structure to a particular .wast test case. When wasm-tools json-from-wast is run, it splits each individual test case out into its own .wasm binary, such as input.29.wasm. These class and method names are meant to line up with the individual cases.

I think originally I was using the test case line numbers that are also part of the json output, but one minor change to a preceding case in the .wast could throw those numbers off and break things. There is still the chance of a case being added somewhere in the middle that would also throw off the case numbers, but this should be more stable overall. In my observation of the spec repo, these cases are all fairly stable at this point. Long term it could be better to have a more dynamic approach like what Endive has done, but this is working well for now.

}
}

private static final class WasmToolsDefinedTypesAssertions {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this intended to be a separate class?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. For the sake of organization, I've taken the approach of creating these *Assertions classes that each line up with a specific .wast file. This class contains the assertions that match up to spec-tests/wasm-tools/definedtypes.wast, etc. As I continue building these out, I'll probably start moving them into top level classes in the test source set to make them easier to track down and to keep ComponentParserTests itself from getting too huge.

assertThat(actualComponent)
.usingRecursiveComparison()
.ignoringFields("customSections")
.ignoringFieldsMatchingRegexes(".*customSection\\.bytes.*")

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

noticing this, I believe that adding(basic) support for custom sections(if they are similar to Wasm core we don't need to "understand" the content) is less work than keeping ignoring them.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I would like to build that out eventually, but it seems less of a priority. Currently they are all being parsed as an UnknownCustomSection, and ignoring the content of the byte[] allows me to easily build out the other parts of the expected class structure.

The ones that I'm ignoring are all from this Name Section part of the spec. To properly build out the expected structure I'd need to populate it with all of the variable names used in the .wast text format of the test.

return sort;
}

public static final class ID {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this an enum?

@jeremyg484 jeremyg484 Jun 26, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It definitely could be, and probably should be? I only did it this way to try to stay consistent with some of the code in the main Endive parser such as ValType, but as I look at that code some more I'm seeing there are probably some special reasons as for why ValType.ID is not itself an enum.

@jeremyg484 jeremyg484 Jun 26, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went ahead and did the conversion here and in DefValType to enums. I also renamed ID to Kind, which is a slightly more accurate name and is consistent with some of the other added classes such as ExternDesc (which I am considering breaking up into a class hierarchy the way I have with Alias and DefValType).

this.id = id;
}

public static final class ID {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should it be an enum?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above.

@jeremyg484

jeremyg484 commented Jun 26, 2026

Copy link
Copy Markdown
Contributor Author

Skimmed through and looks again a very good step forward.

To effectively review this one, can I ask a direct link to the spec you are following?

Sure!

The two most relevant documents are the Component Model AST Explainer and the Component Model Binary Format Explainer. In building out the parser, I spend probably 85% of the time looking at the Binary.md doc and just refer to the Explainer.md doc when I need more context.

My main focus thus far has been on the <core:type> section (mostly in the previous PR) and <type> section (in this PR). Thus I've been spending most of my time in the Type Definitions section, with just the bare minimum from a few other of the definitions outside of there filled in as needed in order to pass the particular subset of .wast tests that I have chosen thus far.

@andreaTP andreaTP left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an heavy lift! Thanks a lot for taking the time!
LGTM

@andreaTP andreaTP merged commit 2c3d2a4 into roastedroot:main Jun 29, 2026
2 checks passed
@jeremyg484

jeremyg484 commented Jun 29, 2026

Copy link
Copy Markdown
Contributor Author

This is an heavy lift! Thanks a lot for taking the time!

It's less of a lift than the code makes it appear, as I've already written this once in my experimental personal project using Kaitai. This is mostly a porting exercise. And LLMs have gotten quite good at being able to handle the repetitive and tedious parts where there are obvious patterns being repeated. :)

I got the Alias, Imports, Exports, and Core Module sections almost completed as well this weekend...will have another PR up for those sometime this week, just want to fill out more of the test cases first.

After that, the biggest remaining part of the parsing is the Canonical Definitions section.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants