Skip to content

Add support for writing HTML literals using UTF-8 strings#13052

Closed
chsienki wants to merge 9 commits into
mainfrom
chsienki/utf8-html-literals-refactor
Closed

Add support for writing HTML literals using UTF-8 strings#13052
chsienki wants to merge 9 commits into
mainfrom
chsienki/utf8-html-literals-refactor

Conversation

@chsienki

@chsienki chsienki commented Apr 15, 2026

Copy link
Copy Markdown
Member

Summary

Builds on #12848 by @DamianEdwards, refactoring the UTF-8 HTML literal detection to use a pipeline-friendly pre-computed map approach.

When a .cshtml page's @inherits base class has a callable WriteLiteral(ReadOnlySpan<byte>) overload, HTML literals are emitted as C# UTF-8 string literals (\"...\"u8), enabling direct binding to the byte-span overload and avoiding UTF-16→UTF-8 transcoding at runtime.

Key changes from the original PR

  • Pre-computed Utf8SupportMap instead of per-file probe compilations — the source generator extracts @inherits base type names from parsed syntax trees, combines with the declaration compilation to build a value-comparable map, and passes it to ProcessRemaining
  • No compilation reference in the project engine — the map flows through the incremental pipeline as pure data with value equality, so downstream stages only re-run when UTF-8 support results actually change
  • IUtf8WriteLiteralFeature engine feature with DefaultUtf8WriteLiteralFeature implementation backed by the map
  • Moved Utf8WriteLiteralDetectionPass to CSharp namespace

Tests

  • End-to-end source generator tests with baseline verification (u8 vs string literals)
  • Mixed files: two .cshtml files with different @inherits, only one uses UTF-8
  • Incremental switching: overload added → u8, removed → string, in a single test
  • No @inherits directive → string literals (default base class)
  • MVC integration tests for @inherits detection

Closes #8429

Implement auto-detection of UTF-8 WriteLiteral support for legacy .cshtml
code generation. When a page's @inherits base class has a callable
WriteLiteral(ReadOnlySpan<byte>) overload, HTML literals are emitted as
C# UTF-8 string literals ("..."u8).
@chsienki chsienki requested a review from a team as a code owner April 15, 2026 04:14
- FullyQualifiedInherits: namespaced type with fully-qualified @inherits
- ShortNameInherits_WithUsing: documents that short names don't resolve
  for UTF-8 detection (GetTypeByMetadataName requires full qualification)
- PartiallyQualifiedInherits: documents partial names don't resolve
- SwitchesWhenOverloadAddedOrRemoved: uses fresh drivers per edit step

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@chsienki chsienki marked this pull request as draft April 15, 2026 04:31
@chsienki

Copy link
Copy Markdown
Member Author

Just realized a flaw in this I need to fix. No need to review yet, I'll ping when its ready.


// Build a map of which @inherits base types support UTF-8 WriteLiteral.
var utf8SupportMap = parsedDocuments
.Select(static (item, _) => item.Item3.CodeDocument.GetInheritsDirectiveContent())

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to be horribly annoying, couldn't two different files have the same @inherits content, but different base types, due to @using directives?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think that's possible. Given that we need to fully resolve the type anyway, I'll use the assembly qualified type name as the key so they can't clash.

@davidwengier

Copy link
Copy Markdown
Member

No need to review yet

Too late! 😛

Though I think, based on the commit message, you found the same issue I did anyway :)

chsienki and others added 3 commits April 15, 2026 15:54
- Add GetInheritsDirectiveContent and GetUsingDirectives extension methods
  on RazorCodeDocument for extracting @inherits and @using directives
- Resolve short/aliased type names via augmented compilation with the
  document's @using directives when GetTypeByMetadataName fails
- Dual-lookup Utf8SupportMap: per-file (filePath -> FQN) + per-type
  (FQN -> bool) to handle same @inherits text resolving differently
- Use GetFullName() for metadata name formatting
- Call HasCallableUtf8WriteLiteralOverload via string overload to avoid
  cross-compilation symbol issues
- Add InheritsInfo nested record on DefaultUtf8WriteLiteralFeature
- Tests: short name with @using, alias via _ViewImports, file-level
  alias, alias shadowing (CS0576 graceful fallback), fully-qualified

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Build one probe syntax tree with namespace-scoped usings for all entries
that need resolution, instead of creating a separate augmented compilation
per entry. This reduces O(N) AddSyntaxTrees calls to O(1).

- Two-pass Create: fast path via GetTypeByMetadataName, then batch slow path
- ResolveTypeNamesWithUsings takes CSharpCompilation directly
- Split pipeline: extract @inherits first, then usings only for files that need it
- Rename GetInheritsDirectiveContent to GetInheritsDirectiveValue
- Make InheritsInfo fields non-nullable

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@chsienki chsienki marked this pull request as ready for review April 17, 2026 18:18
GetInheritsDirectiveValue() now searches import syntax trees when the
main document has no @inherits directive. The most specific _ViewImports
wins, and the page's own @inherits overrides everything.

Added tests for @inherits in _ViewImports (global and namespaced types)
and cascading _ViewImports with override precedence.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Comment on lines +69 to +71
private readonly ImmutableSortedDictionary<string, string> _fileToType;
// fully-qualified type name -> supports UTF-8
private readonly ImmutableSortedDictionary<string, bool> _typeSupport;

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since these are only written to once, I wonder if a Dictionary might be better, for faster lookup.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need it for the equality checks. Dictionary doesn't have an ordering so you can't easily compare across two different iterations of the generator.

var results = new List<(int, string)>();

// Build a single probe tree with namespace-scoped usings for each entry.
var sb = new StringBuilder();

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pool?

CancellationToken cancellationToken)
{
if (_utf8Feature is null ||
!codeDocument.FileKind.IsLegacy() ||

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, as per my comment below in the resolved thread, is this restriction necessary? Feels like only supporting cshtml and needing an explicit @inherit directive are just making the feature harder to use without any benefit. Are we worried the ASP.NET base classes will add a compatible overload that doesn't mean "write a literal"?

@DamianEdwards DamianEdwards Apr 18, 2026

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we worried the ASP.NET base classes will add a compatible overload that doesn't mean "write a literal"?

I'm certainly not. I do agree it feels restrictive to require that a explicit @inherits is present rather than just the default base class for the view type (MVC view or Razor Page). While we could relax this in the future if and when dotnet/aspnetcore#65605 is done, I'm not sure what benefit the guard based on @inherits adds.

As for support in .razor files, that's a whole other world as the rendering model of Razor Components is fundamentally different to that of .cshtml in that it's inherently delayed rendering, not immediate, and as such isn't a good fit for ReadOnlySpan<byte>. Not to say .razor couldn't be updated to realize a benefit, but the mechanics are just completely different, i.e. mapping a literal span to a write method vs. a method that inserts a node in the render tree.

In both the case of MVC and Blazor, it's almost certain that the UTF8 literal would need to be passed as ref via ReadOnlyMemory<byte> instead of a span and we'd need to do more work in the Razor Compiler to emit code that suits that in the most optimal (and safe) way.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@davidwengier Apologies I missed that (requiring an explicit @inherits) comment on the previous thread.

We certainly don't have to limit to the inherits. I just wanted to start with the smallest change possible that we can open up in the future. Right now, we know that the default base class doesn't support it, so it didn't seem worth checking it. I put in a comment a couple lines down saying that in effect.

If we update the base classes to support it, it would need a new TFM, which would need a new SDK, which would mean a newer generator, so I don't think there is a scenario where the base class would get enabled and a user couldn't pick it up.

That being said if we want to just start checking it upfront we can do so.

@DamianEdwards

Copy link
Copy Markdown
Member

@chsienki tried this out with Razor Slices and it works great! Found a bug in my existing code that pre-empted this feature too 😄

@DamianEdwards

DamianEdwards commented Apr 22, 2026

Copy link
Copy Markdown
Member

@chsienki OK I think I found a bug, views that inherit from generic base classes with a WriteLiteral(ReadOnlySpan<byte>) method trigger the detection and thus don't get UTF8 literals.

Can be worked around by putting a non-generic type in the inheritance chain (aspnet/Benchmarks@cc5ab3f)

chsienki and others added 3 commits April 23, 2026 10:06
The slow path for resolving @inherits type names previously skipped
entries with no Razor @using directives. Since .cshtml files always
have default MVC imports, this filter was ineffective. Removing it
ensures types resolvable via C# global usings or the compilation's
existing context are not missed.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The slow path now uses GetFullMetadataName() which builds a proper CLR
metadata name (with backtick arity for generics and + for nested types)
instead of GetFullName() which produces C# display syntax that cannot
be resolved by GetTypeByMetadataName.

Added tests for generic base classes (single and multiple type params),
generics in namespaces, nested generics, and generics from metadata
references.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@chsienki chsienki requested a review from davidwengier April 23, 2026 18:17
@chsienki

Copy link
Copy Markdown
Member Author

Superseded by dotnet/roslyn#83457 -- ported onto dotnet/roslyn after the razor -> roslyn repo merge.

@chsienki chsienki closed this Apr 28, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add ability to opt-in to HTML literals being written as UTF8 string literals in generated class files

3 participants