Skip to content

Optimize JSON codec#1148

Merged
adwsingh merged 49 commits intomainfrom
adwsingh/json-perf
Apr 14, 2026
Merged

Optimize JSON codec#1148
adwsingh merged 49 commits intomainfrom
adwsingh/json-perf

Conversation

@adwsingh
Copy link
Copy Markdown
Contributor

@adwsingh adwsingh commented Apr 9, 2026

  • Better Json benchmarks
  • Reuse JsonMapSerializer instance instead of allocating per writeMap()
  • Override serialize() in JsonCodec with larger initial buffer
  • Close deserializer in deserializeShape to enable Jackson buffer recycling
  • Pre-compute SerializableString per Schema member for field name writes
  • Cache resolved TimestampFormatter per Schema in UseTimestampFormatTrait
  • Optimize fieldToMember cache with get-before-compute pattern
  • Replace InterceptingSerializer with flat struct serializer
  • Add Schema extension system with SPI and migrate JSON caches
  • Use bounded recycler pool for Jackson buffer recycling
  • Remove FAST_DOUBLE_WRITER
  • Add some more thread safety guards in DeferredRootSchema

Issue #, if available:

Description of changes:

Benchmark                      (testCase)  Mode   Baseline                Jackson                Improvement  Faster    Smithy                 Improvement  Faster
  JsonBench.serialize                SIMPLE  avgt    445.018 ±    5.086     391.184 ±    4.997      -12.1%     1.14x      167.105 ±    5.994      -62.5%     2.66x
  JsonBench.serialize               COMPLEX  avgt   5235.102 ±  614.972    4872.618 ±   49.656       -6.9%     1.07x     2750.038 ±   63.010      -47.5%     1.90x
  JsonBench.deserialize              SIMPLE  avgt   1183.312 ±   31.967     631.371 ±   13.215      -46.7%     1.87x      207.856 ±    2.746      -82.4%     5.69x
  JsonBench.deserialize             COMPLEX  avgt  14428.343 ± 2318.861   10848.429 ±  184.464      -24.8%     1.33x     5177.113 ±  136.239      -64.1%     2.79x
  JsonBench.deserializeReversed      SIMPLE  avgt   1121.856 ±    5.531     637.508 ±   14.086      -43.2%     1.76x      305.524 ±    3.409      -72.8%     3.67x
  JsonBench.deserializeReversed     COMPLEX  avgt  13994.261 ±  255.641   11196.554 ± 1930.630      -20.0%     1.25x     5938.104 ±  145.732      -57.6%     2.36x
  JsonBench.roundtrip                SIMPLE  avgt   3372.301 ±   72.620    1024.011 ±   12.481      -69.6%     3.29x      388.857 ±    3.317      -88.5%     8.67x
  JsonBench.roundtrip               COMPLEX  avgt  22271.692 ±  220.388   16723.857 ±  406.419      -24.9%     1.33x     8373.092 ± 1043.459      -62.4%     2.66x

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@adwsingh adwsingh force-pushed the adwsingh/json-perf branch from 032a008 to b6316c8 Compare April 9, 2026 15:51
@adwsingh adwsingh requested a review from mtdowling April 9, 2026 16:38
@timocov
Copy link
Copy Markdown
Contributor

timocov commented Apr 9, 2026

Hey @adwsingh, I have a couple of questions:

  • does it mean that with this change it would be possible for custom integrations using json-codec (e.g. via http-binding module) to override certain things like a timestamp formatter (e.g. provide custom implementations for built-in formats or add custom formats if needed)?
  • it seems that the order of SchemaExtensionProviders matters i.e. whatever provider returned a non-null value first wins? If so, how (if any) it would be possible to change it so that a custom provider has precedence?
  • with the caching of certain things internally, what would be your suggestion for custom integrations to avoid doing to prevent issues? for instance, in order to implement a custom serde for alloy#discriminated I had to 1) create a another schema with just a "type: String" member (where type is based on whatever value was provided to the trait) 2) override serilization of each generated union sub-type to be "serilize type; serilize members" instead of just "serialize struct" 3) override deserialization of the whole union to be something like "read document; check the type; based on type deserialize members of its sub-type". Is it expected that something in such a flow might break?

@adwsingh
Copy link
Copy Markdown
Contributor Author

@timocov great questions.

I can answer the first two.

does it mean that with this change it would be possible for custom integrations using json-codec (e.g. via http-binding module) to override certain things like a timestamp formatter (e.g. provide custom implementations for built-in formats or add custom formats if needed)?

Possibly, but that was not really the main motivation behind Schema Extensions.

What I have been running into more and more is the need to keep caches keyed by Schema, and then look them up multiple times within a single request. Schema Extensions are meant to address that by letting us compute data that depends only on the Schema once, and then attach it directly to the Schema itself.

it seems that the order of SchemaExtensionProviders matters i.e. whatever provider returned a non-null value first wins? If so, how (if any) it would be possible to change it so that a custom provider has precedence?

I do not think so. Each SchemaExtensionProvider uses its own key to store a value on the Schema, and two SchemaExtensionKeys cannot have the same id.

@adwsingh adwsingh force-pushed the adwsingh/json-perf branch from f21b56a to 50aa1c3 Compare April 10, 2026 10:22
@adwsingh adwsingh force-pushed the adwsingh/json-perf branch from 50aa1c3 to 14a678e Compare April 14, 2026 01:27
@adwsingh adwsingh force-pushed the adwsingh/json-perf branch from 14a678e to 9930568 Compare April 14, 2026 02:00
@adwsingh adwsingh force-pushed the adwsingh/json-perf branch from 15aa1a7 to 9377301 Compare April 14, 2026 03:38
import java.util.concurrent.atomic.AtomicReferenceArray;

/**
* Virtual-thread-friendly buffer pool for JSON serialization.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You say it's virtual thread friendly here, but the next paragraph says you don't use pooling in vts. IME, buffer pooling is still useful even with VTs though (e.g., in the vt HTTP client I made). Did you benchmark this and see otherwise?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its VT friendly in the sense that it doesn't cause a leak for VTs. We can't use this for VTs in its current form because of reliance on Thread Ids. We could have something like a ConcurrentLinkedDeque like Jackson's RecyclerPool but that is in general less performant on platform threads.

I am adding a TODO to better handle VTs.

Comment thread core/src/main/java/software/amazon/smithy/java/core/schema/Schema.java Outdated
@adwsingh adwsingh force-pushed the adwsingh/json-perf branch from 1f3df85 to 8d3571c Compare April 14, 2026 19:54
@adwsingh adwsingh force-pushed the adwsingh/json-perf branch from 8d3571c to 7178198 Compare April 14, 2026 20:00
@adwsingh adwsingh enabled auto-merge (squash) April 14, 2026 20:11
@adwsingh adwsingh force-pushed the adwsingh/json-perf branch from 8350031 to c898175 Compare April 14, 2026 20:21
@adwsingh adwsingh merged commit 191e35c into main Apr 14, 2026
4 checks passed
@adwsingh adwsingh deleted the adwsingh/json-perf branch April 14, 2026 20:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants