8385651: HotCodeSampler crashes with JFR enabled by chadrako · Pull Request #31326 · openjdk/jdk

chadrako · 2026-05-29T20:38:38Z

When running with -XX:StartFlightRecording -XX:+HotCodeHeap the JVM crashes when both samplers attempt to suspend the same Java thread. By holding Threads_lock the HotCodeSampler is able to coordinate with other samplers. This change only effects the HotCodeCollector which is currently experimental so the change is low risk. All hot code tests under test/hotspot/jtreg/compiler/hotcode/ pass

I confirm that I make this contribution in accordance with the OpenJDK Interim AI Policy.

Progress

Change must not contain extraneous whitespace
Commit message must refer to an issue
Change must be properly reviewed (2 reviews required, with at least 1 Reviewer, 1 Author)

Issue

JDK-8385651: HotCodeSampler crashes with JFR enabled (Bug - P4)

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/31326/head:pull/31326
$ git checkout pull/31326

Update a local copy of the PR:
$ git checkout pull/31326
$ git pull https://git.openjdk.org/jdk.git pull/31326/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 31326

View PR using the GUI difftool:
$ git pr show -t 31326

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/31326.diff

Using Webrev

Link to Webrev Comment

bridgekeeper · 2026-05-29T20:39:49Z

👋 Welcome back crakoczy! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

openjdk · 2026-05-29T20:40:41Z

❗ This change is not yet ready to be integrated.
See the Progress checklist in the description for automated requirements.

openjdk · 2026-05-29T20:41:33Z

@chadrako The following label will be automatically applied to this pull request:

hotspot

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

openjdk · 2026-05-29T20:41:36Z

The total number of required reviews for this PR has been set to 2 based on the presence of this label: hotspot. This can be overridden with the /reviewers command.

mlbridge · 2026-05-29T20:51:53Z

Webrevs

01: Full - Incremental (81e0165a)
00: Full (75b3b049)

mgronlun · 2026-05-29T21:28:04Z

What is the HotCode sampler? How does it interact with JFR in ways that make the JVM crash?

Can you provide more details in the bug (full hs_err, for example)

mgronlun · 2026-05-29T21:28:22Z

/label add hotspot-jfr

openjdk · 2026-05-29T21:29:56Z

@mgronlun
The hotspot-jfr label was successfully added.

chadrako · 2026-05-29T22:40:57Z

What is the HotCode sampler?

The HotCodeSampler was added in JDK-8326205: Grouping frequently called C2 nmethods in CodeCache. It is responsible for sampling Java threads to find the hottest nmethods which are then grouped into a dedicated hot code heap.

How does it interact with JFR in ways that make the JVM crash?

JFR and the HotCodeSampler both use SuspendedThreadTask to sample the thread. JFR holds Threads_lock when it does this. HotCodeSampler does not. This can lead to a race where one tries to suspend an already suspended thread.

eastig · 2026-05-30T20:25:38Z

- * @bug 8383605
- * @summary test some out of bounds parameters for createInterleavedRaster()
+ * @summary Verify the HotCodeSampler and JFR do not attempt to suspend the same JavaThread and crash
+ * @requires vm.compiler2.enabled


* @requires vm.compiler2.enabled & vm.hasJFR

eastig · 2026-05-30T20:34:41Z


-/**
+/*
 * @test


missing @bug 8385651

mgronlun · 2026-05-31T09:30:10Z

What is the HotCode sampler?

The HotCodeSampler was added in JDK-8326205: Grouping frequently called C2 nmethods in CodeCache. It is responsible for sampling Java threads to find the hottest nmethods which are then grouped into a dedicated hot code heap.

How does it interact with JFR in ways that make the JVM crash?

JFR and the HotCodeSampler both use SuspendedThreadTask to sample the thread. JFR holds Threads_lock when it does this. HotCodeSampler does not. This can lead to a race where one tries to suspend an already suspended thread.

The JVM already has two profilers that report which code is hot: the default, out-of-the-box, “wall-clock” JFR sampler, and the newly introduced, experimental JFR CPUTime sampler.

Introducing a third “Hot Code” sampler is gratuitous. Code is duplicated, as it's clearly a copy of the default JFR sampler. Because it uses the same internal mechanisms as the default JFR sampler, the two samplers share resources and therefore cannot be run simultaneously—they are mutually exclusive. As a consequence, and most importantly, the runtime performance overhead of sampling is now duplicated: two samplers do the same, or at least very similar, work twice.

If you are interested in reporting what code is hot, I suggest you instead hook into the existing JFR samplers that already do this work and let them report back. If you are only interested in the top frame, you could have a callback placed at a single location:

jdk/src/hotspot/share/jfr/periodic/sampling/jfrThreadSampling.cpp

Line 188 in d3073b5

nmethod* const sampled_nm = sampled_cb->as_nmethod_or_null();

On the receiving end, using a ConcurrentHashTable, you will have your histogram immediately.

eastig · 2026-05-31T21:34:24Z

The JVM already has two profilers that report which code is hot: the default, out-of-the-box, “wall-clock” JFR sampler, and the newly introduced, experimental JFR CPUTime sampler.

Introducing a third “Hot Code” sampler is gratuitous. Code is duplicated, as it's clearly a copy of the default JFR sampler. Because it uses the same internal mechanisms as the default JFR sampler, the two samplers share resources and therefore cannot be run simultaneously—they are mutually exclusive. As a consequence, and most importantly, the runtime performance overhead of sampling is now duplicated: two samplers do the same, or at least very similar, work twice.

We had a good reason for the “Hot Code” sampler. JFR is designed to collect data and to provide it to an external observer. This is its main purpose. It is designed for observability use cases as a lightweight tool. I don't think its design considered internal uses.

The sampler is very simple. It does not have much code, less than 100 lines. It is not designed to provide data outside JVM. It does not need to do much work JFR does, e.g. constructing frames or traversing stacks. We don't want to pay for something we don't use. If I am correct, JFR will cost us 1-2% overhead.

If you are interested in reporting what code is hot, I suggest you instead hook into the existing JFR samplers that already do this work and let them report back. If you are only interested in the top frame, you could have a callback placed at a single location:

Thank you for the suggestion. I think the HotCodeCollector might detect JFR running and use it as the sampler instead.

We really want to have some sampling engine HotCodeCollector and JFR both can use. Is it possible to extract sampling functionality? I don't think hooking into JFR might be a good idea.

The HotCodeCollector is experimental. We can replace the current sampler with the better one at any time.

mgronlun · 2026-06-01T09:39:40Z

We had a good reason for the “Hot Code” sampler. JFR is designed to collect data and to provide it to an external observer. This is its main purpose. It is designed for observability use cases as a lightweight tool. I don't think its design considered internal uses.

The sampler is very simple. It does not have much code, less than 100 lines. It is not designed to provide data outside JVM. It does not need to do much work JFR does, e.g. constructing frames or traversing stacks. We don't want to pay for something we don't use. If I am correct, JFR will cost us 1-2% overhead.

But instead you will have the end user pay twice for two samplers that essentially produce the same data?

Those JFR ballpark numbers come when you are running with the full default.jfc and the profile.jfc configurations, respectively, but JFR is highly configurable; here, you only need to enable a single event. If you are concerned, you can also reduce the length of the stack traces to 1, like so:

-XX:FlightRecorderOptions=stackdepth=1 -XX:StartFlightRecording:settings=none,+jdk.ExecutionSample#enabled=true,+jdk.ExecutionSample#period=1ms

If you are interested in reporting what code is hot, I suggest you instead hook into the existing JFR samplers that already do this work and let them report back. If you are only interested in the top frame, you could have a callback placed at a single location:

Thank you for the suggestion. I think the HotCodeCollector might detect JFR running and use it as the sampler instead.

We really want to have some sampling engine HotCodeCollector and JFR both can use. Is it possible to extract sampling functionality? I don't think hooking into JFR might be a good idea.

The HotCodeCollector is experimental. We can replace the current sampler with the better one at any time.

The main problem I have is that the two implementations are mutually exclusive and so cannot run concurrently. JFR needs to wait, and this sampler needs to wait for JFR. I don't see any refactoring of sampler functionality in the near future. The Threads_lock is really too coarse to hold during sampling, and attempts have been made to remove it; see JDK-8358429. The problem with holding it during sampling is that it stalls safe pointing. That is one of the reasons the JFR samples only a subset of threads, whereas I see your sampler samples the entire set, which can take some time to complete.

Taking a step back, is the 'hot code' functionality really best addressed through coarse-grained sampling? Isn't this better addressed internally, using something like MethodData and the compiler's internals? What is lacking there? There is no tracking of method invocations for methods compiled at tier 4?

Hold Threads_lock to fix crash when running with JFR

75b3b04

openjdk Bot added the hotspot hotspot-dev@openjdk.org label May 29, 2026

chadrako marked this pull request as ready for review May 29, 2026 20:45

openjdk Bot added the rfr Pull request is ready for review label May 29, 2026

Remove unneeded JVM arg

81e0165

openjdk Bot added the hotspot-jfr hotspot-jfr-dev@openjdk.org label May 29, 2026

eastig reviewed May 30, 2026

View reviewed changes

Comment thread test/hotspot/jtreg/compiler/hotcode/HotCodeCollectorJFR.java

/**

/*

* @test

Copy link
Copy Markdown

Member

eastig May 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing @bug 8385651

Conversation

chadrako commented May 29, 2026 • edited by openjdk Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Progress

Issue

Reviewing

Uh oh!

bridgekeeper Bot commented May 29, 2026

Uh oh!

openjdk Bot commented May 29, 2026

Uh oh!

openjdk Bot commented May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

openjdk Bot commented May 29, 2026

Uh oh!

mlbridge Bot commented May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Webrevs

Uh oh!

mgronlun commented May 29, 2026

Uh oh!

mgronlun commented May 29, 2026

Uh oh!

openjdk Bot commented May 29, 2026

Uh oh!

chadrako commented May 29, 2026

Uh oh!

eastig May 30, 2026

Choose a reason for hiding this comment

Uh oh!

eastig May 30, 2026

Choose a reason for hiding this comment

Uh oh!

mgronlun commented May 31, 2026

Uh oh!

eastig commented May 31, 2026

Uh oh!

mgronlun commented Jun 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

3 participants

chadrako commented May 29, 2026 •

edited by openjdk Bot

Loading

openjdk Bot commented May 29, 2026 •

edited

Loading

mlbridge Bot commented May 29, 2026 •

edited

Loading

mgronlun commented Jun 1, 2026 •

edited

Loading