Skip to content

8385651: HotCodeSampler crashes with JFR enabled#31326

Open
chadrako wants to merge 2 commits into
openjdk:masterfrom
chadrako:JDK-8385651
Open

8385651: HotCodeSampler crashes with JFR enabled#31326
chadrako wants to merge 2 commits into
openjdk:masterfrom
chadrako:JDK-8385651

Conversation

@chadrako
Copy link
Copy Markdown
Contributor

@chadrako chadrako commented May 29, 2026

JDK-8385651

When running with -XX:StartFlightRecording -XX:+HotCodeHeap the JVM crashes when both samplers attempt to suspend the same Java thread. By holding Threads_lock the HotCodeSampler is able to coordinate with other samplers. This change only effects the HotCodeCollector which is currently experimental so the change is low risk. All hot code tests under test/hotspot/jtreg/compiler/hotcode/ pass



Progress

  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change must be properly reviewed (2 reviews required, with at least 1 Reviewer, 1 Author)

Issue

  • JDK-8385651: HotCodeSampler crashes with JFR enabled (Bug - P4)

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/31326/head:pull/31326
$ git checkout pull/31326

Update a local copy of the PR:
$ git checkout pull/31326
$ git pull https://git.openjdk.org/jdk.git pull/31326/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 31326

View PR using the GUI difftool:
$ git pr show -t 31326

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/31326.diff

Using Webrev

Link to Webrev Comment

@bridgekeeper
Copy link
Copy Markdown

bridgekeeper Bot commented May 29, 2026

👋 Welcome back crakoczy! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link
Copy Markdown

openjdk Bot commented May 29, 2026

❗ This change is not yet ready to be integrated.
See the Progress checklist in the description for automated requirements.

@openjdk openjdk Bot added the hotspot hotspot-dev@openjdk.org label May 29, 2026
@openjdk
Copy link
Copy Markdown

openjdk Bot commented May 29, 2026

@chadrako The following label will be automatically applied to this pull request:

  • hotspot

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk
Copy link
Copy Markdown

openjdk Bot commented May 29, 2026

The total number of required reviews for this PR has been set to 2 based on the presence of this label: hotspot. This can be overridden with the /reviewers command.

@chadrako chadrako marked this pull request as ready for review May 29, 2026 20:45
@openjdk openjdk Bot added the rfr Pull request is ready for review label May 29, 2026
@mlbridge
Copy link
Copy Markdown

mlbridge Bot commented May 29, 2026

Webrevs

@mgronlun
Copy link
Copy Markdown

What is the HotCode sampler? How does it interact with JFR in ways that make the JVM crash?

Can you provide more details in the bug (full hs_err, for example)

@mgronlun
Copy link
Copy Markdown

/label add hotspot-jfr

@openjdk openjdk Bot added the hotspot-jfr hotspot-jfr-dev@openjdk.org label May 29, 2026
@openjdk
Copy link
Copy Markdown

openjdk Bot commented May 29, 2026

@mgronlun
The hotspot-jfr label was successfully added.

@chadrako
Copy link
Copy Markdown
Contributor Author

What is the HotCode sampler?

The HotCodeSampler was added in JDK-8326205: Grouping frequently called C2 nmethods in CodeCache. It is responsible for sampling Java threads to find the hottest nmethods which are then grouped into a dedicated hot code heap.

How does it interact with JFR in ways that make the JVM crash?

JFR and the HotCodeSampler both use SuspendedThreadTask to sample the thread. JFR holds Threads_lock when it does this. HotCodeSampler does not. This can lead to a race where one tries to suspend an already suspended thread.

* @bug 8383605
* @summary test some out of bounds parameters for createInterleavedRaster()
* @summary Verify the HotCodeSampler and JFR do not attempt to suspend the same JavaThread and crash
* @requires vm.compiler2.enabled
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

* @requires vm.compiler2.enabled & vm.hasJFR


/**
/*
* @test
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing @bug 8385651

@mgronlun
Copy link
Copy Markdown

What is the HotCode sampler?

The HotCodeSampler was added in JDK-8326205: Grouping frequently called C2 nmethods in CodeCache. It is responsible for sampling Java threads to find the hottest nmethods which are then grouped into a dedicated hot code heap.

How does it interact with JFR in ways that make the JVM crash?

JFR and the HotCodeSampler both use SuspendedThreadTask to sample the thread. JFR holds Threads_lock when it does this. HotCodeSampler does not. This can lead to a race where one tries to suspend an already suspended thread.

The JVM already has two profilers that report which code is hot: the default, out-of-the-box, “wall-clock” JFR sampler, and the newly introduced, experimental JFR CPUTime sampler.

Introducing a third “Hot Code” sampler is gratuitous. Code is duplicated, as it's clearly a copy of the default JFR sampler. Because it uses the same internal mechanisms as the default JFR sampler, the two samplers share resources and therefore cannot be run simultaneously—they are mutually exclusive. As a consequence, and most importantly, the runtime performance overhead of sampling is now duplicated: two samplers do the same, or at least very similar, work twice.

If you are interested in reporting what code is hot, I suggest you instead hook into the existing JFR samplers that already do this work and let them report back. If you are only interested in the top frame, you could have a callback placed at a single location:

nmethod* const sampled_nm = sampled_cb->as_nmethod_or_null();

On the receiving end, using a ConcurrentHashTable, you will have your histogram immediately.

@eastig
Copy link
Copy Markdown
Member

eastig commented May 31, 2026

The JVM already has two profilers that report which code is hot: the default, out-of-the-box, “wall-clock” JFR sampler, and the newly introduced, experimental JFR CPUTime sampler.

Introducing a third “Hot Code” sampler is gratuitous. Code is duplicated, as it's clearly a copy of the default JFR sampler. Because it uses the same internal mechanisms as the default JFR sampler, the two samplers share resources and therefore cannot be run simultaneously—they are mutually exclusive. As a consequence, and most importantly, the runtime performance overhead of sampling is now duplicated: two samplers do the same, or at least very similar, work twice.

We had a good reason for the “Hot Code” sampler. JFR is designed to collect data and to provide it to an external observer. This is its main purpose. It is designed for observability use cases as a lightweight tool. I don't think its design considered internal uses.

The sampler is very simple. It does not have much code, less than 100 lines. It is not designed to provide data outside JVM. It does not need to do much work JFR does, e.g. constructing frames or traversing stacks. We don't want to pay for something we don't use. If I am correct, JFR will cost us 1-2% overhead.

If you are interested in reporting what code is hot, I suggest you instead hook into the existing JFR samplers that already do this work and let them report back. If you are only interested in the top frame, you could have a callback placed at a single location:

Thank you for the suggestion. I think the HotCodeCollector might detect JFR running and use it as the sampler instead.

We really want to have some sampling engine HotCodeCollector and JFR both can use. Is it possible to extract sampling functionality? I don't think hooking into JFR might be a good idea.

The HotCodeCollector is experimental. We can replace the current sampler with the better one at any time.

@mgronlun
Copy link
Copy Markdown

mgronlun commented Jun 1, 2026

We had a good reason for the “Hot Code” sampler. JFR is designed to collect data and to provide it to an external observer. This is its main purpose. It is designed for observability use cases as a lightweight tool. I don't think its design considered internal uses.

The sampler is very simple. It does not have much code, less than 100 lines. It is not designed to provide data outside JVM. It does not need to do much work JFR does, e.g. constructing frames or traversing stacks. We don't want to pay for something we don't use. If I am correct, JFR will cost us 1-2% overhead.

But instead you will have the end user pay twice for two samplers that essentially produce the same data?

Those JFR ballpark numbers come when you are running with the full default.jfc and the profile.jfc configurations, respectively, but JFR is highly configurable; here, you only need to enable a single event. If you are concerned, you can also reduce the length of the stack traces to 1, like so:

-XX:FlightRecorderOptions=stackdepth=1 -XX:StartFlightRecording:settings=none,+jdk.ExecutionSample#enabled=true,+jdk.ExecutionSample#period=1ms

If you are interested in reporting what code is hot, I suggest you instead hook into the existing JFR samplers that already do this work and let them report back. If you are only interested in the top frame, you could have a callback placed at a single location:

Thank you for the suggestion. I think the HotCodeCollector might detect JFR running and use it as the sampler instead.

We really want to have some sampling engine HotCodeCollector and JFR both can use. Is it possible to extract sampling functionality? I don't think hooking into JFR might be a good idea.

The HotCodeCollector is experimental. We can replace the current sampler with the better one at any time.

The main problem I have is that the two implementations are mutually exclusive and so cannot run concurrently. JFR needs to wait, and this sampler needs to wait for JFR. I don't see any refactoring of sampler functionality in the near future. The Threads_lock is really too coarse to hold during sampling, and attempts have been made to remove it; see JDK-8358429. The problem with holding it during sampling is that it stalls safe pointing. That is one of the reasons the JFR samples only a subset of threads, whereas I see your sampler samples the entire set, which can take some time to complete.

Taking a step back, is the 'hot code' functionality really best addressed through coarse-grained sampling? Isn't this better addressed internally, using something like MethodData and the compiler's internals? What is lacking there? There is no tracking of method invocations for methods compiled at tier 4?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

hotspot hotspot-dev@openjdk.org hotspot-jfr hotspot-jfr-dev@openjdk.org rfr Pull request is ready for review

Development

Successfully merging this pull request may close these issues.

3 participants