Skip to content

8376561: Rewrite alloc-release-alloc sequences on Windows using VirtualAlloc2#30270

Closed
roberttoyonaga wants to merge 19 commits into
openjdk:masterfrom
roberttoyonaga:JDK-8376561
Closed

8376561: Rewrite alloc-release-alloc sequences on Windows using VirtualAlloc2#30270
roberttoyonaga wants to merge 19 commits into
openjdk:masterfrom
roberttoyonaga:JDK-8376561

Conversation

@roberttoyonaga
Copy link
Copy Markdown
Contributor

@roberttoyonaga roberttoyonaga commented Mar 16, 2026

Motivation and Goal

Previously on Windows, memory could not be (1) reserved then split up and (2) reserved then mapped to. Due to those limitations, there are places in Hotspot where the following race-y pattern happens on Windows:

  1. Reserve a large region in order to get an available address range.
  2. Release the whole region.
  3. Quickly re-reserve/map subregion(s) within the original region.

VirtualAlloc2 placeholders (available since Windows version 1803) removes those 2 limitations by introducing “placeholders”. “Placeholders” allow for reserving, then splitting or mapping into a region. However, they come with a few restrictions:

  • The entire placeholder must be replaced or released. You cannot operate on only part of it. This means you need to split it down to the size you want before mapping or committing into it.
  • There is no good way to check whether a region is in fact a placeholder. VirtualQuery cannot identify placeholder regions.
  • After a placeholder has been replaced, it can be converted back to a placeholder. But this is only possible if the entire placeholder is available (no committed or mapped parts).
  • Large pages are not compatible with placeholders (they are already committed).
  • You cannot assign NUMA nodes to placeholder regions.

The goal is to eliminate the race-y patterns previously required on Windows by using VirtualAlloc2 placeholders. This will require some changes to the os:: memory API and to the caller code in specific locations.

What’s been done in this PR

Infrastructure for dynamic loading of Windows memory API

The Windows API functions are dynamically loaded at startup at the end of os::init_2 in os_windows.cpp. This is when os::win32::VirtualAlloc2 and os::win32::MapViewOfFile3 are prepared. If the loading fails, the Windows implementation falls back to the old racey approach.

lookup_kernelbase_library and lookup_kernelbase_symbol have been moved from zSyscall_windows.cpp to os::win32:: to reduce code duplication.

Additions to the os:: memory API

os::reserve_placeholder_memory , os::split_memory , and os::convert_to_reserved have been introduced. See “API design” section for more details.

Removed the race in map_or_reserve_memory_aligned in os_windows.cpp

Previously, map_or_reserve_memory_aligned over-reserved a large range in order to get a range of available addresses, released it, then rounded up to the requested alignment and re-reserved an aligned subrange of the original range. On POSIX, mmap allows you to over-reserve then chop off the front of the reservation at an aligned address. This was not possible on windows.

Now, VirtualAlloc2 and MapViewOfFile3 allow us to request an alignment directly.

Remove race when reserving on NUMA nodes

Previously, if NUMA interleaving was enabled, pd_attempt_reserve_memory_at needed to call allocate_pages_individually. This resulted in reserving the desired range, releasing the whole range, then re-reserving subranges assigned one-by-one to a round robin of NUMA nodes.

Now, when VirtualAlloc2 is available, we can avoid that racy pattern by using os::reserve_with_numa_placeholder. This uses a placeholder to reserve the overall range, then uses os::split_memory to divide up the region (while everything remains reserved). The placeholder chunks are then replaced by real reservations assigned to the NUMA node round robin.

Deciding between 2 Approaches to the API

Two approaches were considered:
(A): Make all reserved regions splittable by default.
(B): Draw a distinction between placeholder regions and regular reserved regions.

(A) Seems to make the API surface much cleaner, but the limitations of Windows placeholders make the underlying implementation much more complex and end up imposing implicit rules about how callers can interact with the memory. So although the API surface appears cleaner, the actual API usage ends up being complex and not well defined.

(B) Increases the API surface because we need os::reserve_placeholder_memory and os::convert_to_reserved in addition to os::reserve_memory. However, this approach forms a clear distinction between placeholder regions and regular reserved regions. The API usages and intents are clear and well defined.

Please see my comment on https://bugs.openjdk.org/browse/JDK-8376561 explaining why (B) is the better approach in more detail.

The API Design

image

In general, callers will make the following calls in order:

PlaceholderRegion os::reserve_placeholder_memory(size_t bytes, MemTag mem_tag, bool executable, char* addr)

PlaceholderRegion os::split_memory(PlaceholderRegion& region, size_t offset)

char* os::convert_to_reserved(PlaceholderRegion region)

A placeholder is functionally distinct from a regular reserved region. It has both limitations and abilities that normal reserved regions don’t have. Designing the API functions to accept/return the PlaceholderRegion instance instead of a raw char* forces callers to treat them differently and enforces a conceptual distinction.

The returned PlaceholderRegion cannot be committed until it is converted to a regular reserved region. This allows os::reserve_memory, os::commit_memory, and os::uncommit_memory to remain unchanged. The pre-existing os:: code does not need to be placeholder-aware. This is very important because of two limitations (1) there is no good way to test whether a region is a placeholder vs regular reserved region (2) placeholders must be replaced in their entirety before being reserved/committed/released. Without the help of os::convert_to_reserved, existing os:: memory functions would need special handling for placeholders. This complicates and slows the common path (where the caller doesn’t care about placeholders).

This API design keeps a clear distinction between the placeholder side of the workflow and the regular reserved region side. We avoid needing to change the whole flow diagram, instead we just add new bits to the start of the diagram. Callers that don’t care about placeholders won’t be affected by them. This is hopefully cleaner and easier to reason about.

Naming

I originally was using the name “SplittableRegion”. I eventually renamed it to “PlaceholderRegion" because, later on, it would be good to use placeholders to replace the reserve-release-map race as well. The name “SplittableRegion” lends itself nicely to the reserve-split-commit workflow, but not so nicely to the reserve-map and reserve-split-map workflow.

Next steps

Use placeholders to resolve file mapping races

Improve attempt_map_memory_to_file_at and map_memory by taking advantage of placeholders. This will allow us to eliminate the reserve → release → map race.

image

Change CDS code to take advantage of placeholders

The CDS code uses the racey reserve → release → map subregion pattern on Windows. This could be replaced with reserve_placeholder memory → map_memory.

Testing

  • Added new tests to os_windows.cpp and test_os.cpp
  • make test TEST=gtest:os
  • make test TEST="test/hotspot/jtreg/runtime/cds"
  • Manual CDS test with log inspection
  • Tier 1
  • Tested on amd64 linux and 64 bit Windows. Not tested on AIX


Progress

  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change must be properly reviewed (2 reviews required, with at least 1 Reviewer, 1 Author)

Issue

  • JDK-8376561: Rewrite alloc-release-alloc sequences on Windows using VirtualAlloc2 (Enhancement - P4)

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/30270/head:pull/30270
$ git checkout pull/30270

Update a local copy of the PR:
$ git checkout pull/30270
$ git pull https://git.openjdk.org/jdk.git pull/30270/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 30270

View PR using the GUI difftool:
$ git pr show -t 30270

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/30270.diff

Using Webrev

Link to Webrev Comment

@bridgekeeper
Copy link
Copy Markdown

bridgekeeper Bot commented Mar 16, 2026

👋 Welcome back roberttoyonaga! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link
Copy Markdown

openjdk Bot commented Mar 16, 2026

❗ This change is not yet ready to be integrated.
See the Progress checklist in the description for automated requirements.

@openjdk openjdk Bot changed the title 8376561 8376561: Rewrite alloc-release-alloc sequences on Windows using VirtualAlloc2 Mar 16, 2026
@openjdk openjdk Bot added the hotspot hotspot-dev@openjdk.org label Mar 16, 2026
@openjdk
Copy link
Copy Markdown

openjdk Bot commented Mar 16, 2026

@roberttoyonaga The following label will be automatically applied to this pull request:

  • hotspot

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

test working. need to focus on map

refactor and one more test

new test, comments, fix aix split memory accounting, use unified logging instead of warning()

remove map API code. Rename to PlacholderRegion. Fix os::release

remove map_memory API code

fix race and simplify AOTMetaspace::reserve_address_space_for_archives

clean up

comments

test: map_memory_to_file_aligned_larger

remove old test and modify tests

clean up

fix aix and comments

cleanup

gate fix

anothe rgate fix

fix windows test

account for wish address

fix missing NMT accounting

comments

small cleanup

trailing whitespaces
@roberttoyonaga roberttoyonaga marked this pull request as ready for review March 17, 2026 12:50
@openjdk openjdk Bot added the rfr Pull request is ready for review label Mar 17, 2026
@mlbridge
Copy link
Copy Markdown

mlbridge Bot commented Mar 17, 2026

@tstuefe
Copy link
Copy Markdown
Member

tstuefe commented Mar 17, 2026

Very nice work, and well-written patch description. I will take a look at it when I have time, but it may take a while. @johan-sjolen @afshin-zafari may be interested, since its NMT-relevant.

/reviewers 2

@openjdk
Copy link
Copy Markdown

openjdk Bot commented Mar 17, 2026

@tstuefe
The total number of required reviews for this PR (including the jcheck configuration and the last /reviewers command) is now set to 2 (with at least 1 Reviewer, 1 Author).

Copy link
Copy Markdown
Member

@tstuefe tstuefe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Started reviewing the first part

Comment thread src/hotspot/os/windows/os_windows.cpp Outdated
"Size must be a multiple of allocation granularity (page size)");

// VirtualAlloc2 and MapViewOfFile3 support alignment natively.
// This avoids the race prone retry loop below.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For clarity I would split this into two (internal, static) sub functions, one to use placeholders, one to use conventional technique with over-reserving. Either make them filescope static or, if that is not possible, put them into the os:: class into a private section.

Then, you can also write a fitting comment at the start of the new function, and leave the old implementation's comment unchanged.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, good idea. I have split it up into two functions. The old path now remains the same as before, and a net new path is taken when VirtualAlloc2 is available.

}
}

os::PlaceholderRegion os::pd_reserve_placeholder_memory(size_t bytes, bool exec, char* addr) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Attention @TheRealMDoerr and @JoKern65 :

Here, we must make sure that Java Heap gets always allocated with 64K pages. Otherwise, we may see performance problems on AIX. So, we may need to restrict the use of this API on AIX if it's used for Java Heap memory. For other memory, e.g. CDS/Metaspace, its probably not that important.

I hope we can phase out the SystemV shm coding in the near future. @TheRealMDoerr , how far are we from that point? Which AIX versions that don't support 64K mmap are still in use for modern OpenJDK variants JDK27 and later?


vmembk_add(base, offset, vmi->pagesize, vmi->type);
vmi->addr = base + offset;
vmi->size = region_size - offset;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Throwing out the shm V coding would also allow us to throw out this ancient bookkeeping code. IIRC it long predates OpenJDK.

Comment thread src/hotspot/os/bsd/os_bsd.cpp Outdated
return anon_mmap(nullptr /* addr */, bytes, exec);
}

os::PlaceholderRegion os::pd_reserve_placeholder_memory(size_t bytes, bool exec, char* addr) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mmap usage should behave identically on all Unices, apart from the bookkeeping we need to do on AIX.

I would therefore unify Linux and BSD coding into os_posix.cpp, ifdef it out for AIX, and leave AIX implementation in os_aix.cpp.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea! I have done the suggested refactoring.

Comment thread src/hotspot/os/windows/os_windows.cpp Outdated
char ebuf[1024];
void* const handle = os::dll_load(name, ebuf, sizeof(ebuf));
if (handle == nullptr) {
log_trace(os)("Failed to load library: %s", name);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Print out ebuf, or just pass nullptr, 0 to dll_load. I think its safe to assume that this load will always work.

Comment thread src/hotspot/os/windows/os_windows.cpp Outdated
install_kernelbase_symbol(os::win32::VirtualAlloc2, "VirtualAlloc2");
log_info(os)("VirtualAlloc2 is%s available.", os::win32::VirtualAlloc2 == nullptr ? " not" : "");
install_kernelbase_symbol(os::win32::MapViewOfFile3, "MapViewOfFile3");
log_info(os)("MapViewOfFile3 is%s available.", os::win32::MapViewOfFile3 == nullptr ? " not" : "");
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd use debug level, not info level.

Comment thread src/hotspot/os/windows/os_windows.cpp Outdated

#ifndef MEM_RESERVE_PLACEHOLDER
#define MEM_RESERVE_PLACEHOLDER 0x00040000
#endif
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These always make me itchy, but I guess its safe to assume that the constants will never change. Otherwise, I do something like this:

ifndef X
  define X value
else
  STATIC_ASSERT(x==value)
endif

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After some more thought, I think we can actually remove these ifdefs. All the code paths where these constants are used are guarded by is_VirtualAlloc2_supported(). When VirtualAlloc2 exists, these constants should already be defined. It seems like ZGC also doesn't bother with this.

Copy link
Copy Markdown
Member

@tstuefe tstuefe Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, it was introduced in Windows 10, which is in extended EOL, and nobody uses Windows 7 anymore. So anyone with a (properly updated) W10 should have this (both build and runtime machine)

Even if someone runs this on an old W7 machine to run their old minecraft server - we would just gracefully handle that, right? Since the API is missing, we'd just fall back to the old way of doing things?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even if someone runs this on an old W7 machine to run their old minecraft server - we would just gracefully handle that, right? Since the API is missing, we'd just fall back to the old way of doing things?

Yes, that's right. If VirtualAlloc2 is missing, we don't even need these constants anyway.

Copy link
Copy Markdown
Member

@tstuefe tstuefe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More reviewing, still not complete

char* base() const { return _base; }
size_t size() const { return _size; }
bool is_empty() const { return _base == nullptr; }
};
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would make this a truly immutable object:

char* const _base; const size_t _size;

and give it a proper copy-constructor.

Then, I would change split_region like this:

struct PlaceHolderRegionTuple { PlaceHolderRegion left, right };
bool pd_split_memory(const PlaceHolderRegion& orig, PlaceHolderRegionTuple& result, size_t offset);

or alternatively, just

bool pd_split_memory(const PlaceHolderRegion& orig, PlaceHolderRegion& left, PlaceHolderRegion& right, size_t offset);

or, yet another one:

struct PlaceHolderRegionTuple { PlaceHolderRegion left, right };
// .. blabla
// In case of an error, the returned regions are both empty
PlaceHolderRegionTuple pd_split_memory(const PlaceHolderRegion& orig, size_t offset);

Somewhat cleaner, and easier to mentally grep.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, and have adopted your 3rd suggestion.

Comment thread src/hotspot/share/runtime/os.hpp Outdated
Comment on lines +555 to +556
// (an empty PlaceholderRegion is returned).
// If the returned PlaceholderRegion is empty, the reservation failed.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At this layer its just interesting that it can fail (mmap can fail on POSIX too, so no need to mention Windows specifically) and that this will return an empty region. So I would cut out the Windows specific part.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done! I've removed it

@@ -216,6 +236,22 @@ class os: AllStatic {

static char* pd_reserve_memory(size_t bytes, bool executable);

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need comments above the pd_... in the generic OS header. These are impl specifics. I would instead add some small comments to the relevant parts to the various cpp files.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done! I've removed those comments and added some where appropriate in the platform-dependent code.

Comment thread src/hotspot/os/posix/os_posix.cpp Outdated
}

os::PlaceholderRegion os::pd_split_memory(PlaceholderRegion& region, size_t offset) {
// On POSIX, mmap regions are inherently splittable. Just do bookkeeping.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What bookkeeping? :-)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! I must have copied my comment over from the AIX code and forgot to change that part.


assert(base != nullptr, "Region base cannot be null");
assert(offset > 0, "Offset must be positive");
assert(offset < region_size, "Offset must be less than region size");
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please assert that split point is page aligned

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

Comment thread src/hotspot/os/windows/os_windows.cpp Outdated
res = (char*)virtualAlloc(addr, bytes, MEM_RESERVE, PAGE_READWRITE);
} else {
bool use_numa_interleaving = (UseNUMAInterleaving && !UseLargePages);

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please split the conditions into two:

if (numa)
  if va2 supported
   // numa stuff new way, placeholders
  else
    // numa stuff, old way, racy
else
  // non-numa

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that is clearer. Done!

Comment thread src/hotspot/os/windows/os_windows.cpp Outdated
return os::win32::convert_placeholder_to_reserved(region);
}

// This function is for convenience to help with reserve_with_numa_placeholder.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here, and for other functions that are newly exported as part of the private section of os::win32: could these be file-scope static instead or do they have to live inside os::win32?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously, reserve_with_numa_placeholder was calling os::pd_reserve_placeholder_memory which is private in os::. This is why I had originally put reserve_with_numa_placeholder in os::win32::. convert_placeholder_to_reserved can be made file-scope static without problems.

I've created a helper to solve the access modifier issue and changed both convert_placeholder_to_reserved & reserve_with_numa_placeholder to be file-scope static

Comment thread src/hotspot/os/windows/os_windows.cpp Outdated
return aligned_base;
}

// VirtualAlloc2 and MapViewOfFile3 support alignment natively.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was confused here by the name, since this does not use placeholders but a different feature of the new APIs. I would rename this function to map_or_reserve_memory_aligned_va2.

(Pre-existing, but having two very distinct functionalities (reserving memory and mapping a file into memory) crammed into the same API is an eye-sore.)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point about the naming. I've renamed it now.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And I split up the new version into reserve_memory_aligned_va2 and map_memory_aligned_va2, but kept the old version as a single function (map_or_reserve_memory_aligned). I think old version combines two code paths because they both do the reserve-release-reserve dance and it reduces code duplication to have them combined. But I don't mind splitting up the old version too if you think it's better.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ended up deciding to split up the old version too for the sake of consistency.
So now there's map_memory_aligned, map_memory_aligned_va2, reserve_memory_aligned, and reserve_memory_aligned_va2

Comment thread src/hotspot/os/windows/os_windows.cpp Outdated
}
return aligned_base;
}
log_trace(os)("Aligned allocation via VirtualAlloc2/MapViewOfFile3 failed, falling back to retry loop.");
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Give us a GetLastError printout here

Comment thread src/hotspot/os/windows/os_windows.cpp Outdated

if (res != nullptr) {
if (addr != nullptr && res != addr) {
// Got a different address than requested; release and fail.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this actually happen?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Windows docs just say "If the function succeeds, the return value is the base address of the allocated region of pages." The docs don't mention about retrying at another address if the requested address is not possible. So I think the check here is unecessary and will remove it.

@openjdk openjdk Bot removed the rfr Pull request is ready for review label Apr 9, 2026
@roberttoyonaga
Copy link
Copy Markdown
Contributor Author

/template append

@openjdk
Copy link
Copy Markdown

openjdk Bot commented Apr 10, 2026

@roberttoyonaga The pull request template has been appended to the pull request body

@openjdk openjdk Bot added the rfr Pull request is ready for review label Apr 10, 2026
@roberttoyonaga
Copy link
Copy Markdown
Contributor Author

I decided to drop my changes to AOTMetaspace::reserve_address_space_for_archives for a couple reasons:

  1. I realized that I still needed to keep the old Windows path (which I originally tried to remove) in case an older version of Windows is used. This defeated the purpose of the simplification/unification with the POSIX path I was hoping to achieve.
  2. My changes there didn't actually fix the whole race, just the first part. Fixing the full race would require changes to AOTMetaspace::map_archive as well. I think that's probably best saved for future PR.

@roberttoyonaga
Copy link
Copy Markdown
Contributor Author

Seems like the gate linux-x64-static / test (hs/tier1 compiler part 1) is failing due to a download/unpacking problem.

Unpacking jdk bundle...

gzip: stdin: unexpected end of file
tar: Unexpected EOF in archive
tar: Unexpected EOF in archive
tar: Error is not recoverable: exiting now
Error: Process completed with exit code 2.

It doesn't look like it's related to the changes in this PR.

@roberttoyonaga roberttoyonaga requested a review from tstuefe April 21, 2026 21:01
Copy link
Copy Markdown
Member

@tstuefe tstuefe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay. Its looking pretty good, but I think there is still a remaining issue

Comment thread src/hotspot/os/aix/os_aix.cpp
}
// Do manual alignment
aligned_base = align_up(extra_base, alignment);
os::unmap_memory(extra_base, extra_size);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pre-existing, and possibly for another RFE or not at all. I always stumble on Windows when seeing code that uses os::release_memory and os::unmap_memory, since we only can release mappings at a whole. And in fact, the size parameter is always ignored. So I wonder whether we should just pass -1 or 0 here with a /* ignored */ after it. But would that even compile, or assert somewhere? Alternatively, we could assert that the size specified is the full mapping size in os::release_memory (but then, the size is probably not known at the release point unless one asks the operating system.)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that it's good to keep passing the size parameter in this case. On Windows in pd_release_memory we need the caller to provide a size because the region may actually be made up multiple smaller allocations due to NUMA interleaving. So even though each region is freed entirely, we otherwise wouldn't know how many regions to expect. If running without NUMA, we assert if the actual size freed doesn't match what the caller provided.

It looks like os::unmap_memory ignores the size parameter in most cases, but there is still one code path that reroutes to pd_release_memory and needs the size

while (remaining_len > 0) {
const size_t bytes_to_rq = MIN2(remaining_len, chunk_size - ((size_t)cur % chunk_size));
os::PlaceholderRegion remaining(cur, remaining_len);
os::PlaceholderRegionPair split = os::split_memory(remaining, bytes_to_rq);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the last chunk, bytes_to_rq = remaining_len. This would probably assert later, since in pd_split_memory we check that split offset < region size.

Copy link
Copy Markdown
Contributor Author

@roberttoyonaga roberttoyonaga May 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this code path shouldn't actually reach pd_split_memory. It would be handled early in os::split_memory. I decided to put the logic for when the split consumes the whole region in os::split_memory because it doesn't actually have any platform dependency, we only superficially wrap the return value in a PlaceholderRegionPair without making any OS calls.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right. No, this is good, lets keep it.

// Split 'orig' at 'offset'. Returns left and right placeholder pieces as a PlaceholderRegionPair.
// The caller must not use 'orig' afterward.
// Offset must be page-aligned.
// If offset == orig.size(), returns { orig, empty }.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a comment

// offset has to be > 0

or alternatively return Pair { empty, orig }, mirroring what we do with offset==orig.size.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes that's a good idea. The mirroring makes sense to me intuitively. I've adopted your suggestion.


assert(base != nullptr, "Region base cannot be null.");
assert(offset > 0, "Offset must be positive (nothing to split at 0).");
assert(offset < region_size, "Offset must be less than region size.");
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the assert is wrong, should be <= region_size. Since we allow that, and we would return { orig, empty }

Weird, though, that your NUMA gtest did not pick it up. Can you explain why?

Copy link
Copy Markdown
Contributor Author

@roberttoyonaga roberttoyonaga May 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The assert doesn't complain here because this case is handled in os::split_memory before we reach the assert in pd_split_memory. The way I was reasoning about it is that if we have reached pd_split_memory, then we already know there is some splitting to do (not offset==size or offset==0). What do you think?

Alternatively, I could move handling of those special offset cases into pd_split_memory. This has the benefit of keeping all the logic in one spot, but it would get duplicated across all the pd_ implementations.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, okay, you are right.

// --- Splittable Memory API tests ---

#define SKIP_IF_SPLITTABLE_NOT_SUPPORTED() \
WINDOWS_ONLY(if (os::win32::VirtualAlloc2 == nullptr) GTEST_SKIP() << "VirtualAlloc2 not available";)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: no trailing (), no need to make this a function-like macro.

Copy link
Copy Markdown
Contributor Author

@roberttoyonaga roberttoyonaga May 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed it. Done!

Comment thread test/hotspot/gtest/runtime/test_os_windows.cpp Outdated
Comment thread src/hotspot/os/windows/os_windows.cpp Outdated
const int node_count = numa_node_list_holder.get_count();

while (remaining_len > 0) {
const size_t bytes_to_rq = MIN2(remaining_len, chunk_size - ((size_t)cur % chunk_size));
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Preexisting, but likely UB: casting a pointer directly to size_t. At least go via uintptr_t.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay I have change it to uintptr_t now.

roberttoyonaga and others added 3 commits May 12, 2026 16:58
Co-authored-by: Thomas Stuefe <thomas.stuefe@gmail.com>
Co-authored-by: Thomas Stuefe <thomas.stuefe@gmail.com>
@roberttoyonaga
Copy link
Copy Markdown
Contributor Author

Sorry for the delay. Its looking pretty good, but I think there is still a remaining issue

Thank you for the new review feedback! I've addressed each of your comments.

The Github gate Pre-submit tests - windows-x64 / test - Test (tier1) is failing in the Get the BootJDK step. It seems like CI flakeyness rather than a problem due to the changes in this PR.

Copy link
Copy Markdown
Member

@tstuefe tstuefe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost good. Please add a test to gtest testing the two corner cases of split -> { empty, orig } and split -> { orig, empty }.

Comment thread src/hotspot/os/windows/os_windows.cpp Outdated

assert(base != nullptr, "Region base cannot be null.");
assert(offset > 0, "Offset must be positive (nothing to split at 0).");
assert(offset < region_size, "Offset must be less than region size.");
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, okay, you are right.

while (remaining_len > 0) {
const size_t bytes_to_rq = MIN2(remaining_len, chunk_size - ((size_t)cur % chunk_size));
os::PlaceholderRegion remaining(cur, remaining_len);
os::PlaceholderRegionPair split = os::split_memory(remaining, bytes_to_rq);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right. No, this is good, lets keep it.

@johan-sjolen
Copy link
Copy Markdown
Contributor

Hi,

I've been gone on parental leave and didn't see your ping until now. Unfortunately, I'm currently deep into some other work and cannot prioritize reviewing this for the time being. I looked at the PR description and this seems like a very good change!

@roberttoyonaga
Copy link
Copy Markdown
Contributor Author

Almost good. Please add a test to gtest testing the two corner cases of split -> { empty, orig } and split -> { orig, empty }.

Done! I have added tests for those two special cases.

@tstuefe
Copy link
Copy Markdown
Member

tstuefe commented May 14, 2026

Hi,

I've been gone on parental leave and didn't see your ping until now. Unfortunately, I'm currently deep into some other work and cannot prioritize reviewing this for the time being. I looked at the PR description and this seems like a very good change!

Welcome back, Johan 🙂

Copy link
Copy Markdown
Member

@tstuefe tstuefe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Great work !

@roberttoyonaga
Copy link
Copy Markdown
Contributor Author

Thank you @tstuefe !

Copy link
Copy Markdown
Contributor

@afshin-zafari afshin-zafari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you Robert for this great work. It brings very good features to the virtual memory allocation. I learned that there were restrictions that shaped the API and structures as they are now in the PR. Therefore I made my review points as starting point for further discussions, as my p.o.v may be incomplete. I tried to separate my concerns by bullet-points here, but they are not requests that are necessarily to be Resolved/Implemented individually. Feel free to reply them in any format you like.

  • The 'placeholder' stuff can be internal in Windows related code (os_windows.?pp), since non-Windows platforms do not have such a term/feature.
  • To detect placeholder regions, can we use VirtualQuery and examine the returned State and Type, for example? Placeholders are created with PAGE_NOACCESS flag; can we use this info for detecting? If we cannot detect placeholder regions in any way, then we should not provide public API (os::reserve_placeholder_memory) for creating placeholders.
  • Separate calls for different use-cases of placeholders. Can we have APIs like this?
    reserve_memory_for_splitting(...)
    reserve_memory_for_mapping(...)
    reserve_memory_for_numa(...)
    reserve_memory_for_xxxxxx(...)
    and encapsulate placeholder creation/usage in specific code blocks. As I found, the numa use-case and aligned allocation are already encapsulated.
    This way, we don't have any placeholder region that is not converted to a reserved region. We used placeholder whenever they are needed and then converted them to whatever needed.
  • Internal impl of release_placeholder() is needed.
  • I am confused by the restriction 'You cannot assign NUMA nodes to placeholder regions.' written in PR description while reserve_with_numa_placeholder is implemented and tested. How that restriction should be interpreted?
  • Is it possible that only one of is_MapViewOfFile3_supported and is_VirtualAlloc2_supported be true (or false). They both should have the same state. If they can have different states, what happens for vmems that are reserved by va2 and then not mapped by MapViewOfFile3? Inconsistent calls of one to another. If not, then no need to check them separately. Suggestion, e.g., put both in a single method is_extended_vm_alloc_available().
  • In reserve_with_numa_placeholder, what happens if some of the splitting attempts fail? It is not covered nor tested.
  • I like to see tests and/or asserts for non acceptable paths, such as:
    • double converting to reserved region
    • release-before-convert
    • commit-before-convert
    • missuse of split/convert
    • cleanup after fails
  • It'd be great if we have unit-tests for verifying the log output of various cases.
    • When debug/warning/trace/error levels to be used?
    • Do we have duplicate/redundant outputs or not? For example, reserve_placeholder_memory_helper may fail with one warning and then the reserve_with_numa_placeholder fail afterward with another warning.
  • reserve_placeholder_memory, bytes is to be granular_allocation checked/asserted.
  • The test TEST_VM(os_windows, numa_placeholder_reserve_commit) only uses node-0 for test, since the NUMA node-list is constructed at os::init_2() during vm init. Missing the required vm flag to be asserted as failure or skipped with proper log. One can pass the required vm flag when running the test.
  • VirtualAlloc2 and MapViewOfFile3 are initialized at the end of os::init_2(). This implies that all the virtual memory allocation until this point would go through the native path in Windows and not the extended placeholder feature. It is good to document it somehow to let the developers be aware of.
  • In map_memory_aligned_va2 if file-mapping fails, GetLastError() used in the log_trace line returns the result of CloseHandle() call. Error to be captured before calling CloseHandle().
  • What is the plan for testing on AIX? Could we repeat all corner cases there?
  • PlaceholderRegion to be replaced by ReservedSpace structure and modified if new methods/fields are needed.
  • The suggestions given in the 'Next Steps' section of the PR's description, could be given as separate tickets/PRs and blocked by this ticket. With the current limitations of placeholders we are not sure yet whether we can successfully/effectively implement the suggestions or not.
  • I suggest and prefer to break down this PR to the parts that are cleanly implemented and tested here, like NUMA and aligned memory allocation. This new smaller PR can be integrated independently. Then create another ticket and PR for splittable memory regions and go for review rounds.

@tstuefe
Copy link
Copy Markdown
Member

tstuefe commented May 18, 2026

Hi @afshin-zafari,

Firstly, many thanks a lot for reviewing this! As we all know, reviewing is a ton of work.

As context, my main motivation is to get this in soon, since I want to use it in follow-up PRs I have open, at least in time for JDK 27 code freeze. Ideally, earlier to give this a bit of cooking time.

I commented inline on a few points.

* The 'placeholder' stuff can be internal in Windows related code (`os_windows.?pp`), since non-Windows platforms do not have such a term/feature.

This should be usable from generic code, too, even if that does not happen in this PR. Things like the CDS+Metaspace splitting would be prime candidates for this API, and there are likely others (e.g. in Metaspace). Robert removed the CDS changes from the initial set of changes because the patch got too complex, but it is still planned.

Also remember that having os:: wrappers for stuff that only happens on a single platform follows a tradition in hotspot. Currently, at least the following APIs are only implemented on Linux:
can_commit_large_page_memory realign_memory (for THPs) numa_make_local|global and numa_get_group_ids_for_range, pd_pretouch_memory, os::trim_native_heap. Probably a lot more. The alternatives would be ifdefs in shared code, which is not great either.

* Separate calls for different use-cases of placeholders. Can we have APIs like this?
  `reserve_memory_for_splitting(...)`
  `reserve_memory_for_mapping(...)`
  `reserve_memory_for_numa(...)`
  `reserve_memory_for_xxxxxx(...)`
  and encapsulate placeholder creation/usage in specific code blocks. As I found, the numa use-case and aligned allocation are already encapsulated.
  This way, we don't have any placeholder region that is not converted to a reserved region. We used placeholder whenever they are needed and then converted them to whatever needed.

A use-case-specific splitup of the API would arguably go against the existing pattern, which abstracts OS-abilities, not the intended use. We e.g. don't have separate APIs at os level for reserve_codecache, reserve_heap etc.

It would also be inflexible. What if I want to split NUMA memory? Or file-map something into memory I then split? In fact, this would be one use case of this API (CDS+class-space reservations).

...

* It'd be great if we have unit-tests for verifying the log output of various cases.
  
  * When `debug/warning/trace/error` levels to be used?

+1, but something for a follow-up RFE. We don't seem to have tests normal memory operation logging as well, e.g. os::map_memory. So those would have to be added as well.

* What is the plan for testing on AIX? Could we repeat all corner cases there?

+1, but it would be better in a follow-up RFE, since it affects more gtests than the new ones.

There is an issue on AIX due to the "fake-64k-pages" mode, where we use SystemV shm instead of mmap. I would guess these tests on AIX would run into something like that. No reason not to fix it - but I don't think this is needed in this RFE, and this should be fixed by someone with AIX knowledge.

I'll see if I find time to fix it, or if the AIX guys at SAP can pick it up. But if we wait long enough, the 64K-fake-pages issue will go away on its own anyway because modern AIX versions can use mmap with 64k pages, and then we can just scrap all that AIX-specific stuff and it should just work.

* `PlaceholderRegion` to be replaced by `ReservedSpace` structure and modified if new methods/fields are needed.

Its not really the same, no? ReservedSpace is something you can commit and use. PlaceholderRegion designates a placeholder region that needs conversion into reserved space before it can be committed. By keeping them separate, we delineate the difference between them.

@roberttoyonaga
Copy link
Copy Markdown
Contributor Author

roberttoyonaga commented May 19, 2026

Hi @afshin-zafari, thank you very much for your review feedback! I have tried to reply to each of your comments/questions below:

The 'placeholder' stuff can be internal in Windows related code (os_windows.?pp), since non-Windows platforms do not have such a term/feature.

Currently, only the windows.cpp code uses the placeholder-->convert-->reserve workflow. But the future goal is to replace similar races in shared code (like CDS). I decided to limit the scope of this PR so I removed some of the changes I had originally made in the CDS code using this new API.
Non-Windows platforms do not technically need the "placeholder" API. On POSIX, the placeholder API will just do a no-op. However, we still need the placeholder API in os.cpp so that shared code (like in CDS) can eventually be platform agnostic.

To detect placeholder regions, can we use VirtualQuery and examine the returned State and Type, for example? Placeholders are created with PAGE_NOACCESS flag; can we use this info for detecting? If we cannot detect placeholder regions in any way, then we should not provide public API (os::reserve_placeholder_memory) for creating placeholders.

I don't think this will be possible. I tested it this and found that state and type cannot be used to distinguish between placeholders and regular reserved regions. Similarly, PAGE_NOACCESS is not unique to placeholders.
With the current implementation, I think this is okay. With the current approach, we avoid needing to manually check whether the region is indeed a placeholder. The Placeholder struct and function parameters already make the distinction explicit.

Separate calls for different use-cases of placeholders. Can we have APIs like this?
reserve_memory_for_splitting(...)
reserve_memory_for_mapping(...)
reserve_memory_for_numa(...)
reserve_memory_for_xxxxxx(...)
and encapsulate placeholder creation/usage in specific code blocks. As I found, the numa use-case and aligned allocation are already encapsulated.
This way, we don't have any placeholder region that is not converted to a reserved region. We used placeholder whenever they are needed and then converted them to whatever needed.

It would be good if we could do this, but eventually we want to be be able to use this API in shared code where it is not possible to encapsulate placeholder usage. For example, a block of memory might be reserved in method_A, then much later it may need to be split or mapped in method_B. Or we may have the case where a block may need to be repeatedly split after doing some operation (which I think is the case with CDS). In which case we have no choice but to return a Placeholder rather than a reserved region, otherwise successive splits are not possible.

Internal impl of release_placeholder() is needed.

Yes, okay I can implement this. I didn't bother implementing this originally because I think that, in practice, callers should not need this since they will likely want to convert the placeholder to a reserved/mapped region to do something with it.

I am confused by the restriction 'You cannot assign NUMA nodes to placeholder regions.' written in PR description while reserve_with_numa_placeholder is implemented and tested. How that restriction should be interpreted?

In order to bind a region to a NUMA node we need to use MEM_REPLACE_PLACEHOLDER which materalizes the placeholder to an actual reserved region assigned to that NUMA node.
reserve_with_numa_placeholder does not contradict this restriction, it only uses placeholders to avoid a race when assigning subregions of a larger region to NUMA nodes. Once the subregions are assigned, they have become normal reserved regions.

Is it possible that only one of is_MapViewOfFile3_supported and is_VirtualAlloc2_supported be true (or false). They both should have the same state. If they can have different states, what happens for vmems that are reserved by va2 and then not mapped by MapViewOfFile3? Inconsistent calls of one to another. If not, then no need to check them separately. Suggestion, e.g., put both in a single method is_extended_vm_alloc_available().

Good idea. I will modify the checks to ensure that both are available, otherwise we fall back to the old way.
I think it would be unlikely that only one is supported since they both come from the same Windows version and kernelbase.

In reserve_with_numa_placeholder, what happens if some of the splitting attempts fail? It is not covered nor tested.

There is no recovery if os::split_memory fails. All failures are treated as fatal(). If the Windows placeholder is valid, then the virtualFree(base, offset, MEM_RELEASE | MEM_PRESERVE_PLACEHOLDER) should never fail unless the OS has an unexpected catastrophic problem.

I like to see tests and/or asserts for non acceptable paths, such as:
double converting to reserved region
release-before-convert
commit-before-convert
missuse of split/convert
cleanup after fails

Yes this is a good point. I will work on adding these tests.

reserve_placeholder_memory, bytes is to be granular_allocation checked/asserted.

Okay I will add a check that the size is page aligned

The test TEST_VM(os_windows, numa_placeholder_reserve_commit) only uses node-0 for test, since the NUMA node-list is constructed at os::init_2() during vm init. Missing the required vm flag to be asserted as failure or skipped with proper log. One can pass the required vm flag when running the test.

I still would like the test to be run even if there is only one NUMA node because it will still exercise most of the important code paths. The main difference is that the assignment will all be delivered to node_0 rather than a round robin. So on systems with only 1 NUMA node, I think it's better to run the test and get some coverage rather than always skip it and get none. What do you think?

VirtualAlloc2 and MapViewOfFile3 are initialized at the end of os::init_2(). This implies that all the virtual memory allocation until this point would go through the native path in Windows and not the extended placeholder feature. It is good to document it somehow to let the developers be aware of.

Yes, that's a good point. I will do that.

In map_memory_aligned_va2 if file-mapping fails, GetLastError() used in the log_trace line returns the result of CloseHandle() call. Error to be captured before calling CloseHandle().

Yes good point, I will fix it.

What is the plan for testing on AIX? Could we repeat all corner cases there?

I unfortunately do not have access to an AIX machine. I agree with what Thomas said about AIX above.
If we are really uncomfortable with including these AIX changes, we can just disable placeholders on AIX (pd_reserve_placeholder_memory would return an empty region).

PlaceholderRegion to be replaced by ReservedSpace structure and modified if new methods/fields are needed.

I considered doing this, but the problem is that PlaceholderRegion and ReservedSpace are conceptually different. They have different limitations and abilities. So I think that it is clearer to keep them distinct. This is the main reason I introduced the PlaceholderRegion in the first place. I want to make it clear to callers of the API that Placeholders must be treated differently than regular reserved regions.

The suggestions given in the 'Next Steps' section of the PR's description, could be given as separate tickets/PRs and blocked by this ticket. With the current limitations of placeholders we are not sure yet whether we can successfully/effectively implement the suggestions or not.

Yes, that is true. I can work on creating some new tickets for this future work in JBS.

I suggest and prefer to break down this PR to the parts that are cleanly implemented and tested here, like NUMA and aligned memory allocation. This new smaller PR can be integrated independently. Then create another ticket and PR for splittable memory regions and go for review rounds.

I am not opposed to doing this if it helps the review, but it would take a bit more time.

@ashu-mehra
Copy link
Copy Markdown
Contributor

@roberttoyonaga @tstuefe Nice work!

Reading up the patch I have the same suggestion as mentioned by @afshin-zafari about splitting it up into multiple parts. On the surface this patch is tackling the Windows anti-pattern of alloc-release-alloc sequence (or variation of it), but IMO it essentially employing three different strategies:

  1. alloc-release-alloc sequence in map_or_reserve_memory_aligned to get an aligned memory area. This is resolved by explicitly specifying the alignment in the call to VirtualAlloc2.
  2. alloc-release-alloc sequence in pd_attempt_reserve_memory_at when using numa interleaving. This is resolved by using VirtualAlloc2 with placeholder.
  3. reserve-release-map sequence in VM (like CDS). This is proposed to be resolved in subsequent work using APIs introduced in this patch.

I believe these three can be delivered in three parts. The second part can be limited to adding internal routines that introduces PlaceHolder in the windows subsystem. And the third part can build up on it to expose these internal routines as APIs for the VM to use. It will also introduce the APIs in the same patch where they intend to be used, which will make it further easier to review. Does that make sense?

@roberttoyonaga
Copy link
Copy Markdown
Contributor Author

Hi @ashu-mehra @afshin-zafari @tstuefe Thank you for your reviews! I am taking the advice suggested above and splitting this patch into smaller PRs. I am closing this larger PR and will link to the smaller ones here.

Please see #31311

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

hotspot hotspot-dev@openjdk.org rfr Pull request is ready for review

Development

Successfully merging this pull request may close these issues.

5 participants