8376561: Rewrite alloc-release-alloc sequences on Windows using VirtualAlloc2#30270
8376561: Rewrite alloc-release-alloc sequences on Windows using VirtualAlloc2#30270roberttoyonaga wants to merge 19 commits into
Conversation
|
👋 Welcome back roberttoyonaga! A progress list of the required criteria for merging this PR into |
|
❗ This change is not yet ready to be integrated. |
|
@roberttoyonaga The following label will be automatically applied to this pull request:
When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command. |
test working. need to focus on map refactor and one more test new test, comments, fix aix split memory accounting, use unified logging instead of warning() remove map API code. Rename to PlacholderRegion. Fix os::release remove map_memory API code fix race and simplify AOTMetaspace::reserve_address_space_for_archives clean up comments test: map_memory_to_file_aligned_larger remove old test and modify tests clean up fix aix and comments cleanup gate fix anothe rgate fix fix windows test account for wish address fix missing NMT accounting comments small cleanup trailing whitespaces
23eb5c5 to
700ff1b
Compare
Webrevs
|
|
Very nice work, and well-written patch description. I will take a look at it when I have time, but it may take a while. @johan-sjolen @afshin-zafari may be interested, since its NMT-relevant. /reviewers 2 |
tstuefe
left a comment
There was a problem hiding this comment.
Started reviewing the first part
| "Size must be a multiple of allocation granularity (page size)"); | ||
|
|
||
| // VirtualAlloc2 and MapViewOfFile3 support alignment natively. | ||
| // This avoids the race prone retry loop below. |
There was a problem hiding this comment.
For clarity I would split this into two (internal, static) sub functions, one to use placeholders, one to use conventional technique with over-reserving. Either make them filescope static or, if that is not possible, put them into the os:: class into a private section.
Then, you can also write a fitting comment at the start of the new function, and leave the old implementation's comment unchanged.
There was a problem hiding this comment.
Yes, good idea. I have split it up into two functions. The old path now remains the same as before, and a net new path is taken when VirtualAlloc2 is available.
| } | ||
| } | ||
|
|
||
| os::PlaceholderRegion os::pd_reserve_placeholder_memory(size_t bytes, bool exec, char* addr) { |
There was a problem hiding this comment.
Attention @TheRealMDoerr and @JoKern65 :
Here, we must make sure that Java Heap gets always allocated with 64K pages. Otherwise, we may see performance problems on AIX. So, we may need to restrict the use of this API on AIX if it's used for Java Heap memory. For other memory, e.g. CDS/Metaspace, its probably not that important.
I hope we can phase out the SystemV shm coding in the near future. @TheRealMDoerr , how far are we from that point? Which AIX versions that don't support 64K mmap are still in use for modern OpenJDK variants JDK27 and later?
|
|
||
| vmembk_add(base, offset, vmi->pagesize, vmi->type); | ||
| vmi->addr = base + offset; | ||
| vmi->size = region_size - offset; |
There was a problem hiding this comment.
Throwing out the shm V coding would also allow us to throw out this ancient bookkeeping code. IIRC it long predates OpenJDK.
| return anon_mmap(nullptr /* addr */, bytes, exec); | ||
| } | ||
|
|
||
| os::PlaceholderRegion os::pd_reserve_placeholder_memory(size_t bytes, bool exec, char* addr) { |
There was a problem hiding this comment.
mmap usage should behave identically on all Unices, apart from the bookkeeping we need to do on AIX.
I would therefore unify Linux and BSD coding into os_posix.cpp, ifdef it out for AIX, and leave AIX implementation in os_aix.cpp.
There was a problem hiding this comment.
Good idea! I have done the suggested refactoring.
| char ebuf[1024]; | ||
| void* const handle = os::dll_load(name, ebuf, sizeof(ebuf)); | ||
| if (handle == nullptr) { | ||
| log_trace(os)("Failed to load library: %s", name); |
There was a problem hiding this comment.
Print out ebuf, or just pass nullptr, 0 to dll_load. I think its safe to assume that this load will always work.
| install_kernelbase_symbol(os::win32::VirtualAlloc2, "VirtualAlloc2"); | ||
| log_info(os)("VirtualAlloc2 is%s available.", os::win32::VirtualAlloc2 == nullptr ? " not" : ""); | ||
| install_kernelbase_symbol(os::win32::MapViewOfFile3, "MapViewOfFile3"); | ||
| log_info(os)("MapViewOfFile3 is%s available.", os::win32::MapViewOfFile3 == nullptr ? " not" : ""); |
There was a problem hiding this comment.
I'd use debug level, not info level.
|
|
||
| #ifndef MEM_RESERVE_PLACEHOLDER | ||
| #define MEM_RESERVE_PLACEHOLDER 0x00040000 | ||
| #endif |
There was a problem hiding this comment.
These always make me itchy, but I guess its safe to assume that the constants will never change. Otherwise, I do something like this:
ifndef X
define X value
else
STATIC_ASSERT(x==value)
endif
There was a problem hiding this comment.
After some more thought, I think we can actually remove these ifdefs. All the code paths where these constants are used are guarded by is_VirtualAlloc2_supported(). When VirtualAlloc2 exists, these constants should already be defined. It seems like ZGC also doesn't bother with this.
There was a problem hiding this comment.
Well, it was introduced in Windows 10, which is in extended EOL, and nobody uses Windows 7 anymore. So anyone with a (properly updated) W10 should have this (both build and runtime machine)
Even if someone runs this on an old W7 machine to run their old minecraft server - we would just gracefully handle that, right? Since the API is missing, we'd just fall back to the old way of doing things?
There was a problem hiding this comment.
Even if someone runs this on an old W7 machine to run their old minecraft server - we would just gracefully handle that, right? Since the API is missing, we'd just fall back to the old way of doing things?
Yes, that's right. If VirtualAlloc2 is missing, we don't even need these constants anyway.
tstuefe
left a comment
There was a problem hiding this comment.
More reviewing, still not complete
| char* base() const { return _base; } | ||
| size_t size() const { return _size; } | ||
| bool is_empty() const { return _base == nullptr; } | ||
| }; |
There was a problem hiding this comment.
I would make this a truly immutable object:
char* const _base; const size_t _size;
and give it a proper copy-constructor.
Then, I would change split_region like this:
struct PlaceHolderRegionTuple { PlaceHolderRegion left, right };
bool pd_split_memory(const PlaceHolderRegion& orig, PlaceHolderRegionTuple& result, size_t offset);
or alternatively, just
bool pd_split_memory(const PlaceHolderRegion& orig, PlaceHolderRegion& left, PlaceHolderRegion& right, size_t offset);
or, yet another one:
struct PlaceHolderRegionTuple { PlaceHolderRegion left, right };
// .. blabla
// In case of an error, the returned regions are both empty
PlaceHolderRegionTuple pd_split_memory(const PlaceHolderRegion& orig, size_t offset);
Somewhat cleaner, and easier to mentally grep.
There was a problem hiding this comment.
I agree, and have adopted your 3rd suggestion.
| // (an empty PlaceholderRegion is returned). | ||
| // If the returned PlaceholderRegion is empty, the reservation failed. |
There was a problem hiding this comment.
At this layer its just interesting that it can fail (mmap can fail on POSIX too, so no need to mention Windows specifically) and that this will return an empty region. So I would cut out the Windows specific part.
There was a problem hiding this comment.
Done! I've removed it
| @@ -216,6 +236,22 @@ class os: AllStatic { | |||
|
|
|||
| static char* pd_reserve_memory(size_t bytes, bool executable); | |||
|
|
|||
There was a problem hiding this comment.
I don't think we need comments above the pd_... in the generic OS header. These are impl specifics. I would instead add some small comments to the relevant parts to the various cpp files.
There was a problem hiding this comment.
Done! I've removed those comments and added some where appropriate in the platform-dependent code.
| } | ||
|
|
||
| os::PlaceholderRegion os::pd_split_memory(PlaceholderRegion& region, size_t offset) { | ||
| // On POSIX, mmap regions are inherently splittable. Just do bookkeeping. |
There was a problem hiding this comment.
Good catch! I must have copied my comment over from the AIX code and forgot to change that part.
|
|
||
| assert(base != nullptr, "Region base cannot be null"); | ||
| assert(offset > 0, "Offset must be positive"); | ||
| assert(offset < region_size, "Offset must be less than region size"); |
There was a problem hiding this comment.
Please assert that split point is page aligned
| res = (char*)virtualAlloc(addr, bytes, MEM_RESERVE, PAGE_READWRITE); | ||
| } else { | ||
| bool use_numa_interleaving = (UseNUMAInterleaving && !UseLargePages); | ||
|
|
There was a problem hiding this comment.
Please split the conditions into two:
if (numa)
if va2 supported
// numa stuff new way, placeholders
else
// numa stuff, old way, racy
else
// non-numa
There was a problem hiding this comment.
Yes, that is clearer. Done!
| return os::win32::convert_placeholder_to_reserved(region); | ||
| } | ||
|
|
||
| // This function is for convenience to help with reserve_with_numa_placeholder. |
There was a problem hiding this comment.
Here, and for other functions that are newly exported as part of the private section of os::win32: could these be file-scope static instead or do they have to live inside os::win32?
There was a problem hiding this comment.
Previously, reserve_with_numa_placeholder was calling os::pd_reserve_placeholder_memory which is private in os::. This is why I had originally put reserve_with_numa_placeholder in os::win32::. convert_placeholder_to_reserved can be made file-scope static without problems.
I've created a helper to solve the access modifier issue and changed both convert_placeholder_to_reserved & reserve_with_numa_placeholder to be file-scope static
| return aligned_base; | ||
| } | ||
|
|
||
| // VirtualAlloc2 and MapViewOfFile3 support alignment natively. |
There was a problem hiding this comment.
I was confused here by the name, since this does not use placeholders but a different feature of the new APIs. I would rename this function to map_or_reserve_memory_aligned_va2.
(Pre-existing, but having two very distinct functionalities (reserving memory and mapping a file into memory) crammed into the same API is an eye-sore.)
There was a problem hiding this comment.
That's a good point about the naming. I've renamed it now.
There was a problem hiding this comment.
And I split up the new version into reserve_memory_aligned_va2 and map_memory_aligned_va2, but kept the old version as a single function (map_or_reserve_memory_aligned). I think old version combines two code paths because they both do the reserve-release-reserve dance and it reduces code duplication to have them combined. But I don't mind splitting up the old version too if you think it's better.
There was a problem hiding this comment.
I ended up deciding to split up the old version too for the sake of consistency.
So now there's map_memory_aligned, map_memory_aligned_va2, reserve_memory_aligned, and reserve_memory_aligned_va2
| } | ||
| return aligned_base; | ||
| } | ||
| log_trace(os)("Aligned allocation via VirtualAlloc2/MapViewOfFile3 failed, falling back to retry loop."); |
There was a problem hiding this comment.
Give us a GetLastError printout here
|
|
||
| if (res != nullptr) { | ||
| if (addr != nullptr && res != addr) { | ||
| // Got a different address than requested; release and fail. |
There was a problem hiding this comment.
The Windows docs just say "If the function succeeds, the return value is the base address of the allocated region of pages." The docs don't mention about retrying at another address if the requested address is not possible. So I think the check here is unecessary and will remove it.
…2 and reserve_memory_aligned_va2
|
/template append |
|
@roberttoyonaga The pull request template has been appended to the pull request body |
|
I decided to drop my changes to
|
|
Seems like the gate It doesn't look like it's related to the changes in this PR. |
tstuefe
left a comment
There was a problem hiding this comment.
Sorry for the delay. Its looking pretty good, but I think there is still a remaining issue
| } | ||
| // Do manual alignment | ||
| aligned_base = align_up(extra_base, alignment); | ||
| os::unmap_memory(extra_base, extra_size); |
There was a problem hiding this comment.
Pre-existing, and possibly for another RFE or not at all. I always stumble on Windows when seeing code that uses os::release_memory and os::unmap_memory, since we only can release mappings at a whole. And in fact, the size parameter is always ignored. So I wonder whether we should just pass -1 or 0 here with a /* ignored */ after it. But would that even compile, or assert somewhere? Alternatively, we could assert that the size specified is the full mapping size in os::release_memory (but then, the size is probably not known at the release point unless one asks the operating system.)
There was a problem hiding this comment.
I think that it's good to keep passing the size parameter in this case. On Windows in pd_release_memory we need the caller to provide a size because the region may actually be made up multiple smaller allocations due to NUMA interleaving. So even though each region is freed entirely, we otherwise wouldn't know how many regions to expect. If running without NUMA, we assert if the actual size freed doesn't match what the caller provided.
It looks like os::unmap_memory ignores the size parameter in most cases, but there is still one code path that reroutes to pd_release_memory and needs the size
| while (remaining_len > 0) { | ||
| const size_t bytes_to_rq = MIN2(remaining_len, chunk_size - ((size_t)cur % chunk_size)); | ||
| os::PlaceholderRegion remaining(cur, remaining_len); | ||
| os::PlaceholderRegionPair split = os::split_memory(remaining, bytes_to_rq); |
There was a problem hiding this comment.
For the last chunk, bytes_to_rq = remaining_len. This would probably assert later, since in pd_split_memory we check that split offset < region size.
There was a problem hiding this comment.
I think this code path shouldn't actually reach pd_split_memory. It would be handled early in os::split_memory. I decided to put the logic for when the split consumes the whole region in os::split_memory because it doesn't actually have any platform dependency, we only superficially wrap the return value in a PlaceholderRegionPair without making any OS calls.
There was a problem hiding this comment.
You are right. No, this is good, lets keep it.
| // Split 'orig' at 'offset'. Returns left and right placeholder pieces as a PlaceholderRegionPair. | ||
| // The caller must not use 'orig' afterward. | ||
| // Offset must be page-aligned. | ||
| // If offset == orig.size(), returns { orig, empty }. |
There was a problem hiding this comment.
Please add a comment
// offset has to be > 0
or alternatively return Pair { empty, orig }, mirroring what we do with offset==orig.size.
There was a problem hiding this comment.
Yes that's a good idea. The mirroring makes sense to me intuitively. I've adopted your suggestion.
|
|
||
| assert(base != nullptr, "Region base cannot be null."); | ||
| assert(offset > 0, "Offset must be positive (nothing to split at 0)."); | ||
| assert(offset < region_size, "Offset must be less than region size."); |
There was a problem hiding this comment.
I think the assert is wrong, should be <= region_size. Since we allow that, and we would return { orig, empty }
Weird, though, that your NUMA gtest did not pick it up. Can you explain why?
There was a problem hiding this comment.
The assert doesn't complain here because this case is handled in os::split_memory before we reach the assert in pd_split_memory. The way I was reasoning about it is that if we have reached pd_split_memory, then we already know there is some splitting to do (not offset==size or offset==0). What do you think?
Alternatively, I could move handling of those special offset cases into pd_split_memory. This has the benefit of keeping all the logic in one spot, but it would get duplicated across all the pd_ implementations.
| // --- Splittable Memory API tests --- | ||
|
|
||
| #define SKIP_IF_SPLITTABLE_NOT_SUPPORTED() \ | ||
| WINDOWS_ONLY(if (os::win32::VirtualAlloc2 == nullptr) GTEST_SKIP() << "VirtualAlloc2 not available";) |
There was a problem hiding this comment.
Nit: no trailing (), no need to make this a function-like macro.
There was a problem hiding this comment.
Removed it. Done!
| const int node_count = numa_node_list_holder.get_count(); | ||
|
|
||
| while (remaining_len > 0) { | ||
| const size_t bytes_to_rq = MIN2(remaining_len, chunk_size - ((size_t)cur % chunk_size)); |
There was a problem hiding this comment.
Preexisting, but likely UB: casting a pointer directly to size_t. At least go via uintptr_t.
There was a problem hiding this comment.
Okay I have change it to uintptr_t now.
Co-authored-by: Thomas Stuefe <thomas.stuefe@gmail.com>
Co-authored-by: Thomas Stuefe <thomas.stuefe@gmail.com>
Thank you for the new review feedback! I've addressed each of your comments. The Github gate |
tstuefe
left a comment
There was a problem hiding this comment.
Almost good. Please add a test to gtest testing the two corner cases of split -> { empty, orig } and split -> { orig, empty }.
|
|
||
| assert(base != nullptr, "Region base cannot be null."); | ||
| assert(offset > 0, "Offset must be positive (nothing to split at 0)."); | ||
| assert(offset < region_size, "Offset must be less than region size."); |
| while (remaining_len > 0) { | ||
| const size_t bytes_to_rq = MIN2(remaining_len, chunk_size - ((size_t)cur % chunk_size)); | ||
| os::PlaceholderRegion remaining(cur, remaining_len); | ||
| os::PlaceholderRegionPair split = os::split_memory(remaining, bytes_to_rq); |
There was a problem hiding this comment.
You are right. No, this is good, lets keep it.
|
Hi, I've been gone on parental leave and didn't see your ping until now. Unfortunately, I'm currently deep into some other work and cannot prioritize reviewing this for the time being. I looked at the PR description and this seems like a very good change! |
Done! I have added tests for those two special cases. |
Welcome back, Johan 🙂 |
|
Thank you @tstuefe ! |
afshin-zafari
left a comment
There was a problem hiding this comment.
Thank you Robert for this great work. It brings very good features to the virtual memory allocation. I learned that there were restrictions that shaped the API and structures as they are now in the PR. Therefore I made my review points as starting point for further discussions, as my p.o.v may be incomplete. I tried to separate my concerns by bullet-points here, but they are not requests that are necessarily to be Resolved/Implemented individually. Feel free to reply them in any format you like.
- The 'placeholder' stuff can be internal in Windows related code (
os_windows.?pp), since non-Windows platforms do not have such a term/feature. - To detect placeholder regions, can we use
VirtualQueryand examine the returnedStateandType, for example? Placeholders are created withPAGE_NOACCESSflag; can we use this info for detecting? If we cannot detect placeholder regions in any way, then we should not provide public API (os::reserve_placeholder_memory) for creating placeholders. - Separate calls for different use-cases of placeholders. Can we have APIs like this?
reserve_memory_for_splitting(...)
reserve_memory_for_mapping(...)
reserve_memory_for_numa(...)
reserve_memory_for_xxxxxx(...)
and encapsulate placeholder creation/usage in specific code blocks. As I found, the numa use-case and aligned allocation are already encapsulated.
This way, we don't have any placeholder region that is not converted to a reserved region. We used placeholder whenever they are needed and then converted them to whatever needed. - Internal impl of
release_placeholder()is needed. - I am confused by the restriction 'You cannot assign NUMA nodes to placeholder regions.' written in PR description while
reserve_with_numa_placeholderis implemented and tested. How that restriction should be interpreted? - Is it possible that only one of
is_MapViewOfFile3_supportedandis_VirtualAlloc2_supportedbe true (or false). They both should have the same state. If they can have different states, what happens for vmems that are reserved by va2 and then not mapped by MapViewOfFile3? Inconsistent calls of one to another. If not, then no need to check them separately. Suggestion, e.g., put both in a single methodis_extended_vm_alloc_available(). - In
reserve_with_numa_placeholder, what happens if some of the splitting attempts fail? It is not covered nor tested. - I like to see tests and/or asserts for non acceptable paths, such as:
- double converting to reserved region
- release-before-convert
- commit-before-convert
- missuse of split/convert
- cleanup after fails
- It'd be great if we have unit-tests for verifying the log output of various cases.
- When
debug/warning/trace/errorlevels to be used? - Do we have duplicate/redundant outputs or not? For example,
reserve_placeholder_memory_helpermay fail with one warning and then thereserve_with_numa_placeholderfail afterward with another warning.
- When
reserve_placeholder_memory, bytes is to be granular_allocation checked/asserted.- The test
TEST_VM(os_windows, numa_placeholder_reserve_commit)only uses node-0 for test, since the NUMA node-list is constructed atos::init_2()during vm init. Missing the required vm flag to be asserted as failure or skipped with proper log. One can pass the required vm flag when running the test. VirtualAlloc2andMapViewOfFile3are initialized at the end ofos::init_2(). This implies that all the virtual memory allocation until this point would go through the native path in Windows and not the extended placeholder feature. It is good to document it somehow to let the developers be aware of.- In
map_memory_aligned_va2if file-mapping fails,GetLastError()used in thelog_traceline returns the result ofCloseHandle()call. Error to be captured before callingCloseHandle(). - What is the plan for testing on AIX? Could we repeat all corner cases there?
PlaceholderRegionto be replaced byReservedSpacestructure and modified if new methods/fields are needed.- The suggestions given in the 'Next Steps' section of the PR's description, could be given as separate tickets/PRs and blocked by this ticket. With the current limitations of placeholders we are not sure yet whether we can successfully/effectively implement the suggestions or not.
- I suggest and prefer to break down this PR to the parts that are cleanly implemented and tested here, like NUMA and aligned memory allocation. This new smaller PR can be integrated independently. Then create another ticket and PR for splittable memory regions and go for review rounds.
|
Hi @afshin-zafari, Firstly, many thanks a lot for reviewing this! As we all know, reviewing is a ton of work. As context, my main motivation is to get this in soon, since I want to use it in follow-up PRs I have open, at least in time for JDK 27 code freeze. Ideally, earlier to give this a bit of cooking time. I commented inline on a few points.
This should be usable from generic code, too, even if that does not happen in this PR. Things like the CDS+Metaspace splitting would be prime candidates for this API, and there are likely others (e.g. in Metaspace). Robert removed the CDS changes from the initial set of changes because the patch got too complex, but it is still planned. Also remember that having os:: wrappers for stuff that only happens on a single platform follows a tradition in hotspot. Currently, at least the following APIs are only implemented on Linux:
A use-case-specific splitup of the API would arguably go against the existing pattern, which abstracts OS-abilities, not the intended use. We e.g. don't have separate APIs at os level for reserve_codecache, reserve_heap etc. It would also be inflexible. What if I want to split NUMA memory? Or file-map something into memory I then split? In fact, this would be one use case of this API (CDS+class-space reservations). ...
+1, but something for a follow-up RFE. We don't seem to have tests normal memory operation logging as well, e.g. os::map_memory. So those would have to be added as well.
+1, but it would be better in a follow-up RFE, since it affects more gtests than the new ones. There is an issue on AIX due to the "fake-64k-pages" mode, where we use SystemV shm instead of mmap. I would guess these tests on AIX would run into something like that. No reason not to fix it - but I don't think this is needed in this RFE, and this should be fixed by someone with AIX knowledge. I'll see if I find time to fix it, or if the AIX guys at SAP can pick it up. But if we wait long enough, the 64K-fake-pages issue will go away on its own anyway because modern AIX versions can use mmap with 64k pages, and then we can just scrap all that AIX-specific stuff and it should just work.
Its not really the same, no? ReservedSpace is something you can commit and use. PlaceholderRegion designates a placeholder region that needs conversion into reserved space before it can be committed. By keeping them separate, we delineate the difference between them. |
|
Hi @afshin-zafari, thank you very much for your review feedback! I have tried to reply to each of your comments/questions below:
Currently, only the windows.cpp code uses the placeholder-->convert-->reserve workflow. But the future goal is to replace similar races in shared code (like CDS). I decided to limit the scope of this PR so I removed some of the changes I had originally made in the CDS code using this new API.
I don't think this will be possible. I tested it this and found that state and type cannot be used to distinguish between placeholders and regular reserved regions. Similarly, PAGE_NOACCESS is not unique to placeholders.
It would be good if we could do this, but eventually we want to be be able to use this API in shared code where it is not possible to encapsulate placeholder usage. For example, a block of memory might be reserved in method_A, then much later it may need to be split or mapped in method_B. Or we may have the case where a block may need to be repeatedly split after doing some operation (which I think is the case with CDS). In which case we have no choice but to return a Placeholder rather than a reserved region, otherwise successive splits are not possible.
Yes, okay I can implement this. I didn't bother implementing this originally because I think that, in practice, callers should not need this since they will likely want to convert the placeholder to a reserved/mapped region to do something with it.
In order to bind a region to a NUMA node we need to use
Good idea. I will modify the checks to ensure that both are available, otherwise we fall back to the old way.
There is no recovery if
Yes this is a good point. I will work on adding these tests.
Okay I will add a check that the size is page aligned
I still would like the test to be run even if there is only one NUMA node because it will still exercise most of the important code paths. The main difference is that the assignment will all be delivered to node_0 rather than a round robin. So on systems with only 1 NUMA node, I think it's better to run the test and get some coverage rather than always skip it and get none. What do you think?
Yes, that's a good point. I will do that.
Yes good point, I will fix it.
I unfortunately do not have access to an AIX machine. I agree with what Thomas said about AIX above.
I considered doing this, but the problem is that PlaceholderRegion and ReservedSpace are conceptually different. They have different limitations and abilities. So I think that it is clearer to keep them distinct. This is the main reason I introduced the PlaceholderRegion in the first place. I want to make it clear to callers of the API that Placeholders must be treated differently than regular reserved regions.
Yes, that is true. I can work on creating some new tickets for this future work in JBS.
I am not opposed to doing this if it helps the review, but it would take a bit more time. |
…astError, release_placeholder
|
@roberttoyonaga @tstuefe Nice work! Reading up the patch I have the same suggestion as mentioned by @afshin-zafari about splitting it up into multiple parts. On the surface this patch is tackling the Windows anti-pattern of alloc-release-alloc sequence (or variation of it), but IMO it essentially employing three different strategies:
I believe these three can be delivered in three parts. The second part can be limited to adding internal routines that introduces PlaceHolder in the windows subsystem. And the third part can build up on it to expose these internal routines as APIs for the VM to use. It will also introduce the APIs in the same patch where they intend to be used, which will make it further easier to review. Does that make sense? |
|
Hi @ashu-mehra @afshin-zafari @tstuefe Thank you for your reviews! I am taking the advice suggested above and splitting this patch into smaller PRs. I am closing this larger PR and will link to the smaller ones here. Please see #31311 |
Motivation and Goal
Previously on Windows, memory could not be (1) reserved then split up and (2) reserved then mapped to. Due to those limitations, there are places in Hotspot where the following race-y pattern happens on Windows:
VirtualAlloc2 placeholders (available since Windows version 1803) removes those 2 limitations by introducing “placeholders”. “Placeholders” allow for reserving, then splitting or mapping into a region. However, they come with a few restrictions:
The goal is to eliminate the race-y patterns previously required on Windows by using VirtualAlloc2 placeholders. This will require some changes to the
os::memory API and to the caller code in specific locations.What’s been done in this PR
Infrastructure for dynamic loading of Windows memory API
The Windows API functions are dynamically loaded at startup at the end of
os::init_2in os_windows.cpp. This is whenos::win32::VirtualAlloc2andos::win32::MapViewOfFile3are prepared. If the loading fails, the Windows implementation falls back to the old racey approach.lookup_kernelbase_libraryandlookup_kernelbase_symbolhave been moved from zSyscall_windows.cpp toos::win32::to reduce code duplication.Additions to the os:: memory API
os::reserve_placeholder_memory,os::split_memory, andos::convert_to_reservedhave been introduced. See “API design” section for more details.Removed the race in
map_or_reserve_memory_alignedin os_windows.cppPreviously,
map_or_reserve_memory_alignedover-reserved a large range in order to get a range of available addresses, released it, then rounded up to the requested alignment and re-reserved an aligned subrange of the original range. On POSIX, mmap allows you to over-reserve then chop off the front of the reservation at an aligned address. This was not possible on windows.Now, VirtualAlloc2 and MapViewOfFile3 allow us to request an alignment directly.
Remove race when reserving on NUMA nodes
Previously, if NUMA interleaving was enabled,
pd_attempt_reserve_memory_at neededto callallocate_pages_individually. This resulted in reserving the desired range, releasing the whole range, then re-reserving subranges assigned one-by-one to a round robin of NUMA nodes.Now, when VirtualAlloc2 is available, we can avoid that racy pattern by using
os::reserve_with_numa_placeholder. This uses a placeholder to reserve the overall range, then usesos::split_memoryto divide up the region (while everything remains reserved). The placeholder chunks are then replaced by real reservations assigned to the NUMA node round robin.Deciding between 2 Approaches to the API
Two approaches were considered:
(A): Make all reserved regions splittable by default.
(B): Draw a distinction between placeholder regions and regular reserved regions.
(A) Seems to make the API surface much cleaner, but the limitations of Windows placeholders make the underlying implementation much more complex and end up imposing implicit rules about how callers can interact with the memory. So although the API surface appears cleaner, the actual API usage ends up being complex and not well defined.
(B) Increases the API surface because we need
os::reserve_placeholder_memoryandos::convert_to_reservedin addition toos::reserve_memory. However, this approach forms a clear distinction between placeholder regions and regular reserved regions. The API usages and intents are clear and well defined.Please see my comment on https://bugs.openjdk.org/browse/JDK-8376561 explaining why (B) is the better approach in more detail.
The API Design
In general, callers will make the following calls in order:
PlaceholderRegion os::reserve_placeholder_memory(size_t bytes, MemTag mem_tag, bool executable, char* addr)PlaceholderRegion os::split_memory(PlaceholderRegion& region, size_t offset)char* os::convert_to_reserved(PlaceholderRegion region)A placeholder is functionally distinct from a regular reserved region. It has both limitations and abilities that normal reserved regions don’t have. Designing the API functions to accept/return the
PlaceholderRegioninstance instead of a rawchar*forces callers to treat them differently and enforces a conceptual distinction.The returned
PlaceholderRegioncannot be committed until it is converted to a regular reserved region. This allowsos::reserve_memory,os::commit_memory, andos::uncommit_memoryto remain unchanged. The pre-existingos::code does not need to be placeholder-aware. This is very important because of two limitations (1) there is no good way to test whether a region is a placeholder vs regular reserved region (2) placeholders must be replaced in their entirety before being reserved/committed/released. Without the help ofos::convert_to_reserved, existingos::memory functions would need special handling for placeholders. This complicates and slows the common path (where the caller doesn’t care about placeholders).This API design keeps a clear distinction between the placeholder side of the workflow and the regular reserved region side. We avoid needing to change the whole flow diagram, instead we just add new bits to the start of the diagram. Callers that don’t care about placeholders won’t be affected by them. This is hopefully cleaner and easier to reason about.
Naming
I originally was using the name “SplittableRegion”. I eventually renamed it to “PlaceholderRegion" because, later on, it would be good to use placeholders to replace the reserve-release-map race as well. The name “SplittableRegion” lends itself nicely to the reserve-split-commit workflow, but not so nicely to the reserve-map and reserve-split-map workflow.
Next steps
Use placeholders to resolve file mapping races
Improve
attempt_map_memory_to_file_atandmap_memoryby taking advantage of placeholders. This will allow us to eliminate the reserve → release → map race.Change CDS code to take advantage of placeholders
The CDS code uses the racey reserve → release → map subregion pattern on Windows. This could be replaced with reserve_placeholder memory → map_memory.
Testing
make test TEST=gtest:osmake test TEST="test/hotspot/jtreg/runtime/cds"Progress
Issue
Reviewers
Reviewing
Using
gitCheckout this PR locally:
$ git fetch https://git.openjdk.org/jdk.git pull/30270/head:pull/30270$ git checkout pull/30270Update a local copy of the PR:
$ git checkout pull/30270$ git pull https://git.openjdk.org/jdk.git pull/30270/headUsing Skara CLI tools
Checkout this PR locally:
$ git pr checkout 30270View PR using the GUI difftool:
$ git pr show -t 30270Using diff file
Download this PR as a diff file:
https://git.openjdk.org/jdk/pull/30270.diff
Using Webrev
Link to Webrev Comment