From 02e4649aec629fe710e99bb8227ef377cbae1eb8 Mon Sep 17 00:00:00 2001 From: "copilot-swe-agent[bot]" <198982749+Copilot@users.noreply.github.com> Date: Sat, 6 Jun 2026 23:50:31 +0000 Subject: [PATCH] Add NLS format analysis: locale.c vs ReactOS XP/2003 infrastructure --- media/doc/nls_format_analysis.md | 287 +++++++++++++++++++++++++++++++ 1 file changed, 287 insertions(+) create mode 100644 media/doc/nls_format_analysis.md diff --git a/media/doc/nls_format_analysis.md b/media/doc/nls_format_analysis.md new file mode 100644 index 0000000000000..6e6cf8a3b00ed --- /dev/null +++ b/media/doc/nls_format_analysis.md @@ -0,0 +1,287 @@ +# NLS Format Analysis: Wine locale.c vs. ReactOS Infrastructure + +## Background + +`dll/win32/kernelbase/wine/locale.c` is the Wine-derived implementation of the +kernelbase locale API. It targets the Windows Vista+ (NT 6.0+) NLS architecture, +which uses a substantially different set of NLS section types, file formats, and +kernel APIs compared to what ReactOS currently implements (Windows XP/2003, NT 5.x). + +The tool `sdk/tools/txt2nls/main.cpp` (note: file is `.cpp`, not `.c`) converts +`.txt` codepage definitions to binary NLS files. It is only one part of the +full NLS infrastructure. + +--- + +## NtGetNlsSectionPtr Section Type Mapping + +The central dispatch mechanism is `NtGetNlsSectionPtr(type, id, unk, &ptr, &size)`. +The type numbers differ completely between the two eras. + +### Windows Vista+ / Wine (correct mapping, confirmed by wine test) + +| Type | File / purpose | Notes | +|------|----------------|-------| +| 9 | sortdefault (sort keys, casemap, ctypes, sort GUIDs) | bogus ID → `STATUS_INVALID_PARAMETER_1` | +| 10 | casemap table (l_intl.nls format) | bogus ID → `STATUS_INVALID_PARAMETER_1` or `STATUS_UNSUCCESSFUL` | +| 11 | codepage files (`c_*.nls`), keyed by codepage number | bogus ID → `STATUS_OBJECT_NAME_NOT_FOUND` | +| 12 | Unicode normalization files, keyed by normalization form | bogus ID → `STATUS_OBJECT_NAME_NOT_FOUND` | +| 13 | unknown sort-related | bogus ID → `STATUS_INVALID_PARAMETER_1` | +| 14 | unknown | success or `STATUS_INVALID_PARAMETER_1` depending on Windows version | +| all others (0–8, 15+) | invalid | → `STATUS_INVALID_PARAMETER_1` | + +### ReactOS Current (`subsystems/win/basesrv/nls.c`) + +| Type | File | Status | +|------|------|--------| +| 1 | `unicode.nls` | implemented | +| 2 | `locale.nls` | implemented | +| 3 | `ctype.nls` | implemented | +| 4 | `sortkey.nls` | implemented | +| 5 | `sorttbls.nls`| implemented | +| 6 | `c_437.nls` | implemented | +| 7 | `c_1252.nls` | implemented | +| 8 | `l_except.nls`| implemented | +| 9 | — | `STATUS_NOT_IMPLEMENTED` | +| 10 | — | `STATUS_NOT_IMPLEMENTED` | +| 11 | codepage files (dynamic) | implemented, **matches Vista+** | +| 12 | `geo.nls` | implemented, **CONFLICTS with Vista+ normalization** | + +Key observations: +- Types 1–8 are the old XP/2003 CSR-based mechanism; they do not exist in Vista+. +- Type 9 is stubbed out; `locale.c` calls it unconditionally on startup. +- Type 12 in ReactOS maps to `geo.nls`; in Vista+ it maps to normalization files. + This is a direct conflict that will cause `IsNormalizedString`/`NormalizeString` + to receive geo data instead of normalization tables. + +--- + +## Function Stubs in ntdll + +From `dll/ntdll/def/ntdll.spec`: + +| Function | Status | Used by locale.c | +|----------|--------|-----------------| +| `NtGetNlsSectionPtr` | `-stub -version=0x600+` | Yes — types 9, 11, 12 | +| `NtInitializeNlsFiles` | `-stub -version=0x600+` | Indirectly (Wine test uses it) | +| `RtlNormalizeString` | `-stub -version=0x600+` | Yes — called by `NormalizeString` | +| `RtlGetLocaleFileMappingAddress` | **not exported at all** | Yes — called in `load_locale_nls()` | + +`RtlGetLocaleFileMappingAddress` is the most critical missing export. +`load_locale_nls()` calls it during `init_locale()` and if it returns failure the +entire locale subsystem fails to initialise. + +--- + +## NLS File Format Differences + +### 1. `locale.nls` — Major format change + +**Old format (ReactOS, XP/2003):** +- Binary table, no magic header, ~209 KB +- Per-LCID records in an undocumented 2003-era layout +- Read by old `kernel32` via `NtGetNlsSectionPtr(2, ...)` + +**New format (Vista+, required by Wine `locale.c`):** +- Accessed via `RtlGetLocaleFileMappingAddress()` +- Starts with `NLS_LOCALE_HEADER` at offset 0 (magic `'NSDS'` at offset 0x0C) +- Followed by: `NLS_LOCALE_DATA[]` array, sorted LCID index, sorted LCNAME index, + string pool, calendar array, **and embedded geo data** (new `struct geo_id[]` + + `struct geo_index[]` with its own sub-header) + +The `NLS_LOCALE_HEADER` and `NLS_LOCALE_DATA` types are defined only in +`sdk/include/wine/winternl.h` (because only Wine code uses them today). +ReactOS NDK headers (`sdk/include/ndk/`) do not have these types. +If `RtlGetLocaleFileMappingAddress` is to be implemented inside ntdll/RTL, +these types will need to be added to a suitable ReactOS header. + +**Impact:** `load_locale_nls()` reads locale data, geo IDs, geo index, and character +maps entirely from the pointer returned by `RtlGetLocaleFileMappingAddress`. +None of this works today on ReactOS. + +**Tooling gap:** No tool exists in ReactOS to produce a `locale.nls` in the new +format. Wine's `nls/locale.nls` (generated by its `tools/make_unicode` script) is +a compatible source; alternatively, a new tool would need to consume the MS +locale data. + +Note also that `sdk/lib/rtl/locale.c` currently resolves LCID↔name via a +hardcoded `RtlpLocaleTable[]` array. This approach is orthogonal to +`RtlGetLocaleFileMappingAddress`; the two need to be reconciled. + +### 2. `sortdefault.nls` — New file, missing in ReactOS + +**Old format (ReactOS, XP/2003):** +- Two separate files served via old CSR types: + - `sortkey.nls` (type 4, ~262 KB): 4-byte sort weights per Unicode code point + - `sorttbls.nls` (type 5, ~21 KB): sort table metadata and filenames for + locale-specific sort files (e.g. `big5.nls`, `prcp.nls`) + +**New format (Vista+, required by Wine `locale.c`):** +- Single file `sortdefault.nls` served via `NtGetNlsSectionPtr(9, 0, ...)` +- Header layout (from `load_sortdefault_nls()`): + ```c + struct { + UINT sortkeys; // offset to sort key table (UINT per Unicode code point) + UINT casemaps; // offset to casemap table (l_intl.nls format, USHORT pairs) + UINT ctypes; // offset to CT_CTYPE1/2/3 table + UINT sortids; // offset to sort ID block + }; + ``` +- Sort ID block: `version` + `guid_count` + `struct sortguid[]` + (each: 16-byte GUID + flags + compression/exception/casing offsets) +- After sort IDs: expansion count + `struct sort_expansion[]` (2×WCHAR per entry) +- After expansions: compression count + `struct sort_compression[]` + compression data +- After compression data: multiple-weights block + `struct jamo_sort[]` + +**Tooling gap:** No tool exists in ReactOS to build `sortdefault.nls`. +Wine generates it from its `tools/make_unicode` script using Unicode data. + +### 3. Normalization NLS files — Missing entirely + +**Old format (ReactOS, XP/2003):** Not present. + +**New format (Vista+, required by Wine `locale.c`):** +- Four files keyed by normalization form (NormalizationC=1, D=2, KC=5, KD=6), + served via `NtGetNlsSectionPtr(12, form, ...)` +- Parsed via `struct norm_table` header (defined in `locale.c`): + - File name (13 WCHARs), checksum, Unicode version, normalization form + - Offsets to: combining class table, property tables (level 1 + 2), + decomposition hash + map + sequence tables, composition hash + sequence tables +- Used by `RtlNormalizeString()` (also stubbed) + +**Conflict:** ReactOS currently maps type 12 to `geo.nls`. The type 12 slot must +be reassigned to normalization. Geo data in the new architecture is embedded in +`locale.nls` itself (see section 1 above). + +**Tooling gap:** No tool exists in ReactOS to build normalization NLS files. +Wine's `nls/Normalize{C,D,KC,KD}.nls` files can serve as a source. + +### 4. Casemap table (type 10) + +**Old format (ReactOS):** `l_intl.nls` served via old CSR type, also referenced +directly from `ExpNlsSectionPointer` in the kernel. + +**New format (Vista+):** Served via `NtGetNlsSectionPtr(10, 0, ...)`. The format +is the same `l_intl.nls` USHORT-pair layout — `locale.c` explicitly notes +`/* casemap table, in l_intl.nls format */` for `sort.casemap`. The content of +ReactOS's existing `l_intl.nls` (4870 bytes) should be compatible once type 10 +is implemented. + +### 5. Codepage NLS files (`c_*.nls`) — Minor header difference, mostly resolved + +Both old and new code use the same 26-byte `NLS_FILE_HEADER` (13 WORDs). +The NDK header `sdk/include/ddk/ntnls.h` already defines the correct layout: + +```c +typedef struct _NLS_FILE_HEADER { + USHORT HeaderSize; // = 13 (WORDs) + USHORT CodePage; + USHORT MaximumCharacterSize; // 1 = SBCS, 2 = DBCS + USHORT DefaultChar; + USHORT UniDefaultChar; + USHORT TransDefaultChar; // Unicode → CP fallback: Unicode of DefaultChar + USHORT TransUniDefaultChar; // CP → Unicode fallback: CP of UniDefaultChar + UCHAR LeadByte[12]; +} NLS_FILE_HEADER; +``` + +**Old tool (`sdk/tools/create_nls/create_nls.c`):** +- Uses a different in-memory layout (`BYTE DefaultChar[2]` + `unknown1`/`unknown2`) +- Always writes `'?'` (0x003F) for `TransDefaultChar` and `TransUniDefaultChar` +- Reads data from the host OS via `GetCPInfoExA` — Windows-only + +**New tool (`sdk/tools/txt2nls/main.cpp`):** +- Uses the correct layout matching `ntnls.h` +- Properly computes `TransDefaultChar` and `TransUniDefaultChar` +- Reads from portable `.txt` source files; runs cross-platform + +The `txt2nls` tool already generates correct NLS files. All codepage `.nls` files +in `media/nls/` are now built by `txt2nls` from the `.txt` sources in +`media/nls/src/`. The `create_nls.c` tool is obsolete for this purpose. + +Files not yet converted: `c_856.nls` and `c_878.nls` are listed in +`media/nls/CMakeLists.txt` as static (manually generated) rather than built +from `.txt` sources. They may still be in the old format with `'?'` for the +translated default chars, and should have `.txt` sources added so they can be +rebuilt by `txt2nls`. + +--- + +## Summary of Required Changes + +### Critical (locale.c cannot initialize without these) + +1. **Implement `RtlGetLocaleFileMappingAddress`** in `sdk/lib/rtl/` or `dll/ntdll/`. + Must map and return a pointer to a Vista+-format `locale.nls` image. + +2. **Implement `NtGetNlsSectionPtr` type 9** (sortdefault) in ntdll/ntoskrnl. + Must serve `sortdefault.nls` in the new unified format. + +3. **Implement `NtGetNlsSectionPtr` type 10** (casemap) in ntdll/ntoskrnl. + Can reuse `l_intl.nls` data (format is already compatible). + +4. **Fix `NtGetNlsSectionPtr` type 12** conflict in `subsystems/win/basesrv/nls.c`: + change from `geo.nls` to the normalization NLS files. Geo data must instead + be embedded in the new `locale.nls`. + +5. **Create new `locale.nls`** in Vista+ format (`NLS_LOCALE_HEADER` + `NLS_LOCALE_DATA[]`). + Wine's `nls/locale.nls` can serve as a base. + +6. **Create `sortdefault.nls`** in the new unified format. + Wine's `nls/sortdefault.nls` can serve as a base. + +7. **Add `NLS_LOCALE_HEADER` and `NLS_LOCALE_DATA` type definitions** to a ReactOS + NDK header (e.g. `sdk/include/ndk/rtltypes.h` or a new `sdk/include/ndk/nlstypes.h`), + so that `RtlGetLocaleFileMappingAddress` can be implemented without depending + on Wine's `winternl.h`. + +### Important (affects correctness) + +8. **Implement `RtlNormalizeString`** in `sdk/lib/rtl/`. Requires normalization NLS + files served via `NtGetNlsSectionPtr(12, ...)`. + +9. **Create normalization NLS files** (`Normalize{C,D,KC,KD}.nls`). + Wine's `nls/Normalize*.nls` can serve as a base. + +10. **Implement `NtInitializeNlsFiles`** in ntdll. Used by some callers to prime + the locale file mapping before `RtlGetLocaleFileMappingAddress` is called. + +11. **Reconcile `sdk/lib/rtl/locale.c`** hardcoded `RtlpLocaleTable[]` with the + file-based data from `locale.nls`. Currently both represent locale↔LCID + mappings independently. Long term the RTL should derive this data from + `locale.nls` rather than a compile-time table. + +### Lower priority + +12. **Convert `c_856.nls` and `c_878.nls`** from static files to `.txt`-sourced + builds via `txt2nls`, ensuring `TransDefaultChar`/`TransUniDefaultChar` are + correct. + +13. **Deprecate `sdk/tools/create_nls/create_nls.c`**. It is superseded by + `txt2nls`. The files it would produce have incorrect `TransDefaultChar` values. + +14. **Document the old CSR NLS types 1–8** in `basesrv/nls.c` as XP/2003-era + compatibility only. They should be preserved for any remaining old code paths + but are invisible to Vista+ callers. + +--- + +## File Reference + +| File | Role | +|------|------| +| `dll/win32/kernelbase/wine/locale.c` | New locale API; requires Vista+ NLS | +| `sdk/tools/txt2nls/main.cpp` | Codepage NLS generator (correct, in use) | +| `sdk/tools/create_nls/create_nls.c` | Old codepage NLS generator (obsolete) | +| `sdk/include/ddk/ntnls.h` | `CPTABLEINFO`, `NLSTABLEINFO`, `NLS_FILE_HEADER` | +| `sdk/include/ndk/rtltypes.h` | `NLS_FILE_HEADER` (matches new format) | +| `sdk/include/wine/winternl.h` | `NLS_LOCALE_HEADER`, `NLS_LOCALE_DATA` (Wine only) | +| `sdk/lib/rtl/locale.c` | RTL locale name↔LCID resolution (hardcoded table) | +| `dll/ntdll/def/ntdll.spec` | Stubs: `NtGetNlsSectionPtr`, `RtlNormalizeString`, etc. | +| `subsystems/win/basesrv/nls.c` | NLS section type → file mapping (old scheme) | +| `media/nls/locale.nls` | Old-format locale data (must be replaced) | +| `media/nls/sortkey.nls` | Old sort key table (superseded by `sortdefault.nls`) | +| `media/nls/sorttbls.nls` | Old sort table metadata (superseded by `sortdefault.nls`) | +| `media/nls/l_intl.nls` | Case mapping (compatible with new type 10) | +| `media/nls/geo.nls` | Geo data (must move into new `locale.nls`) | +| `media/nls/CMakeLists.txt` | NLS file build rules |