Skip to content
Draft
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
287 changes: 287 additions & 0 deletions media/doc/nls_format_analysis.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,287 @@
# NLS Format Analysis: Wine locale.c vs. ReactOS Infrastructure

## Background

`dll/win32/kernelbase/wine/locale.c` is the Wine-derived implementation of the
kernelbase locale API. It targets the Windows Vista+ (NT 6.0+) NLS architecture,
which uses a substantially different set of NLS section types, file formats, and
kernel APIs compared to what ReactOS currently implements (Windows XP/2003, NT 5.x).

The tool `sdk/tools/txt2nls/main.cpp` (note: file is `.cpp`, not `.c`) converts
`.txt` codepage definitions to binary NLS files. It is only one part of the
full NLS infrastructure.

---

## NtGetNlsSectionPtr Section Type Mapping

The central dispatch mechanism is `NtGetNlsSectionPtr(type, id, unk, &ptr, &size)`.
The type numbers differ completely between the two eras.

### Windows Vista+ / Wine (correct mapping, confirmed by wine test)

| Type | File / purpose | Notes |
|------|----------------|-------|
| 9 | sortdefault (sort keys, casemap, ctypes, sort GUIDs) | bogus ID → `STATUS_INVALID_PARAMETER_1` |
| 10 | casemap table (l_intl.nls format) | bogus ID → `STATUS_INVALID_PARAMETER_1` or `STATUS_UNSUCCESSFUL` |
| 11 | codepage files (`c_*.nls`), keyed by codepage number | bogus ID → `STATUS_OBJECT_NAME_NOT_FOUND` |
| 12 | Unicode normalization files, keyed by normalization form | bogus ID → `STATUS_OBJECT_NAME_NOT_FOUND` |
| 13 | unknown sort-related | bogus ID → `STATUS_INVALID_PARAMETER_1` |
| 14 | unknown | success or `STATUS_INVALID_PARAMETER_1` depending on Windows version |
| all others (0–8, 15+) | invalid | → `STATUS_INVALID_PARAMETER_1` |

### ReactOS Current (`subsystems/win/basesrv/nls.c`)

| Type | File | Status |
|------|------|--------|
| 1 | `unicode.nls` | implemented |
| 2 | `locale.nls` | implemented |
| 3 | `ctype.nls` | implemented |
| 4 | `sortkey.nls` | implemented |
| 5 | `sorttbls.nls`| implemented |
| 6 | `c_437.nls` | implemented |
| 7 | `c_1252.nls` | implemented |
| 8 | `l_except.nls`| implemented |
| 9 | — | `STATUS_NOT_IMPLEMENTED` |
| 10 | — | `STATUS_NOT_IMPLEMENTED` |
| 11 | codepage files (dynamic) | implemented, **matches Vista+** |
| 12 | `geo.nls` | implemented, **CONFLICTS with Vista+ normalization** |

Key observations:
- Types 1–8 are the old XP/2003 CSR-based mechanism; they do not exist in Vista+.
- Type 9 is stubbed out; `locale.c` calls it unconditionally on startup.
- Type 12 in ReactOS maps to `geo.nls`; in Vista+ it maps to normalization files.
This is a direct conflict that will cause `IsNormalizedString`/`NormalizeString`
to receive geo data instead of normalization tables.

---

## Function Stubs in ntdll

From `dll/ntdll/def/ntdll.spec`:

| Function | Status | Used by locale.c |
|----------|--------|-----------------|
| `NtGetNlsSectionPtr` | `-stub -version=0x600+` | Yes — types 9, 11, 12 |
| `NtInitializeNlsFiles` | `-stub -version=0x600+` | Indirectly (Wine test uses it) |
| `RtlNormalizeString` | `-stub -version=0x600+` | Yes — called by `NormalizeString` |
| `RtlGetLocaleFileMappingAddress` | **not exported at all** | Yes — called in `load_locale_nls()` |

`RtlGetLocaleFileMappingAddress` is the most critical missing export.
`load_locale_nls()` calls it during `init_locale()` and if it returns failure the
entire locale subsystem fails to initialise.

---

## NLS File Format Differences

### 1. `locale.nls` — Major format change

**Old format (ReactOS, XP/2003):**
- Binary table, no magic header, ~209 KB
- Per-LCID records in an undocumented 2003-era layout
- Read by old `kernel32` via `NtGetNlsSectionPtr(2, ...)`

**New format (Vista+, required by Wine `locale.c`):**
- Accessed via `RtlGetLocaleFileMappingAddress()`
- Starts with `NLS_LOCALE_HEADER` at offset 0 (magic `'NSDS'` at offset 0x0C)
- Followed by: `NLS_LOCALE_DATA[]` array, sorted LCID index, sorted LCNAME index,
string pool, calendar array, **and embedded geo data** (new `struct geo_id[]` +
`struct geo_index[]` with its own sub-header)

The `NLS_LOCALE_HEADER` and `NLS_LOCALE_DATA` types are defined only in
`sdk/include/wine/winternl.h` (because only Wine code uses them today).
ReactOS NDK headers (`sdk/include/ndk/`) do not have these types.
If `RtlGetLocaleFileMappingAddress` is to be implemented inside ntdll/RTL,
these types will need to be added to a suitable ReactOS header.

**Impact:** `load_locale_nls()` reads locale data, geo IDs, geo index, and character
maps entirely from the pointer returned by `RtlGetLocaleFileMappingAddress`.
None of this works today on ReactOS.

**Tooling gap:** No tool exists in ReactOS to produce a `locale.nls` in the new
format. Wine's `nls/locale.nls` (generated by its `tools/make_unicode` script) is
a compatible source; alternatively, a new tool would need to consume the MS
locale data.

Note also that `sdk/lib/rtl/locale.c` currently resolves LCID↔name via a
hardcoded `RtlpLocaleTable[]` array. This approach is orthogonal to
`RtlGetLocaleFileMappingAddress`; the two need to be reconciled.

### 2. `sortdefault.nls` — New file, missing in ReactOS

**Old format (ReactOS, XP/2003):**
- Two separate files served via old CSR types:
- `sortkey.nls` (type 4, ~262 KB): 4-byte sort weights per Unicode code point
- `sorttbls.nls` (type 5, ~21 KB): sort table metadata and filenames for
locale-specific sort files (e.g. `big5.nls`, `prcp.nls`)

**New format (Vista+, required by Wine `locale.c`):**
- Single file `sortdefault.nls` served via `NtGetNlsSectionPtr(9, 0, ...)`
- Header layout (from `load_sortdefault_nls()`):
```c
struct {
UINT sortkeys; // offset to sort key table (UINT per Unicode code point)
UINT casemaps; // offset to casemap table (l_intl.nls format, USHORT pairs)
UINT ctypes; // offset to CT_CTYPE1/2/3 table
UINT sortids; // offset to sort ID block
};
```
- Sort ID block: `version` + `guid_count` + `struct sortguid[]`
(each: 16-byte GUID + flags + compression/exception/casing offsets)
- After sort IDs: expansion count + `struct sort_expansion[]` (2×WCHAR per entry)
- After expansions: compression count + `struct sort_compression[]` + compression data
- After compression data: multiple-weights block + `struct jamo_sort[]`

**Tooling gap:** No tool exists in ReactOS to build `sortdefault.nls`.
Wine generates it from its `tools/make_unicode` script using Unicode data.

### 3. Normalization NLS files — Missing entirely

**Old format (ReactOS, XP/2003):** Not present.

**New format (Vista+, required by Wine `locale.c`):**
- Four files keyed by normalization form (NormalizationC=1, D=2, KC=5, KD=6),
served via `NtGetNlsSectionPtr(12, form, ...)`
- Parsed via `struct norm_table` header (defined in `locale.c`):
- File name (13 WCHARs), checksum, Unicode version, normalization form
- Offsets to: combining class table, property tables (level 1 + 2),
decomposition hash + map + sequence tables, composition hash + sequence tables
- Used by `RtlNormalizeString()` (also stubbed)

**Conflict:** ReactOS currently maps type 12 to `geo.nls`. The type 12 slot must
be reassigned to normalization. Geo data in the new architecture is embedded in
`locale.nls` itself (see section 1 above).

**Tooling gap:** No tool exists in ReactOS to build normalization NLS files.
Wine's `nls/Normalize{C,D,KC,KD}.nls` files can serve as a source.

### 4. Casemap table (type 10)

**Old format (ReactOS):** `l_intl.nls` served via old CSR type, also referenced
directly from `ExpNlsSectionPointer` in the kernel.

**New format (Vista+):** Served via `NtGetNlsSectionPtr(10, 0, ...)`. The format
is the same `l_intl.nls` USHORT-pair layout — `locale.c` explicitly notes
`/* casemap table, in l_intl.nls format */` for `sort.casemap`. The content of
ReactOS's existing `l_intl.nls` (4870 bytes) should be compatible once type 10
is implemented.

### 5. Codepage NLS files (`c_*.nls`) — Minor header difference, mostly resolved

Both old and new code use the same 26-byte `NLS_FILE_HEADER` (13 WORDs).
The NDK header `sdk/include/ddk/ntnls.h` already defines the correct layout:

```c
typedef struct _NLS_FILE_HEADER {
USHORT HeaderSize; // = 13 (WORDs)
USHORT CodePage;
USHORT MaximumCharacterSize; // 1 = SBCS, 2 = DBCS
USHORT DefaultChar;
USHORT UniDefaultChar;
USHORT TransDefaultChar; // Unicode → CP fallback: Unicode of DefaultChar
USHORT TransUniDefaultChar; // CP → Unicode fallback: CP of UniDefaultChar
UCHAR LeadByte[12];
} NLS_FILE_HEADER;
```

**Old tool (`sdk/tools/create_nls/create_nls.c`):**
- Uses a different in-memory layout (`BYTE DefaultChar[2]` + `unknown1`/`unknown2`)
- Always writes `'?'` (0x003F) for `TransDefaultChar` and `TransUniDefaultChar`
- Reads data from the host OS via `GetCPInfoExA` — Windows-only

**New tool (`sdk/tools/txt2nls/main.cpp`):**
- Uses the correct layout matching `ntnls.h`
- Properly computes `TransDefaultChar` and `TransUniDefaultChar`
- Reads from portable `.txt` source files; runs cross-platform

The `txt2nls` tool already generates correct NLS files. All codepage `.nls` files
in `media/nls/` are now built by `txt2nls` from the `.txt` sources in
`media/nls/src/`. The `create_nls.c` tool is obsolete for this purpose.

Files not yet converted: `c_856.nls` and `c_878.nls` are listed in
`media/nls/CMakeLists.txt` as static (manually generated) rather than built
from `.txt` sources. They may still be in the old format with `'?'` for the
translated default chars, and should have `.txt` sources added so they can be
rebuilt by `txt2nls`.

---

## Summary of Required Changes

### Critical (locale.c cannot initialize without these)

1. **Implement `RtlGetLocaleFileMappingAddress`** in `sdk/lib/rtl/` or `dll/ntdll/`.
Must map and return a pointer to a Vista+-format `locale.nls` image.

2. **Implement `NtGetNlsSectionPtr` type 9** (sortdefault) in ntdll/ntoskrnl.
Must serve `sortdefault.nls` in the new unified format.

3. **Implement `NtGetNlsSectionPtr` type 10** (casemap) in ntdll/ntoskrnl.
Can reuse `l_intl.nls` data (format is already compatible).

4. **Fix `NtGetNlsSectionPtr` type 12** conflict in `subsystems/win/basesrv/nls.c`:
change from `geo.nls` to the normalization NLS files. Geo data must instead
be embedded in the new `locale.nls`.

5. **Create new `locale.nls`** in Vista+ format (`NLS_LOCALE_HEADER` + `NLS_LOCALE_DATA[]`).
Wine's `nls/locale.nls` can serve as a base.

6. **Create `sortdefault.nls`** in the new unified format.
Wine's `nls/sortdefault.nls` can serve as a base.

7. **Add `NLS_LOCALE_HEADER` and `NLS_LOCALE_DATA` type definitions** to a ReactOS
NDK header (e.g. `sdk/include/ndk/rtltypes.h` or a new `sdk/include/ndk/nlstypes.h`),
so that `RtlGetLocaleFileMappingAddress` can be implemented without depending
on Wine's `winternl.h`.

### Important (affects correctness)

8. **Implement `RtlNormalizeString`** in `sdk/lib/rtl/`. Requires normalization NLS
files served via `NtGetNlsSectionPtr(12, ...)`.

9. **Create normalization NLS files** (`Normalize{C,D,KC,KD}.nls`).
Wine's `nls/Normalize*.nls` can serve as a base.

10. **Implement `NtInitializeNlsFiles`** in ntdll. Used by some callers to prime
the locale file mapping before `RtlGetLocaleFileMappingAddress` is called.

11. **Reconcile `sdk/lib/rtl/locale.c`** hardcoded `RtlpLocaleTable[]` with the
file-based data from `locale.nls`. Currently both represent locale↔LCID
mappings independently. Long term the RTL should derive this data from
`locale.nls` rather than a compile-time table.

### Lower priority

12. **Convert `c_856.nls` and `c_878.nls`** from static files to `.txt`-sourced
builds via `txt2nls`, ensuring `TransDefaultChar`/`TransUniDefaultChar` are
correct.

13. **Deprecate `sdk/tools/create_nls/create_nls.c`**. It is superseded by
`txt2nls`. The files it would produce have incorrect `TransDefaultChar` values.

14. **Document the old CSR NLS types 1–8** in `basesrv/nls.c` as XP/2003-era
compatibility only. They should be preserved for any remaining old code paths
but are invisible to Vista+ callers.

---

## File Reference

| File | Role |
|------|------|
| `dll/win32/kernelbase/wine/locale.c` | New locale API; requires Vista+ NLS |
| `sdk/tools/txt2nls/main.cpp` | Codepage NLS generator (correct, in use) |
| `sdk/tools/create_nls/create_nls.c` | Old codepage NLS generator (obsolete) |
| `sdk/include/ddk/ntnls.h` | `CPTABLEINFO`, `NLSTABLEINFO`, `NLS_FILE_HEADER` |
| `sdk/include/ndk/rtltypes.h` | `NLS_FILE_HEADER` (matches new format) |
| `sdk/include/wine/winternl.h` | `NLS_LOCALE_HEADER`, `NLS_LOCALE_DATA` (Wine only) |
| `sdk/lib/rtl/locale.c` | RTL locale name↔LCID resolution (hardcoded table) |
| `dll/ntdll/def/ntdll.spec` | Stubs: `NtGetNlsSectionPtr`, `RtlNormalizeString`, etc. |
| `subsystems/win/basesrv/nls.c` | NLS section type → file mapping (old scheme) |
| `media/nls/locale.nls` | Old-format locale data (must be replaced) |
| `media/nls/sortkey.nls` | Old sort key table (superseded by `sortdefault.nls`) |
| `media/nls/sorttbls.nls` | Old sort table metadata (superseded by `sortdefault.nls`) |
| `media/nls/l_intl.nls` | Case mapping (compatible with new type 10) |
| `media/nls/geo.nls` | Geo data (must move into new `locale.nls`) |
| `media/nls/CMakeLists.txt` | NLS file build rules |