diff --git a/NOTES.md b/NOTES.md new file mode 100644 index 0000000..b17f208 --- /dev/null +++ b/NOTES.md @@ -0,0 +1,296 @@ +# mtga-reader local fork — investigation notes + +This is a **local fork** of [`mtgatool/mtga-reader`](https://github.com/mtgatool/mtga-reader), built from HEAD +(`v0.1.6` unreleased as of **2026-04-10**) with local patches so it compiles +and runs on macOS arm64. Originally forked into `/tmp/mtga-reader-head` during +a `commander-tuner` `mtga-import` integration investigation, then moved here +for durability. + +The fork is **not currently wired into anything**. `commander-tuner`'s +`mtga-import` removed its `--collection-source mtga-reader` option because the +`--collection-source untapped-csv` path ended up being a cleaner, +no-privileges-required alternative. These notes exist so the work is +recoverable if someone wants to pick it back up later. + +## What works + +- Builds cleanly on `darwin-arm64` with `npm install && npm run build` + (produces `mtga-reader.darwin-arm64.node`, already present in the repo root + from the last build). +- `readMtgaCards("MTGA")` (our custom napi function added to this fork) scans + Arena's live process memory, finds the `Cards` dictionary via a signature + scan, and returns `{ cards: [{cardId: int, quantity: int}, ...] }`. +- The scan is fast (<1s wall clock) and deterministic against a running Arena + process. +- Requires `sudo` to run because `task_for_pid` on macOS needs elevated + privileges unless the calling binary is signed with + `com.apple.security.cs.debugger`, which requires an Apple developer + entitlement we don't have. Sudo is effectively mandatory for local use. + +## What doesn't + +- **The returned dict is per-printing (grp_id), not per-oracle.** A card like + Lightning Strike with 6 printings has 6 separate entries, each at whatever + physical-copy count the user acquired for that printing. Downstream code + needs to aggregate by oracle name and cap at 4 for deckbuilding use. (The + Python side of `mtga-import` handles this in `_resolve_collection` — see the + `sum + cap at 4` comment block.) +- **Scryfall's `default_cards.json` has null `arena_id` for many recent Arena + printings** (Alchemy Y-sets, Avatar TLA/TLE, Final Fantasy, Lorwyn Eclipsed, + etc.) so downstream name resolution misses ~1500 cards when using this + reader alone. This is a Scryfall-side upstream data ingestion lag, not a + local bulk staleness issue. Running `download-bulk` doesn't help. +- **`PAPA._instance` reads as `0`** on current Arena builds. Upstream's code + path walks `PAPA._instance.InventoryManager._inventoryServiceWrapper.Cards` + and that first step is broken. Unknown root cause — probably a + GC-static-vs-value-static layout difference in the IL2CPP class struct, or a + stale `CLASS_STATIC_FIELDS` offset that happens to work for some value-type + statics but not for reference-type statics. **Our signature scan bypasses + this entirely** by finding the Cards dict directly rather than walking to + it from PAPA, so the broken static read doesn't block card extraction, but + it prevents walking the other fields on PAPA (InventoryManager, EventManager, + MatchManager, etc.) that might be interesting to read in the future. + +## Local patches on top of upstream HEAD + +All in `src/napi/mod.rs` and `src/mono_reader.rs`. None have been sent upstream. + +1. **`MonoReader::is_admin` — macOS branch added** + (`src/mono_reader.rs:48-75`). Upstream's function had branches for Windows + (via `is_elevated` crate) and Linux (via `sudo::check()`) but no macOS + branch, so the function body is `()` on macOS and doesn't match the + declared `-> bool` return type. The crate doesn't compile on macOS as + published. Our patch adds a `#[cfg(target_os = "macos")]` branch that + uses `libc::geteuid() == 0`. + +2. **`MemReader::read_{u64,i64,u16,i16,i8,f32,f64}` methods added** + (`src/napi/mod.rs` in the `macos_backend` module's `MemReader` impl). The + local `MemReader` struct upstream only defines `read_u8/u32/i32/ptr`, but + `read_field_value` further down in the file calls all the missing ones. + Seven missing methods = seven build errors on a cold `cargo build`. Each + fix follows the same `from_le_bytes` pattern as the existing methods. + +3. **`scan_heap_for_cards_dictionary` + `read_cards_dictionary_entries`** + (`src/napi/mod.rs` in `macos_backend`). Signature scan that walks writable + heap regions looking for a `Dictionary`-shaped object with: + - `count` in `[500, 50_000]` + - `buckets_ptr` and `entries_ptr` in a plausible heap range + - First 30 entries have `hash == key` (the defining signature of a .NET + `Dictionary` with the default `EqualityComparer`, + since `GetHashCode(x) == x` for int) + - Keys in Arena card-id range `[1, 200_000]` + - Values in `[1, 4]` (Arena's internal cap) + This uniquely identifies the live card collection dict across 200k+ + candidate positions. Scoring supports optional `MTGA_KNOWN_CARD_IDS` and + `MTGA_VERIFY_QTYS` env vars for cross-validation if you're debugging. + +4. **`read_mtga_cards_impl` + `readMtgaCards` napi export** + (`src/napi/mod.rs`, napi export section at the bottom of the file). Public + entry point that calls the scanner and returns the Cards dictionary's + contents. Bypasses all of upstream's PAPA walker / WrapperController + walker / InventoryManager walker / field-walk machinery. + +5. **Various diagnostic functions — `scan_for_type_info_table`, + `find_class_by_direct_scan`, `dump_class_names_matching`, + `find_papa_instance_via_static_field`, `find_wrapper_controller_instance`, + `find_papa_instance_by_field_verification`, `probe_card_printing_record`, + `scan_for_dict_entry_pattern`.** All dead code now — used during the + reverse-engineering to learn about Arena's memory layout. Feel free to + delete if you're cleaning up for an upstream PR, but they're useful + reference for how to probe specific aspects of Arena's in-process state. + +## CardPrintingRecord field layout + +Captured by running `probe_card_printing_record()` from our own napi +module against a live Arena process. The function calls +`get_class_fields(cpr_class)`, which walks the `FieldInfo[]` array +stored on Arena's own IL2CPP class metadata at startup — so it's +authoritative for whatever Arena build is currently running. We did +not consult any third-party reader to derive this table. + +**Class**: `CardPrintingRecord` in Assembly-CSharp. 50 fields, +confirmed via `get_class_fields()` on current Arena: + +| Offset | Name | Notes | +|---|---|---| +| `0x00` | `Blank` | static sentinel | +| `0x10` | **`GrpId`** | **int — this is Scryfall's `arena_id`** | +| `0x14` | `ArtId` | int | +| `0x18` | `ArtPath` | pointer to string | +| `0x20` | **`TitleId`** | **int — NOT a string; index into a localization table** | +| `0x24` | `InterchangeableTitleId` | int | +| `0x28` | `AltTitleId` | int | +| `0x2c` | `FlavorTextId` | int | +| `0x30` | `ReminderTextId` | int | +| `0x34` | `TypeTextId` | int | +| `0x38` | `SubtypeTextId` | int | +| `0x40` | `ArtistCredit` | pointer to string | +| `0x48` | `ArtSize` | ? | +| `0x4c` | `Rarity` | enum int | +| `0x50` | **`ExpansionCode`** | **pointer to Il2CppString like `"tle"`** | +| `0x58` | `DigitalReleaseSet` | pointer | +| `0x60` | `IsToken` | bool | +| `0x61` | `IsPrimaryCard` | bool | +| `0x62` | `IsDigitalOnly` | bool | +| `0x63` | `IsRebalanced` | bool | +| `0x64` | `RebalancedCardGrpId` | int | +| `0x68` | `DefunctRebalancedCardGrpId` | int | +| `0x6c` | `AlternateDeckLimit` | int | +| `0x70` | **`CollectorNumber`** | **pointer to Il2CppString like `"162"`** | +| `0x78` | `CollectorMax` | pointer | +| `0x80` | `CollectorSuffix` | pointer | +| `0x88` | `DraftContent` | bool | +| `0x8a` | `UsesSideboard` | bool | +| `0x90` | `OldSchoolManaText` | pointer | +| `0x98` | `LinkedFaceType` | ? | +| `0xa0` | `RawFrameDetail` | ? | +| `0xa8` | `Watermark` | pointer | +| `0xb0` | `TextChangeData` | ? | +| `0xc0` | `Power` | pointer to string | +| `0xd0` | `Toughness` | pointer to string | +| `0xe0` | `Colors` | array | +| `0xe8` | `ColorIdentity` | array | +| `0xf0` | `FrameColors` | array | +| `0xf8` | `IndicatorColors` | array | +| `0x100` | `Types` | array | +| `0x108` | `Subtypes` | array | +| `0x110` | `Supertypes` | array | +| `0x118` | `AbilityIds` | array | +| `0x120` | `HiddenAbilityIds` | array | +| `0x128` | `LinkedFaceGrpIds` | array | +| `0x130` | `LinkedAbilityTemplateCardGrpIds` | array | +| `0x138` | `AbilityIdToLinkedTokenGrpId` | dict | +| `0x140` | `AbilityIdToLinkedConjurations` | dict | +| `0x148` | `KnownSupportedStyles` | array | +| `0x150` | `AdditionalFrameDetails` | ? | + +### Paths to resolve a grp_id to a card name + +**Path A — via `TitleId` + Arena's localization table (offline, untried)**: +`TitleId` at offset `0x20` is an int ID into Arena's localization database, +not a direct string pointer. Resolving to English text requires walking a +localization data structure we haven't explored — likely keyed first by +language code and then by TitleId, with some form of fallback handling, +possibly lazy-loaded. The walker hasn't been written. + +**Path B — via `ExpansionCode` + `CollectorNumber` (online, simpler)**: +`ExpansionCode` at `0x50` and `CollectorNumber` at `0x70` are both +Il2CppString pointers (UTF-16 managed strings). With both, hit +`https://api.scryfall.com/cards/{set}/{number}` — which **returns cards +even when Scryfall's `arena_id` field is null** (verified during the +investigation, e.g. `/cards/tle/162` returns `Diresight` with +`arena_id: None`). Introduces a network dependency but that's +cacheable on disk. + +### Finding CardPrintingRecord instances + +Our direct heap scan for `obj[0] == cpr_class` produces **mostly false +positives**. The sample we captured (`probe_card_printing_record` output) +showed: + +- **Instance 1** at `0x1036a9910`: `GrpId=75, TitleId=1`, rest mostly zero. + Looked like a tiny token or placeholder slot. +- **Instance 2** at `0x10399d1a8`: all zeros. Uninitialized. +- **Instance 3** at `0x103a320d8`: **GrpId=71806704** (a pointer value, not + an int) — the "class pointer" at offset 0 coincidentally matched + `cpr_class` but the struct at that address is some other type. +- **Instances 4-5** at `0x10ff09b98`/`0x10ff09ba8`: have fields reading as + strings `"_count"`, `"_entries"`, `"_freeList"`, `"_buckets"` — **they're + actually Dictionary internal field-name string literals** that happened to + land at addresses whose first 8 bytes equal `cpr_class`. + +**The real instances must be inside a container** — probably Arena has a +`Dictionary` or similar. To find it, scan for a +dictionary with these properties: + +- `hash == key` at the standard Dictionary layout (hash at +0, key at + +8 of each entry) +- Entry stride `24` bytes (not 16) because the value is an 8-byte pointer, + plus 4 bytes alignment padding between `key` (int, 4 bytes) and `value` + (ptr, 8 bytes) +- `count` around **17,000** — roughly how many cards Arena ships with +- Keys (grp_ids) in the Arena range `[1, 200_000]` +- Values pointing to objects whose first 8 bytes equal `cpr_class` + +Once found, iterate the entries and for each valid (key, value_ptr) pair, +the value_ptr is a real `CardPrintingRecord*`. Read the fields we care about +(`GrpId`, `ExpansionCode`, `CollectorNumber` — or `TitleId` if going the +localization-table route). + +### Improving static field reading (alternative approach) + +If we wanted to fix the broken `papa._instance` read instead of +bypassing it, some ideas that haven't been explored: + +1. **Dump the raw bytes of the `Il2CppClass` struct for PAPA** and compare + against the layout our code assumes (`CLASS_STATIC_FIELDS` at `0xA8`). + Look for another pointer field nearby that might be the GC-tracked static + area. +2. **Cross-reference against the actual IL2CPP source** at + `https://github.com/Unity-Technologies/il2cpp` or similar. The `Il2CppClass` + struct layout is public, but it varies by Unity version. +3. **Use `il2cpp-dumper`** or `Il2CppInspector` against Arena's + `GameAssembly.dylib` to get a definitive dump of every class's metadata. + Those tools are open source and specifically target reverse-engineering + IL2CPP binaries. They'd tell us exactly what offset `static_fields` is at + and whether there's a separate GC-static-fields pointer. + +## Resume / rebuild instructions + +```sh +cd ~/repos/mtga-reader + +# Make sure Rust is on PATH (rustup lives in ~/.cargo by default) +. "$HOME/.cargo/env" + +# devDeps (napi-rs CLI etc.) — only needed on first checkout +npm install + +# Full release build. Produces mtga-reader.darwin-arm64.node in the repo +# root, which Node's native loader picks up via ./index.js. +npm run build + +# Optional: expose as a global package for experiments +npm link + +# Run against live MTGA (needs sudo for task_for_pid). Example using +# our custom readMtgaCards function: +sudo node -e 'console.log(require("mtga-reader").readMtgaCards("MTGA"))' +``` + +`MTGA_KNOWN_CARD_IDS` and `MTGA_VERIFY_QTYS` env vars are read by the +`scan_heap_for_cards_dictionary` function for ground-truth validation when +there are multiple passing candidates: + +```sh +MTGA_KNOWN_CARD_IDS="90881,90804,91088" \ +MTGA_VERIFY_QTYS="98307:4,98487:3" \ +sudo node -e 'console.log(require("mtga-reader").readMtgaCards("MTGA"))' +``` + +## Decision log + +- **2026-04-10: Forked from upstream HEAD.** `v0.1.5` on npm doesn't build + on macOS at all (no cfg(target_os="macos") deps in `Cargo.toml`, no macOS + branch in `is_admin`, stale API surface). + +- **2026-04-10: Bypassed the PAPA walker entirely.** Upstream's code path + walks `PAPA._instance → InventoryManager → _inventoryServiceWrapper → + Cards`, but `PAPA._instance` reads as 0 on current Arena builds. Replaced + with a signature scan (`scan_heap_for_cards_dictionary`) that finds the + Cards dict directly by `hash == key` + value-range signature. + +- **2026-04-10: Accepted that Scryfall arena_id coverage is incomplete.** + Scryfall's `default_cards.json` has ~16,500 entries with populated + `arena_id` but Arena has ~17,466 cards total; the gap is newer Alchemy and + Universes Beyond sets where Scryfall's upstream data hasn't caught up. + Adding `commander-tuner/mtga-import --untapped-csv` as a fallback / + primary source proved to be the cheapest and most reliable fix. The + CardDatabase walker was NOT implemented. + +- **2026-04-11: Removed mtga-reader support from `mtga-import`.** The + `untapped-csv` source is simpler (no sudo, no Arena running, no Rust + toolchain, no Scryfall coverage gap), and the mtga-reader code path hit + enough macOS-specific friction that it wasn't worth maintaining in the + commander-tuner repo. This fork stays in `~/repos/mtga-reader` as a + standalone project for future experimentation. diff --git a/index.d.ts b/index.d.ts index 516e67c..f9fc875 100644 --- a/index.d.ts +++ b/index.d.ts @@ -61,5 +61,25 @@ export declare function getInstanceField(address: number, fieldName: string): an export declare function getStaticField(classAddress: number, fieldName: string): any export declare function getDictionary(address: number): DictionaryData export declare function readData(processName: string, fields: Array): any +/** + * Signature-based card-collection reader. Scans the MTGA process + * heap for a `Dictionary` object whose contents match + * the shape of an Arena player collection (enough entries, keys in + * the Arena card-id range, values in the quantity range) and + * returns the list of (cardId, quantity) entries. + * + * This is a macOS-only path added as a local patch: the + * `readData` walker starting from PAPA / WrapperController turned + * out to be too fragile against current Arena builds (IL2CPP + * metadata layout drift, runtime-class-vs-metadata-class + * indirection, inconsistent CLASS_NAME offsets on runtime-allocated + * class structs). The signature scan sidesteps every one of those + * by searching for the only dictionary in the process whose entries + * all look like real card entries. + * + * Returns a JSON array of `{ "cardId": int, "quantity": int }` + * objects on success, or `{ "error": string }` on any failure. + */ +export declare function readMtgaCards(processName: string): any export declare function readClass(processName: string, address: number): any export declare function readGenericInstance(processName: string, address: number): any diff --git a/index.js b/index.js index 59c4790..f064a46 100644 --- a/index.js +++ b/index.js @@ -310,7 +310,7 @@ if (!nativeBinding) { throw new Error(`Failed to load native binding`) } -const { isAdmin, findProcess, init, close, isInitialized, getAssemblies, getAssemblyClasses, getClassDetails, getInstance, getInstanceField, getStaticField, getDictionary, readData, readClass, readGenericInstance } = nativeBinding +const { isAdmin, findProcess, init, close, isInitialized, getAssemblies, getAssemblyClasses, getClassDetails, getInstance, getInstanceField, getStaticField, getDictionary, readData, readMtgaCards, readClass, readGenericInstance } = nativeBinding module.exports.isAdmin = isAdmin module.exports.findProcess = findProcess @@ -325,5 +325,6 @@ module.exports.getInstanceField = getInstanceField module.exports.getStaticField = getStaticField module.exports.getDictionary = getDictionary module.exports.readData = readData +module.exports.readMtgaCards = readMtgaCards module.exports.readClass = readClass module.exports.readGenericInstance = readGenericInstance diff --git a/package-lock.json b/package-lock.json index 0dff401..1a3e28e 100644 --- a/package-lock.json +++ b/package-lock.json @@ -161,6 +161,7 @@ "resolved": "https://registry.npmjs.org/acorn/-/acorn-8.11.3.tgz", "integrity": "sha512-Y9rRfJG5jcKOE0CLisYbojUjIrIEE7AGMzA/Sm4BslANhbS+cDMpgBdcPT91oJ7OuJ9hYJBx59RjbhxVnrF8Xg==", "dev": true, + "peer": true, "bin": { "acorn": "bin/acorn" }, diff --git a/src/napi/mod.rs b/src/napi/mod.rs index ba3da22..1865f02 100644 --- a/src/napi/mod.rs +++ b/src/napi/mod.rs @@ -676,6 +676,41 @@ mod macos_backend { bytes.first().copied().unwrap_or(0) } + pub fn read_i8(&self, addr: usize) -> i8 { + let bytes = self.read_bytes(addr, 1); + i8::from_le_bytes(bytes.try_into().unwrap_or([0; 1])) + } + + pub fn read_u16(&self, addr: usize) -> u16 { + let bytes = self.read_bytes(addr, 2); + u16::from_le_bytes(bytes.try_into().unwrap_or([0; 2])) + } + + pub fn read_i16(&self, addr: usize) -> i16 { + let bytes = self.read_bytes(addr, 2); + i16::from_le_bytes(bytes.try_into().unwrap_or([0; 2])) + } + + pub fn read_u64(&self, addr: usize) -> u64 { + let bytes = self.read_bytes(addr, 8); + u64::from_le_bytes(bytes.try_into().unwrap_or([0; 8])) + } + + pub fn read_i64(&self, addr: usize) -> i64 { + let bytes = self.read_bytes(addr, 8); + i64::from_le_bytes(bytes.try_into().unwrap_or([0; 8])) + } + + pub fn read_f32(&self, addr: usize) -> f32 { + let bytes = self.read_bytes(addr, 4); + f32::from_le_bytes(bytes.try_into().unwrap_or([0; 4])) + } + + pub fn read_f64(&self, addr: usize) -> f64 { + let bytes = self.read_bytes(addr, 8); + f64::from_le_bytes(bytes.try_into().unwrap_or([0; 8])) + } + pub fn read_string(&self, addr: usize) -> String { if addr == 0 { return String::new(); @@ -701,34 +736,313 @@ mod macos_backend { pub static STATE: Mutex = Mutex::new(StateWrapper(None)); fn find_second_data_segment(pid: u32) -> usize { + // Historical name kept for compatibility. Upstream hardcoded + // the "second __DATA segment of GameAssembly" pattern with + // the assumption that a fixed offset inside that segment + // held the IL2CPP type info table. Both assumptions drifted: + // the real table lives in the first __DATA segment on current + // MTGA builds, and even there the offset `0x24360` is wrong. + // Returning any segment start here is now just a sentinel so + // init_impl knows whether vmmap parsing succeeded at all; + // init_impl does its own scan via `scan_for_type_info_table`. + find_all_data_segments(pid) + .into_iter() + .next() + .map(|(s, _e)| s) + .unwrap_or(0) + } + + fn find_all_data_segments(pid: u32) -> Vec<(usize, usize)> { let output = Command::new("vmmap") .args(["-wide", &pid.to_string()]) .output() .ok(); + let mut result: Vec<(usize, usize)> = Vec::new(); if let Some(output) = output { let stdout = String::from_utf8_lossy(&output.stdout); - let mut found_first = false; - for line in stdout.lines() { - if line.contains("GameAssembly") && line.contains("__DATA") && !line.contains("__DATA_CONST") { + if line.contains("GameAssembly") + && line.contains("__DATA") + && !line.contains("__DATA_CONST") + { let parts: Vec<&str> = line.split_whitespace().collect(); if parts.len() >= 2 { let addr_parts: Vec<&str> = parts[1].split('-').collect(); - if let Ok(start) = usize::from_str_radix(addr_parts[0], 16) { - if found_first { - return start; + if addr_parts.len() >= 2 { + if let (Ok(start), Ok(end)) = ( + usize::from_str_radix(addr_parts[0], 16), + usize::from_str_radix(addr_parts[1], 16), + ) { + result.push((start, end)); } - found_first = true; } } } } } + // vmmap output ordering isn't guaranteed, so sort so callers + // can reliably walk segments low-to-high. + result.sort(); + result + } + + /// Scan GameAssembly's __DATA segments for an IL2CPP type info + /// table. Hardcoded offsets in upstream rot every time Unity + /// reshuffles its metadata layout, so we find the table + /// heuristically each time init runs. + /// + /// Strategy: + /// 1. For each __DATA segment, read the whole segment in ONE + /// `mach_vm_read_overwrite` call into a local buffer (avoids + /// doing 720k syscalls one slot at a time). + /// 2. Phase 1 (in-memory): walk the buffer 8 bytes at a time + /// looking for positions where the next 20 aligned slots are + /// all either zero or look like pointers (in the coarse + /// macOS arm64 userspace range). + /// 3. Phase 2 (syscalls): for each candidate position, take the + /// first non-zero slot, treat it as a class pointer, read + /// `slot + CLASS_NAME` to get a name-pointer, read a string + /// from there, and check that the result is a plausible class + /// name (non-empty, short, ASCII-graphic). If 10+ slots in a + /// row yield valid names, the position is the table base. + /// + /// Returns the absolute address of the discovered type info table, + /// or 0 if no plausible table was found in any __DATA segment. + fn scan_for_type_info_table(reader: &MemReader, pid: u32) -> usize { + // Coarse bounds for "looks like a loaded-library pointer on + // macOS arm64 userspace". Real pointers into GameAssembly are + // all in [0x100000000, 0x200000000] on current MTGA builds — + // and the scanner is only a heuristic anyway, so a permissive + // range is fine. + const MIN_PTR: usize = 0x1_0000_0000; + const MAX_PTR: usize = 0x2_0000_0000; + const WINDOW_SLOTS: usize = 20; + const PLAUSIBLE_THRESHOLD: usize = 18; // out of 20 + const NAME_VALIDATION_STREAK: usize = 10; + + let is_ptr = |v: usize| v == 0 || (v >= MIN_PTR && v <= MAX_PTR); + // Any plausibly-decoded C# / IL2CPP identifier — including generic + // placeholders like `$i1` — passes this check. Used in phase 2 to + // filter garbage memory reads, but NOT enough on its own to pick + // the right table (the IL2CPP generic-instance table is full of + // `$i1`-style placeholders and will pass this check). + let is_valid_name = |s: &str| { + !s.is_empty() + && s.len() <= 128 + && s.chars().all(|c| { + c.is_ascii_graphic() || c == '_' || c == '.' || c == '`' || c == '<' || c == '>' + }) + }; + // A "rich" name is one that is almost certainly a real C# type + // definition from user code (as opposed to a generic placeholder + // or metadata marker): starts with letter/underscore, has + // enough length to be meaningful, and either contains a + // namespace dot or is capitalized (a PascalCase type name). + // Counting these in candidate tables lets us distinguish the + // main type_info_table (many rich names) from the generic + // instance table (mostly $i1 placeholders). + let is_rich_name = |s: &str| { + if s.len() < 4 || s.len() > 128 { + return false; + } + let first = match s.chars().next() { + Some(c) => c, + None => return false, + }; + if !(first.is_ascii_alphabetic() || first == '_') { + return false; + } + // All chars ASCII-identifier-ish + if !s.chars().all(|c| { + c.is_ascii_alphanumeric() + || c == '_' + || c == '.' + || c == '`' + || c == '<' + || c == '>' + || c == ',' + || c == ' ' + || c == '[' + || c == ']' + }) { + return false; + } + // Either has a namespace dot OR starts with an uppercase + // PascalCase-style identifier + s.contains('.') || first.is_ascii_uppercase() + }; + + let segments = find_all_data_segments(pid); + eprintln!( + "scan_for_type_info_table: {} __DATA segments to scan: {:?}", + segments.len(), + segments + .iter() + .map(|(s, e)| format!("0x{:x}-0x{:x} ({}K)", s, e, (e - s) / 1024)) + .collect::>(), + ); + + for (seg_start, seg_end) in segments { + let seg_size = seg_end - seg_start; + // Read the whole segment into a local buffer in one syscall. + // MTGA's __DATA segments are single contiguous VM regions + // per vmmap, so this call is cheap and doesn't span holes. + let buf = reader.read_bytes(seg_start, seg_size); + if buf.len() != seg_size { + eprintln!( + "scan_for_type_info_table: segment 0x{:x}-0x{:x} short read ({} of {} bytes), skipping", + seg_start, seg_end, buf.len(), seg_size, + ); + continue; + } + + let slot_count = seg_size / 8; + // Phase 1: collect in-memory candidates. + let mut candidates: Vec = Vec::new(); + let mut i = 0; + while i + WINDOW_SLOTS < slot_count { + // Read window of 20 aligned slots directly from buffer. + let mut plausible = 0usize; + let mut any_nonzero = false; + for j in 0..WINDOW_SLOTS { + let off = (i + j) * 8; + let slot = u64::from_le_bytes(buf[off..off + 8].try_into().unwrap_or([0; 8])) + as usize; + if slot != 0 { + any_nonzero = true; + } + if is_ptr(slot) { + plausible += 1; + } + } + if plausible >= PLAUSIBLE_THRESHOLD && any_nonzero { + candidates.push(i); + // Skip ahead past this window to avoid flooding + // candidates with every shifted copy of the same + // hit. Phase 2 will verify more rigorously. + i += WINDOW_SLOTS; + continue; + } + i += 1; + } + eprintln!( + "scan_for_type_info_table: segment 0x{:x}: {} phase-1 candidates", + seg_start, candidates.len(), + ); + + // Phase 2 + 3: score candidates by the number of UNIQUE + // rich class names they contain in their first VERIFY_DEPTH + // slots. The main type_info_table has thousands of unique + // classes (each C# type definition gets one slot); IL2CPP's + // generic-specialization tables have repeats where every + // slot shares the base generic name (e.g., 31 slots all + // reading `AltAssetReference\`1`). Scoring by uniqueness + // separates the two. We also require a meaningful floor of + // unique rich names so a candidate with 3 unique names and + // 100 zeros doesn't accidentally win. + const VERIFY_DEPTH: usize = 300; + const MIN_UNIQUE_RICH: usize = 20; + + use std::collections::HashSet; + let mut best: Option<(usize, usize, Vec)> = None; + + for ci in candidates { + let off = ci * 8; + let first_slot = u64::from_le_bytes(buf[off..off + 8].try_into().unwrap_or([0; 8])) + as usize; + if first_slot == 0 || !is_ptr(first_slot) { + continue; + } + // Does slot[0] look like a class? + let name_ptr = reader.read_ptr(first_slot + offsets::CLASS_NAME); + if name_ptr == 0 || !is_ptr(name_ptr) { + continue; + } + let first_name = reader.read_string(name_ptr); + if !is_valid_name(&first_name) { + continue; + } + + // Deep verify: count valid names + UNIQUE rich names + // over VERIFY_DEPTH slots. Bail early if the walker + // wanders off the end of the table into garbage. + let mut valid_names = 0usize; + let mut unique_rich: HashSet = HashSet::new(); + let mut out_of_band = 0usize; + + for k in 0..VERIFY_DEPTH { + if ci + k >= slot_count { + break; + } + let off_k = (ci + k) * 8; + let slot_k = u64::from_le_bytes( + buf[off_k..off_k + 8].try_into().unwrap_or([0; 8]), + ) as usize; + if slot_k == 0 { + continue; + } + if !is_ptr(slot_k) { + // Non-pointer garbage: 3 in a row means we walked + // past the table's end, stop scanning. + out_of_band += 1; + if out_of_band >= 3 { + break; + } + continue; + } + out_of_band = 0; + let nk_ptr = reader.read_ptr(slot_k + offsets::CLASS_NAME); + if nk_ptr == 0 || !is_ptr(nk_ptr) { + continue; + } + let nk = reader.read_string(nk_ptr); + if !is_valid_name(&nk) { + continue; + } + valid_names += 1; + if is_rich_name(&nk) { + unique_rich.insert(nk); + } + } + + let unique_count = unique_rich.len(); + if valid_names < NAME_VALIDATION_STREAK || unique_count < MIN_UNIQUE_RICH { + continue; + } + + let addr = seg_start + off; + let is_new_best = match &best { + Some((_, best_unique, _)) => unique_count > *best_unique, + None => true, + }; + if is_new_best { + let mut samples: Vec = unique_rich.iter().take(10).cloned().collect(); + samples.sort(); + eprintln!( + "scan_for_type_info_table: candidate at 0x{:x} valid={}, unique_rich={}, sample={:?}", + addr, valid_names, unique_count, samples, + ); + best = Some((addr, unique_count, samples)); + } + } + + if let Some((addr, unique, samples)) = best { + eprintln!( + "scan_for_type_info_table: FOUND at 0x{:x} (best candidate in segment 0x{:x}), unique_rich_names={}, samples={:?}", + addr, seg_start, unique, samples, + ); + return addr; + } + } 0 } fn find_class_by_name(reader: &MemReader, type_info_table: usize, name: &str) -> Option { + // Unused when the caller prefers find_class_by_direct_scan + // (which is more robust across table-layout drift). Kept for + // compatibility with upstream code paths that still treat + // `state.type_info_table` as authoritative. for i in 0..50000 { let class_ptr = reader.read_ptr(type_info_table + i * 8); if class_ptr == 0 { @@ -738,13 +1052,155 @@ mod macos_backend { if name_ptr == 0 { continue; } - if reader.read_string(name_ptr) == name { + let class_name = reader.read_string(name_ptr); + if class_name.is_empty() { + continue; + } + if class_name == name { return Some(class_ptr); } } None } + /// Scan both __DATA segments and dump every unique class name + /// containing the given substring. Diagnostic only — used to + /// discover the right class names when upstream's hardcoded + /// names drift. Caps at `limit` results per call to avoid + /// flooding stderr. + fn dump_class_names_matching(reader: &MemReader, pid: u32, needle: &str, limit: usize) { + use std::collections::HashSet; + const MIN_PTR: usize = 0x1_0000_0000; + const MAX_PTR: usize = 0x2_0000_0000; + + let segments = find_all_data_segments(pid); + let mut seen: HashSet = HashSet::new(); + let mut names_seen: HashSet = HashSet::new(); + let mut matches: Vec = Vec::new(); + + 'outer: for (seg_start, seg_end) in segments { + let size = seg_end - seg_start; + let buf = reader.read_bytes(seg_start, size); + if buf.len() != size { + continue; + } + let slot_count = size / 8; + for i in 0..slot_count { + let off = i * 8; + let p = u64::from_le_bytes(buf[off..off + 8].try_into().unwrap_or([0; 8])) + as usize; + if p < MIN_PTR || p > MAX_PTR { + continue; + } + if !seen.insert(p) { + continue; + } + let name_ptr = reader.read_ptr(p + offsets::CLASS_NAME); + if name_ptr < MIN_PTR || name_ptr > MAX_PTR { + continue; + } + let class_name = reader.read_string(name_ptr); + if class_name.is_empty() || class_name.len() > 128 { + continue; + } + // Must look like a real C# identifier, not garbage. + if !class_name.chars().next().map_or(false, |c| c.is_ascii_alphabetic() || c == '_' || c == '<') { + continue; + } + if class_name.contains(needle) && names_seen.insert(class_name.clone()) { + matches.push(class_name); + if matches.len() >= limit { + break 'outer; + } + } + } + } + matches.sort(); + eprintln!( + "dump_class_names_matching({:?}): {} unique match(es): {:?}", + needle, matches.len(), matches, + ); + } + + /// Scan all __DATA segments of GameAssembly.dylib for an + /// `Il2CppClass*` whose `CLASS_NAME` string equals `name`. + /// + /// This is a more robust alternative to + /// `find_class_by_name(type_info_table, ...)` because it does not + /// depend on locating "the" type info table — IL2CPP's metadata + /// layout has enough sub-tables (generic instantiations, + /// per-assembly name lookups, interface method tables, etc.) that + /// picking a specific one by heuristic is fragile. Every class + /// pointer that matters for this importer is referenced from + /// somewhere in __DATA at least once, so a direct pointer scan + /// finds them regardless of which sub-table holds them. + /// + /// Algorithm: + /// 1. Read both __DATA segments into memory in one syscall each. + /// 2. Walk the buffer 8 bytes at a time, collect every + /// pointer-shaped value within the GameAssembly mapping range. + /// 3. Deduplicate pointers via a HashSet so we only dereference + /// each candidate once. + /// 4. For each unique candidate, read `ptr + CLASS_NAME` then + /// read the name string; compare to target. + fn find_class_by_direct_scan( + reader: &MemReader, + pid: u32, + name: &str, + ) -> Option { + use std::collections::HashSet; + const MIN_PTR: usize = 0x1_0000_0000; + const MAX_PTR: usize = 0x2_0000_0000; + + let segments = find_all_data_segments(pid); + let mut seen: HashSet = HashSet::new(); + let mut checked: usize = 0; + let mut matched: Option = None; + + for (seg_start, seg_end) in segments { + let size = seg_end - seg_start; + let buf = reader.read_bytes(seg_start, size); + if buf.len() != size { + continue; + } + let slot_count = size / 8; + for i in 0..slot_count { + let off = i * 8; + let p = u64::from_le_bytes(buf[off..off + 8].try_into().unwrap_or([0; 8])) + as usize; + if p < MIN_PTR || p > MAX_PTR { + continue; + } + if !seen.insert(p) { + continue; + } + let name_ptr = reader.read_ptr(p + offsets::CLASS_NAME); + if name_ptr < MIN_PTR || name_ptr > MAX_PTR { + continue; + } + let class_name = reader.read_string(name_ptr); + if class_name.is_empty() || class_name.len() > 128 { + continue; + } + checked += 1; + if class_name == name { + matched = Some(p); + break; + } + } + if matched.is_some() { + break; + } + } + eprintln!( + "find_class_by_direct_scan: target={:?}, unique_candidates_checked={}, found={}", + name, + checked, + matched.is_some(), + ); + matched + } + pub fn read_class_name(reader: &MemReader, class: usize) -> String { if class == 0 || class < 0x100000 { return String::new(); @@ -761,43 +1217,1235 @@ mod macos_backend { reader.read_string(ns_ptr) } - fn find_papa_instance(reader: &MemReader, papa_class: usize) -> Option { - let heap_regions = [ - (0x15a000000_usize, 0x15b000000_usize), - (0x158000000_usize, 0x16a000000_usize), - (0x145000000_usize, 0x150000000_usize), - ]; + /// Find PAPA's singleton instance by reading a static field + /// directly out of the class's static-fields region, rather than + /// heap-scanning. This is much more reliable: every C# + /// `public static Instance { get; }` compiles to a backing field + /// on the declaring class whose value is the singleton pointer. + /// We don't have to guess object layouts or scan gigabytes of + /// heap. + /// + /// Returns the first non-null pointer found in a static field of + /// PAPA whose name contains `"instance"` (case-insensitive). Also + /// dumps the full field list of PAPA to stderr for debugging + /// when the lookup fails — that's how we'll chase down any + /// future renames of the backing field. + fn find_papa_instance_via_static_field( + reader: &MemReader, + papa_class: usize, + ) -> Option { + let fields = get_class_fields(reader, papa_class); + eprintln!( + "find_papa_instance_via_static_field: PAPA has {} field(s)", + fields.len(), + ); + + let static_fields_base = reader.read_ptr(papa_class + offsets::CLASS_STATIC_FIELDS); + eprintln!( + "find_papa_instance_via_static_field: CLASS_STATIC_FIELDS base = 0x{:x}", + static_fields_base, + ); + + // Pass 1: report every field so we can see the layout. + for (i, field) in fields.iter().enumerate() { + let value = if field.is_static && static_fields_base > 0x100000 { + reader.read_ptr(static_fields_base + field.offset as usize) + } else { + 0 + }; + eprintln!( + " field[{}] name={:?} type={:?} offset=0x{:x} is_static={} static_value=0x{:x}", + i, field.name, field.type_name, field.offset, field.is_static, value, + ); + } + + if static_fields_base < 0x100000 { + return None; + } + + // Pass 2: return the first plausible static instance pointer. + // Prefer a field explicitly named `k__BackingField` + // (the compiler-generated backing field for a standard C# + // `public static Instance { get; }` auto-property), then fall + // back to any static field whose name contains "instance". + let mut preferred: Option = None; + let mut fallback: Option = None; + for field in &fields { + if !field.is_static { + continue; + } + let value = reader.read_ptr(static_fields_base + field.offset as usize); + if value < 0x100000 { + continue; + } + let name_lower = field.name.to_ascii_lowercase(); + if field.name == "k__BackingField" { + preferred = Some(value); + break; + } + if name_lower.contains("instance") && fallback.is_none() { + fallback = Some(value); + } + } + preferred.or(fallback) + } + + /// Find PAPA's singleton instance by cross-verifying two + /// independently discovered class pointers: the object's own + /// class pointer (which must equal `papa_class`) AND the + /// InventoryManager field at offset `k__BackingField` + /// on PAPA (which must dereference to an object whose class + /// pointer equals InventoryManager's class). The combination + /// uniquely identifies the real PAPA instance — random heap + /// pointers matching one check are common, but matching BOTH + /// simultaneously is astronomically unlikely. + /// + /// This sidesteps every static-field / object-layout offset + /// assumption. We only need: + /// - `papa_class` (found via direct scan) + /// - The offset of `k__BackingField` in PAPA + /// (read from `get_class_fields` at runtime — the field name + /// matches exactly on current MTGA builds) + /// - `InventoryManager` class pointer (also found via direct scan) + /// + /// Scans writable heap regions and returns the first match where + /// both checks pass. If InventoryManager class isn't found via + /// direct scan OR the field enumeration for PAPA doesn't turn + /// up `k__BackingField`, returns None and lets + /// the caller fall through to the next strategy. + /// Read an object-typed static field from a class. Returns the + /// pointer value stored in the static field, or 0 if anything + /// goes wrong. Used to resolve singleton `Instance` fields like + /// `WrapperController.k__BackingField` which are the + /// documented C# singleton pattern. + fn read_static_object_field( + reader: &MemReader, + class_addr: usize, + field_name: &str, + ) -> usize { + let fields = get_class_fields(reader, class_addr); + let field = match fields.iter().find(|f| f.name == field_name) { + Some(f) => f, + None => { + eprintln!( + "read_static_object_field: class 0x{:x} has no field named {:?}", + class_addr, field_name, + ); + return 0; + } + }; + if !field.is_static { + eprintln!( + "read_static_object_field: field {:?} on class 0x{:x} is not marked static (offset=0x{:x})", + field_name, class_addr, field.offset, + ); + return 0; + } + let static_base = reader.read_ptr(class_addr + offsets::CLASS_STATIC_FIELDS); + if static_base < 0x100000 { + eprintln!( + "read_static_object_field: CLASS_STATIC_FIELDS for class 0x{:x} is 0x{:x}, unusable", + class_addr, static_base, + ); + return 0; + } + let value = reader.read_ptr(static_base + field.offset as usize); + eprintln!( + "read_static_object_field: class 0x{:x} field {:?} static_base=0x{:x} field_offset=0x{:x} value=0x{:x}", + class_addr, field_name, static_base, field.offset, value, + ); + value + } + + /// Locate the WrapperController singleton instance using the + /// documented C# singleton pattern (`WrapperController.Instance` + /// auto-property → `k__BackingField`). Falls back to + /// scanning the heap for any object whose class pointer equals + /// `WrapperController` if the static field read fails. + /// + /// On success returns a pointer to a real WrapperController + /// instance; the caller can use it as `state.papa_instance` + /// (misnomer kept for API compatibility — the walker only cares + /// that the value is a real object whose class the reader can + /// resolve). + /// Signature-scan the heap for a `Dictionary` whose + /// contents look like an Arena card collection: many entries, + /// keys in the card-id range, values in the quantity range. + /// + /// This deliberately avoids every IL2CPP class/metadata offset + /// (`CLASS_NAME`, `CLASS_FIELDS`, `CLASS_STATIC_FIELDS`) because + /// those have been unreliable at every level above runtime + /// instance data on current MTGA builds. The only layout + /// assumption is the documented .NET `Dictionary` + /// object layout: + /// + /// ```text + /// +0x00 klass pointer (Il2CppObject header) + /// +0x08 monitor pointer (Il2CppObject header) + /// +0x10 buckets[] pointer + /// +0x18 entries[] pointer <-- what we read + /// +0x20 count (int32) <-- what we read + /// ``` + /// + /// And the documented `Dictionary.Entry` layout, which + /// upstream's existing dictionary-reader code already uses: + /// + /// ```text + /// Entry[] array header is 0x20 bytes (klass + monitor + length) + /// Each entry is 16 bytes starting at entries[] + 0x20: + /// +0x00 hash (int32; -1 means empty slot) + /// +0x04 next (int32; unused here) + /// +0x08 key (int32) <-- cardId + /// +0x0c value (int32) <-- quantity + /// ``` + /// + /// Arena card IDs are small positive integers (typically + /// 1..200_000), and quantities are 1..4 (we accept up to 99 to + /// tolerate weird edge cases like event rewards or currency + /// counters). A card collection dictionary has thousands of + /// entries, so we require at least `MIN_COUNT` to drop noise from + /// small runtime dictionaries. + /// + /// Returns the address of the best-scoring dictionary (most + /// entries where the first 10 sampled entries all validate), or + /// 0 if nothing passed. + /// Parse the `MTGA_KNOWN_CARD_IDS` environment variable into a + /// set of arena_ids the caller knows should appear in the real + /// collection dict. Format: comma-separated decimal integers, + /// e.g., `"90881,90804,91088"`. Empty or unset → empty set, + /// which disables the known-ids cross-check. + fn parse_known_card_ids_env() -> std::collections::HashSet { + let raw = std::env::var("MTGA_KNOWN_CARD_IDS").unwrap_or_default(); + raw.split(',') + .filter_map(|s| s.trim().parse::().ok()) + .collect() + } + + /// Byte-pattern scan: find every 8-byte-aligned position in any + /// writable heap region where the int32 at +0 equals the int32 + /// at +8 (the defining hash==key shape of a .NET + /// Dictionary.Entry for any value type TValue). For + /// each hit, dump the surrounding 32 bytes so we can recognize + /// entry stride from context. + /// + /// Targeted mode: if `target_key` is Some, only report hits + /// whose hash/key matches the target. Used to confirm whether a + /// specific (cardId, quantity) pair exists anywhere in Arena's + /// memory without assuming entry size. + fn scan_for_dict_entry_pattern( + reader: &MemReader, + pid: u32, + target_key: Option, + expected_value: Option, + max_hits: usize, + ) { + eprintln!( + "scan_for_dict_entry_pattern: target_key={:?} expected_value={:?}", + target_key, expected_value, + ); + let heap_regions = find_scannable_heap_regions(pid); + let mut hits: Vec<(usize, Vec)> = Vec::new(); + + for (start, end) in heap_regions { + let size = end - start; + let buf = reader.read_bytes(start, size); + if buf.len() != size { + continue; + } + // Walk every 4-byte-aligned offset looking for int at +0 + // matching int at +8. + let mut i = 0usize; + while i + 16 <= buf.len() { + let a = i32::from_le_bytes(buf[i..i + 4].try_into().unwrap_or([0; 4])); + let b = i32::from_le_bytes(buf[i + 8..i + 12].try_into().unwrap_or([0; 4])); + if a != 0 && a == b { + // Potential hash == key + let matches_target = match target_key { + Some(t) => a == t, + None => a > 1000 && a < 200_000, // plausible card id + }; + if matches_target { + // Record surrounding 32 bytes for context (if in bounds) + let ctx_start = i.saturating_sub(0); + let ctx_end = (i + 32).min(buf.len()); + let ctx_bytes = buf[ctx_start..ctx_end].to_vec(); + hits.push((start + i, ctx_bytes)); + if hits.len() >= max_hits { + break; + } + } + } + i += 4; + } + if hits.len() >= max_hits { + break; + } + } + + eprintln!( + "scan_for_dict_entry_pattern: {} hits for target={:?}", + hits.len(), target_key, + ); + for (addr, bytes) in hits.iter().take(20) { + let b32: Vec = (0..bytes.len() / 4) + .map(|k| { + let v = i32::from_le_bytes( + bytes[k * 4..k * 4 + 4].try_into().unwrap_or([0; 4]), + ); + format!("{}", v) + }) + .collect(); + eprintln!( + " 0x{:x}: [{}]", + addr, + b32.join(" "), + ); + } + } + + /// Parse `MTGA_VERIFY_QTYS` as a map of `arena_id → expected_quantity`. + /// Format: comma-separated `:` pairs, e.g., + /// `"98307:4,98487:3,90804:2"`. Used by the scanner to + /// distinguish between multiple dicts that all pass the base + /// signature — the correct collection dict will have the + /// expected quantities for these specific cards, while stale + /// caches or format-filtered subsets will have different + /// (smaller or absent) values. + fn parse_verify_qtys_env() -> std::collections::HashMap { + let raw = std::env::var("MTGA_VERIFY_QTYS").unwrap_or_default(); + let mut out = std::collections::HashMap::new(); + for pair in raw.split(',') { + let pair = pair.trim(); + if pair.is_empty() { + continue; + } + let mut parts = pair.splitn(2, ':'); + let id_part = parts.next().unwrap_or(""); + let qty_part = parts.next().unwrap_or(""); + if let (Ok(id), Ok(qty)) = (id_part.parse::(), qty_part.parse::()) { + out.insert(id, qty); + } + } + out + } + + fn scan_heap_for_cards_dictionary(reader: &MemReader, pid: u32) -> usize { + // Arena collections have hundreds to low tens-of-thousands + // of entries. `hash == key` below is the load-bearing + // signature; the count range is a secondary filter that + // mostly drops small on-demand dicts created by game state. + // Arena collections: 500-50000 entries. Tight upper bound + // matters — without env-var ground-truth (known_ids / + // verify_qtys), the selection falls through to + // "biggest count wins", and letting 50k+ dicts through + // means junk game-state dicts (e.g., hash=3/key=3 counter + // arrays) can beat real collections. 50k is plenty for + // a live card collection; anything bigger is noise. + const MIN_COUNT: i32 = 500; + const MAX_COUNT: i32 = 50_000; + // Arena card IDs (internal "grp_id" / Arena IDs) are always + // small positive integers in the observed range + // ~60_000..110_000 but the range slowly extends as new sets + // release. Keep a generous upper bound. + const MIN_CARD_ID: i32 = 1; + const MAX_CARD_ID: i32 = 200_000; + // Arena's internal card-ownership model caps quantities at + // 4. Any card with "any number allowed" rules text (Hare + // Apparent, Persistent Petitioners, Seven Dwarves, Rat + // Colony, Relentless Rats, Shadowborn Apostle) is still + // capped at 4 internally. With MAX_QUANTITY=4 the real + // card collection is essentially the only Dictionary that passes; relaxing this bound lets junk + // counter-shaped dicts through. + const MIN_QUANTITY: i32 = 1; + const MAX_QUANTITY: i32 = 4; + // Sample more entries to handle hash buckets that happen to + // have many empty slots at the start of the entries array. + const SAMPLE_ENTRIES: usize = 30; + const MIN_VALID_SAMPLES: usize = 12; + const MIN_PTR: usize = 0x1_0000_0000; + const MAX_PTR: usize = 0x4_0000_0000; + + let known_ids = parse_known_card_ids_env(); + let verify_qtys = parse_verify_qtys_env(); + let heap_regions = find_scannable_heap_regions(pid); + eprintln!( + "scan_heap_for_cards_dictionary: scanning {} heap regions for Dictionary with {}-{} entries, values in [{}..{}], known_ids={:?}, verify_qtys={:?}", + heap_regions.len(), MIN_COUNT, MAX_COUNT, MIN_QUANTITY, MAX_QUANTITY, known_ids, verify_qtys, + ); + + // Collect EVERY candidate that passes validation rather than + // only the largest. Multiple int→int dictionaries can coexist + // (counters, progress, state, cards), and the "biggest that + // looks valid" heuristic can still land on the wrong one. + // Printing them all lets us tell which is the real collection. + let mut candidates: Vec<(usize, i32, Vec<(i32, i32, i32)>)> = Vec::new(); // (addr, count, first_valid_samples) + let mut candidates_examined = 0usize; + + for (start, end) in heap_regions { + let size = end - start; + let buf = reader.read_bytes(start, size); + if buf.len() != size { + continue; + } + let slot_count = size / 8; + let mut i = 0; + while i + 5 < slot_count { + let base = i * 8; + let buckets_ptr = u64::from_le_bytes( + buf[base + 0x10..base + 0x18].try_into().unwrap_or([0; 8]), + ) as usize; + let entries_ptr = u64::from_le_bytes( + buf[base + 0x18..base + 0x20].try_into().unwrap_or([0; 8]), + ) as usize; + let count = i32::from_le_bytes( + buf[base + 0x20..base + 0x24].try_into().unwrap_or([0; 4]), + ); + + if count < MIN_COUNT + || count > MAX_COUNT + || buckets_ptr < MIN_PTR + || buckets_ptr > MAX_PTR + || entries_ptr < MIN_PTR + || entries_ptr > MAX_PTR + { + i += 1; + continue; + } + + candidates_examined += 1; + + let mut valid = 0usize; + let mut sample_valid_entries: Vec<(i32, i32, i32)> = Vec::new(); // (hash, key, value) + for entry_idx in 0..SAMPLE_ENTRIES { + let entry_addr = entries_ptr + 0x20 + entry_idx * 16; + let entry_bytes = reader.read_bytes(entry_addr, 16); + if entry_bytes.len() != 16 { + break; + } + let hash = i32::from_le_bytes(entry_bytes[0..4].try_into().unwrap_or([0; 4])); + let key = i32::from_le_bytes(entry_bytes[8..12].try_into().unwrap_or([0; 4])); + let value = i32::from_le_bytes(entry_bytes[12..16].try_into().unwrap_or([0; 4])); + + if hash == -1 { + // .NET dict empty-slot marker. + continue; + } + // Defining signature of a real Dictionary + // with default equality comparer: + // hash == key + // because EqualityComparer.Default.GetHashCode(x) == x. + // No other int-keyed dictionary in the Arena process + // has this property — counter / stats / rarity + // dicts either use non-default comparers or store + // data in structs where `hash` at offset 0 is + // something else entirely. This is the tight + // check that distinguishes the real card + // collection from every other Dictionary-shaped thing in the heap. + if hash == key + && key >= MIN_CARD_ID + && key <= MAX_CARD_ID + && value >= MIN_QUANTITY + && value <= MAX_QUANTITY + { + valid += 1; + if sample_valid_entries.len() < 5 { + sample_valid_entries.push((hash, key, value)); + } + } + } + + if valid >= MIN_VALID_SAMPLES { + let dict_addr = start + base; + candidates.push((dict_addr, count, sample_valid_entries)); + } + i += 1; + } + } + + candidates.sort_by_key(|(_, count, _)| std::cmp::Reverse(*count)); + eprintln!( + "scan_heap_for_cards_dictionary: examined {} pre-filter candidates, {} passed validation", + candidates_examined, candidates.len(), + ); + for (i, (addr, count, samples)) in candidates.iter().take(10).enumerate() { + let sample_strs: Vec = samples + .iter() + .map(|(h, k, v)| format!("(hash={},key={},val={})", h, k, v)) + .collect(); + eprintln!( + " [{}] 0x{:x} count={} samples=[{}]", + i, addr, count, sample_strs.join(", "), + ); + } + + // Score each candidate by: + // 1. Number of `known_ids` present (membership check) + // 2. Number of `verify_qtys` whose quantity matches exactly + // (verification — this distinguishes stale/cached + // dicts from the live collection dict because the + // quantities will differ) + // + // Tiebreakers: prefer the candidate with more extracted + // entries, then the bigger `count` field. + // + // If neither env var is set, we fall back to "biggest count + // wins" — still wrong in the general case but it's the + // best we can do without ground truth. + let best = if !known_ids.is_empty() || !verify_qtys.is_empty() { + #[allow(clippy::type_complexity)] + let mut scored: Vec<(usize, i32, usize, usize, usize)> = Vec::new(); // (addr, count, matched_known, matched_qtys, total_valid) + for (addr, count, _) in &candidates { + let entries = read_cards_dictionary_entries(reader, *addr); + let by_id: std::collections::HashMap = + entries.iter().copied().collect(); + let matched_known: usize = known_ids + .iter() + .filter(|id| by_id.contains_key(id)) + .count(); + let matched_qtys: usize = verify_qtys + .iter() + .filter(|(id, expected)| by_id.get(*id) == Some(*expected)) + .count(); + scored.push((*addr, *count, matched_known, matched_qtys, entries.len())); + eprintln!( + " scoring 0x{:x}: count={} extracted={} known_ids={}/{} verify_qtys={}/{}", + addr, count, entries.len(), + matched_known, known_ids.len(), + matched_qtys, verify_qtys.len(), + ); + } + // Rank: verify_qtys is the strictest signal (only the + // TRUE live collection has exactly-matching quantities), + // then known_ids presence, then total extracted count. + scored.sort_by(|a, b| { + b.3.cmp(&a.3) + .then_with(|| b.2.cmp(&a.2)) + .then_with(|| b.4.cmp(&a.4)) + }); + scored.first().and_then(|(addr, _, _, _, _)| { + candidates + .iter() + .find(|(a, _, _)| a == addr) + .cloned() + }) + } else { + candidates.first().cloned() + }; + + match best { + Some((addr, count, samples)) => { + let sample_strs: Vec = samples + .iter() + .map(|(h, k, v)| format!("(h={},k={},v={})", h, k, v)) + .collect(); + eprintln!( + "scan_heap_for_cards_dictionary: SELECTED 0x{:x} count={} samples=[{}]", + addr, count, sample_strs.join(", "), + ); + addr + } + None => 0, + } + } + + /// Read the card entries out of a previously-discovered + /// Dictionary object. + /// + /// Applies the SAME filter that `scan_heap_for_cards_dictionary` + /// uses to identify the dict in the first place: only accept + /// entries where `hash == key` (the defining signature of + /// `Dictionary` with the default equality comparer, + /// since `EqualityComparer.Default.GetHashCode(x) == x`), + /// the key is a plausible Arena card id, and the value is in + /// the 1..4 ownership range. Entries that fail these checks are + /// skipped rather than returned as garbage rows: they represent + /// either deleted/rehashed slots (common in any `Dictionary` + /// that has seen removals) or array-tail padding past the count. + /// Without this filter we would emit hundreds of ghost rows that + /// downstream Arena-id → name resolution has no hope of mapping + /// to real cards. + fn read_cards_dictionary_entries( + reader: &MemReader, + dict_addr: usize, + ) -> Vec<(i32, i32)> { + const MIN_CARD_ID: i32 = 1; + const MAX_CARD_ID: i32 = 200_000; + // Matches the tight value range in + // scan_heap_for_cards_dictionary (see comment there): + // Arena's internal card-ownership cap is 4, so any entry + // with value > 4 is almost certainly not a card collection + // entry. + const MIN_QUANTITY: i32 = 1; + const MAX_QUANTITY: i32 = 4; + + let entries_ptr = reader.read_ptr(dict_addr + 0x18); + let count = reader.read_i32(dict_addr + 0x20); + if entries_ptr < 0x100000 || count <= 0 { + return Vec::new(); + } + let mut entries = Vec::new(); + let mut skipped_empty = 0usize; + let mut skipped_mismatched_hash = 0usize; + let mut skipped_out_of_range = 0usize; + for i in 0..count.min(50_000) as usize { + let entry_addr = entries_ptr + 0x20 + i * 16; + let hash = reader.read_i32(entry_addr); + let key = reader.read_i32(entry_addr + 8); + let value = reader.read_i32(entry_addr + 12); + if hash == -1 { + skipped_empty += 1; + continue; + } + if hash != key { + skipped_mismatched_hash += 1; + continue; + } + if key < MIN_CARD_ID + || key > MAX_CARD_ID + || value < MIN_QUANTITY + || value > MAX_QUANTITY + { + skipped_out_of_range += 1; + continue; + } + entries.push((key, value)); + } + eprintln!( + "read_cards_dictionary_entries: count={} kept={} skipped(empty={}, hash!=key={}, out_of_range={})", + count, + entries.len(), + skipped_empty, + skipped_mismatched_hash, + skipped_out_of_range, + ); + entries + } + + /// Diagnostic: find the `CardPrintingRecord` class and dump its + /// fields, then scan the heap for the first few instances and + /// show what's at each field offset. Used to reverse-engineer + /// Arena's in-process card database layout so we can build our + /// own arena_id → card_name lookup table without depending on + /// Scryfall having populated arena_id values. + /// + /// Untapped's companion app doesn't read this dictionary — they + /// download their own pre-built arena_id → card metadata + /// mapping from their server. We're reconstructing it from + /// Arena's memory directly instead, which gives us an + /// authoritative source that works offline and doesn't lag + /// behind Scryfall's data ingestion. + fn probe_card_printing_record(reader: &MemReader, pid: u32) { + eprintln!("probe_card_printing_record: looking for CardPrintingRecord class..."); + let cpr_class = match find_class_by_direct_scan(reader, pid, "CardPrintingRecord") { + Some(addr) => addr, + None => { + eprintln!("probe_card_printing_record: CardPrintingRecord class not found — bail"); + return; + } + }; + eprintln!("probe_card_printing_record: class = 0x{:x}", cpr_class); + + let fields = get_class_fields(reader, cpr_class); + eprintln!("probe_card_printing_record: {} fields:", fields.len()); + for (i, f) in fields.iter().enumerate() { + eprintln!( + " field[{}] name={:?} type={:?} offset=0x{:x} is_static={}", + i, f.name, f.type_name, f.offset, f.is_static, + ); + } + + // Scan heap for instances: objects whose first 8 bytes equal + // cpr_class. For each of the first few hits, dump 256 bytes + // of the instance so we can see the raw field values. + eprintln!( + "probe_card_printing_record: scanning heap for instances (first 5)...", + ); + let heap_regions = find_scannable_heap_regions(pid); + let mut hits = 0usize; + for (start, end) in heap_regions { + if hits >= 5 { + break; + } + let size = end - start; + let buf = reader.read_bytes(start, size); + if buf.len() != size { + continue; + } + let mut i = 0; + while i + 256 <= buf.len() { + let ptr = u64::from_le_bytes(buf[i..i + 8].try_into().unwrap_or([0; 8])) as usize; + if ptr == cpr_class { + let obj_addr = start + i; + eprintln!(" instance at 0x{:x}:", obj_addr); + // Dump first 20 int32 slots and 20 pointer slots + let mut i32s: Vec = Vec::new(); + let mut ptrs: Vec = Vec::new(); + for k in 0..20 { + let i32_off = k * 4; + if i + i32_off + 4 <= buf.len() { + let v = i32::from_le_bytes( + buf[i + i32_off..i + i32_off + 4].try_into().unwrap_or([0; 4]), + ); + i32s.push(format!("+{:02x}:{}", i32_off, v)); + } + let ptr_off = k * 8; + if i + ptr_off + 8 <= buf.len() { + let p = u64::from_le_bytes( + buf[i + ptr_off..i + ptr_off + 8].try_into().unwrap_or([0; 8]), + ) as usize; + ptrs.push(format!("+{:02x}:0x{:x}", ptr_off, p)); + } + } + eprintln!(" i32s: {}", i32s.join(" ")); + eprintln!(" ptrs: {}", ptrs.join(" ")); + // For each field in the metadata, try to read + // its value from the instance and display. + eprintln!(" field values (from metadata offsets):"); + for f in fields.iter().take(30) { + if f.is_static { + continue; + } + let field_off = f.offset as usize; + if i + field_off + 8 > buf.len() { + continue; + } + let as_int = i32::from_le_bytes( + buf[i + field_off..i + field_off + 4].try_into().unwrap_or([0; 4]), + ); + let as_ptr = u64::from_le_bytes( + buf[i + field_off..i + field_off + 8].try_into().unwrap_or([0; 8]), + ) as usize; + let as_string = if as_ptr >= 0x100000 && as_ptr < 0x400000000 { + reader.read_string(as_ptr) + } else { + String::new() + }; + let string_display = if as_string.is_empty() || as_string.len() > 40 { + String::new() + } else { + format!(" str={:?}", as_string) + }; + eprintln!( + " {}: int={} ptr=0x{:x}{}", + f.name, as_int, as_ptr, string_display, + ); + } + hits += 1; + if hits >= 5 { + break; + } + i += 8; + continue; + } + i += 8; + } + } + eprintln!("probe_card_printing_record: {} instances examined", hits); + } + + /// Public entry point for the signature-based card collection + /// reader. Bypasses the entire PAPA/WrapperController/InventoryManager + /// walker. Called from `read_mtga_cards` below. + pub fn read_mtga_cards_impl(process_name: &str) -> Result> { + let pid = find_pid_by_name(process_name) + .ok_or_else(|| Error::from_reason(format!("Process '{}' not found", process_name)))?; + let reader = MemReader::new(pid); + + // Diagnostic byte-pattern scan — find every location in heap + // where an int equals the int 8 bytes later AND equals a + // specific target arena_id. This confirms whether the + // (cardId, quantity) pair we're looking for actually exists + // in Arena's memory at all, independent of the dict-header + // signature scan below. If the target key is set via + // MTGA_PROBE_CARD_ID, run it before the normal scan so we + // see the diagnostic output regardless of what the scan + // ultimately returns. + if let Ok(probe_str) = std::env::var("MTGA_PROBE_CARD_ID") { + if let Ok(target_key) = probe_str.trim().parse::() { + scan_for_dict_entry_pattern(&reader, pid, Some(target_key), None, 50); + } + } + + // Diagnostic: probe CardPrintingRecord class and dump its + // fields so we can figure out how to map grp_id → card name + // directly from Arena's in-process card database. Gated + // behind MTGA_PROBE_CARD_DB env var so it doesn't always run. + if std::env::var("MTGA_PROBE_CARD_DB").is_ok() { + probe_card_printing_record(&reader, pid); + } + + let dict_addr = scan_heap_for_cards_dictionary(&reader, pid); + if dict_addr == 0 { + return Err(Error::from_reason( + "Cards dictionary not found via heap signature scan. \ + Either the MTGA player is not logged in yet, the card \ + collection is empty, or the Dictionary layout \ + has changed in a way the scanner doesn't recognize.", + )); + } + let entries = read_cards_dictionary_entries(&reader, dict_addr); + if entries.is_empty() { + return Err(Error::from_reason(format!( + "Found Cards dictionary at 0x{:x} but it had no valid entries. \ + This usually means the collection is still loading.", + dict_addr, + ))); + } + eprintln!( + "read_mtga_cards_impl: extracted {} cards from dictionary at 0x{:x}", + entries.len(), dict_addr, + ); + Ok(entries) + } + + fn find_wrapper_controller_instance( + reader: &MemReader, + pid: u32, + wrapper_controller_class: usize, + ) -> Option { + // Strategy 1: read the static k__BackingField. + // Only trust it if the returned pointer dereferences to an + // object whose class is WrapperController — the static field + // parser has been observed returning stale / shared values + // (offset 0 of PAPA and WrapperController both report the + // same static_base, which can't be right). + let inst = read_static_object_field( + reader, + wrapper_controller_class, + "k__BackingField", + ); + if inst >= 0x100000 { + let obj_class = reader.read_ptr(inst); + if obj_class == wrapper_controller_class { + eprintln!( + "find_wrapper_controller_instance: static = 0x{:x} verified (obj->class matches)", + inst, + ); + return Some(inst); + } + eprintln!( + "find_wrapper_controller_instance: static = 0x{:x} but obj->class = 0x{:x} != 0x{:x}, rejecting", + inst, obj_class, wrapper_controller_class, + ); + } + + // Strategy 2: heap scan + cross-verified field walk. Scan for + // any object whose first 8 bytes equal the WrapperController + // class pointer, then verify by reading its + // `k__BackingField` field and checking that + // the referenced object's class pointer equals the + // InventoryManager class. A false positive would have to hit + // BOTH conditions simultaneously, which is astronomically + // unlikely for non-instance heap data. + eprintln!( + "find_wrapper_controller_instance: static read failed, scanning heap with field verification for instances of class 0x{:x}", + wrapper_controller_class, + ); + + // Cross-verify by CLASS NAME rather than class pointer. In + // IL2CPP there can be multiple `Il2CppClass*` variants for + // the same logical type — a static metadata entry in __DATA + // and a runtime class struct that actual heap instances + // reference. These have different addresses but both have a + // CLASS_NAME offset pointing to the same string literal. + // Comparing by name is the robust check. + let im_field_offset = get_class_fields(reader, wrapper_controller_class) + .iter() + .find(|f| f.name == "k__BackingField") + .map(|f| f.offset as usize); + if im_field_offset.is_none() { + eprintln!("find_wrapper_controller_instance: WrapperController has no k__BackingField field!"); + } + eprintln!( + "find_wrapper_controller_instance: im_field_offset = {:?}", + im_field_offset.map(|v| format!("0x{:x}", v)), + ); + + let heap_regions = find_scannable_heap_regions(pid); + let mut total_raw_matches = 0usize; + let mut first_raw_match: Option = None; + let mut sample_raw_matches: Vec<(usize, usize, usize, usize, String)> = Vec::new(); // (addr, +0, im_ptr, im_ptr_class, im_ptr_class_name) for (start, end) in heap_regions { let step = 0x100000; - for chunk_start in (start..end).step_by(step) { - let bytes = reader.read_bytes(chunk_start, step); + let mut chunk_start = start; + while chunk_start < end { + let chunk_size = step.min(end - chunk_start); + let bytes = reader.read_bytes(chunk_start, chunk_size); if bytes.is_empty() || bytes.iter().all(|&b| b == 0) { + chunk_start += chunk_size; continue; } + let mut i = 0; + while i + 16 <= bytes.len() { + let ptr = usize::from_le_bytes(bytes[i..i + 8].try_into().unwrap_or([0; 8])); + if ptr == wrapper_controller_class { + let obj_addr = chunk_start + i; + total_raw_matches += 1; + if first_raw_match.is_none() { + first_raw_match = Some(obj_addr); + } + + if let Some(im_off) = im_field_offset { + let im_ptr = reader.read_ptr(obj_addr + im_off); + if im_ptr > 0x100000 && im_ptr < 0x400_000_000 { + let im_ptr_class = reader.read_ptr(im_ptr); + let im_ptr_class_name = if im_ptr_class > 0x100000 { + read_class_name(reader, im_ptr_class) + } else { + String::new() + }; + if sample_raw_matches.len() < 10 { + sample_raw_matches.push(( + obj_addr, + ptr, + im_ptr, + im_ptr_class, + im_ptr_class_name.clone(), + )); + } + if im_ptr_class_name == "InventoryManager" { + eprintln!( + "find_wrapper_controller_instance: VERIFIED by class name at 0x{:x} (im_ptr=0x{:x}, im_ptr_class=0x{:x})", + obj_addr, im_ptr, im_ptr_class, + ); + return Some(obj_addr); + } + } else if sample_raw_matches.len() < 10 { + sample_raw_matches.push(( + obj_addr, + ptr, + im_ptr, + 0, + String::new(), + )); + } + } + } + i += 8; + } + chunk_start += chunk_size; + } + } - for i in (0..bytes.len() - 8).step_by(8) { + eprintln!( + "find_wrapper_controller_instance: {} raw matches scanned, none verified by class name.", + total_raw_matches, + ); + if !sample_raw_matches.is_empty() { + eprintln!("find_wrapper_controller_instance: first 10 samples (obj_addr, +0, im_ptr, im_ptr_class, im_ptr_class_name):"); + for (a, v0, imp, imc, imn) in &sample_raw_matches { + eprintln!( + " 0x{:x} +0=0x{:x} im_ptr=0x{:x} im_ptr_class=0x{:x} name={:?}", + a, v0, imp, imc, imn, + ); + } + } + None + } + + fn find_papa_instance_by_field_verification( + reader: &MemReader, + pid: u32, + papa_class: usize, + ) -> Option { + let im_class = find_class_by_direct_scan(reader, pid, "InventoryManager")?; + eprintln!( + "find_papa_instance_by_field_verification: InventoryManager class = 0x{:x}", + im_class, + ); + + let papa_fields = get_class_fields(reader, papa_class); + let im_field_offset = papa_fields + .iter() + .find(|f| f.name == "k__BackingField") + .map(|f| f.offset as usize)?; + eprintln!( + "find_papa_instance_by_field_verification: PAPA.k__BackingField at offset 0x{:x}", + im_field_offset, + ); + + let heap_regions = find_scannable_heap_regions(pid); + eprintln!( + "find_papa_instance_by_field_verification: scanning {} heap regions", + heap_regions.len(), + ); + + let mut total_papa_matches = 0usize; + let mut verified_matches: Vec = Vec::new(); + + for (start, end) in heap_regions { + let step = 0x100000; + let mut chunk_start = start; + while chunk_start < end { + let chunk_size = step.min(end - chunk_start); + let bytes = reader.read_bytes(chunk_start, chunk_size); + if bytes.is_empty() || bytes.iter().all(|&b| b == 0) { + chunk_start += chunk_size; + continue; + } + + let mut i = 0; + while i + 8 <= bytes.len() { + let ptr = usize::from_le_bytes(bytes[i..i + 8].try_into().unwrap_or([0; 8])); + if ptr == papa_class { + total_papa_matches += 1; + let obj_addr = chunk_start + i; + let im_ptr = reader.read_ptr(obj_addr + im_field_offset); + if im_ptr > 0x100000 && im_ptr < 0x400000000 { + let im_obj_class = reader.read_ptr(im_ptr); + if im_obj_class == im_class { + verified_matches.push(obj_addr); + if verified_matches.len() >= 5 { + break; + } + } + } + } + i += 8; + } + if verified_matches.len() >= 5 { + break; + } + chunk_start += chunk_size; + } + if verified_matches.len() >= 5 { + break; + } + } + + eprintln!( + "find_papa_instance_by_field_verification: papa_class matched in {} slots, {} verified as real PAPA instances", + total_papa_matches, verified_matches.len(), + ); + if !verified_matches.is_empty() { + eprintln!( + "find_papa_instance_by_field_verification: verified PAPA instances: {:?}", + verified_matches + .iter() + .map(|a| format!("0x{:x}", a)) + .collect::>(), + ); + return Some(verified_matches[0]); + } + None + } + + fn find_papa_instance(reader: &MemReader, pid: u32, papa_class: usize) -> Option { + // Strategy 1: field-verified heap scan. Cross-checks the + // candidate's class pointer AND its InventoryManager backing + // field. Astronomically unlikely to false-positive. + if let Some(inst) = find_papa_instance_by_field_verification(reader, pid, papa_class) { + eprintln!( + "find_papa_instance: using field-verified instance = 0x{:x}", + inst, + ); + return Some(inst); + } + + // Strategy 2: static-field lookup. Only works if PAPA exposes + // its singleton via a conventional C# `static Instance` — on + // current MTGA builds the `_instance` static field reads as + // null even though Arena is fully initialized, so this rarely + // helps, but it stays as a fallback for older/older MTGA + // versions. + if let Some(inst) = find_papa_instance_via_static_field(reader, papa_class) { + eprintln!( + "find_papa_instance: using static-field instance = 0x{:x}", + inst, + ); + return Some(inst); + } + + // Strategy 3 (last resort): upstream's hardcoded-heap-range + // scan with its brittle `+16` / `+224` verification. This + // path is effectively dead on current Arena builds — the + // heap ranges are wrong AND the field offsets are wrong — but + // it's kept so we have a clear error path if the first two + // strategies both fail. + // + // Local patch — upstream used three hardcoded heap ranges + // (`0x15a000000..0x15b000000`, `0x158000000..0x16a000000`, + // `0x145000000..0x150000000`) that happened to be where the + // PAPA instance lived on whatever macOS build was tested. + // macOS arm64 heap addresses drift between OS versions and + // even between Arena restarts, so hardcoded ranges rot. + // Instead, enumerate writable (`rw-`) regions of reasonable + // size from `vmmap` and scan each of them. + let heap_regions = find_scannable_heap_regions(pid); + eprintln!( + "find_papa_instance: scanning {} heap regions for papa_class=0x{:x}", + heap_regions.len(), papa_class, + ); + + // Diagnostics: count total slots where `ptr == papa_class`. + // If that count is zero, either our papa_class pointer is + // wrong or the scannable heap regions miss the GC heap. + // If the count is non-zero but verification fails every time, + // the +16 / +224 object layout offsets have drifted. + let mut total_matches: usize = 0; + let mut sample_matches: Vec<(usize, usize, usize, String)> = Vec::new(); + + for (start, end) in heap_regions { + let step = 0x100000; + let mut chunk_start = start; + while chunk_start < end { + let chunk_size = step.min(end - chunk_start); + let bytes = reader.read_bytes(chunk_start, chunk_size); + if bytes.is_empty() || bytes.iter().all(|&b| b == 0) { + chunk_start += chunk_size; + continue; + } + + let mut i = 0; + while i + 8 <= bytes.len() { let ptr = usize::from_le_bytes(bytes[i..i + 8].try_into().unwrap_or([0; 8])); if ptr == papa_class { let obj_addr = chunk_start + i; + total_matches += 1; let val_at_16 = reader.read_ptr(obj_addr + 16); + let inv_mgr_224 = reader.read_ptr(obj_addr + 224); + if sample_matches.len() < 10 { + let inv_mgr_class = if inv_mgr_224 > 0x100000 { + reader.read_ptr(inv_mgr_224) + } else { + 0 + }; + let inv_name = if inv_mgr_class > 0x100000 { + read_class_name(reader, inv_mgr_class) + } else { + String::new() + }; + sample_matches.push((obj_addr, val_at_16, inv_mgr_224, inv_name)); + } + + // Upstream verification: +16 looks like a + // non-self pointer, +224 points to something + // whose class name contains "InventoryManager". if val_at_16 != papa_class && val_at_16 > 0x100000 { - let inv_mgr = reader.read_ptr(obj_addr + 224); - if inv_mgr > 0x100000 && inv_mgr < 0x400000000 { - let inv_class = reader.read_ptr(inv_mgr); + if inv_mgr_224 > 0x100000 && inv_mgr_224 < 0x400000000 { + let inv_class = reader.read_ptr(inv_mgr_224); let inv_name = read_class_name(reader, inv_class); if inv_name.contains("InventoryManager") { + eprintln!( + "find_papa_instance: FOUND (strict) at 0x{:x} after {} match(es)", + obj_addr, total_matches, + ); return Some(obj_addr); } } } } + i += 8; + } + chunk_start += chunk_size; + } + } + eprintln!( + "find_papa_instance: total slots matching papa_class = {}. Strict InventoryManager check did not pass for any of them.", + total_matches, + ); + if !sample_matches.is_empty() { + eprintln!("find_papa_instance: first {} matches (obj_addr, +16 ptr, +224 ptr, class_name_at_+224):", sample_matches.len()); + for (a, v16, v224, name) in &sample_matches { + eprintln!(" 0x{:x} +16=0x{:x} +224=0x{:x} name={:?}", a, v16, v224, name); + } + // Loose fallback: if we found a match where +16 is a + // valid non-self pointer (i.e., it looks like a typical + // Il2CppObject header), return the first one. This is + // less certain than the InventoryManager-verified hit + // but lets downstream code try the field walk anyway + // and produce a more actionable error if field offsets + // have drifted. + for (obj_addr, val_at_16, _v224, _name) in &sample_matches { + if *val_at_16 != papa_class && *val_at_16 > 0x100000 { + eprintln!( + "find_papa_instance: using LOOSE fallback at 0x{:x} (first match with plausible +16 header)", + obj_addr, + ); + return Some(*obj_addr); } } } None } + /// Parse vmmap output for writable, reasonably-sized VM regions + /// that a Unity GC-managed heap might live in. Returns + /// `(start, end)` pairs sorted by address, filtered to exclude + /// the GameAssembly dylib's own segments (which we already know + /// are code and static metadata, not C# object instances) and + /// any region smaller than 1MB (too small to hold the managed + /// heap) or larger than 4GB (to avoid reading the entire VM if + /// vmmap reports some weird very-large mapping). + fn find_scannable_heap_regions(pid: u32) -> Vec<(usize, usize)> { + let output = Command::new("vmmap") + .args(["-wide", &pid.to_string()]) + .output() + .ok(); + + let mut result: Vec<(usize, usize)> = Vec::new(); + const MIN_SIZE: usize = 1 << 20; // 1 MB + const MAX_SIZE: usize = 4usize << 30; // 4 GB + + if let Some(output) = output { + let stdout = String::from_utf8_lossy(&output.stdout); + for line in stdout.lines() { + // Skip obvious non-heap regions. We want the IL2CPP + // GC heap (Boehm) and the managed heap Unity allocates + // for C# objects — both are `rw-` `SM=PRV` or + // `SM=ZER` mappings in the anonymous-mapping range. + // We exclude the GameAssembly dylib segments because + // the PAPA instance is a heap-allocated C# object + // whose pointer lives in GC-managed memory, not in + // the dylib. + if line.contains("GameAssembly") { + continue; + } + // Only rw- regions can hold mutable C# object data. + if !line.contains("rw-") { + continue; + } + // Parse "0xstart-0xend" or "start-end" from the + // second whitespace-separated column. vmmap lines + // look like: + // "MALLOC_LARGE 142000000-142100000 [ 1024K ...] rw-/rwx SM=PRV" + let parts: Vec<&str> = line.split_whitespace().collect(); + let addr_field_idx = parts.iter().position(|p| p.contains('-') && p.split('-').count() == 2 && p.chars().next().map_or(false, |c| c.is_ascii_hexdigit())); + let idx = match addr_field_idx { + Some(i) => i, + None => continue, + }; + let addr_parts: Vec<&str> = parts[idx].split('-').collect(); + if addr_parts.len() != 2 { + continue; + } + let start = match usize::from_str_radix(addr_parts[0], 16) { + Ok(v) => v, + Err(_) => continue, + }; + let end = match usize::from_str_radix(addr_parts[1], 16) { + Ok(v) => v, + Err(_) => continue, + }; + if end <= start { + continue; + } + let size = end - start; + if size < MIN_SIZE || size > MAX_SIZE { + continue; + } + result.push((start, end)); + } + } + result.sort(); + // De-dup overlapping regions. + result.dedup(); + result + } + pub fn is_admin_impl() -> bool { unsafe { libc::geteuid() == 0 } } @@ -825,20 +2473,79 @@ mod macos_backend { let reader = MemReader::new(pid); + // Sanity-check that vmmap can see GameAssembly at all. The + // returned segment address isn't used for the table walk + // anymore — we scan directly for class pointers — but if it's + // 0, neither the direct scan nor the table scan has any data + // to work with. let data_base = find_second_data_segment(pid); + eprintln!("init_impl: vmmap data_base = 0x{:x}", data_base); if data_base == 0 { return Err(Error::from_reason("Could not find GameAssembly __DATA segment")); } - let type_info_table = reader.read_ptr(data_base + offsets::TYPE_INFO_TABLE_OFFSET); - if type_info_table == 0 { - return Err(Error::from_reason("Could not find type info table")); + // Find PAPA by scanning __DATA for pointers that dereference + // to a class named "PAPA". This bypasses the fragile + // "find the type info table" step — IL2CPP has many + // sub-tables that all pass rich-name heuristics, and picking + // the "right" one is version-dependent. Direct scan doesn't + // care which table the class pointer lives in. + let papa_class = find_class_by_direct_scan(&reader, pid, "PAPA") + .ok_or_else(|| Error::from_reason( + "PAPA class not found via direct __DATA scan. Either the \ + top-level singleton has been renamed in this MTGA version, \ + or GameAssembly's __DATA segments are structured differently \ + than expected. Check mtga-reader debug output above.", + ))?; + eprintln!("init_impl: direct-scan papa_class = 0x{:x}", papa_class); + + // Find WrapperController class — present on both macOS and + // Windows Arena builds. On Windows this is the singleton + // entry point that holds `k__BackingField` → + // `k__BackingField` → ... path. On macOS + // the upstream code preferred PAPA, but PAPA's heap layout + // and static-field state are hostile (see + // find_papa_instance_by_field_verification and + // find_papa_instance_via_static_field both failing). + // WrapperController is worth trying as an alternative root. + let wrapper_controller_class = find_class_by_direct_scan(&reader, pid, "WrapperController"); + eprintln!( + "init_impl: WrapperController class = {}", + wrapper_controller_class.map(|v| format!("0x{:x}", v)).unwrap_or_else(|| "not found".to_string()), + ); + + // type_info_table is kept in state for API compatibility with + // the rest of the module (some read paths still reference it). + // Use the best table we can find, but don't fail init if we + // can't — the direct-scanned papa_class is enough for readData. + let type_info_table = scan_for_type_info_table(&reader, pid); + eprintln!("init_impl: scan_for_type_info_table result = 0x{:x}", type_info_table); + + // Try WrapperController first — it's the Windows-proven + // singleton entry point and exists on macOS too. If we get a + // real instance this way, we store it as `papa_instance` + // (misnomer kept for API compat) and the walker starts + // from WrapperController instead of PAPA. + let wrapper_instance_opt = wrapper_controller_class.and_then(|wc_class| { + find_wrapper_controller_instance(&reader, pid, wc_class) + }); + if let Some(wc_inst) = wrapper_instance_opt { + eprintln!( + "init_impl: using WrapperController instance 0x{:x} as papa_instance", + wc_inst, + ); + let mut wrapper = STATE.lock().map_err(|_| Error::from_reason("Failed to lock state"))?; + wrapper.0 = Some(Il2CppState { + reader, + pid, + type_info_table, + papa_class, + papa_instance: wc_inst, + }); + return Ok(true); } - let papa_class = find_class_by_name(&reader, type_info_table, "PAPA") - .ok_or_else(|| Error::from_reason("PAPA class not found"))?; - - let papa_instance = find_papa_instance(&reader, papa_class).unwrap_or(0); + let papa_instance = find_papa_instance(&reader, pid, papa_class).unwrap_or(0); let mut wrapper = STATE.lock().map_err(|_| Error::from_reason("Failed to lock state"))?; wrapper.0 = Some(Il2CppState { @@ -1488,6 +3195,45 @@ pub fn read_data(process_name: String, fields: Vec) -> serde_json::Value { serde_json::json!({ "error": "Platform not supported" }) } } +/// Signature-based card-collection reader. Scans the MTGA process +/// heap for a `Dictionary` object whose contents match +/// the shape of an Arena player collection (enough entries, keys in +/// the Arena card-id range, values in the quantity range) and +/// returns the list of (cardId, quantity) entries. +/// +/// This is a macOS-only path added as a local patch: the +/// `readData` walker starting from PAPA / WrapperController turned +/// out to be too fragile against current Arena builds (IL2CPP +/// metadata layout drift, runtime-class-vs-metadata-class +/// indirection, inconsistent CLASS_NAME offsets on runtime-allocated +/// class structs). The signature scan sidesteps every one of those +/// by searching for the only dictionary in the process whose entries +/// all look like real card entries. +/// +/// Returns a JSON array of `{ "cardId": int, "quantity": int }` +/// objects on success, or `{ "error": string }` on any failure. +#[napi] +pub fn read_mtga_cards(process_name: String) -> serde_json::Value { + #[cfg(target_os = "macos")] + { + match macos_backend::read_mtga_cards_impl(&process_name) { + Ok(entries) => { + let cards: Vec = entries + .into_iter() + .map(|(key, value)| serde_json::json!({ "cardId": key, "quantity": value })) + .collect(); + serde_json::json!({ "cards": cards }) + } + Err(e) => serde_json::json!({ "error": e.to_string() }), + } + } + #[cfg(not(target_os = "macos"))] + { + let _ = process_name; + serde_json::json!({ "error": "readMtgaCards is macOS-only in this local fork" }) + } +} + #[napi] pub fn read_class(process_name: String, address: i64) -> serde_json::Value { #[cfg(target_os = "windows")]