Make foreign-alloc memory permanent until explicit free#1792
Conversation
ForeignData_O for %foreign-alloc was created with DeleteOnDtor, so the malloc'd block was freed by the wrapper's GC finalizer. CFFI code (and cffi-libffi in particular) stores the raw address in foreign memory the GC does not scan and drops the wrapper, so the next GC freed still-referenced blocks -- a use-after-free that crashed struct-by-value calls after a few hundred iterations. Allocate with None instead: foreign-alloc memory now persists until %foreign-free, matching the malloc semantics every other CFFI backend provides. with-foreign-object's allocator keeps DeleteOnDtor as it always frees explicitly. Add src/tests/fli/cffi-fsbv-stress.lisp: correctness across scalar / struct return / nested-struct by-value calls plus a 300k-call stress loop that SEGVs on the pre-fix backend.
|
CI note: the Locally (macOS arm64, boehmprecise, base = current main + the W^X fixes from #1781/#1786/#1788): full regression suite 1963 passed / 4 failed, all four known-preexisting (SBCL-CROSS-COMPILE-4, INCLUDE-LEVEL-2B/3, TYPES-CLASSES-10), zero new failures; plus the new cffi-fsbv stress test passes (300k mixed struct-by-value calls, no crash). |
|
We don't really use the tests/ directory and should probably delete it, but no big |
Problem
CFFI struct-by-value calls (via cffi-libffi) crash with memory corruption after a few hundred calls on Clasp — SEGV inside
ffi_callreading a freedffi_type*.Root cause
%allocate-foreign-data(fli.cc) creates theForeignData_Owrapper withDeleteOnDtor, so the malloc'd block isfree()d by the wrapper's GC finalizer. That ties the lifetime of the foreign memory to the GC-reachability of the Lisp wrapper.CFFI code routinely stores the raw address inside other foreign (GC-unscanned) memory and drops the wrapper. cffi-libffi's
make-libffi-cifis the concrete case: it caches the cif but drops the wrappers for thearg_typesarray and the per-structffi_typedescriptors, whose raw pointers live inside the cif's malloc'd block. The next GC collects those wrappers, the finalizers free the still-referenced blocks, and the following call is a use-after-free.Every other CFFI backend's
foreign-allocis plain malloc — permanent untilforeign-free. Clasp's violated that contract.Proven by pinning every
%foreign-allocwrapper (200k calls, zero crashes) and by forcing a GC each iteration (crash moves to iteration 0).Fix
Allocate with
core::Noneinstead ofcore::DeleteOnDtorinPERCENTallocate_foreign_dataandallocate_foreign_data, soforeign-allocmemory persists until an explicit%foreign-free— matching every other CFFI backend.with-foreign-object's allocator keepsDeleteOnDtorsince it always frees explicitly on scope exit.Blast radius: the only other in-tree
%foreign-alloccaller (defcallback-native.lisp) already frees explicitly.Test
src/tests/fli/cffi-fsbv-stress.lisp: correctness across scalar / struct-return / nested-struct by-value calls plus a 300k-call stress loop. Verified on macOS arm64 (boehm): SEGVs with the fix reverted, passes with it applied.Note: full struct-by-value support also needs a one-line fix in CFFI's
cffi-clasp.lisp(:defaultlibrary marker not mapped to:rtld-default); that goes upstream to cffi separately.