Skip to content
16 changes: 12 additions & 4 deletions docs/internals/memory-model.md
Original file line number Diff line number Diff line change
Expand Up @@ -128,12 +128,12 @@ The currently running fiber is tracked in `_fiber_current`. When execution switc

```asm
.comm _concat_buf, 65536 ; 64KB scratch buffer
.comm _concat_off, 8 ; current write offset (reset per statement)
.comm _concat_off, 8 ; current write offset (reset per statement, to the frame base)
```

The string buffer (`_concat_buf`) is a 64KB scratch region used by all string operations — `itoa`, `ftoa`, `concat`, `strtolower`, `str_replace`, etc. Each operation writes its result at the current offset and advances the offset.

**The buffer is reset to offset 0 at the start of every statement.** This means strings in the buffer are temporarythey only live for the duration of one statement's evaluation.
**The buffer is reset at the start of every statement — to the frame's inherited base, not necessarily 0.** Strings in the buffer are temporary: they live only for the duration of one statement's evaluation, *plus* (for an argument passed into a call) the lifetime of the callee. See [Cross-call slice arguments](#cross-call-slice-arguments) below.

### How it works

Expand All @@ -147,7 +147,15 @@ _concat_buf:
offset=0 offset=5 offset=6 _concat_off = 17
```

Each sub-expression writes its result further into the buffer. After the statement completes (echo writes to stdout), the next statement resets `_concat_off` to 0.
Each sub-expression writes its result further into the buffer. After the statement completes (echo writes to stdout), the next statement resets `_concat_off` back to the current frame's base offset (0 in `main`).

### Cross-call slice arguments

A string operation returns a *borrowed slice* into `_concat_buf` (a pointer + length), not a heap copy. When such a slice is passed **as an argument** to a function, method, or closure, the callee runs its own statements — and resetting `_concat_off` all the way to 0 would overwrite the caller's slice bytes before the callee could read them.

To prevent that, each frame records, on entry, the `_concat_off` value it inherited from the caller (the high-water mark below which the caller's live slices sit). That value is the frame's **base**: per-statement resets restore `_concat_off` to the base rather than 0, so the callee's own concatenations append *above* the caller's slices instead of clobbering them. `main` (and other root contexts) have a base of 0, so their behaviour is unchanged. The cursor is also saved/restored around each nested call so the caller can keep concatenating after the call returns.

A consequence is that `_concat_buf` usage grows with the depth of nested calls that are *holding live slice arguments* (each frame reserves its caller's region). In practice this depth is shallow; deeply recursive string builders that would accumulate are a separate, pre-existing compile-time limitation, so the 64KB budget is not a concern for ordinary code.

### Copy-on-store

Expand All @@ -159,7 +167,7 @@ When a string result is stored to a variable (e.g., `$x = "a" . "b";`), the code

### Implications

- **No overflow.** Because the buffer resets each statement, only one statement's worth of string operations need to fit in 64KB.
- **Bounded usage.** Because the buffer resets each statement, only one statement's worth of string operations needs to fit in 64KB — plus the slice arguments held by any enclosing calls on the current stack (see [Cross-call slice arguments](#cross-call-slice-arguments)). For ordinary code this is comfortably within 64KB.
- **No mutation.** You can't modify a string in place — you always create a new one.
- **Scratch only.** The buffer is strictly temporary. Anything that needs to survive goes to the heap.

Expand Down
93 changes: 80 additions & 13 deletions src/codegen/builtins/arrays/array_fill.rs
Original file line number Diff line number Diff line change
Expand Up @@ -7,26 +7,46 @@
//!
//! Key details:
//! - Returned arrays must use the payload layout expected by later codegen and GC/refcount paths.
//! - A non-zero start index, or a string fill value, needs a keyed (hash) result because a
//! 0-based indexed array cannot represent keys `start..start+count-1` and the scalar indexed
//! fill cannot store a string pointer+length. Those cases route through `__rt_array_fill_assoc`.

use crate::codegen::abi;
use crate::codegen::context::Context;
use crate::codegen::data_section::DataSection;
use crate::codegen::emit::Emitter;
use crate::codegen::expr::emit_expr;
use crate::codegen::platform::Arch;
use crate::parser::ast::Expr;
use crate::codegen::runtime_value_tag;
use crate::parser::ast::{Expr, ExprKind};
use crate::types::PhpType;

/// Returns the assoc-result type produced by `__rt_array_fill_assoc` (int keys, boxed values).
fn assoc_fill_type() -> PhpType {
PhpType::AssocArray {
key: Box::new(PhpType::Int),
value: Box::new(PhpType::Mixed),
}
}

/// Returns true when `array_fill` must build a keyed (hash) array rather than a 0-based
/// indexed one: a non-literal-zero start index produces keys `start..start+count-1`, and a
/// string value cannot be stored in the scalar indexed fill's 8-byte slots.
fn needs_assoc_fill(start_arg: &Expr, value_ty: &PhpType) -> bool {
let start_is_literal_zero = matches!(start_arg.kind, ExprKind::IntLiteral(0));
!start_is_literal_zero || matches!(value_ty.codegen_repr(), PhpType::Str)
}

/// Emits the `array_fill(start_index, count, value)` builtin call.
///
/// Evaluates arguments left-to-right, pushing `start_index` and `count` on the stack
/// before evaluating `value` to preserve ordering. Calls `__rt_array_fill` or
/// `__rt_array_fill_refcounted` depending on whether `value` is refcounted.
/// On x86_64 Linux, delegates to `emit_array_fill_linux_x86_64` for correct System V ABI
/// register usage.
/// Evaluates arguments left-to-right, pushing `start_index` and `count` on the stack before
/// evaluating `value` to preserve ordering. A literal-zero start with a scalar/refcounted value
/// uses the indexed `__rt_array_fill`/`__rt_array_fill_refcounted` helpers; a non-zero start or
/// a string value routes through `__rt_array_fill_assoc`, which builds a Mixed-valued hash with
/// keys `start..start+count-1`. On x86_64 Linux, delegates to `emit_array_fill_linux_x86_64`.
///
/// Returns `PhpType::Array(Box::new(value_ty))` where `value_ty` is the inferred type
/// of the fill value.
/// Returns `PhpType::Array(value_ty)` for the indexed path or `AssocArray{Int, Mixed}` for the
/// keyed path.
pub fn emit(
_name: &str,
args: &[Expr],
Expand All @@ -46,6 +66,30 @@ pub fn emit(
// -- save count, evaluate fill value --
emitter.instruction("str x0, [sp, #-16]!"); // push count onto stack
let value_ty = emit_expr(&args[2], emitter, ctx, data);

if needs_assoc_fill(&args[0], &value_ty) {
// -- marshal the fill value into value_lo (x2), value_hi (x3), value_tag (x4) --
match value_ty.codegen_repr() {
PhpType::Str => {
emitter.instruction("mov x3, x2"); // string length becomes the value high word
emitter.instruction("mov x2, x1"); // string pointer becomes the value low word
}
PhpType::Float => {
emitter.instruction("fmov x2, d0"); // move the float bits into the value low word
emitter.instruction("mov x3, #0"); // floats use no high word
}
_ => {
emitter.instruction("mov x2, x0"); // scalar value or heap pointer becomes the value low word
emitter.instruction("mov x3, #0"); // non-string payloads use no high word
}
}
abi::emit_load_int_immediate(emitter, "x4", runtime_value_tag(&value_ty) as i64); // runtime value tag for per-slot boxing
emitter.instruction("ldr x1, [sp], #16"); // pop count into x1 (second arg)
emitter.instruction("ldr x0, [sp], #16"); // pop start index into x0 (first arg)
emitter.instruction("bl __rt_array_fill_assoc"); // build a keyed hash with keys start..start+count-1
return Some(assoc_fill_type());
}

let uses_refcounted_runtime = value_ty.is_refcounted();
// -- set up three-arg call: start, count, value --
emitter.instruction("mov x2, x0"); // move fill value to x2 (third arg)
Expand All @@ -63,12 +107,11 @@ pub fn emit(

/// x86_64 Linux-specific entry point for `array_fill`.
///
/// Uses System V AMD64 ABI: `rdi` = start_index, `rsi` = count, `rdx` = fill value.
/// Floats are moved via `movq` from `xmm0` into `rdx`. Calls either
/// `__rt_array_fill_refcounted` or `__rt_array_fill` depending on whether the fill
/// value type is refcounted.
/// Uses System V AMD64 ABI: `rdi` = start_index, `rsi` = count, `rdx` = fill value (or
/// value_lo for the keyed path). The keyed path additionally passes value_hi in `rcx` and the
/// runtime value tag in `r8`, then calls `__rt_array_fill_assoc`.
///
/// Returns `PhpType::Array(Box::new(value_ty))`.
/// Returns `PhpType::Array(value_ty)` for the indexed path or `AssocArray{Int, Mixed}`.
fn emit_array_fill_linux_x86_64(
args: &[Expr],
emitter: &mut Emitter,
Expand All @@ -80,6 +123,30 @@ fn emit_array_fill_linux_x86_64(
emit_expr(&args[1], emitter, ctx, data);
abi::emit_push_reg(emitter, "rax"); // preserve the count while evaluating the fill value argument
let value_ty = emit_expr(&args[2], emitter, ctx, data);

if needs_assoc_fill(&args[0], &value_ty) {
// -- marshal the fill value into value_lo (rdx), value_hi (rcx), value_tag (r8) --
match value_ty.codegen_repr() {
PhpType::Str => {
emitter.instruction("mov rcx, rdx"); // string length becomes the value high word
emitter.instruction("mov rdx, rax"); // string pointer becomes the value low word
}
PhpType::Float => {
emitter.instruction("movq rdx, xmm0"); // move the float bits into the value low word
emitter.instruction("xor rcx, rcx"); // floats use no high word
}
_ => {
emitter.instruction("mov rdx, rax"); // scalar value or heap pointer becomes the value low word
emitter.instruction("xor rcx, rcx"); // non-string payloads use no high word
}
}
abi::emit_load_int_immediate(emitter, "r8", runtime_value_tag(&value_ty) as i64); // runtime value tag for per-slot boxing
abi::emit_pop_reg(emitter, "rsi"); // restore the requested count into the second argument register
abi::emit_pop_reg(emitter, "rdi"); // restore the start index into the first argument register
abi::emit_call_label(emitter, "__rt_array_fill_assoc"); // build a keyed hash with keys start..start+count-1
return Some(assoc_fill_type());
}

if matches!(value_ty, PhpType::Float) {
emitter.instruction("movq rdx, xmm0"); // move the floating-point fill payload bits into the third x86_64 runtime argument register
} else {
Expand Down
7 changes: 7 additions & 0 deletions src/codegen/builtins/math/abs.rs
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,13 @@ pub fn emit(
) -> Option<PhpType> {
emitter.comment("abs()");
let ty = emit_expr(&args[0], emitter, ctx, data);
if matches!(ty, PhpType::Mixed | PhpType::Union(_)) {
// The operand is a boxed Mixed cell pointer, not a raw scalar; the runtime helper
// unboxes it, applies the integer or float absolute value per the stored tag, and
// reboxes — preserving PHP's int→int / float→float result typing.
crate::codegen::abi::emit_call_label(emitter, "__rt_abs_mixed");
return Some(PhpType::Mixed);
}
if ty == PhpType::Float {
// -- float absolute value --
match emitter.target.arch {
Expand Down
22 changes: 18 additions & 4 deletions src/codegen/builtins/math/pow.rs
Original file line number Diff line number Diff line change
Expand Up @@ -45,31 +45,45 @@ pub fn emit(
emitter.comment("pow()");
// -- evaluate base, save it, evaluate exponent, call C pow() --
let t0 = emit_expr(&args[0], emitter, ctx, data);
let t0_mixed = matches!(t0, PhpType::Mixed | PhpType::Union(_));
match emitter.target.arch {
Arch::AArch64 => {
if t0 != PhpType::Float {
if t0_mixed {
// The base is a boxed Mixed cell pointer, not a scalar; cast it to a double
// through the runtime so `scvtf` does not convert the pointer itself.
abi::emit_call_label(emitter, "__rt_mixed_cast_float");
} else if t0 != PhpType::Float {
emitter.instruction("scvtf d0, x0"); // convert the pow() base to float when the first argument is an integer
}
}
Arch::X86_64 => {
if t0 != PhpType::Float {
if t0_mixed {
abi::emit_call_label(emitter, "__rt_mixed_cast_float");
} else if t0 != PhpType::Float {
emitter.instruction("cvtsi2sd xmm0, rax"); // convert the pow() base to float when the first argument is an integer
}
}
}
abi::emit_push_float_reg(emitter, abi::float_result_reg(emitter)); // preserve the floating pow() base while the exponent expression is evaluated
let t1 = emit_expr(&args[1], emitter, ctx, data);
let t1_mixed = matches!(t1, PhpType::Mixed | PhpType::Union(_));
match emitter.target.arch {
Arch::AArch64 => {
if t1 != PhpType::Float {
if t1_mixed {
// The exponent is a boxed Mixed cell pointer; cast it to a double through
// the runtime so `scvtf` does not convert the pointer itself.
abi::emit_call_label(emitter, "__rt_mixed_cast_float");
} else if t1 != PhpType::Float {
emitter.instruction("scvtf d0, x0"); // convert the pow() exponent to float when the second argument is an integer
}
emitter.instruction("fmov d1, d0"); // move the floating exponent into the second libc pow() argument register
abi::emit_pop_float_reg(emitter, "d0"); // restore the floating base into the first libc pow() argument register
emitter.bl_c("pow"); // delegate the exponentiation to libc pow() on AArch64
}
Arch::X86_64 => {
if t1 != PhpType::Float {
if t1_mixed {
abi::emit_call_label(emitter, "__rt_mixed_cast_float");
} else if t1 != PhpType::Float {
emitter.instruction("cvtsi2sd xmm0, rax"); // convert the pow() exponent to float when the second argument is an integer
}
abi::emit_pop_float_reg(emitter, "xmm1"); // restore the floating base into a scratch floating-point register before ordering the SysV libc pow() arguments
Expand Down
4 changes: 3 additions & 1 deletion src/codegen/builtins/types/is_finite.rs
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,9 @@ pub fn emit(
) -> Option<PhpType> {
emitter.comment("is_finite()");
let ty = emit_expr(&args[0], emitter, ctx, data);
if ty != PhpType::Float {
if matches!(ty, PhpType::Mixed | PhpType::Union(_)) {
abi::emit_call_label(emitter, "__rt_mixed_cast_float"); // unbox a boxed Mixed payload to a double before the finite check (avoids treating the cell pointer as a value)
} else if ty != PhpType::Float {
abi::emit_int_result_to_float_result(emitter); // normalize integer inputs into the active floating-point result register before the finite check
}
match emitter.target.arch {
Expand Down
38 changes: 30 additions & 8 deletions src/codegen/builtins/types/is_float.rs
Original file line number Diff line number Diff line change
Expand Up @@ -12,15 +12,22 @@ use crate::codegen::context::Context;
use crate::codegen::data_section::DataSection;
use crate::codegen::emit::Emitter;
use crate::codegen::expr::emit_expr;
use crate::codegen::abi;
use crate::codegen::{abi, platform::Arch};
use crate::parser::ast::Expr;
use crate::types::PhpType;

/// Emits the `is_float` builtin call.
/// Emits a PHP `is_float` type predicate call.
///
/// Inspects the compile-time type of `args[0]`. Returns a PHP boolean in the
/// active integer result register: 1 if the resolved type is `PhpType::Float`,
/// 0 otherwise. Always returns `Some(PhpType::Bool)`.
/// For `PhpType::Mixed` or `PhpType::Union`, unpacks the boxed mixed payload via
/// `__rt_mixed_unbox` and tests the runtime tag (2 = float). For all other types,
/// returns the compile-time predicate result directly.
///
/// Arguments:
/// args[0] — the expression to inspect
///
/// Outputs:
/// - Result register: 1 if the value is a float at runtime, 0 otherwise
/// - Return type: `PhpType::Bool`
pub fn emit(
_name: &str,
args: &[Expr],
Expand All @@ -30,8 +37,23 @@ pub fn emit(
) -> Option<PhpType> {
emitter.comment("is_float()");
let ty = emit_expr(&args[0], emitter, ctx, data);
// -- return true/false based on compile-time type --
let val = if ty == PhpType::Float { 1 } else { 0 };
abi::emit_load_int_immediate(emitter, abi::int_result_reg(emitter), val); // return the compile-time type predicate result in the active integer result register

if matches!(ty, PhpType::Mixed | PhpType::Union(_)) {
abi::emit_call_label(emitter, "__rt_mixed_unbox"); // normalize boxed mixed payloads to their concrete runtime tag
match emitter.target.arch {
Arch::AArch64 => {
emitter.instruction("cmp x0, #2"); // runtime tag 2 = float payload
emitter.instruction("cset x0, eq"); // x0 = 1 if the unboxed payload is a float, 0 otherwise
}
Arch::X86_64 => {
emitter.instruction("cmp rax, 2"); // runtime tag 2 = float payload
emitter.instruction("sete al"); // set al when the unboxed payload is a float
emitter.instruction("movzx rax, al"); // widen the boolean byte into the integer result register
}
}
} else {
let val = if matches!(ty, PhpType::Float) { 1 } else { 0 };
abi::emit_load_int_immediate(emitter, abi::int_result_reg(emitter), val); // return the compile-time type predicate result
}
Some(PhpType::Bool)
}
Loading
Loading