Skip to content

Improve RLOAD/RSTORE handling#133

Open
arnaud-lb wants to merge 1 commit into
dstogov:masterfrom
arnaud-lb:gh132
Open

Improve RLOAD/RSTORE handling#133
arnaud-lb wants to merge 1 commit into
dstogov:masterfrom
arnaud-lb:gh132

Conversation

@arnaud-lb

@arnaud-lb arnaud-lb commented Mar 27, 2026

Copy link
Copy Markdown
Contributor

The result of RLOAD() may be be clobbered by an RSTORE() before use: GH-132.

This is an attempt to fix that:

  • Add fixed live ranges for RSTOREs
  • Allocate registers normally for RLOAD, even for loads of fixed regs, so that a copy is made in case an RSTORE conflicts with the RLOAD live range
  • Introduce IR_LIVE_INTERVAL_ALLOW_FIXED: Allows to allocate a register from the fixed_regs set, when hints match
  • Do not add a fixed live range for PARAM/RLOAD of fixed regs, as it's not necessary

Generated code in PHP JIT mostly didn't change after this.

TODO:

  • Aarch64

Comment thread ir_ra.c
Comment on lines +1446 to +1447
IR_USE_LIVE_POS_FROM_REF(ref),
IR_DEF_LIVE_POS_FROM_REF(ref));

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure about the start/end positions here. I don't fully understand the differences between the various sub-positions.

 * Add fixed live ranges for RSTOREs
 * Allocate registers normally for RLOAD, so that a copy is made in case
   an RSTORE conflicts with the RLOAD live range
 * Introduce IR_LIVE_INTERVAL_ALLOW_FIXED: Allows to allocate a register
   from the fixed_regs set, when hints match
 * Do not add a fixed live range for PARAM/RLOAD of fixed regs, as it's
   not necessary
@arnaud-lb arnaud-lb marked this pull request as ready for review March 27, 2026 15:29

@dstogov dstogov left a comment

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the attempt!

I didn't look into all implementation details yet, most probably, this is logically right, but this makes significant overhead. I see 6% slowdown on Wordpress function JIT (both time and callgrind) and also degradation in generated code because of inference of few RLOAD intervals (e.g. bench.php total() with function JIIT). May be we miss some RLOADs in zend_jit_ir.c.

Another "proper" way to fix this would be handling anti-dependencies between RLOAD usages and RSTOREs in "Global Code Motion" and "Local Scheduling". (some related background can be found at https://github.com/SeaOfNodes/Simple/blob/main/chapter11/README.md#inserting-anti-dependencies ). This is not simple and dangerous for stable releases...

For now, I think it's better to find a simple work-around. E.g. disable IF folding into COND if condition is RLOAD. Will this work?

@arnaud-lb

Copy link
Copy Markdown
Contributor Author

Yes, disabling folding into COND would probably fix the issue for now. I agree this is the best solution for stable branches.

@dstogov

dstogov commented Mar 31, 2026

Copy link
Copy Markdown
Owner

Yes, disabling folding into COND would probably fix the issue for now. I agree this is the best solution for stable branches.

See 703eb90
Can you check if this fixes the Psalm problem?
I may merge this to PHP later today.

@arnaud-lb

Copy link
Copy Markdown
Contributor Author

It doesn't appear to be enough.

In the case of PHP-21460, the cond op is LOAD() of an RLOAD(). Here is a less minimized reproducer:

{
    uintptr_t c_4 = 0x10;
    uintptr_t c_5 = 0x20;
    uintptr_t c_30 = 1;
    uintptr_t c_31 = 2;
    uintptr_t c_35 = 0xffff;

    l_1 = START(l_364);

    uintptr_t d_2, l_2 = RLOAD(l_1, 15);
    
    uintptr_t d_3 = ADD(d_2, c_4);
    uintptr_t d_4, l_4 = LOAD(l_2, d_3);
    uintptr_t d_5 = ADD(d_4, c_5);
    uintptr_t d_6, l_6 = LOAD(l_4, d_5);

    l_81 = IF(l_6, d_6);
    l_82 = IF_TRUE(l_81);
    l_83 = END(l_82);
    l_84 = IF_FALSE(l_81);
    l_85 = END(l_84);
    l_86 = MERGE(l_83, l_85);
    uintptr_t d_87 = PHI(l_86, c_30, c_31);

    l_92 = RSTORE(l_86, c_35, 15);
    uintptr_t d_93, l_93 = RLOAD(l_92, 15);
    uintptr_t d_95, l_95 = LOAD(l_93, d_93);
    l_98 = IF(l_95, d_95);
    l_99 = IF_TRUE(l_98, 1);
    l_104 = IJMP(l_99, c_35);
    l_105 = IF_FALSE(l_98);
    l_108 = STORE(l_105, c_35, d_87);

    l_363 = TAILCALL(l_108, c_35);
    l_364 = UNREACHABLE(l_363);
}

@dstogov

dstogov commented Mar 31, 2026

Copy link
Copy Markdown
Owner

It doesn't appear to be enough.

This seems compiled right

test:
	movq 0x10(%r15), %rax          ; RLOAD + first LOAD
	movq 0x20(%rax), %rax          ; second LOAD
	movl $0xffff, %r15d            ; RSTORE
	cmpq $0, (%r15)                ; second RLOAD + third LOAD + second IF
	jne .L1
	movl $2, %ecx                  ; COND sank to IF_FALSE of the second IF
	testq %rax, %rax               ; COND condition (result of second LOAD)
	movl $1, %eax
	cmoveq %rcx, %rax
	movq %rax, 0xffff
	movq $0xffff, %rax
	jmpq *%rax
.L1:
	movq $0xffff, %rax
	jmpq *%rax

What is wrong?

@arnaud-lb

Copy link
Copy Markdown
Contributor Author

My bad, wrong reproducer. With this one, the result is incorrect:

{
    uint32_t c_29 = 49121616;
    uintptr_t c_30 = 1;
    uintptr_t c_31 = 2;
    uintptr_t c_35 = 0xffff;

    l_1 = START(l_364);

    uintptr_t d_2, l_2 = RLOAD(l_1, 15);
    
    bool d_80 = NE(d_2, c_29);

    l_81 = IF(l_2, d_80);
    l_82 = IF_TRUE(l_81);
    l_83 = END(l_82);
    l_84 = IF_FALSE(l_81);
    l_85 = END(l_84);
    l_86 = MERGE(l_83, l_85);
    uintptr_t d_87 = PHI(l_86, c_30, c_31);

    l_92 = RSTORE(l_86, c_35, 15);
    uintptr_t d_93, l_93 = RLOAD(l_92, 15);
    uintptr_t d_95, l_95 = LOAD(l_93, d_93);
    l_98 = IF(l_95, d_95);
    l_99 = IF_TRUE(l_98, 1);
    l_104 = IJMP(l_99, c_35);
    l_105 = IF_FALSE(l_98);
    l_108 = STORE(l_105, c_35, d_87);

    l_363 = TAILCALL(l_108, c_35);
    l_364 = UNREACHABLE(l_363);
}
$ ./ir -S --debug-regset 0xffffffffffff3fff test.ir
test:
        movl $0xffff, %r15d    ; RSTORE
        cmpq $0, (%r15)
        jne .L1
        movl $2, %eax
        cmpq $0x2ed8950, %r15  ; First RLOAD() + NE()
        movl $1, %ecx
        cmoveq %rax, %rcx
        movq %rcx, 0xffff
        movq $0xffff, %rax
        jmpq *%rax
.L1:
        movq $0xffff, %rax
        jmpq *%rax

@dstogov

dstogov commented Mar 31, 2026

Copy link
Copy Markdown
Owner

Aha. This shouldn't be difficult to fix. Thanks.

@dstogov

dstogov commented Mar 31, 2026

Copy link
Copy Markdown
Owner

Aha. This shouldn't be difficult to fix. Thanks.

BTW Do you know what kind of PHP code produces this?

@dstogov

dstogov commented Mar 31, 2026

Copy link
Copy Markdown
Owner

27650a8

@dstogov

dstogov commented Mar 31, 2026

Copy link
Copy Markdown
Owner

Please don't close this PR. It may be useful.

@arnaud-lb

Copy link
Copy Markdown
Contributor Author

BTW Do you know what kind of PHP code produces this?

There is a small reproducer in this comment:
php/php-src#21460 (comment) (the script should print int(4))

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants