Summary
microblaze_invalidate_dcache_range() can loop indefinitely when asked to invalidate a valid range that touches the final cache line in the 32-bit address space.
This appears to affect the MicroBlaze write-through D-cache path, where XPAR_MICROBLAZE_DCACHE_USE_WRITEBACK == 0. In that configuration, the implementation walks cache-line addresses upward and relies on advancing past the final aligned address. If the final aligned address is the last cache line before 0xFFFFFFFF, the next increment wraps to 0x00000000, and the loop never terminates.
The relevant file in this repository appears to be:
lib/bsp/standalone/src/microblaze/microblaze_invalidate_dcache_range.S
Minimal Reproducer
For a MicroBlaze configuration with a 16-byte D-cache line:
#include "mb_interface.h"
#include "xil_types.h"
void repro(void)
{
microblaze_invalidate_dcache_range((UINTPTR)0xFFFFFFF0u, 1u);
}
Equivalently, through the public cache API:
#include "xil_cache.h"
void repro(void)
{
Xil_DCacheInvalidateRange((UINTPTR)0xFFFFFFF0u, 1u);
}
This is a valid range. It invalidates one byte at 0xFFFFFFF0, which is within the 32-bit address space. However, it causes microblaze_invalidate_dcache_range()` to spin indefinitely on the write-through path.
Expected Behavior
The function should invalidate the cache line containing the specified byte range and return.
Actual Behavior
The function invalidates the final cache line, increments the current address by one cache line, wraps to 0x00000000, and then continues looping indefinitely.
Mechanics Of The Problem
The implementation in:
lib/bsp/standalone/src/microblaze/microblaze_invalidate_dcache_range.S
computes an inclusive end address:
It then aligns both the start and inclusive end addresses down to cache-line boundaries.
In the write-through path, the loop effectively does this:
current = aligned_start;
end = aligned_end;
while (current <= end) {
invalidate_cache_line(current);
current += line_size;
}
That loop shape requires the address after the final cache line to be representable. If aligned_end is the final cache line, for example 0xFFFFFFF0 with a 16-byte line size, the next increment wraps to 0x00000000. At that point the loop never reaches its completion condition.
This affects any valid range whose aligned inclusive end address is the final cache line. With a 16-byte cache line, that means any range including bytes in:
Impact In Xilinx lwIP / AXI Ethernet TCP Transmit Path
As an example that this can be hit even without user code invoking Xil_DCacheInvalidateRange, this problem can be encountered when using the lwIP TCP transmit path (which is where I ran into it).
This can affect normal TCP transmit paths when using Xilinx lwIP with AXI Ethernet and no-copy buffers.
In:
ThirdParty/sw_services/lwip220/src/lwip-2.2.0/contrib/ports/xilinx/netif/xaxiemacif_dma.c
the AXI Ethernet DMA transmit path flushes each pbuf payload before handing it to the DMA engine:
XCACHE_FLUSH_DCACHE_RANGE(q->payload, q->len);
For MicroBlaze write-through D-cache configurations, the cache macro in:
lib/bsp/standalone/src/common/xenv_standalone.h
maps XCACHE_FLUSH_DCACHE_RANGE(Addr, Len) to microblaze_invalidate_dcache_range(...).
Therefore, if a no-copy TCP transmit buffer has a payload range that touches the final cache line near 0xFFFFFFFF, the Ethernet transmit path can stall indefinitely inside the cache maintenance routine.
This is particularly easy to hit with large external-memory buffers placed near the top of the 32-bit address space. For example, a no-copy transmit buffer at 0xC0000000 with length 0x40000000 ends exactly at 0xFFFFFFFF.
Why This Appears To Be A Bug
The cache API describes the arguments as a start address and a byte length. It does not document a requirement that the range must avoid the final cache line, or that addr + len must have a representable one-past-end address.
The range:
addr = 0xFFFFFFF0
len = 1
is valid and fully contained in the 32-bit address space, but the function does not return.
Proposed Minimal Fix
Avoid using an address-walk loop that requires representing the address after the final cache line.
One minimal approach is to use the same offset-countdown style already used by the write-back branch of microblaze_invalidate_dcache_range.S.
Conceptually:
if (len == 0) {
return;
}
aligned_start = addr & ~(line_size - 1);
aligned_end = (addr + len - 1) & ~(line_size - 1);
offset = aligned_end - aligned_start;
for (;;) {
invalidate_cache_line(aligned_start + offset);
if (offset == 0) {
break;
}
offset -= line_size;
}
In assembly terms, after aligning r5 to the start cache line and r6 to the inclusive end cache line, the write-through branch could count down an offset rather than incrementing the address:
RSUBK r6, r5, r6 /* r6 = aligned_end - aligned_start */
L_start:
wdc r5, r6 /* invalidate aligned_start + offset */
#if defined (__arch64__ )
addlik r6, r6, -(XPAR_MICROBLAZE_DCACHE_LINE_LEN * 4)
beagei r6, L_start
#else
bneid r6, L_start
addik r6, r6, -(XPAR_MICROBLAZE_DCACHE_LINE_LEN * 4)
#endif
This avoids ever needing to compute or compare against the address after the final cache line. For the failing large-range case:
aligned_start = 0xC0000000
aligned_end = 0xFFFFFFF0
offset = 0x3FFFFFF0
The first operation invalidates aligned_start + offset == 0xFFFFFFF0, then the offset counts down to zero and the loop terminates normally.
The same issue appears to exist in the write-through path of:
lib/bsp/standalone/src/microblaze/microblaze_flush_dcache_range.S
That file uses the same upward address-walk pattern, so it may need the same treatment.
Summary
microblaze_invalidate_dcache_range()can loop indefinitely when asked to invalidate a valid range that touches the final cache line in the 32-bit address space.This appears to affect the MicroBlaze write-through D-cache path, where
XPAR_MICROBLAZE_DCACHE_USE_WRITEBACK == 0. In that configuration, the implementation walks cache-line addresses upward and relies on advancing past the final aligned address. If the final aligned address is the last cache line before0xFFFFFFFF, the next increment wraps to0x00000000, and the loop never terminates.The relevant file in this repository appears to be:
lib/bsp/standalone/src/microblaze/microblaze_invalidate_dcache_range.SMinimal Reproducer
For a MicroBlaze configuration with a 16-byte D-cache line:
Equivalently, through the public cache API:
This is a valid range. It invalidates one byte at
0xFFFFFFF0, which is within the 32-bit address space. However, it causes microblaze_invalidate_dcache_range()` to spin indefinitely on the write-through path.Expected Behavior
The function should invalidate the cache line containing the specified byte range and return.
Actual Behavior
The function invalidates the final cache line, increments the current address by one cache line, wraps to
0x00000000, and then continues looping indefinitely.Mechanics Of The Problem
The implementation in:
lib/bsp/standalone/src/microblaze/microblaze_invalidate_dcache_range.Scomputes an inclusive end address:
It then aligns both the start and inclusive end addresses down to cache-line boundaries.
In the write-through path, the loop effectively does this:
That loop shape requires the address after the final cache line to be representable. If
aligned_endis the final cache line, for example0xFFFFFFF0with a 16-byte line size, the next increment wraps to0x00000000. At that point the loop never reaches its completion condition.This affects any valid range whose aligned inclusive end address is the final cache line. With a 16-byte cache line, that means any range including bytes in:
Impact In Xilinx lwIP / AXI Ethernet TCP Transmit Path
As an example that this can be hit even without user code invoking
Xil_DCacheInvalidateRange, this problem can be encountered when using the lwIP TCP transmit path (which is where I ran into it).This can affect normal TCP transmit paths when using Xilinx lwIP with AXI Ethernet and no-copy buffers.
In:
ThirdParty/sw_services/lwip220/src/lwip-2.2.0/contrib/ports/xilinx/netif/xaxiemacif_dma.cthe AXI Ethernet DMA transmit path flushes each pbuf payload before handing it to the DMA engine:
For MicroBlaze write-through D-cache configurations, the cache macro in:
lib/bsp/standalone/src/common/xenv_standalone.hmaps
XCACHE_FLUSH_DCACHE_RANGE(Addr, Len)tomicroblaze_invalidate_dcache_range(...).Therefore, if a no-copy TCP transmit buffer has a payload range that touches the final cache line near
0xFFFFFFFF, the Ethernet transmit path can stall indefinitely inside the cache maintenance routine.This is particularly easy to hit with large external-memory buffers placed near the top of the 32-bit address space. For example, a no-copy transmit buffer at
0xC0000000with length0x40000000ends exactly at0xFFFFFFFF.Why This Appears To Be A Bug
The cache API describes the arguments as a start address and a byte length. It does not document a requirement that the range must avoid the final cache line, or that
addr + lenmust have a representable one-past-end address.The range:
is valid and fully contained in the 32-bit address space, but the function does not return.
Proposed Minimal Fix
Avoid using an address-walk loop that requires representing the address after the final cache line.
One minimal approach is to use the same offset-countdown style already used by the write-back branch of
microblaze_invalidate_dcache_range.S.Conceptually:
In assembly terms, after aligning
r5to the start cache line andr6to the inclusive end cache line, the write-through branch could count down an offset rather than incrementing the address:This avoids ever needing to compute or compare against the address after the final cache line. For the failing large-range case:
The first operation invalidates
aligned_start + offset == 0xFFFFFFF0, then the offset counts down to zero and the loop terminates normally.The same issue appears to exist in the write-through path of:
lib/bsp/standalone/src/microblaze/microblaze_flush_dcache_range.SThat file uses the same upward address-walk pattern, so it may need the same treatment.