Skip to content

Replaces inline for loops with proper runtime loops.#241

Open
LucasSantos91 wants to merge 3 commits into
Snektron:masterfrom
LucasSantos91:optimal_load
Open

Replaces inline for loops with proper runtime loops.#241
LucasSantos91 wants to merge 3 commits into
Snektron:masterfrom
LucasSantos91:optimal_load

Conversation

@LucasSantos91
Copy link
Copy Markdown
Contributor

Current load functions make use of inline loops:

pub fn BaseWrapperWithCustomDispatch(DispatchType: type) type {
    return struct {
        const Self = @This();
        pub const Dispatch = DispatchType;

        dispatch: Dispatch,
        pub fn load(loader: anytype) Self {
            var self: Self = .{ .dispatch = .{} };
            inline for (std.meta.fields(Dispatch)) |field| {
                if (loader(Instance.null_handle, field.name.ptr)) |cmd_ptr| {
                    @field(self.dispatch, field.name) = @ptrCast(cmd_ptr);
                }
            }
            return self;
         }

This generates a lot of bloat. Here's a small program for code analysis:

const vk = @import("vk");

fn loadStuff(a: vk.Device, b: [*:0]const u8) vk.PfnVoidFunction {
    return asm volatile (""
        : [ret] "={rax}" (-> vk.PfnVoidFunction),
        : [a] "r" (a),
          [b] "r" (b),
    );
}

pub fn main() void {
    const w: vk.DeviceWrapper = .load(@enumFromInt(1), loadStuff);
    _ = w;
}

Here's the generated assembly:

push rbp
mov rbp, rsp
lea rax, [0x00007FF60FEA3000]
mov ecx, 0x01
lea rax, [0x00007FF60FEA3010]
lea rax, [0x00007FF60FEA3021]
lea rax, [0x00007FF60FEA302F]
lea rax, [0x00007FF60FEA303F]
lea rax, [0x00007FF60FEA3050]
lea rax, [0x00007FF60FEA3061]
...

I truncated it. It just goes on forever.

We can make this better by noticing that Dispatch is supposed to be just an array of pointers. Since fields of same size follow source code order, we can cast the dispatch to a slice, and replace the loop with a proper runtime loop. Also, all handles lower to just a usize.
Here's the generated load function with this patch:

fn loadCommonImpl(loader: *const fn (usize, [*:0]const u8) PfnVoidFunction, handle: usize, names: []const [*:0]const u8, ptrs: [*]PfnVoidFunction) void {
    for (ptrs[0..names.len], names) |*ptr, name| {
        ptr.* = loader(handle, name);
    }
}
pub const BaseWrapper = BaseWrapperWithCustomDispatch(BaseDispatch);
pub fn BaseWrapperWithCustomDispatch(DispatchType: type) type {
    return struct {
        const Self = @This();
        pub const Dispatch = DispatchType;

        dispatch: Dispatch,
        pub fn load(loader: *const fn (Instance, [*:0]const u8) PfnVoidFunction) Self {
            var self: Self = .{ .dispatch = .{} };
            const names = comptime blk: {
                const fields = @typeInfo(Dispatch).@"struct".fields;
                var names: [fields.len][*:0]const u8 = undefined;
                for (&names, fields) |*d, f| d.* = f.name.ptr;
                break :blk names;
            };
            loadCommonImpl(@ptrCast(loader), @intFromEnum(Instance.null_handle), &names, @ptrCast(&self.dispatch));
            return self;
        }

And here's the generated assembly from that sample program:

push rbp
mov rbp, rsp
xor ecx, ecx
lea rdx, [0x00007FF796052000]
mov r8d, 0x01
nop [rax+rax*1], ax
mov rax, [rcx+rdx*1]
add rcx, 0x08
cmp rcx, 0x1668
jnz 0x00007FF796051040 (main)
pop rbp
ret

That's the entire thing.

@Snektron
Copy link
Copy Markdown
Owner

Ah, interesting note. I don't mind changing it, but

Since fields of same size follow source code order, we can cast the dispatch to a slice

This is not true for a non-extern struct. I guess we could change the layout of the dispatch structs to extern though (I doubt that it changes anything in practice).

Also, can you rebase? Master branch should be fixed again now.

@LucasSantos91 LucasSantos91 force-pushed the optimal_load branch 2 times, most recently from 224caba to 3cef569 Compare April 20, 2026 23:34
Also, change `loader` function back to being `anytype`, as a concrete type was causing problems related to callconv.
@LucasSantos91
Copy link
Copy Markdown
Contributor Author

This is not true for a non-extern struct.

You're right. For some reason I had it in my head that it was true even for non-extern structs (in practice, it is true).

Rebased and added the extern qualifier.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants