Skip to content

Speed up +/- of arrays on GPU #83

@blegat

Description

@blegat

This uses scalar indexing, this is failing on GPU with ERROR: Scalar indexing is disallowed., we should update it to use view_linear (we don't need to branch with view_matrix as the size doesn't matter here, we already checked they are of the same size in infer_size):

if node.index == 1 # :+
for j in _eachindex(f.sizes, k)
tmp_sum = zero(T)
for c_idx in children_indices
ix = children_arr[c_idx]
@j f.partials_storage[ix] = one(T)
tmp_sum += @j f.forward_storage[ix]
end
@j f.forward_storage[k] = tmp_sum
end
elseif node.index == 2 # :-
@assert N == 2
child1 = first(children_indices)
@inbounds ix1 = children_arr[child1]
@inbounds ix2 = children_arr[child1+1]
for j in _eachindex(f.sizes, k)
tmp_sub = @j f.forward_storage[ix1]
tmp_sub -= @j f.forward_storage[ix2]
@j f.partials_storage[ix1] = one(T)
@j f.partials_storage[ix2] = -one(T)
@j f.forward_storage[k] = tmp_sub
end

Once this is done, we should be able to do Y_hat - Y and not just Y_hat .- Y for training NN on GPU.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions