This uses scalar indexing, this is failing on GPU with ERROR: Scalar indexing is disallowed., we should update it to use view_linear (we don't need to branch with view_matrix as the size doesn't matter here, we already checked they are of the same size in infer_size):
|
if node.index == 1 # :+ |
|
for j in _eachindex(f.sizes, k) |
|
tmp_sum = zero(T) |
|
for c_idx in children_indices |
|
ix = children_arr[c_idx] |
|
@j f.partials_storage[ix] = one(T) |
|
tmp_sum += @j f.forward_storage[ix] |
|
end |
|
@j f.forward_storage[k] = tmp_sum |
|
end |
|
elseif node.index == 2 # :- |
|
@assert N == 2 |
|
child1 = first(children_indices) |
|
@inbounds ix1 = children_arr[child1] |
|
@inbounds ix2 = children_arr[child1+1] |
|
for j in _eachindex(f.sizes, k) |
|
tmp_sub = @j f.forward_storage[ix1] |
|
tmp_sub -= @j f.forward_storage[ix2] |
|
@j f.partials_storage[ix1] = one(T) |
|
@j f.partials_storage[ix2] = -one(T) |
|
@j f.forward_storage[k] = tmp_sub |
|
end |
Once this is done, we should be able to do
Y_hat - Y and not just
Y_hat .- Y for training NN on GPU.
This uses scalar indexing, this is failing on GPU with
ERROR: Scalar indexing is disallowed., we should update it to useview_linear(we don't need to branch withview_matrixas the size doesn't matter here, we already checked they are of the same size ininfer_size):ArrayDiff.jl/src/reverse_mode.jl
Lines 205 to 226 in 2bec169
Once this is done, we should be able to do
Y_hat - Yand not justY_hat .- Yfor training NN on GPU.