-
Notifications
You must be signed in to change notification settings - Fork 0
Open
Description
I'm not 100% clear about pytorch syntax. Should the two following ways to compute the gradients df/dtheta be equivalent? Why are they not? :) I'm not entirely sure what loss.backward(backward_ones) does. Is this df/d1 ?
loss.mean().backward(retain_variables=True)
print(reg_funcs.params.grad.data)
reg_funcs.params.data.zero_()
loss.backward(backward_ones)
print(reg_funcs.params.grad.data)
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels