DeepLearning:Linear Class解释

[code lang="python"]
class Linear(Node):
Represents a node that performs a linear transform.
def __init__(self, X, W, b):
# The base class (Node) constructor. Weights and bias
# are treated like inbound nodes.
Node.__init__(self, [X, W, b])

def forward(self):
Performs the math behind a linear transform.
X = self.inbound_nodes[0].value
W = self.inbound_nodes[1].value
b = self.inbound_nodes[2].value
self.value = np.dot(X, W) + b

def backward(self):
Calculates the gradient based on the output values.
# Initialize a partial for each of the inbound_nodes.
self.gradients = {n: np.zeros_like(n.value) for n in self.inbound_nodes}
# Cycle through the outputs. The gradient will change depending
# on each output, so the gradients are summed over all outputs.
for n in self.outbound_nodes:
# Get the partial of the cost with respect to this node.
grad_cost = n.gradients[self]
# Set the partial of the loss with respect to this node's inputs.
self.gradients[self.inbound_nodes[0]] += np.dot(grad_cost, self.inbound_nodes[1].value.T)
# Set the partial of the loss with respect to this node's weights.
self.gradients[self.inbound_nodes[1]] += np.dot(self.inbound_nodes[0].value.T, grad_cost)
# Set the partial of the loss with respect to this node's bias.
self.gradients[self.inbound_nodes[2]] += np.sum(grad_cost, axis=0, keepdims=False)

1. the loss with respect to inputs
self.gradients[self.inbound_nodes[0]] += np.dot(grad_cost, self.inbound_nodes[1].value.T)


[code]np.dot(grad_cost, self.inbound_nodes[1].value.T)[/code]

对于Linear节点来说,有三个输入参数,即inputs, weights, bias分别对应着


So, each node will pass on the cost gradient to its inbound nodes and each node will get the cost gradient from it's outbound nodes. Then, for each node we'll need to calculate a gradient that's the cost gradient times the gradient of that node with respect to its inputs.


[code]np.dot(self.inbound_nodes[0].value.T, grad_cost)[/code]


[code]np.sum(grad_cost, axis=0, keepdims=False)[/code]



于是可以理解为什么要for n in self.outbound_nodes: 目的是为了在每一个节点的输出节点里遍历。
If a node has multiple outgoing nodes, you just sum up the gradients from each node.

要区分Backpropagation 和Gradient Descent是两个步骤,我通过Backpropagation找到gradient,于是找到了变化方向。再通过Gradient Descent来最小化误差。

To find the gradient, you just multiply the gradients for all nodes in front of it going backwards from the cost. This is the idea behind backpropagation. The gradients are passed backwards through the network and used with gradient descent to update the weights and biases.


Backpropagation只求了导数部分。Gradient Descent则是整个过程。


4 条评论:


Ubuntu SSR setting

使用electron-ssr客户端 https://github.com/shadowsocksrr/electron-ssr 设置proxy