Basic principles and use of Autodiff#

In the field of deep learning, any complex deep neural network model can essentially be represented by a computational graph.

There is a process of forward calculation and reverse calculation in the calculation graph. Training the model parameters of the process, according to need backpropagation ` <https://en.wikipedia.org/wiki/Backpropagation>’_ (Backpropagation) algorithm, layer by layer using chain rule calculation parameter on the gradient of the loss function. As a type of deep learning framework, MegEngine implements an automatic differentiation mechanism, which can automatically calculate and maintain the gradient information of Tensor during the backpropagation process. This mechanism can also be used in other scientific computing scenarios.

In this section, we will introduce the Gradient property of Tensor and how to use the autodiff module to complete the corresponding work.

Gradient and Gradient Manager#

Let’s take the following simplest operation \(y=w*x+b\) as an example:

>>> from megengine import Tensor
>>> x = Tensor([3.])
>>> w = Tensor([2.])
>>> b = Tensor([-1.])
>>> y = w * x + b

Tensor has a grad (ie gradient) attribute, which is used to record gradient information and can be used in scenarios where gradients are required to participate in calculations.

By default, Tensor calculation is not recorded MegEngine gradient information:

>>> print(x.grad)
None

If you want to manage Tensor gradient, need to use: py: class: ~ .GradManager , which was by reverse mode automatic differentiation:

>>> from megengine.autodiff import GradManager
>>> with GradManager() as gm:
...      gm.attach(x)
...      y = w * x + b
...      gm.backward(y)  # dy/dx = w
>>> x.grad
Tensor([2.], device=xpux:0)

In the above code, the operation history in the with statement will be recorded by the gradient manager. We use the: py:meth:~.GradManager.attach method to bind the Tensor to be tracked (in the example, x), and then perform the calculation; use: py:meth:~.GradManager.backward `Method to calculate the gradient of all bound Tensors for a given ``y`, and add them to the grad attribute of the corresponding Tensor, and release resources in the process.

See also

  • You can use the: py:meth:.Tensor.detach method to return an unbound Tensor.

  • You can query the Tensor bound in the current gradient manager through the: py:meth:~.GradManager.attached_tensors interface.

We can also use record and release instead of with semantics. For more instructions, please refer to:py:class:` ~.GradManager` API documentation.

Neural network programming example#

When training the neural network, we can use a gradient manager to reverse propagation calculations, to obtain gradient information parameter:

gm = GradManager()
gm.attach(model.parameters())

for data in dataset:
    with gm:
        loss = model(data)
        gm.backward(loss)

A more complete usage example can be found in the Getting Started Tutorial of the MegEngine documentation.