Difference between clone() vs detach() copy.deepcopy() in PyTorch

Python assignment operators do not create copies of objects, they only bind names to an object. For immutable objects, that usually doesn’t make a difference.

Working with mutable objects or collections of mutable objects, you might be looking for a way to create “real copies” or “clones” of these objects.

Essentially, you’ll sometimes want copies that you can modify without automatically modifying the original simultaneously. In this article, we will explain the difference between detach(), clone(), copy.deepcopy(), detach().clone(), and clone().detach() for a tensor.

Difference between detach() and clone()

tensor.detach() creates a tensor that shares storage with a tensor that does not require grad. You should use detach() when attempting to remove a tensor from a computation graph. When you do tensor.detach() it makes requres_grad false.

x = torch.tensor(([1.0]),requires_grad=True)
y = x.detach()
print(y.requires_grad) #False

Detach tensor shares storage which means that any modifications made to the detached tensor also occur to the attached version.

y[0]=2.0
print(x) #tensor([2.], requires_grad=True)

Another use case is when you want to clone/copy a non-parameter Tensor without autograd. You should use .detach(). If you perform operations on the tensor without affecting the origin gradients you need to add detach(),detach() does not affect the origin graph.

x = torch.tensor(([1.0]),requires_grad=True)
y = x**2
z = 2*y
w= z**3

p = z
q = torch.tensor(([2.0]), requires_grad=True)

pq = p*q

pq.backward(retain_graph=True)

w.backward()
print(x.grad) #tensor([56.])

The gradient will be accumulated in z, and it gives the result of ‘56’.

x = torch.tensor(([1.0]),requires_grad=True)
y = x**2
z = 2*y
w= z**3

# detach it, so the gradient w.r.t `p` does not effect `z`!
p = z.detach()
q = torch.tensor(([2.0]), requires_grad=True)
pq = p*q
pq.backward(retain_graph=True)

w.backward()
print(x.grad)#tensor([48.])

The gradient is 48. So detach() makes the gradient flow in the subpath no harm to the main path.

.detach() gives a new tensor that is a view of the original one, any inplace modification of one will affect the other.

tensor.clone() creates a copy of the tensor that imitates the original tensor’s requires_grad field. You should use a clone to copy the tensor for keeping the copy as a part of the computation graph it came from.

.clone() produces a new tensor instance with a new memory allocation to the tensor data. In addition, it remembers the history of the original tensor and is connected to the earlier graph, and appears as CloenBackward. The main advantage it seems is that it’s safer wrt in-place ops.

x=torch.ones(5,requires_grad=True)

y=x.clone()*2

z=x.clone()*3

sum=(y+z).sum()

sum.backward()

print(x.grad) # tensor([5., 5., 5., 5., 5.])

When you do clone() on tensors, their clones will still be on the graph and any operations on them will be reflected in the graph right. For example, changing the values or attributes will also change the original tensor as well or affect the graph computation when doing a backward pass.

.clone() give a new tensor with the same content backed with new memory.

Difference between detach().clone() vs clone().detach()

They will give an equivalent end result. The minor optimization of doing detach() first is that the clone operation won’t be tracked. If you do clone first, then the autograd info is created for the clone and after the detach, because they are inaccessible, they are deleted.

So the end result is the same, but you do a bit more useless work. In any meaningful workload, you shouldn’t see any perf difference though. So no need to worry too much about it.

If you want a new Tensor backward with new memory and that does not share the autograd history of the original one use .detach().clone() .

copy.deepcopy() vs clone()

copy.deepcopy() makes a deep copy of the original tensor meaning it creates a new tensor instance with a new memory allocation to the tensor data.

import copy

x=torch.ones(5,requires_grad=True)

x_deepcopy = copy.deepcopy(x)

print(x,'x')   #tensor([1., 1., 1., 1., 1.], requires_grad=True) x
print(x_deepcopy,'x deepcopy')# tensor([1., 1., 1., 1., 1.], requires_grad=True) x deepcopy

copy.deepcopy disregards any graph-related information and just copies the data as a simple object while the clone will create a new tensor and any operations on it will be reflected in the graph and in order to prevent this you need to use detach as well.

For Tensors in most cases, you should go for a clone since this is a PyTorch operation that will be recorded by autograd. When it comes to Module, there is no clone method available so you can either use copy.deepcopy or creates a new instance of the model and just copies the parameters.

Difference between clone() vs detach() copy.deepcopy() in PyTorch

Difference between detach() and clone()

Difference between detach().clone() vs clone().detach()

copy.deepcopy() vs clone()

Related Post

Latest Posts