Estimated reading time: 3 minutes

Tensor Addition is a fundamental operation in tensor algebra. It involves adding corresponding elements of two tensors that have the same shape, resulting in a new tensor of the same shape where each element is the sum of the corresponding elements of the input tensors. When performed on a GPU using CUDA, this operation can be highly parallelized, leading to significant performance gains, especially for large tensors.
Code Example with PyTorch and CUDA
import torch
# Check if CUDA is available and set the device
if torch.cuda.is_available():
device = torch.device("cuda")
print(f"Using CUDA device: {torch.cuda.get_device_name(0)}")
else:
device = torch.device("cpu")
print("CUDA not available, using CPU")
# Define two tensors of the same shape
tensor_a = torch.tensor([[1, 2, 3], [4, 5, 6]], dtype=torch.float32).to(device)
tensor_b = torch.tensor([[7, 8, 9], [10, 11, 12]], dtype=torch.float32).to(device)
# Perform tensor addition
tensor_sum = torch.add(tensor_a, tensor_b)
# Alternatively, you can use the '+' operator:
# tensor_sum = tensor_a + tensor_b
# Print the result
print("Tensor A:\n", tensor_a)
print("Tensor B:\n", tensor_b)
print("Sum of Tensor A and Tensor B:\n", tensor_sum.cpu().numpy())
Code Explanation:
import torch
: Imports the PyTorch library.if torch.cuda.is_available(): ... else: ...
: Checks for CUDA availability and sets thedevice
accordingly.tensor_a = torch.tensor(...)
andtensor_b = torch.tensor(...)
: Create two tensors and move them to the specified device.tensor_sum = torch.add(tensor_a, tensor_b)
: Performs element-wise addition.- The
print()
statements display the original tensors and the result.
How CUDA Accelerates Tensor Addition
When these tensors are on the GPU, CUDA enables parallel computation of the addition operation. Each element-wise addition can be performed by a different thread running on the GPU’s numerous cores. This massive parallelism allows for significantly faster computation, especially for very large tensors, compared to performing the additions sequentially on a CPU.
In Recurrent Neural Networks (RNNs) and Transformer architectures, hidden states are often updated through a series of computations. Tensor addition can be used to combine different components of these updates, accelerating the processing of sequential data crucial for LLMs.
Leave a Reply