Estimated reading time: 2 minutes

Element-wise Tensor Multiplication, also known as Hadamard product, involves multiplying corresponding elements of two tensors that have the same shape. Utilizing CUDA on a GPU significantly accelerates this operation through parallel processing.
Code Example with PyTorch and CUDA
import torch
# Check if CUDA is available and set the device
if torch.cuda.is_available():
device = torch.device("cuda")
print(f"Using CUDA device: {torch.cuda.get_device_name(0)}")
else:
device = torch.device("cpu")
print("CUDA not available, using CPU")
# Define two tensors of the same shape
tensor_a = torch.tensor([[1, 2, 3], [4, 5, 6]], dtype=torch.float32).to(device)
tensor_b = torch.tensor([[7, 8, 9], [10, 11, 12]], dtype=torch.float32).to(device)
# Perform element-wise tensor multiplication
tensor_product = torch.mul(tensor_a, tensor_b)
# Alternatively, you can use the '*' operator:
# tensor_product = tensor_a * tensor_b
# Print the result
print("Tensor A:\n", tensor_a)
print("Tensor B:\n", tensor_b)
print("Element-wise Product of Tensor A and Tensor B:\n", tensor_product.cpu().numpy())
Code Explanation:
import torch
: Imports the PyTorch library.if torch.cuda.is_available(): ... else: ...
: Checks for CUDA availability and sets thedevice
accordingly.tensor_a = torch.tensor(...)
andtensor_b = torch.tensor(...)
: Create two tensors and move them to the specified device.tensor_product = torch.mul(tensor_a, tensor_b)
: Performs element-wise multiplication.- The
print()
statements display the original tensors and the result.
CUDA Acceleration of Element-wise Tensor Multiplication
When tensors are processed on a CUDA-enabled GPU, the element-wise multiplication operation is parallelized across the GPU’s numerous cores, leading to significant speedups.
In Transformer networks, element-wise multiplication is used to apply attention scores to value embeddings, scaling their importance. Efficient execution on the GPU with CUDA is crucial for the performance of LLMs.
Leave a Reply