
In artificial neural networks, the fundamental building blocks are nodes (also called neurons or units). These nodes perform computations on incoming data and pass the result to other nodes in the network. A crucial component of each node is its activation function, which introduces non-linearity and determines the node’s output. The design and choice of these components are critical for the network’s ability to learn complex patterns from data (Nature Research on Neural Networks).
1. Nodes (Neurons/Units) in Detail:
- Computational Units: Nodes are the core processing units within a neural network, inspired by biological neurons. They receive one or more input signals (dendrites), process these signals, and transmit a single output signal (axon) to other neurons. (deeplizard Definition, Khan Academy on Neurons)
- Inputs and Weights: Each node receives inputs ($x_i$) from the previous layer or the initial data. Each input connection has an associated weight ($w_i$), which represents the strength or influence of that input on the node’s activation. Higher weights indicate a stronger influence.
- Weighted Sum: Inside the node, the inputs are multiplied by their corresponding weights, and these weighted inputs are summed together. A bias term ($b$) is then added to this sum:
$z = \sum_{i=1}^{n} (w_i \cdot x_i) + b$
where:- $x_i$ are the input values from the previous layer or input data.
- $w_i$ are the synaptic weights associated with each input.
- $b$ is the bias term, which acts like an offset, allowing the neuron to fire even when the weighted sum of inputs is zero. It helps the network learn more flexible decision boundaries.
- Activation Function: The calculated weighted sum ($z$) is then passed through the node’s activation function ($\sigma$). This non-linear function determines the output of the neuron based on its input. The choice of activation function significantly impacts the network’s learning capabilities and performance.
- Output: The output of the activation function, $a = \sigma(z)$, becomes the activation of the neuron and is transmitted as input to the neurons in the subsequent layer. This flow of activation propagates through the network until the output layer produces the final prediction.
- Organization into Layers: Nodes are typically organized into interconnected layers (IBM Explanation, TensorFlow Keras Layers):
- Input Nodes: These nodes form the input layer and receive the raw data or features. The number of input nodes matches the dimensionality of the input data.
- Hidden Nodes: These nodes reside in one or more hidden layers between the input and output layers. They perform intermediate computations, progressively extracting higher-level features from the input. The depth and width (number of nodes per layer) of the hidden layers are crucial hyperparameters.
- Output Nodes: These nodes form the output layer and produce the final predictions or classifications. The number of output nodes and their activation functions are tailored to the specific task (e.g., one node for binary classification, multiple nodes for multi-class classification or regression).
- More on Neural Network Nodes (GeeksforGeeks)
- TensorFlow Playground (Interactive Visualization) – Explore how nodes and layers interact.
2. Activation Functions in Detail:
- Introducing Non-Linearity: As mentioned earlier, non-linear activation functions are essential for enabling neural networks to model complex, non-linear relationships in data. Without them, the network would be limited to linear transformations, regardless of its depth (V7 Labs Explanation, Towards Data Science on Non-linearity).
- Decision Making (Activation): Activation functions act as a thresholding mechanism, determining the extent to which a neuron’s input should contribute to the next layer. They control the “firing” or activation level of the neuron.
- Common Types of Activation Functions:
- Linear Activation Function (Identity Function):
- Formula: $\sigma(z) = z$
- Output Range: $(-\infty, +\infty)$
- Use Case: Primarily used in the output layer for regression tasks where the output can be any real value. Sometimes used in specific layers for identity mapping.
- Note: While simple, its lack of non-linearity limits its use in most parts of deep neural networks.
- Sigmoid (Logistic) Activation Function:
- Formula: $\sigma(z) = \frac{1}{1 + e^{-z}}$
- Output Range: $(0, 1)$
- Use Case: Historically used in the output layer for binary classification (output interpreted as the probability of belonging to the positive class). Also found in some older recurrent neural networks. However, it can suffer from the vanishing gradient problem, especially in deep networks, where gradients become very small, hindering learning. It’s also not zero-centered.
- Sigmoid Function on Wikipedia
- Tanh (Hyperbolic Tangent) Activation Function:
- Formula: $\sigma(z) = \frac{e^z – e^{-z}}{e^z + e^{-z}}$
- Output Range: $(-1, 1)$
- Use Case: Similar to the sigmoid function but is zero-centered, which can sometimes lead to faster convergence during training. However, it also suffers from the vanishing gradient problem in deep networks.
- Hyperbolic Functions on Wikipedia
- ReLU (Rectified Linear Unit) Activation Function:
- Formula: $\sigma(z) = \max(0, z)$
- Output Range: $[0, +\infty)$
- Use Case: A very popular choice for hidden layers in many types of neural networks due to its simplicity and computational efficiency. It helps to alleviate the vanishing gradient problem for positive inputs as the gradient is 1. A major drawback is the “dying ReLU” problem, where neurons can become inactive if their input is consistently negative.
- ReLU on Wikipedia
- Leaky ReLU:
- Output Range: $(-\infty, +\infty)$
- Use Case: Addresses the dying ReLU problem by allowing a small, non-zero gradient for negative inputs, ensuring that the neuron doesn’t become completely inactive.
- Softmax Activation Function:
- Formula: $\sigma(z)_i = \frac{e^{z_i}}{\sum_{j=1}^{K} e^{z_j}}$, where $K$ is the number of output classes and $z_i$ is the $i$-th element of the input vector to the output layer.
- Output Range: $(0, 1)$ for each output, with the sum of all outputs equal to 1. This represents a probability distribution over the $K$ classes.
- Use Case: Almost exclusively used in the output layer for multi-class classification problems, providing the probability of the input belonging to each class.
- Softmax Function on Wikipedia
- Other Activation Functions: Many other activation functions exist, each with its own characteristics and use cases, including:
- ELU (Exponential Linear Unit): Similar to Leaky ReLU but with a smoother transition for negative inputs.
- SELU (Scaled Exponential Linear Unit): A self-normalizing activation function that can help stabilize learning in deep networks.
- Swish: A relatively new activation function that has shown promising results in some deep learning tasks.
- Overview of More Activation Functions (ML Cheat Sheet)
- Linear Activation Function (Identity Function):
- Choosing the Right Activation Function: The selection of the appropriate activation function is a crucial hyperparameter tuning step. The choice depends on factors such as the type of layer (hidden or output), the nature of the problem (classification or regression), and the potential issues like vanishing gradients. ReLU and its variants are often a good starting point for hidden layers, while the output layer’s activation is dictated by the desired output format (Choosing Activation Functions (Towards Data Science), TensorFlow Keras Activation Layers).
In summary, nodes are the fundamental computational units in neural networks, and activation functions are essential non-linear transformations applied within these nodes. The careful design and selection of nodes and their activation functions are key to building effective and powerful neural network models capable of learning intricate patterns from data.
Leave a Reply