Estimated reading time: 5 minutes
K-Nearest Neighbors (KNN)
KNN is a simple yet effective algorithm for classification and regression. It classifies a new data point based on the majority class among its K nearest neighbors in the feature space.
Use Cases:
- Image recognition.
- Recommendation systems.
- Pattern recognition.
Sample Data:
import numpy as np
# Features (Feature 1, Feature 2)
neighbor_features = np.array([[1, 1], [1, 2], [2, 0], [5, 5], [5, 6], [6, 4]])
# Target (Class)
neighbor_labels = np.array([0, 0, 0, 1, 1, 1])
neighbor_features
are the coordinates of data points, and neighbor_labels
are their corresponding classes.
Code Implementation (Classification Example):
from sklearn.neighbors import KNeighborsClassifier
import numpy as np
X = np.array([[1, 1], [1, 2], [2, 0], [5, 5], [5, 6], [6, 4]])
y = np.array([0, 0, 0, 1, 1, 1])
model = KNeighborsClassifier(n_neighbors=3)
model.fit(X, y)
new_X = np.array([[3, 3]])
predicted_class = model.predict(new_X)
print(f"Predicted class for [3, 3]: {predicted_class}")
Code Explanation:
from sklearn.neighbors import KNeighborsClassifier
: Imports the KNeighborsClassifier class for classification. For regression, useKNeighborsRegressor
.n_neighbors=3
: Specifies the number of nearest neighbors to consider.- The rest of the steps are similar to other classification algorithms.
Support Vector Regression (SVR)
SVR is the regression counterpart of Support Vector Machines. It aims to find a function that has at most a specified deviation from the actually obtained targets for all training data.
Use Cases:
- Time series prediction.
- Financial forecasting.
- Demand forecasting.
Sample Data:
import numpy as np
# Features (Input feature)
svr_features = np.array([[1], [2], [3], [4], [5]])
# Target (Continuous value)
svr_target = np.array([2.1, 3.9, 6.2, 8.1, 9.8])
svr_features
are the input values, and svr_target
are the corresponding continuous target values.
Code Implementation:
from sklearn.svm import SVR
import numpy as np
X = np.array([[1], [2], [3], [4], [5]])
y = np.array([2.1, 3.9, 6.2, 8.1, 9.8])
model = SVR(kernel='rbf')
model.fit(X, y)
new_X = np.array([[6]])
predicted_value = model.predict(new_X)
print(f"Predicted value for [6]: {predicted_value}")
Code Explanation:
from sklearn.svm import SVR
: Imports the SVR class.kernel='rbf'
: Specifies the Radial Basis Function kernel. Other kernels like ‘linear’, ‘poly’, ‘sigmoid’ can also be used.- The rest of the steps are similar to other regression algorithms.
Independent Component Analysis (ICA)
ICA is a statistical and computational technique for revealing hidden factors that underlie sets of random variables, measurements, or signals.
Use Cases:
- Signal processing (e.g., separating audio sources).
- Biomedical signal analysis (e.g., EEG analysis).
- Financial data analysis.
Sample Data:
import numpy as np
# Mixed signals (each row is a mixed signal)
mixed_signals = np.array([[1.0, 2.0], [3.0, 1.5], [1.5, 2.5]])
mixed_signals
represents data where independent sources have been mixed together.
Code Implementation:
from sklearn.decomposition import FastICA
import numpy as np
X = np.array([[1.0, 2.0], [3.0, 1.5], [1.5, 2.5]])
model = FastICA(n_components=2, random_state=0)
S = model.fit_transform(X) # Estimated source signals
A = model.mixing_matrix_ # Estimated mixing matrix
print("Estimated source signals:\n", S)
print("\nEstimated mixing matrix:\n", A)
Code Explanation:
from sklearn.decomposition import FastICA
: Imports the FastICA class, a common implementation of ICA.n_components=2
: Specifies the number of independent components to extract.model.fit_transform(X)
: Fits the ICA model to the mixed data and transforms it to estimate the independent source signals (S
).model.mixing_matrix_
: Provides the estimated mixing matrix (A
) that combined the original sources.
Singular Value Decomposition (SVD)
SVD is a matrix factorization technique that decomposes a matrix into the product of three other matrices. It has applications in dimensionality reduction and recommendation systems.
Use Cases:
- Recommendation systems (e.g., collaborative filtering).
- Dimensionality reduction.
- Information retrieval (e.g., Latent Semantic Analysis).
Sample Data:
import numpy as np
# A user-item rating matrix (rows are users, columns are items, values are ratings)
ratings_matrix = np.array([[5, 1, 0, 0], [1, 0, 0, 4], [0, 0, 5, 0], [0, 3, 0, 0]])
ratings_matrix
represents user ratings for different items, with 0 indicating no rating.
Code Implementation:
import numpy as np
from numpy.linalg import svd
A = np.array([[5, 1, 0, 0], [1, 0, 0, 4], [0, 0, 5, 0], [0, 3, 0, 0]])
U, s, Vh = svd(A)
print("U (User features):\n", U)
print("\ns (Singular values):\n", s)
print("\nVh (Item features):\n", Vh)
# You can use the decomposed matrices for various tasks like recommendation
# by reconstructing a lower-rank approximation of the original matrix.
k = 2 # Number of components to keep
Ak_approx = U[:, :k] @ np.diag(s[:k]) @ Vh[:k, :]
print("\nLower-rank approximation (k=2):\n", Ak_approx)
Code Explanation:
from numpy.linalg import svd
: Imports the Singular Value Decomposition function from NumPy’s linear algebra module.A
: The matrix to be decomposed (e.g., the user-item rating matrix).U, s, Vh = svd(A)
: Performs the SVD, resulting in three matrices:U
: The left singular vectors (representing user features).s
: The singular values (indicating the importance of each component).Vh
: The right singular vectors (representing item features, transposed).
- The code then demonstrates how to reconstruct a lower-rank approximation of the original matrix by keeping only the top `k` singular values and corresponding singular vectors. This lower-rank matrix can be used for recommendations or dimensionality reduction.
Must-know Data Science Algorithms (Part 4)
Leave a Reply