Category: Data structure

  • Efficient String Search algorithms among Millions of Strings

    Efficient String Search in a Large List (2025) Searching for a specific string within a list containing millions of entries requires efficient algorithms and data structures to avoid performance bottlenecks. A simple linear search would be highly inefficient in this scenario. Here are several efficient ways to tackle this problem in 2025: 1. Using a Read more

  • Most used Search Algorithms

    Search Algorithms for Techies (2025) As techies, understanding search algorithms is fundamental. Whether you’re working with databases, web search, AI, or even game development, efficient search is often at the core of your applications. Here’s a look at essential search algorithms in 2025, categorized for clarity: Basic Search Algorithms Linear Search (Sequential Search): A straightforward Read more

  • Top 8 Essential Data Structures You Should Know

    Data structures are fundamental building blocks in computer science, enabling efficient organization and manipulation of data. Understanding these structures is crucial for writing effective and performant code. Here are eight of the most commonly used data structures: 1. Arrays (and Python Lists) An array is a contiguous block of memory used to store a collection Read more

  • Detail of Parquet

    The Parquet format is a column-oriented data storage format designed for efficient data storage and retrieval. It is an open-source project within the Apache Hadoop ecosystem. Here’s a breakdown of its key aspects: Key Characteristics: Advantages of Using Parquet: Disadvantages of Using Parquet: Parquet vs. Other Data Formats: In summary, Parquet is a powerful and Read more

  • Data Lake vs. Data Lakehouse: Understanding Modern Data Architectures

    Organizations today grapple with ever-increasing volumes and varieties of data. To effectively store, manage, and analyze this data, different architectural approaches have emerged. Two prominent concepts in this landscape are the data lake and the data lakehouse. While both aim to provide a centralized data repository, they differ significantly in their design principles and capabilities. Read more

  • Apache Spark

    Let’s illustrate Apache Spark with a classic “word count” example using PySpark (the Python API for Spark). This example demonstrates the fundamental concepts of distributed data processing with Spark. Scenario: You have a large text file (or multiple files) and you want to count the occurrences of each unique word in the file(s). Steps: from Read more

  • What is a Tensor

    In the realm of computer science, especially within the fields of machine learning and deep learning, a tensor is a fundamental data structure. Think of it as a generalization of vectors and matrices to potentially higher dimensions. Here’s a breakdown of how to understand tensors: Key Properties of Tensors: Why are Tensors Important in Machine Read more

  • Tensor

    PyTorch’s fundamental data structure is the Tensor. It’s the central object for numerical computation in PyTorch, analogous to NumPy’s ndarray but with added capabilities for GPU acceleration and automatic differentiation (crucial for deep learning). Here’s a breakdown of PyTorch’s data structure landscape, with the Tensor at the core: 1. Tensors (torch.Tensor) 2. NumPy Arrays (numpy.ndarray) Read more