Using local LLM for Document Extraction

Estimated reading time: 5 minutes

Non-Cloud LLM for Document Extraction

Non- for Document Extraction

This guide explains how to use a non-cloud version of a pretrained Large Language Model (LLM) for document extraction, focusing on open-source models and local execution.

Phase 1: Setting Up Your Local Environment

1. Hardware Requirements

Ensure your system meets the following recommendations:

  • CPU/: An NVIDIA GPU with sufficient VRAM (8GB+) is highly recommended for faster inference. Otherwise, a powerful multi-core CPU is necessary.
  • RAM: Adequate system RAM to load the model and process documents. Requirements vary by model size.
  • Storage: Sufficient disk space to store the LLM weights (can range from GBs to hundreds of GBs).

2. Software Installation

Install the necessary software and libraries:


        pip install pip --upgrade
        pip install torch torchvision torchaudio  # For  (if your model uses it)
        pip install tensorflow  # For TensorFlow (if your model uses it)
        pip install transformers sentencepiece accelerate  # Hugging Face Transformers
        pip install PyPDF2 Pillow opencv-python  # For document handling
        pip install pytesseract  # For OCR (if needed)
        # Optional:
        # Follow instructions for llama.cpp or Ollama from their respective sites.
    

3. Downloading a Pretrained LLM

Download a suitable open-source LLM from the Hugging Face Hub. Consider models like Mixtral, Llama 2, or smaller models depending on your resources.

Using transformers:


        from transformers import AutoTokenizer, AutoModelForCausalLM
        import torch

        model_name = "mistralai/Mixtral-8x7B-Instruct-v0.1"  # Replace with your chosen model
        tokenizer = AutoTokenizer.from_pretrained(model_name)
        model = AutoModelForCausalLM.from_pretrained(model_name)

        if torch..is_available():
            model = model.to("cuda")
        print(f"Model '{model_name}' loaded successfully.")
    

Using Ollama:


        ollama pull mistral:latest
        echo "Mistral model pulled successfully (if Ollama is installed)."
    

Phase 2: Implementing the Document Extraction

1. Loading Your Document

Load the document content based on its format:

PDF:


        from PyPDF2 import PdfReader

        def extract_text_from_pdf(pdf_path):
            text = ""
            with open(pdf_path, 'rb') as file:
                reader = PdfReader(file)
                for page in reader.pages:
                    text += page.extract_text()
            return text

        pdf_text = extract_text_from_pdf("your_document.pdf")
        print("PDF text extracted.")
    

(Scanned Document):


        from PIL import Image
        import pytesseract

        def extract_text_from_image(image_path):
            img = Image.open(image_path)
            text = pytesseract.image_to_string(img)
            return text

        image_text = extract_text_from_image("scanned_document.png")
        print("Text extracted from image.")
    

Text File:


        with open("document.txt", "r", encoding="utf-8") as f:
            text_file_content = f.read()
        print("Text file content loaded.")
    

2. Preparing the Input Prompt

Create a clear and concise prompt to instruct the LLM on what information to extract and the desired output format (e.g., ).


        document_text = pdf_text if 'pdf_text' in locals() else (image_text if 'image_text' in locals() else (text_file_content if 'text_file_content' in locals() else ""))

        prompt = f"""Extract the following information from the document:
        - Invoice Number
        - Date
        - Customer Name
        - Total Amount

        Document:
        {document_text}

        Output the information as a JSON object.
        """
        print("Prompt created.")
    

3. Interacting with the Local LLM

Send the prompt to your loaded LLM for processing.

Using transformers:


        inputs = tokenizer(prompt, return_tensors="pt")
        if torch.cuda.is_available():
            inputs = inputs.to("cuda")

        outputs = model.generate(**inputs, max_new_tokens=500)
        extracted_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
        print("\nLLM Output:")
        print(extracted_text)

        import json
        try:
            extracted_data = json.loads(extracted_text.split("```json")[-1].split("```")[0].strip())
            print("\nExtracted Data (JSON):")
            print(extracted_data)
        except (json.JSONDecodeError, IndexError):
            print("\nCould not parse JSON from the output.")
            print("Raw output:", extracted_text)
    

Using Ollama:


        import requests
        import json

        data = {
            "prompt": prompt,
            "model": "mistral:latest",  # Ensure Ollama is running and this model is pulled
            "stream": False
        }

        response = requests.post('http://localhost:11434/api/generate', json=data)
        if response.status_code == 200:
            extracted_text = response.json()['response'].strip()
            print("\nLLM Output:")
            print(extracted_text)
            try:
                extracted_data = json.loads(extracted_text)
                print("\nExtracted Data (JSON):")
                print(extracted_data)
            except json.JSONDecodeError:
                print("\nCould not parse JSON from the output.")
                print("Raw output:", extracted_text)
        else:
            print(f"Ollama API error: {response.status_code} - {response.text}")
    

Phase 3: Considerations for Non-Cloud LLM Usage

  • Resource Management: Monitor your system resources, especially GPU and RAM usage.
  • Model Selection: Experiment with different open-source models to find the best balance of and resource consumption.
  • Prompt Engineering: Refine your prompts for better accuracy and desired output format.
  • Fine-tuning: Consider fine-tuning on your specific document types for improved results.
  • Chunking: For large documents, implement chunking strategies to fit within the model’s context window.
  • Error Handling: Add error handling for file operations, OCR, and JSON parsing.

Agentic AI (13) AI Agent (14) airflow (4) Algorithm (21) Algorithms (46) apache (28) apex (2) API (89) Automation (44) Autonomous (24) auto scaling (5) AWS (49) Azure (35) BigQuery (14) bigtable (8) blockchain (1) Career (4) Chatbot (17) cloud (94) cosmosdb (3) cpu (38) cuda (17) Cybersecurity (6) database (78) Databricks (6) Data structure (13) Design (66) dynamodb (23) ELK (2) embeddings (36) emr (7) flink (9) gcp (23) Generative AI (11) gpu (8) graph (36) graph database (13) graphql (3) image (39) indexing (26) interview (7) java (39) json (31) Kafka (21) LLM (16) LLMs (31) Mcp (1) monitoring (85) Monolith (3) mulesoft (1) N8n (3) Networking (12) NLU (4) node.js (20) Nodejs (2) nosql (22) Optimization (62) performance (175) Platform (78) Platforms (57) postgres (3) productivity (15) programming (47) pseudo code (1) python (54) pytorch (31) RAG (36) rasa (4) rdbms (5) ReactJS (4) redis (13) Restful (8) rust (2) salesforce (10) Spark (14) spring boot (5) sql (53) tensor (17) time series (12) tips (7) tricks (4) use cases (35) vector (49) vector db (2) Vertex AI (16) Workflow (35) xpu (1)

Leave a Reply