AI Framework

LangChain + DocDigitizer

Use DocDigitizer as a document loader in LangChain RAG and agent pipelines. Structured extraction, not raw text.

Quick Start

bash

pip install langchain-community docdigitizer

Features

📄

Document Loader

Standard LangChain BaseLoader interface.

{}

Structured Metadata

Fields in doc.metadata, not page content.

📁

Batch Loading

Directory paths with configurable concurrency.

🔍

RAG-Optimized

Chunks by document section, not character count.

🤖

Agent Tool

✅

Schema-Aware Chunking

Define schema once for intelligent chunking.

How It Works

Initialize

Create loader with API key and document path

Load

Call .load() to get documents with structured metadata

Index

Store in FAISS, Chroma, or Pinecone

Query

Metadata filters combined with semantic search

python

from langchain_community.document_loaders import DocDigitizerLoader

loader = DocDigitizerLoader(
    api_key="dd-YOUR_KEY",
    file_path="contract.pdf"
)
docs = loader.load()
print(docs[0].metadata["parties"])  # structured extraction

Structured metadata, not raw text chunks

Ready to extract?

Get your API key in 30 seconds. First 50 extractions free.

Get Started Free View Documentation →

Questions? → Talk to Us