Skip to main content
AI Framework

LangChain + DocDigitizer

Use DocDigitizer as a document loader in LangChain RAG and agent pipelines. Structured extraction, not raw text.

Quick Start

bash
pip install langchain-community docdigitizer

Features

๐Ÿ“„

Document Loader

Standard LangChain BaseLoader interface.

{}

Structured Metadata

Fields in doc.metadata, not page content.

๐Ÿ“

Batch Loading

Directory paths with configurable concurrency.

๐Ÿ”

RAG-Optimized

Chunks by document section, not character count.

๐Ÿค–

Agent Tool

Register as LangChain Tool for dynamic calls.

โœ…

Schema-Aware Chunking

Define schema once for intelligent chunking.

How It Works

1

Initialize

Create loader with API key and document path

2

Load

Call .load() to get documents with structured metadata

3

Index

Store in FAISS, Chroma, or Pinecone

4

Query

Metadata filters combined with semantic search

python
from langchain_community.document_loaders import DocDigitizerLoader

loader = DocDigitizerLoader(
    api_key="dd-YOUR_KEY",
    file_path="contract.pdf"
)
docs = loader.load()
print(docs[0].metadata["parties"])  # structured extraction
Structured metadata, not raw text chunks

Ready to extract?

Get your API key in 30 seconds. First 50 extractions free.

Questions? โ†’ Talk to Us