AI Framework
LangChain + DocDigitizer
Use DocDigitizer as a document loader in LangChain RAG and agent pipelines. Structured extraction, not raw text.
Quick Start
bash
pip install langchain-community docdigitizer
Features
Document Loader
Standard LangChain BaseLoader interface.
Structured Metadata
Fields in doc.metadata, not page content.
Batch Loading
Directory paths with configurable concurrency.
RAG-Optimized
Chunks by document section, not character count.
Agent Tool
Register as LangChain Tool for dynamic calls.
Schema-Aware Chunking
Define schema once for intelligent chunking.
How It Works
1
Initialize
Create loader with API key and document path
2
Load
Call .load() to get documents with structured metadata
3
Index
Store in FAISS, Chroma, or Pinecone
4
Query
Metadata filters combined with semantic search
python
from langchain_community.document_loaders import DocDigitizerLoader
loader = DocDigitizerLoader(
api_key="dd-YOUR_KEY",
file_path="contract.pdf"
)
docs = loader.load()
print(docs[0].metadata["parties"]) # structured extractionStructured metadata, not raw text chunks
Ready to extract?
Get your API key in 30 seconds. First 50 extractions free.
Questions? โ Talk to Us