I'm excited to introduce Unsiloed Chunker, an open-source Python library designed for efficient document chunking in retrieval-augmented generation (RAG) applications.
Key Features:
Multi-threaded Processing: Speeds up chunking operations by processing multiple documents simultaneously. Supports Multiple File Types: Handles PDF, DOCX, and PPTX formats. Flexible Chunking Strategies: Offers fixed-size and page-based chunking methods. Zero Dependencies: Lightweight and easy to integrate into your projects. Installation:
pip install unsiloed-chunker Usage Example:
from unsiloed_chunker import Chunker
chunker = Chunker(file_path="your_document.pdf") chunks = chunker.chunk(strategy="fixed_size", chunk_size=500) for chunk in chunks: print(chunk) For more details, check out the documentation.
I'd love to hear your feedback and suggestions!
Comments URL: https://news.ycombinator.com/item?id=44272502
Points: 1
# Comments: 0