Show HN: Unsiloed – VLMs for Document Ingestion

I'm excited to introduce Unsiloed Chunker, an open-source Python library designed for efficient document chunking in retrieval-augmented generation (RAG) applications.

Key Features:

Multi-threaded Processing: Speeds up chunking operations by processing multiple documents simultaneously. Supports Multiple File Types: Handles PDF, DOCX, and PPTX formats. Flexible Chunking Strategies: Offers fixed-size and page-based chunking methods. Zero Dependencies: Lightweight and easy to integrate into your projects. Installation:

pip install unsiloed-chunker Usage Example:

from unsiloed_chunker import Chunker

chunker = Chunker(file_path="your_document.pdf") chunks = chunker.chunk(strategy="fixed_size", chunk_size=500) for chunk in chunks: print(chunk) For more details, check out the documentation.

I'd love to hear your feedback and suggestions!


Comments URL: https://news.ycombinator.com/item?id=44272502

Points: 1

# Comments: 0