Pipelines
A pipeline is a pre-configured processing workflow that bundles together all the essential components needed for document ingestion and vector search. By encapsulating these components in a single entity, pipelines simplify the setup process and ensure consistency across your application.
What's in a Pipeline?
Each pipeline consists of four key components:
- Embedding Model: Transforms text into high-dimensional vectors that capture semantic meaning
- Vector Store: Manages the storage and retrieval of embeddings for efficient similarity search
- Default Text Splitter: Defines how documents are chunked into processable segments
- Default Distance Strategy: Specifies the method for measuring similarity between vectors (e.g., cosine similarity, Euclidean distance)
Pipelines allow you to define these settings once and reuse them throughout your application, making it easy to maintain consistent behavior across different operations.
Classifications
Pipelines can also be associated with classifications, which provide a hierarchical way to categorize and organize your documents. Classifications enable you to:
- Group related documents together for easier management
- Filter search results based on document categories
- Apply different processing rules to different document types
- Maintain logical separation between different content domains
By combining pipelines with classifications, you can build sophisticated document management systems that are both flexible and maintainable.