Pipelines

A pipeline is a pre-configured processing workflow that bundles together all the essential components needed for document ingestion and vector search. By encapsulating these components in a single entity, pipelines simplify the setup process and ensure consistency across your application.

What's in a Pipeline?

Each pipeline consists of four key components:

Embedding Model: Transforms text into high-dimensional vectors that capture semantic meaning
Vector Store: Manages the storage and retrieval of embeddings for efficient similarity search
Default Text Splitter: Defines how documents are chunked into processable segments
Default Distance Strategy: Specifies the method for measuring similarity between vectors (e.g., cosine similarity, Euclidean distance)

Pipelines allow you to define these settings once and reuse them throughout your application, making it easy to maintain consistent behavior across different operations.

Classifications

Pipelines can also be associated with classifications, which provide a hierarchical way to categorize and organize your documents. Classifications enable you to:

Group related documents together for easier management
Filter search results based on document categories
Apply different processing rules to different document types
Maintain logical separation between different content domains

By combining pipelines with classifications, you can build sophisticated document management systems that are both flexible and maintainable.

What's in a Pipeline?​

Classifications​

What's in a Pipeline?

Classifications