Pipelex · lchoquel · Jul 7, 2025 · Jul 2, 2025 · Jul 7, 2025 · Jul 7, 2025
diff --git a/.cursor/rules/best_practices.mdc b/.cursor/rules/best_practices.mdc
@@ -15,13 +15,9 @@ This document outlines the core best practices and patterns used in our codebase
    - Use type hints for all variables where type is not obvious
 
 2. **StrEnum**
-   - Import StrEnum from pipelex.types
+    If you want to use StrEnum, import it from `pipelex.types`
    ```python
    from pipelex.types import StrEnum
-
-   class ModelType(StrEnum):
-       GPT4 = "gpt-4"
-       GPT35 = "gpt-3.5-turbo"
    ```
 
 ## Factory Pattern
@@ -75,3 +71,7 @@ This document outlines the core best practices and patterns used in our codebase
            "The fal-client SDK is required to use FAL models."
        ) from exc
    ```
+
+## Pipelines
+
+Always run the cli `pipelex validate` when you are finished writing pipelines: This checks for errors. If there are errors, iterate.
diff --git a/.cursor/rules/llms.mdc b/.cursor/rules/llms.mdc
@@ -1,5 +1,5 @@
 ---
-description: Use LLM models with approrpiate settings. Define LLM handles. Define LLM parameters directly in PipeLLM or through presets.
+description: 
 globs: 
 alwaysApply: false
 ---
@@ -41,7 +41,7 @@ Here is an example of using an llm_handle to specify which LLM to use in a PipeL
 PipeLLM = "Write text about Hello World."
 output = "Text"
 llm = { llm_handle = "gpt-4o-mini", temperature = 0.9, max_tokens = "auto" }
-prompt = """
+prompt_template = """
 Write a haiku about Hello World.
 """
 ```

diff --git a/.cursor/rules/pipe-batch.mdc b/.cursor/rules/pipe-batch.mdc
@@ -0,0 +1,31 @@
+---
+description: 
+globs: 
+alwaysApply: false
+---
+# PipeBatch Controller
+
+The PipeBatch controller allows you to apply a pipe operation to each element in a list of inputs in parallele. It is created via a PipeSequence.
+
+## Usage in TOML Configuration
+
+```toml
+[pipe.sequence_with_batch]
+PipeSequence = "A Sequence of pipes"
+inputs = { input_data = "ConceptName" }
+output = "OutputConceptName"
+steps = [
+    { pipe = "pipe_to_apply", batch_over = "input_list", batch_as = "current_item", result = "batch_results" }
+]
+```
+
+## Key Parameters
+
+- `pipe`: The pipe operation to apply to each element in the batch
+- `batch_over`: The name of the list in the context to iterate over
+- `batch_as`: The name to use for the current element in the pipe's context
+- `result`: Where to store the results of the batch operation
+
+# Important tip
+
+Always run the cli `pipelex validate` when you are finished writing pipelines: This checks for errors. If there are errors, iterate.
diff --git a/.cursor/rules/pipe-condition.mdc b/.cursor/rules/pipe-condition.mdc
@@ -0,0 +1,50 @@
+---
+description: 
+globs: 
+alwaysApply: false
+---
+# PipeCondition Controller
+
+The PipeCondition controller allows you to implement conditional logic in your pipeline, choosing which pipe to execute based on an evaluated expression. It supports both direct expressions and expression templates.
+
+## Usage in TOML Configuration
+
+### Basic Usage with Direct Expression
+
+```toml
+[pipe.conditional_operation]
+PipeCondition = "A conditonal pipe to decide wheter..."
+inputs = { input_data = "CategoryInput" }
+output = "native.Text"
+expression = "input_data.category"
+
+[pipe.conditional_operation.pipe_map]
+small = "process_small"
+medium = "process_medium"
+large = "process_large"
+```
+or
+```toml
+[pipe.conditional_operation]
+PipeCondition = "A conditonal pipe to decide wheter..."
+inputs = { input_data = "CategoryInput" }
+output = "native.Text"
+expression_template = "{{ input_data.category }}" # Jinja2 code
+
+[pipe.conditional_operation.pipe_map]
+small = "process_small"
+medium = "process_medium"
+large = "process_large"
+```
+
+## Key Parameters
+
+- `expression`: Direct boolean or string expression (mutually exclusive with expression_template)
+- `expression_template`: Jinja2 template for more complex conditional logic (mutually exclusive with expression)
+- `pipe_map`: Dictionary mapping expression results to pipe codes : 
+1 - The key on the left (`small`, `medium`) is the result of `expression` or `expression_template`.
+2 - The value on the right (`process_small`, `process_medium`, ..) is the name of the pipce to trigger
+
+# Important tip
+
+Always run the cli `pipelex validate` when you are finished writing pipelines: This checks for errors. If there are errors, iterate.
diff --git a/.cursor/rules/pipe-func.mdc b/.cursor/rules/pipe-func.mdc
diff --git a/.cursor/rules/pipe-imgg.mdc b/.cursor/rules/pipe-imgg.mdc
@@ -0,0 +1,5 @@
+---
+description:
+globs:
+alwaysApply: false
+---
diff --git a/.cursor/rules/pipe-llm.mdc b/.cursor/rules/pipe-llm.mdc
@@ -0,0 +1,107 @@
+---
+description: 
+globs: 
+alwaysApply: false
+---
+# PipeLLM Guide
+
+## Purpose
+
+PipeLLM is used to:
+1. Generate text or objects with LLMs
+2. Process images with Vision LLMs
+
+## Basic Usage
+
+### Simple Text Generation
+```toml
+[pipe.write_story]
+PipeLLM = "Write a short story"
+output = "Text"
+prompt_template = """
+Write a short story about a programmer.
+"""
+```
+
+### Structured Data Extraction
+```toml
+[pipe.extract_info]
+PipeLLM = "Extract information"
+inputs = { text = "Text" }
+output = "PersonInfo"
+prompt_template = """
+Extract person information from this text:
+@text
+"""
+```
+
+### Where to Put Structured Objects
+Place your Pydantic models in `pipelex_libraries/pipelines/your_models.py`:
+
+```python
+from pipelex.core.stuff_content import StructuredContent
+
+class PersonInfo(StructuredContent): # The output models always have to be subclass of StructuredContent
+    name: str
+    age: int
+    email: str
+```
+
+## Advanced Features
+
+### LLM Settings
+
+You can specify LLM settings in two ways:
+
+1. **Direct in the pipe**:
+```toml
+[pipe.analyze]
+PipeLLM = "Analyze text"
+output = "Analysis"
+llm = { llm_handle = "gpt-4", temperature = 0.7 }
+prompt_template = "Analyze this text"
+```
+
+2. **Using predefined settings** from `pipelex_libraries/llm_deck/base_llm_deck.toml`:
+```toml
+[pipe.analyze]
+PipeLLM = "Analyze text"
+output = "Analysis"
+llm = "llm_for_analysis"  # References a preset from llm_deck
+prompt_template = "Analyze this text"
+```
+
+### System Prompts
+Add system-level instructions:
+```toml
+[pipe.expert_analysis]
+PipeLLM = "Expert analysis"
+output = "Analysis"
+system_prompt = "You are a data analysis expert"
+prompt_template = "Analyze this data"
+```
+
+### Multiple Outputs
+Generate multiple results:
+```toml
+[pipe.generate_ideas]
+PipeLLM = "Generate ideas"
+output = "Idea"
+nb_output = 3  # Generate exactly 3 ideas
+# OR
+multiple_output = true  # Let the LLM decide how many to generate
+```
+
+### Vision Tasks
+Process images with VLMs:
+```toml
+[pipe.analyze_image]
+PipeLLM = "Analyze image"
+inputs = { image = "Image" } # `image` is the name of the stuff that contains the Image. If its in a stuff, you can add something like `{ "page.image": "Image" }
+output = "ImageAnalysis"
+prompt_template = "Describe what you see in this image"
+```
+
+# Important tip
+
+Always run the cli `pipelex validate` when you are finished writing pipelines: This checks for errors. If there are errors, iterate.
diff --git a/.cursor/rules/pipe-ocr.mdc b/.cursor/rules/pipe-ocr.mdc
@@ -0,0 +1,39 @@
+---
+description: 
+globs: 
+alwaysApply: false
+---
+# PipeOCR Guide
+
+## Purpose
+
+Extract text and images from an image or a PDF
+
+## Basic Usage
+
+### Simple Text Generation
+```toml
+[pipe.extract_info]
+PipeOcr = "extract the information"
+inputs = { ocr_input = "PDF" } # or { ocr_input = "Image" } if its an image. This is the only input
+output = "Page"
+```
+
+The output concept `Page` is a native concept, with the structure `PageContent`:
+It corresponds to 1 page. Therefore, the PipeOcr is outputing a `ListContent` of `Page`
+
+```python
+class TextAndImagesContent(StuffContent):
+    text: Optional[TextContent]
+    images: Optional[List[ImageContent]]
+
+class PageContent(StructuredContent):
+    text_and_images: TextAndImagesContent
+    page_view: Optional[ImageContent] = None
+```
+- `text_and_images` are the text, and the related images found in the input image or PDF.
+- `page_view` is the screenshot of the whole pdf page/image.
+
+# Important tip
+
+Always run the cli `pipelex validate` when you are finished writing pipelines: This checks for errors. If there are errors, iterate.
diff --git a/.cursor/rules/pipe-parallel.mdc b/.cursor/rules/pipe-parallel.mdc
@@ -0,0 +1,5 @@
+---
+description:
+globs:
+alwaysApply: false
+---
diff --git a/.cursor/rules/pipe-sequence.mdc b/.cursor/rules/pipe-sequence.mdc
@@ -0,0 +1,58 @@
+---
+description: 
+globs: 
+alwaysApply: false
+---
+# PipeSequence Guide
+
+## Purpose
+PipeSequence executes multiple pipes in a defined order, where each step can use results from previous steps.
+
+## Basic Structure
+```toml
+[pipe.your_sequence_name]
+PipeSequence = "Description of what this sequence does"
+inputs = { input_name = "InputType" } # All the inputs of the sub pipes, except the ones generated by intermediate steps
+output = "OutputType"
+steps = [
+    { pipe = "first_pipe", result = "first_result" },
+    { pipe = "second_pipe", result = "second_result" },
+    { pipe = "final_pipe", result = "final_result" }
+]
+```
+
+## Key Components
+
+1. **Steps Array**: List of pipes to execute in sequence
+   - `pipe`: Name of the pipe to execute
+   - `result`: Name to assign to the pipe's output that will be in the working memory
+
+2. **Working Memory**: Each step can access:
+   - Original sequence inputs
+   - Results from previous steps
+   - Use the result names in subsequent steps
+
+## Using PipeBatch in Steps
+
+You can use PipeBatch functionality within steps using `batch_over` and `batch_as`:
+
+```toml
+steps = [
+    { pipe = "process_items", batch_over = "input_list", batch_as = "current_item", result = "processed_items"
+    }
+]
+```
+
+1. **batch_over**: Specifies a `ListContent` field to iterate over. Each item in the list will be processed individually and IN PARALLEL by the pipe.
+   - Must be a `ListContent` type containing the items to process
+   - Can reference inputs or results from previous steps
+
+2. **batch_as**: Defines the name that will be used to reference the current item being processed
+   - This name can be used in the pipe's input mappings
+   - Makes each item from the batch available as a single element
+
+The result of a batched step will be a `ListContent` containing the outputs from processing each item.
+
+# Important tip
+
+Always run the cli `pipelex validate` when you are finished writing pipelines: This checks for errors. If there are errors, iterate.