Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 5 additions & 5 deletions .cursor/rules/best_practices.mdc
Original file line number Diff line number Diff line change
Expand Up @@ -15,13 +15,9 @@ This document outlines the core best practices and patterns used in our codebase
- Use type hints for all variables where type is not obvious

2. **StrEnum**
- Import StrEnum from pipelex.types
If you want to use StrEnum, import it from `pipelex.types`
```python
from pipelex.types import StrEnum

class ModelType(StrEnum):
GPT4 = "gpt-4"
GPT35 = "gpt-3.5-turbo"
```

## Factory Pattern
Expand Down Expand Up @@ -75,3 +71,7 @@ This document outlines the core best practices and patterns used in our codebase
"The fal-client SDK is required to use FAL models."
) from exc
```

## Pipelines

Always run the cli `pipelex validate` when you are finished writing pipelines: This checks for errors. If there are errors, iterate.
4 changes: 2 additions & 2 deletions .cursor/rules/llms.mdc
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
---
description: Use LLM models with approrpiate settings. Define LLM handles. Define LLM parameters directly in PipeLLM or through presets.
description:
globs:
alwaysApply: false
---
Expand Down Expand Up @@ -41,7 +41,7 @@ Here is an example of using an llm_handle to specify which LLM to use in a PipeL
PipeLLM = "Write text about Hello World."
output = "Text"
llm = { llm_handle = "gpt-4o-mini", temperature = 0.9, max_tokens = "auto" }
prompt = """
prompt_template = """
Write a haiku about Hello World.
"""
```
Expand Down
31 changes: 31 additions & 0 deletions .cursor/rules/pipe-batch.mdc
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
---
description:
globs:
alwaysApply: false
---
# PipeBatch Controller

The PipeBatch controller allows you to apply a pipe operation to each element in a list of inputs in parallele. It is created via a PipeSequence.

## Usage in TOML Configuration

```toml
[pipe.sequence_with_batch]
PipeSequence = "A Sequence of pipes"
inputs = { input_data = "ConceptName" }
output = "OutputConceptName"
steps = [
{ pipe = "pipe_to_apply", batch_over = "input_list", batch_as = "current_item", result = "batch_results" }
]
```

## Key Parameters

- `pipe`: The pipe operation to apply to each element in the batch
- `batch_over`: The name of the list in the context to iterate over
- `batch_as`: The name to use for the current element in the pipe's context
- `result`: Where to store the results of the batch operation

# Important tip

Always run the cli `pipelex validate` when you are finished writing pipelines: This checks for errors. If there are errors, iterate.
50 changes: 50 additions & 0 deletions .cursor/rules/pipe-condition.mdc
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
---
description:
globs:
alwaysApply: false
---
# PipeCondition Controller

The PipeCondition controller allows you to implement conditional logic in your pipeline, choosing which pipe to execute based on an evaluated expression. It supports both direct expressions and expression templates.

## Usage in TOML Configuration

### Basic Usage with Direct Expression

```toml
[pipe.conditional_operation]
PipeCondition = "A conditonal pipe to decide wheter..."
inputs = { input_data = "CategoryInput" }
output = "native.Text"
expression = "input_data.category"

[pipe.conditional_operation.pipe_map]
small = "process_small"
medium = "process_medium"
large = "process_large"
```
or
```toml
[pipe.conditional_operation]
PipeCondition = "A conditonal pipe to decide wheter..."
inputs = { input_data = "CategoryInput" }
output = "native.Text"
expression_template = "{{ input_data.category }}" # Jinja2 code

[pipe.conditional_operation.pipe_map]
small = "process_small"
medium = "process_medium"
large = "process_large"
```

## Key Parameters

- `expression`: Direct boolean or string expression (mutually exclusive with expression_template)
- `expression_template`: Jinja2 template for more complex conditional logic (mutually exclusive with expression)
- `pipe_map`: Dictionary mapping expression results to pipe codes :
1 - The key on the left (`small`, `medium`) is the result of `expression` or `expression_template`.
2 - The value on the right (`process_small`, `process_medium`, ..) is the name of the pipce to trigger

# Important tip

Always run the cli `pipelex validate` when you are finished writing pipelines: This checks for errors. If there are errors, iterate.
Empty file added .cursor/rules/pipe-func.mdc
Empty file.
5 changes: 5 additions & 0 deletions .cursor/rules/pipe-imgg.mdc
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
---
description:
globs:
alwaysApply: false
---
107 changes: 107 additions & 0 deletions .cursor/rules/pipe-llm.mdc
Original file line number Diff line number Diff line change
@@ -0,0 +1,107 @@
---
description:
globs:
alwaysApply: false
---
# PipeLLM Guide

## Purpose

PipeLLM is used to:
1. Generate text or objects with LLMs
2. Process images with Vision LLMs

## Basic Usage

### Simple Text Generation
```toml
[pipe.write_story]
PipeLLM = "Write a short story"
output = "Text"
prompt_template = """
Write a short story about a programmer.
"""
```

### Structured Data Extraction
```toml
[pipe.extract_info]
PipeLLM = "Extract information"
inputs = { text = "Text" }
output = "PersonInfo"
prompt_template = """
Extract person information from this text:
@text
"""
```

### Where to Put Structured Objects
Place your Pydantic models in `pipelex_libraries/pipelines/your_models.py`:

```python
from pipelex.core.stuff_content import StructuredContent

class PersonInfo(StructuredContent): # The output models always have to be subclass of StructuredContent
name: str
age: int
email: str
```

## Advanced Features

### LLM Settings

You can specify LLM settings in two ways:

1. **Direct in the pipe**:
```toml
[pipe.analyze]
PipeLLM = "Analyze text"
output = "Analysis"
llm = { llm_handle = "gpt-4", temperature = 0.7 }
prompt_template = "Analyze this text"
```

2. **Using predefined settings** from `pipelex_libraries/llm_deck/base_llm_deck.toml`:
```toml
[pipe.analyze]
PipeLLM = "Analyze text"
output = "Analysis"
llm = "llm_for_analysis" # References a preset from llm_deck
prompt_template = "Analyze this text"
```

### System Prompts
Add system-level instructions:
```toml
[pipe.expert_analysis]
PipeLLM = "Expert analysis"
output = "Analysis"
system_prompt = "You are a data analysis expert"
prompt_template = "Analyze this data"
```

### Multiple Outputs
Generate multiple results:
```toml
[pipe.generate_ideas]
PipeLLM = "Generate ideas"
output = "Idea"
nb_output = 3 # Generate exactly 3 ideas
# OR
multiple_output = true # Let the LLM decide how many to generate
```

### Vision Tasks
Process images with VLMs:
```toml
[pipe.analyze_image]
PipeLLM = "Analyze image"
inputs = { image = "Image" } # `image` is the name of the stuff that contains the Image. If its in a stuff, you can add something like `{ "page.image": "Image" }
output = "ImageAnalysis"
prompt_template = "Describe what you see in this image"
```

# Important tip

Always run the cli `pipelex validate` when you are finished writing pipelines: This checks for errors. If there are errors, iterate.
39 changes: 39 additions & 0 deletions .cursor/rules/pipe-ocr.mdc
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
---
description:
globs:
alwaysApply: false
---
# PipeOCR Guide

## Purpose

Extract text and images from an image or a PDF

## Basic Usage

### Simple Text Generation
```toml
[pipe.extract_info]
PipeOcr = "extract the information"
inputs = { ocr_input = "PDF" } # or { ocr_input = "Image" } if its an image. This is the only input
output = "Page"
```

The output concept `Page` is a native concept, with the structure `PageContent`:
It corresponds to 1 page. Therefore, the PipeOcr is outputing a `ListContent` of `Page`

```python
class TextAndImagesContent(StuffContent):
text: Optional[TextContent]
images: Optional[List[ImageContent]]

class PageContent(StructuredContent):
text_and_images: TextAndImagesContent
page_view: Optional[ImageContent] = None
```
- `text_and_images` are the text, and the related images found in the input image or PDF.
- `page_view` is the screenshot of the whole pdf page/image.

# Important tip

Always run the cli `pipelex validate` when you are finished writing pipelines: This checks for errors. If there are errors, iterate.
5 changes: 5 additions & 0 deletions .cursor/rules/pipe-parallel.mdc
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
---
description:
globs:
alwaysApply: false
---
58 changes: 58 additions & 0 deletions .cursor/rules/pipe-sequence.mdc
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
---
description:
globs:
alwaysApply: false
---
# PipeSequence Guide

## Purpose
PipeSequence executes multiple pipes in a defined order, where each step can use results from previous steps.

## Basic Structure
```toml
[pipe.your_sequence_name]
PipeSequence = "Description of what this sequence does"
inputs = { input_name = "InputType" } # All the inputs of the sub pipes, except the ones generated by intermediate steps
output = "OutputType"
steps = [
{ pipe = "first_pipe", result = "first_result" },
{ pipe = "second_pipe", result = "second_result" },
{ pipe = "final_pipe", result = "final_result" }
]
```

## Key Components

1. **Steps Array**: List of pipes to execute in sequence
- `pipe`: Name of the pipe to execute
- `result`: Name to assign to the pipe's output that will be in the working memory

2. **Working Memory**: Each step can access:
- Original sequence inputs
- Results from previous steps
- Use the result names in subsequent steps

## Using PipeBatch in Steps

You can use PipeBatch functionality within steps using `batch_over` and `batch_as`:

```toml
steps = [
{ pipe = "process_items", batch_over = "input_list", batch_as = "current_item", result = "processed_items"
}
]
```

1. **batch_over**: Specifies a `ListContent` field to iterate over. Each item in the list will be processed individually and IN PARALLEL by the pipe.
- Must be a `ListContent` type containing the items to process
- Can reference inputs or results from previous steps

2. **batch_as**: Defines the name that will be used to reference the current item being processed
- This name can be used in the pipe's input mappings
- Makes each item from the batch available as a single element

The result of a batched step will be a `ListContent` containing the outputs from processing each item.

# Important tip

Always run the cli `pipelex validate` when you are finished writing pipelines: This checks for errors. If there are errors, iterate.
Loading