Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
287 changes: 287 additions & 0 deletions docs/contribute/source/plugin/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,287 @@
# wasmedge_ocr Plugin
Copy link

Copilot AI Feb 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The filename "README.md" is inconsistent with other plugin documentation files in this directory. All other plugin documentation files follow the pattern of using a descriptive name (e.g., "ebpf.md", "process.md", "rusttls.md", "wasi_logging.md"). This file should be renamed to "wasmedge_ocr.md" to follow the established convention in the codebase.

References:

  • docs/contribute/source/plugin/ebpf.md
  • docs/contribute/source/plugin/process.md
  • docs/contribute/source/plugin/rusttls.md
  • docs/contribute/source/plugin/wasi_logging.md

Copilot uses AI. Check for mistakes.
Copy link

Copilot AI Feb 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing frontmatter metadata. All other plugin documentation files in this directory include YAML frontmatter with at least a "sidebar_position" field. This file should include the frontmatter at the top of the file (before line 1) to be consistent with the established pattern.

Example from other plugin docs:

---
sidebar_position: X
---

References:

  • docs/contribute/source/plugin/ebpf.md:1-3
  • docs/contribute/source/plugin/process.md:1-3
  • docs/contribute/source/plugin/wasi_logging.md:1-3

Copilot uses AI. Check for mistakes.
Copy link

Copilot AI Feb 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main title format is inconsistent with other plugin documentation files. Other plugin docs use titles that start with "Build..." (e.g., "Build with eBPF Plug-in", "Build WasmEdge With WasmEdge-Process Plug-in", "Build WasmEdge With WASI-Logging Plug-in"). This title should follow a similar pattern, such as "Build WasmEdge With wasmedge_ocr Plug-in" to maintain consistency.

References:

  • docs/contribute/source/plugin/ebpf.md:5
  • docs/contribute/source/plugin/process.md:5
  • docs/contribute/source/plugin/wasi_logging.md:5

Copilot uses AI. Check for mistakes.

The `wasmedge_ocr` plugin provides Optical Character Recognition (OCR) capabilities to WasmEdge applications by integrating with the Tesseract OCR engine. It allows WebAssembly modules to extract text and layout information from images located on the host filesystem.
Copy link

Copilot AI Feb 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inconsistent terminology. The established convention in this codebase is to use "plug-in" (hyphenated) when referring to plugins in descriptive text. The word "plugin" (unhyphenated) should be changed to "plug-in" throughout the document for consistency, except in technical contexts like environment variable names (WASMEDGE_PLUGIN_PATH), file paths, or code identifiers where the unhyphenated form is required.

References:

  • docs/contribute/source/plugin/ebpf.md:7
  • docs/contribute/source/plugin/process.md:7
  • docs/contribute/source/plugin/wasi_logging.md:6

Copilot uses AI. Check for mistakes.

## Overview

This plugin exposes host functions that enable Wasm modules to:
1. Trigger OCR processing on a specified image file.
2. Retrieve the results in TSV (Tab-Separated Values) format, which includes recognized text, confidence scores, and bounding box coordinates.

### Quick Start

Get the plugin running immediately with these steps.

1. **Install Dependencies**
- **Linux (Ubuntu/Debian)**:
```bash
sudo apt-get install libtesseract-dev libleptonica-dev tesseract-ocr-eng
```
- **macOS**:
```bash
brew install tesseract leptonica
```

2. **Build Plugin**
Copy link

Copilot AI Feb 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inconsistent capitalization in section heading. The word "Plugin" should be "Plug-in" (hyphenated and lowercase 'l') to match the established convention in other plugin documentation files.

References:

  • docs/contribute/source/plugin/ebpf.md:9 ("Build the eBPF Plug-in")
  • docs/contribute/source/plugin/process.md:13 ("Build WasmEdge with WasmEdge-Process Plug-in")

Copilot uses AI. Check for mistakes.
```bash
# From WasmEdge root
cmake -DWASMEDGE_PLUGIN_WASMEDGE_OCR:BOOL=TRUE -B ./build -G "Unix Makefiles"
cmake --build ./build
```

3. **Set Plugin Path**
```bash
export WASMEDGE_PLUGIN_PATH=$(pwd)/build/plugins/wasmedge_ocr/
```

4. **Run**
```bash
wasmedge app.wasm
```

### Intended Use Cases
- Extracting text from artifacts (scanned documents, photos).
- Getting bounding box coordinates for text in images (layout analysis).
- processing images where the file resides on the host system.
Copy link

Copilot AI Feb 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inconsistent capitalization. The word "processing" should be capitalized to match the style and parallel structure of the other items in this list.

Suggested change
- processing images where the file resides on the host system.
- Processing images where the file resides on the host system.

Copilot uses AI. Check for mistakes.

### Supported Image Formats
The plugin uses Leptonica for image loading, supporting formats such as:
- PNG
- JPEG
- TIFF
- BMP
- GIF
- WebP
- PNM

## Architecture

The plugin links against `libtesseract` and `libleptonica`. It exposes a module named `wasmedge_ocr` with two stateful host functions.

### Integration
- **Direct Host Access**: The plugin follows a "host-passthrough" model where the Wasm module provides a file path string. The plugin reads this file directly from the **host filesystem**, effectively bypassing WASI file system sandboxing for the input file.
- **Stateful Execution**: The process requires a two-step call sequence: triggering the extraction and then fetching the result.
- **Output Format**: Hardcoded to return TSV data (level `RIL_WORD`), providing detailed word-level data.

> [!WARNING]
> **Security Notice**: This plugin accesses files on the **Host Filesystem** using direct paths provided by the Wasm module. This explicitly **bypasses the WASI sandbox** isolation. Only use this plugin with properly reviewed and trusted Wasm modules, as they can probe for files on your host system.
Comment on lines +66 to +67
Copy link

Copilot AI Feb 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Inconsistent admonition syntax. This documentation uses Docusaurus, which has a different syntax for admonitions. The GitHub-style alert syntax (> [!WARNING]) should be changed to Docusaurus-style admonition syntax for consistency with other documentation files.

The correct format should be:

:::warning
**Security Notice**: This plug-in accesses files on the **Host Filesystem** using direct paths provided by the Wasm module. This explicitly **bypasses the WASI sandbox** isolation. Only use this plug-in with properly reviewed and trusted Wasm modules, as they can probe for files on your host system.
:::

References:

  • docs/contribute/source/plugin/wasi_logging.md:10-13
  • docs/contribute/source/plugin/process.md:26-28

Copilot uses AI. Check for mistakes.

### Dependencies

To build and run this plugin, the host system must have the development libraries for Tesseract and Leptonica installed, along with the English language data for Tesseract.

#### Linux (Debian/Ubuntu)
```bash
sudo apt-get install libtesseract-dev libleptonica-dev tesseract-ocr-eng
```

#### macOS
```bash
brew install tesseract leptonica
```

#### Minimum Versions
- **Tesseract**: 4.x or higher (requires `libtesseract`).
- **Leptonica**: Compatible version with the installed Tesseract.

## Build Instructions

This plugin is built as part of the WasmEdge project source tree.

### 1. Enable the Plugin in CMake
When configuring the WasmEdge build, you must enable the OCR plugin using the `WASMEDGE_PLUGIN_WASMEDGE_OCR` flag.

```bash
cmake -DWASMEDGE_PLUGIN_WASMEDGE_OCR:BOOL=TRUE -B ./build -G "Unix Makefiles"
```

### 2. Build WasmEdge
```bash
cmake --build ./build
```

### 3. Verify Build
After building, check that the plugin library exists:
```bash
ls ./build/plugins/wasmedge_ocr/libwasmedgePluginWasmEdgeOCR.so
# On macOS, it will be .dylib
```

## API Reference

**Module Name**: `wasmedge_ocr`

### API Call Flow
This API is **stateful** and must be called in a specific order:
1. **Initialize & Process**: Call `num_of_extractions` with the image path. This initializes Tesseract and runs the recognition.
2. **Buffer Preparation**: The return value tells you how many bytes to allocate.
3. **Retrieve Data**: Call `get_output` to copy the data into your buffer.
4. **Cleanup**: `get_output` automatically calls `TesseractApi->End()`, cleaning up resources. You cannot call it again for the same image.

### 1. `num_of_extractions`

Triggers the OCR process on the image and returns the length of the result string.

```wasm
(func $num_of_extractions (param i32 i32) (result i32))
```

- **Parameters**:
- `image_path_ptr` (i32): Pointer to the null-terminated string containing the absolute or relative path to the image file on the host.
- `image_path_len` (i32): Length of the file path string.
- **Returns**:
- `(i32)`: The length (in bytes) of the generated TSV output string. Returns 0 or error code if it fails (internal logic usually returns length).

### 2. `get_output`

Retrieves the TSV output buffer generated by the previous call to `num_of_extractions`.

```wasm
(func $get_output (param i32 i32) (result i32))
```

- **Parameters**:
- `out_buf_ptr` (i32): Pointer to the memory buffer where the result should be written.
- `max_len` (i32): Maximum size of the buffer (should be at least the size returned by `num_of_extractions`).
- **Returns**:
- `(i32)`: Returns 0 (`ErrNo::Success`) on success. Returns error codes otherwise.
- **Side Effect**:
- **Clears State**: This function calls `TesseractApi->End()`, which cleans up the Tesseract instance. You cannot call `get_output` multiple times for the same extraction.

### TSV Output Format
The `get_output` function returns raw TSV (Tab-Separated Values) data generated by Tesseract at the `RIL_WORD` (Word) level.

| Column | Description |
| :--- | :--- |
| **level** | Hierarchy level (always 5 for Word) |
| **page_num** | Page number in the document |
| **block_num** | Block number |
| **par_num** | Paragraph number |
| **line_num** | Line number |
| **word_num** | Word number |
| **left** | X coordinate of the top-left corner |
| **top** | Y coordinate of the top-left corner |
| **width** | Width of the bounding box |
| **height** | Height of the bounding box |
| **conf** | Confidence score (0-100) |
| **text** | The recognized text string |

## Usage Examples

### Step-by-Step Workflow

1. **Prepare Image**: Have an image file (e.g., `test.png`) on the host.
2. **Call `num_of_extractions`**: Pass the path to the image. Receive the result length.
3. **Allocate Memory**: Create a buffer of the received length.
4. **Call `get_output`**: Pass the buffer pointer and length to retrieve data.

### Rust Example

```rust
#[link(wasm_import_module = "wasmedge_ocr")]
extern "C" {
pub fn num_of_extractions(path_ptr: *const u8, path_len: usize) -> u32;
pub fn get_output(out_ptr: *mut u8, max_len: usize) -> u32;
}

pub fn main() {
let image_path = "test.png";

unsafe {
// 1. Trigger OCR and get result length
let len = num_of_extractions(image_path.as_ptr(), image_path.len());

if len > 0 {
// 2. Allocate buffer
let mut buf = vec![0u8; len as usize];

// 3. Retrieve output
let res = get_output(buf.as_mut_ptr(), len as usize);

if res == 0 {
let output = String::from_utf8_lossy(&buf);
println!("OCR Result (TSV):\n{}", output);
} else {
eprintln!("Failed to get output, error code: {}", res);
}
}
}
}
```

### Execution

Run the compiled Wasm file using the WasmEdge CLI with the plugin paths set.

```bash
# Set plugin path if installed in a custom location, otherwise default is used
export WASMEDGE_PLUGIN_PATH=./build/plugins/wasmedge_ocr/

# Run the wasm file
wasmedge app.wasm
```

## Performance & Limitations

### 1. Language Support
- **English Only**: The plugin hardcodes the initialization language to `"eng"`. It requires `tesseract-ocr-eng` data to be present on the host. Multi-language or custom trained data selection is **not** currently exposed via the API.

### 2. Output Format
- **TSV Fixed**: The output is strictly Tesseract's TSV format (Tab-Separated Values). It contains word confidence, bounding boxes, and text. Plain text extraction is not directly provided as a separate option; user must parse the TSV.

### 3. One-Shot Lifecycle
- The `get_output` function calls `TesseractApi->End()`. This destroys the Tesseract instance associated with the module environment.
- **Implication**: If you need to process multiple images, the current implementation might require re-initializing the module or might fail if the environment does not re-initialize automatically (the current code only initializes in the constructor). A safe approach is to treat the module instance as single-use or test behavior for sequential calls carefully.

### 4. File Access
- The plugin uses `pixRead` with the path provided. This file must exist on the **Host Filesystem** where WasmEdge is running. It does not read from the Wasm virtual filesystem (WASI). If running in a container, map the image file into the container.

### Compatibility Matrix

| Feature | Support | Notes |
| :--- | :--- | :--- |
| **Interpreter Mode** | ✅ Supported | Standard execution |
| **AOT / JIT** | ✅ Supported | Validated on x86_64, aarch64 |
| **WASI Filesystem** | ❌ Not Supported | Files read directly from Host FS |
| **Host OS** | Linux, macOS | Windows support experimental/untested |

## Common Pitfalls

* **Calling `get_output` twice**: Will cause a crash or undefined behavior because the Tesseract instance is destroyed after the first call.
* **Reuse of Module Instance**: The module is designed for single-use per Tesseract session. Re-instantiate the module for processing new images if you encounter issues.
* **Relative Paths**: Paths are relative to the *working directory of the `wasmedge` process*, not the Wasm file location.
* **Missing Data**: Forgetting to install `tesseract-ocr-eng` will cause silent initialization failures (calls return 0).

## Troubleshooting

### Common Build Failures

**1. `Tesseract` or `Leptonica` not found**
- **Error**: `Could NOT find Tesseract (missing: TESSERACT_LIBRARIES TESSERACT_INCLUDE_DIRS)`
- **Fix**: Ensure development headers are installed.
- Ubuntu: `sudo apt install libtesseract-dev libleptonica-dev`
- macOS: `brew install tesseract leptonica`
- **Fix**: If using custom paths, set `PKG_CONFIG_PATH` to help CMake find the libraries.

### Runtime Failure Modes

**1. Initialization Error (Error Code 1 or 2)**
- **Symptom**: `num_of_extractions` returns 0 or logs `[WasmEdge-OCR] Error occurred when initializing tesseract.`
- **Cause**: Missing `tessdata` (specifically `eng.traineddata`).
- **Fix**: Install the language data packages (`sudo apt install tesseract-ocr-eng`) or set the `TESSDATA_PREFIX` environment variable to the directory containing `eng.traineddata`.

**2. File Not Found**
- **Symptom**: `pixRead` fails, `num_of_extractions` might return 0 or unexpected length.
- **Cause**: The path provided is relative to the *host's* current working directory, not the Wasm file location.
- **Fix**: Use absolute paths for images or ensure the WasmEdge runner is executed from the correct directory.

**3. "Symbol not found" when running Wasm**
- **Cause**: The plugin is not loaded.
- **Fix**: Ensure the `WASMEDGE_PLUGIN_PATH` environment variable points to the directory containing `libwasmedgePluginWasmEdgeOCR.so` (or `.dylib`).

## Future Improvements

* Expose Tesseract language selection via API.
* Implement direct memory buffer support (passing image bytes instead of paths).
* Add support for other output formats (text, HOCR, PDF).
* Allow re-initialization of Tesseract engine without destroying module instance.
Loading