You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
feat: add directory parsing support to OpenViking (#194)
* feat: add directory parsing support to OpenViking
- Implemented DirectoryParser to handle local directories with mixed document types.
- Enhanced add_resource function to support directory imports with options for including, excluding, and ignoring specific directories.
- Updated client and service layers to forward additional parsing options.
- Added unit tests for DirectoryParser to ensure correct functionality and error handling.
- Improved user feedback with rich table summaries for processed, failed, unsupported, and skipped files during directory imports.
* docs: update README.md to include directory import instructions for add.py
* style: reformat files to pass CI code formatting
* style: reformat files to pass CI code formatting
Copy file name to clipboardExpand all lines: examples/query/README.md
+31-1Lines changed: 31 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -20,6 +20,35 @@ uv run query.py "What do we have here?" --score-threshold 0.5
20
20
mv data/ data.bak/ # or rm -rf if you want
21
21
```
22
22
23
+
## Add Directory
24
+
25
+
`add.py` supports adding an entire directory of documents at once. Files are automatically classified and parsed by their type (PDF, Markdown, Text, code, etc.). A summary table is printed after import showing which files were processed, failed, unsupported, or filtered.
26
+
27
+
```bash
28
+
# Add all supported files in a directory
29
+
uv run add.py ~/Documents/research/
30
+
31
+
# Only include specific file types
32
+
uv run add.py ~/project/ --include '*.md' --include '*.pdf'
33
+
34
+
# Exclude certain files
35
+
uv run add.py ~/project/ --exclude 'test_*' --exclude '*.pyc'
36
+
37
+
# Skip specific sub-directories
38
+
uv run add.py ~/project/ --ignore-dirs node_modules --ignore-dirs .git
39
+
40
+
# Combine options
41
+
uv run add.py ~/project/ --include '*.md' --exclude 'draft_*' --ignore-dirs vendor
42
+
```
43
+
44
+
### Directory Options
45
+
46
+
| Option | Description |
47
+
|--------|-------------|
48
+
|`--include PATTERN`| Glob pattern for files to include (can be repeated) |
49
+
|`--exclude PATTERN`| Glob pattern for files to exclude (can be repeated) |
50
+
|`--ignore-dirs NAME`| Directory names to skip (can be repeated) |
51
+
23
52
### Query Options
24
53
25
54
| Option | Default | Description |
@@ -50,7 +79,7 @@ Edit `ov.conf` to configure:
50
79
51
80
```
52
81
rag.py # RAG pipeline library
53
-
add.py # Add documents CLI
82
+
add.py # Add documents/directories CLI
54
83
query.py # Query CLI
55
84
q # Quick query wrapper
56
85
logging_config.py # Logging configuration
@@ -64,3 +93,4 @@ data/ # Database storage
64
93
- Use `uv run query.py` for more control
65
94
- Set `OV_DEBUG=1` only when debugging
66
95
- Resources are indexed once, query unlimited times
96
+
- When adding directories, use `--include` / `--exclude` to control which files are imported
0 commit comments