Skip to content

Commit 980a3a6

Browse files
committed
Update the organization
1 parent ea33373 commit 980a3a6

File tree

4 files changed

+156
-136
lines changed

4 files changed

+156
-136
lines changed

Modules/Sources/WordPressIntelligence/README.md

Lines changed: 77 additions & 94 deletions
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@ AI-powered content intelligence for WordPress using Apple Foundation Models.
44

55
## Features
66

7-
- **Excerpt Generation** - Generate 3 excerpt variations in multiple languages and styles
7+
- **Excerpt Generation** - Generate 3 excerpt variations in 8 languages with configurable length/style
88
- **Tag Suggestions** - AI-powered tag recommendations
99
- **Post Summaries** - Automatic content summarization
1010

@@ -20,13 +20,16 @@ let generator = ExcerptGeneration(length: .medium, style: .engaging)
2020
let excerpts = try await generator.generate(for: postContent)
2121
```
2222

23-
**Languages**: English, Spanish, French, German, Japanese, Mandarin, Russian
23+
**Languages**: English, Spanish, French, German, Italian, Portuguese, Japanese, Chinese
2424
**Lengths**: Short (15-35 words), Medium (40-80 words), Long (90-130 words)
2525
**Styles**: Engaging, Professional, Conversational, Formal, Witty
2626

2727
## Testing
2828

29-
### Standard Tests
29+
### Standard XCTest
30+
31+
Run standard tests that verify language, length, and diversity:
32+
3033
```bash
3134
cd Modules
3235
xcodebuild test \
@@ -35,126 +38,106 @@ xcodebuild test \
3538
-only-testing:WordPressIntelligenceTests
3639
```
3740

38-
Tests excerpt generation for 8 languages with automatic language verification, length compliance, and performance measurement. Tests emit both formatted output and structured JSON markers for optional Claude-based evaluation.
41+
### Quality Evaluation
3942

40-
### Quality Evaluation (Optional)
43+
Evaluate AI-generated content quality using Claude scoring. Requires [Claude CLI](https://github.com/anthropics/claude-cli).
4144

42-
Evaluate AI-generated content quality with Claude CLI:
45+
**Location**: `Modules/Tests/WordPressIntelligenceTests/`
4346

4447
```bash
45-
# Setup (one-time)
46-
pip install claude-cli && claude configure
47-
48-
# Run evaluations
48+
# Quick start
4949
cd Modules/Tests/WordPressIntelligenceTests
50-
51-
# Run all excerpt tests (default)
52-
./evaluate-with-claude.sh
53-
54-
# Run tag suggestion tests
55-
./evaluate-with-claude.sh --test-type tags
56-
57-
# Run post summary tests
58-
./evaluate-with-claude.sh --test-type summary
59-
60-
# Run a specific test
61-
./evaluate-with-claude.sh --only-testing "ExcerptGenerationTests/excerptGenerationEnglish(parameters:)"
62-
./evaluate-with-claude.sh --test-type tags --only-testing "IntelligenceSuggestedTagsTests/testEnglishPost()"
63-
./evaluate-with-claude.sh --test-type summary --only-testing "PostSummaryTests/testEnglishBlogPost()"
50+
make # Show all available commands
51+
make eval # Run full evaluation (all test types)
52+
make eval-quick # Run English excerpt evaluation
53+
make eval TESTS="excerpts" # Run only excerpt tests
54+
make eval TESTS="excerpts tags" # Run excerpt and tag tests
55+
make eval-tags # Evaluate tag suggestions
56+
make eval-summary # Evaluate post summaries
57+
make open # Open latest HTML report
6458
```
6559

66-
#### Options
60+
**Common targets**:
61+
- `make eval` - Run full evaluation for all test types (excerpts, tags, summary)
62+
- `make eval TESTS="excerpts"` - Run only specific test types
63+
- `make eval-quick` - Fast evaluation (English excerpts only)
64+
- `make rebuild-improve` - Regenerate HTML with mock improvements (for UI development)
65+
- `make open` - Open latest evaluation report
66+
- `make help` - Show all available commands
6767

68-
| Option | Values | Description |
69-
|--------|--------|-------------|
70-
| `--test-type` | `excerpts`, `tags`, `summary` | Test type to run (default: `excerpts`) |
71-
| `--model` | `sonnet`, `opus`, `haiku` | Claude model for evaluation (default: `sonnet`) |
72-
| `--simulator` | Simulator name | iOS simulator (default: `iPhone 16 Pro`) |
73-
| `--only-testing` | Test identifier | Run specific test only |
74-
| `--skip-tests` | - | Skip test execution, re-evaluate previous results |
68+
For advanced options and HTML report development, see:
69+
- `Modules/Tests/WordPressIntelligenceTests/Makefile`
70+
- `Modules/Tests/WordPressIntelligenceTests/lib/DEVELOPMENT.md`
7571

76-
#### Examples
72+
### Evaluation Output
7773

78-
```bash
79-
# Use Claude Opus for evaluation
80-
./evaluate-with-claude.sh --model opus
74+
Results are saved to `/tmp/WordPressIntelligence-Tests/evaluation-<timestamp>/`:
8175

82-
# Use different simulator
83-
./evaluate-with-claude.sh --simulator "iPhone 15"
76+
- **`evaluation-report.html`** - Interactive report with filtering, sorting, baseline comparison
77+
- **`evaluation-results.json`** - Machine-readable data for CI/CD
78+
- Console output with quick summary
8479

85-
# Re-evaluate existing results
86-
./evaluate-with-claude.sh --skip-tests
87-
```
80+
**HTML Report Features**:
81+
- Sortable columns (test name, status, score, duration)
82+
- Filter by language, status, or comparison results
83+
- Baseline comparison with delta indicators (↑ improved, ↓ regressed, = unchanged)
84+
- Click any test to see detailed scores, generated content, and Claude feedback
85+
- Score distribution dots (●●●) show pass/warn/fail for each excerpt
8886

89-
#### Evaluation Output
87+
### Scoring
9088

91-
Results saved to `/tmp/WordPressIntelligence-Tests/evaluation-<timestamp>/`:
89+
Quality scores use weighted criteria (1-10 scale):
9290

93-
- **`evaluation-results.json`** - Machine-readable data for CI/CD integration
94-
- **`evaluation-report.html`** - Interactive report with filters, sorting, and baseline comparison
95-
- **Console output** - Quick summary with statistics and category averages
91+
**Excerpt Generation**:
92+
- Language Match (3.0×), Grammar (2.0×), Relevance (2.0×) - critical factors
93+
- Hook Quality (1.5×), Key Info (1.5×), Length, Style, Standalone, Engagement (1.0× each)
94+
- Diversity: structural, angle, length, lexical variation
9695

97-
Open HTML report: `open /tmp/WordPressIntelligence-Tests/evaluation-*/evaluation-report.html`
96+
**Pass criteria**: Overall ≥ 7.0 AND no critical failures
97+
**Needs Improvement**: 6.0-6.9 OR any score < 4.0
98+
**Failed**: Language < 8.0 OR Grammar < 6.0 OR Overall < 6.0
9899

99-
#### HTML Report Features
100+
*Note: Tag and summary evaluations use different criteria optimized for their use cases.*
100101

101-
The interactive HTML report displays evaluation results in a sortable table:
102+
## Extending Tests
102103

103-
**Table Columns:**
104-
- **Test Name** - Test identifier and parameters
105-
- **Status** - Pass/fail badge with score distribution indicators
106-
- For excerpt tests: colored dots (●●●) show individual excerpt scores
107-
- Green (●): score ≥ 7.0 (passed)
108-
- Yellow (●): 6.0 ≤ score < 7.0 (needs improvement)
109-
- Red (●): score < 6.0 (failed)
110-
- Hover over dots to see individual scores
111-
- **Score** - Average score across all excerpts/attempts
112-
- **Δ Baseline** - Score change vs. baseline (shown only when comparison enabled)
113-
- Green (↑): improvement
114-
- Red (↓): regression
115-
- Gray (=): unchanged
116-
- **Duration** - Test execution time
104+
### Adding Test Cases
117105

118-
**Baseline Comparison:**
119-
1. Load a baseline JSON file using the file picker
120-
2. Table automatically shows the Δ Baseline column with score deltas
121-
3. Click "Clear Comparison" to hide the baseline column and return to single-report view
106+
1. Add test data to `lib/config.py`:
107+
```python
108+
"new_test_case": TestConfig(
109+
original_content="...",
110+
language="english",
111+
# ... other parameters
112+
)
113+
```
122114

123-
## Evaluating Results
115+
2. Update `Makefile` if adding new test type:
116+
```makefile
117+
eval-newtype:
118+
@./lib/evaluate-with-claude.sh --test-type newtype
119+
```
124120

125-
**Standard Tests** - Automatic verification for excerpts:
126-
- ✅ Language matches input (via NLLanguageRecognizer)
127-
- ✅ Word count within target range (with warnings for minor deviations)
128-
- ✅ 3 diverse variations generated (Levenshtein distance ≥ 15%)
129-
- ✅ HTML properly stripped
121+
### Customizing Evaluation Criteria
130122

131-
**Evaluation Tests** - Weighted quality scores (1-10, example for excerpts):
123+
Edit scoring logic in `lib/evaluators.py`. Each test type has its own evaluator class with weighted criteria and thresholds.
132124

133-
Status thresholds:
134-
-**Passed**: Overall ≥ 7.0 AND no critical failures
135-
- ⚠️ **Needs Improvement**: 6.0 ≤ Overall < 7.0 OR any score < 4.0
136-
-**Failed**: Language < 8.0 OR Grammar < 6.0 OR Overall < 6.0
125+
### Developing HTML Report
137126

138-
Weighted criteria (higher weight = more important):
139-
- **Language Match** (3.0×) - Correct language (critical)
140-
- **Grammar** (2.0×) - No grammatical errors (critical)
141-
- **Relevance** (2.0×) - Captures main message
142-
- **Hook Quality** (1.5×) - Enticing opening
143-
- **Key Info** (1.5×) - Preserves critical facts
144-
- Length, Style, Standalone, Engagement (1.0× each)
127+
For fast iteration on HTML report UI without re-running tests:
145128

146-
Diversity evaluation:
147-
- Structural (opening styles)
148-
- Angle (different emphases)
149-
- Length (sentence variety)
150-
- Lexical (vocabulary variation)
129+
```bash
130+
make rebuild-improve # Regenerate with mock improvements
131+
# Edit lib/evaluation-viewer.html
132+
make rebuild-improve # Instant preview
133+
```
151134

152-
*Note: Tags and summary evaluations use different criteria tailored to their specific requirements.*
135+
See `lib/DEVELOPMENT.md` for complete HTML development workflow.
153136

154137
## Troubleshooting
155138

156139
**Tests skipped**: Missing iOS 26 or Apple Intelligence support
157-
**Language issues**: Check prompt in `ExcerptGeneration.swift`
158-
**Evaluation fails**: Run `claude configure` to authenticate
140+
**Language issues**: Check prompt in `Sources/WordPressIntelligence/ExcerptGeneration.swift`
141+
**Evaluation fails**: Install/configure Claude CLI: `pip install claude-cli && claude configure`
159142

160-
See `CLAUDE.md` for project guidelines.
143+
See `CLAUDE.md` for project development guidelines.

Modules/Sources/WordPressIntelligence/UseCases/ExcerptGeneration.swift

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -54,8 +54,7 @@ public struct ExcerptGeneration {
5454
\(PromptHelper.makeLocaleInstructions())
5555
5656
**CRITICAL Requirements (MUST be followed exactly)**
57-
1. ⚠️ LANGUAGE: Generate excerpts in the SAME language as POST_CONTENT. If the post is in French, excerpts must be in French.
58-
If in Japanese, excerpts must be in Japanese. NO translation. NO defaulting to English. Match input language EXACTLY.
57+
1. ⚠️ LANGUAGE: Generate excerpts in the SAME language as POST_CONTENT. NO translation. NO defaulting to English. Match input language EXACTLY.
5958
6059
2. ⚠️ LENGTH: Each excerpt MUST match the TARGET_LENGTH specification.
6160
- PRIMARY: Match the sentence count (e.g., "1-2 sentences" means write 1 or 2 complete sentences)
Lines changed: 71 additions & 33 deletions
Original file line numberDiff line numberDiff line change
@@ -1,25 +1,31 @@
1-
.PHONY: help rebuild rebuild-improve rebuild-regress rebuild-subtle rebuild-dramatic open-latest clean
1+
.PHONY: help eval eval-quick eval-tags eval-summary rebuild rebuild-improve rebuild-regress rebuild-subtle rebuild-dramatic open clean
22

3-
# Default target
4-
help:
5-
@echo "HTML Reporter Development Commands"
6-
@echo ""
7-
@echo "Quick rebuild (fast iteration):"
8-
@echo " make rebuild - Rebuild without baseline"
9-
@echo " make rebuild-improve - Rebuild showing improvements"
10-
@echo " make rebuild-regress - Rebuild showing regressions"
11-
@echo " make rebuild-subtle - Rebuild with small changes"
12-
@echo " make rebuild-dramatic - Rebuild with large changes"
13-
@echo ""
14-
@echo "Utilities:"
15-
@echo " make open-latest - Open latest evaluation report"
16-
@echo " make clean - Clean generated reports"
17-
@echo ""
18-
@echo "Full evaluation:"
19-
@echo " make eval - Run full evaluation pipeline"
20-
@echo " make eval-quick - Run English tests only"
3+
# =============================================================================
4+
# Evaluation Targets - Run full evaluations with Claude scoring
5+
# =============================================================================
6+
7+
# Default test types (all of them)
8+
TESTS ?= excerpts tags summary
9+
10+
eval:
11+
@for test_type in $(TESTS); do \
12+
echo "Running evaluation for $$test_type..."; \
13+
./lib/evaluate-with-claude.sh --test-type $$test_type || exit 1; \
14+
done
15+
16+
eval-quick:
17+
@./lib/evaluate-with-claude.sh --only-testing "ExcerptGenerationTests/excerptGenerationEnglish(parameters:)"
18+
19+
eval-tags:
20+
@./lib/evaluate-with-claude.sh --test-type tags
21+
22+
eval-summary:
23+
@./lib/evaluate-with-claude.sh --test-type summary
24+
25+
# =============================================================================
26+
# HTML Report Development - Fast iteration without re-running tests
27+
# =============================================================================
2128

22-
# Quick rebuild commands
2329
rebuild:
2430
@./lib/quick-rebuild-report.sh
2531

@@ -35,26 +41,58 @@ rebuild-subtle:
3541
rebuild-dramatic:
3642
@./lib/quick-rebuild-report.sh --regress --variation 2.0
3743

38-
# Open latest report
39-
open-latest:
40-
@LATEST=$$(ls -t "$(TMPDIR)WordPressIntelligence-Tests/quick-rebuild/evaluation-report.html" 2>/dev/null | head -1); \
44+
# =============================================================================
45+
# Utilities
46+
# =============================================================================
47+
48+
open:
49+
@LATEST=$$(ls -t "$(TMPDIR)WordPressIntelligence-Tests/*/evaluation-report.html" 2>/dev/null | head -1); \
4150
if [ -n "$$LATEST" ]; then \
4251
echo "Opening: $$LATEST"; \
4352
open "$$LATEST"; \
4453
else \
45-
echo "No report found. Run 'make rebuild' first."; \
54+
echo "No report found. Run 'make eval' or 'make rebuild' first."; \
4655
fi
4756

48-
# Clean generated files
4957
clean:
50-
@echo "Cleaning generated reports..."
51-
@rm -rf "$(TMPDIR)WordPressIntelligence-Tests/quick-rebuild"
52-
@rm -f "$(TMPDIR)WordPressIntelligence-Tests/mock-baseline.json"
58+
@echo "Cleaning generated files..."
59+
@rm -rf "$(TMPDIR)WordPressIntelligence-Tests"
5360
@echo "✓ Cleaned"
5461

55-
# Full evaluation
56-
eval:
57-
@./evaluate-with-claude.sh
62+
# =============================================================================
63+
# Help
64+
# =============================================================================
5865

59-
eval-quick:
60-
@./evaluate-with-claude.sh --only-testing "ExcerptGenerationTests/excerptGenerationEnglish(parameters:)"
66+
help:
67+
@echo "┌─────────────────────────────────────────────────────────┐"
68+
@echo "│ WordPressIntelligence Evaluation Test Suite │"
69+
@echo "└─────────────────────────────────────────────────────────┘"
70+
@echo ""
71+
@echo "━━━ Evaluation (Run tests + Claude scoring) ━━━"
72+
@echo ""
73+
@echo " make eval Run all test types (excerpts, tags, summary)"
74+
@echo " make eval TESTS=\"excerpts\" Run only excerpt tests"
75+
@echo " make eval TESTS=\"excerpts tags\" Run excerpt and tag tests"
76+
@echo " make eval-quick Run English excerpt tests only"
77+
@echo " make eval-tags Run tag suggestion tests"
78+
@echo " make eval-summary Run post summary tests"
79+
@echo ""
80+
@echo "━━━ HTML Report Development (No test runs) ━━━"
81+
@echo ""
82+
@echo " make rebuild Rebuild report (no baseline)"
83+
@echo " make rebuild-improve Rebuild with mock improvements"
84+
@echo " make rebuild-regress Rebuild with mock regressions"
85+
@echo " make rebuild-subtle Rebuild with subtle changes (±0.3)"
86+
@echo " make rebuild-dramatic Rebuild with dramatic changes (±2.0)"
87+
@echo ""
88+
@echo "━━━ Utilities ━━━"
89+
@echo ""
90+
@echo " make open Open latest HTML report"
91+
@echo " make clean Clean all generated files"
92+
@echo " make help Show this help"
93+
@echo ""
94+
@echo "For HTML development workflow: lib/DEVELOPMENT.md"
95+
@echo "For evaluation CLI options: ./lib/evaluate-with-claude.sh --help"
96+
@echo ""
97+
98+
.DEFAULT_GOAL := help

0 commit comments

Comments
 (0)