Update the organization

kean · kean · commit 980a3a639332 · 2025-12-12T15:19:58.000-05:00
diff --git a/Modules/Sources/WordPressIntelligence/README.md b/Modules/Sources/WordPressIntelligence/README.md
@@ -4,7 +4,7 @@ AI-powered content intelligence for WordPress using Apple Foundation Models.
 
 ## Features
 
-- **Excerpt Generation** - Generate 3 excerpt variations in multiple languages and styles
+- **Excerpt Generation** - Generate 3 excerpt variations in 8 languages with configurable length/style
 - **Tag Suggestions** - AI-powered tag recommendations
 - **Post Summaries** - Automatic content summarization
 
@@ -20,13 +20,16 @@ let generator = ExcerptGeneration(length: .medium, style: .engaging)
 let excerpts = try await generator.generate(for: postContent)
 ```
 
-**Languages**: English, Spanish, French, German, Japanese, Mandarin, Russian
+**Languages**: English, Spanish, French, German, Italian, Portuguese, Japanese, Chinese
 **Lengths**: Short (15-35 words), Medium (40-80 words), Long (90-130 words)
 **Styles**: Engaging, Professional, Conversational, Formal, Witty
 
 ## Testing
 
-### Standard Tests
+### Standard XCTest
+
+Run standard tests that verify language, length, and diversity:
+
 ```bash
 cd Modules
 xcodebuild test \
@@ -35,126 +38,106 @@ xcodebuild test \
   -only-testing:WordPressIntelligenceTests
 ```
 
-Tests excerpt generation for 8 languages with automatic language verification, length compliance, and performance measurement. Tests emit both formatted output and structured JSON markers for optional Claude-based evaluation.
+### Quality Evaluation
 
-### Quality Evaluation (Optional)
+Evaluate AI-generated content quality using Claude scoring. Requires [Claude CLI](https://github.com/anthropics/claude-cli).
 
-Evaluate AI-generated content quality with Claude CLI:
+**Location**: `Modules/Tests/WordPressIntelligenceTests/`
 
 ```bash
-# Setup (one-time)
-pip install claude-cli && claude configure
-
-# Run evaluations
+# Quick start
 cd Modules/Tests/WordPressIntelligenceTests
-
-# Run all excerpt tests (default)
-./evaluate-with-claude.sh
-
-# Run tag suggestion tests
-./evaluate-with-claude.sh --test-type tags
-
-# Run post summary tests
-./evaluate-with-claude.sh --test-type summary
-
-# Run a specific test
-./evaluate-with-claude.sh --only-testing "ExcerptGenerationTests/excerptGenerationEnglish(parameters:)"
-./evaluate-with-claude.sh --test-type tags --only-testing "IntelligenceSuggestedTagsTests/testEnglishPost()"
-./evaluate-with-claude.sh --test-type summary --only-testing "PostSummaryTests/testEnglishBlogPost()"
+make                    # Show all available commands
+make eval               # Run full evaluation (all test types)
+make eval-quick         # Run English excerpt evaluation
+make eval TESTS="excerpts"       # Run only excerpt tests
+make eval TESTS="excerpts tags"  # Run excerpt and tag tests
+make eval-tags          # Evaluate tag suggestions
+make eval-summary       # Evaluate post summaries
+make open               # Open latest HTML report
 ```
 
-#### Options
+**Common targets**:
+- `make eval` - Run full evaluation for all test types (excerpts, tags, summary)
+- `make eval TESTS="excerpts"` - Run only specific test types
+- `make eval-quick` - Fast evaluation (English excerpts only)
+- `make rebuild-improve` - Regenerate HTML with mock improvements (for UI development)
+- `make open` - Open latest evaluation report
+- `make help` - Show all available commands
 
-| Option | Values | Description |
-|--------|--------|-------------|
-| `--test-type` | `excerpts`, `tags`, `summary` | Test type to run (default: `excerpts`) |
-| `--model` | `sonnet`, `opus`, `haiku` | Claude model for evaluation (default: `sonnet`) |
-| `--simulator` | Simulator name | iOS simulator (default: `iPhone 16 Pro`) |
-| `--only-testing` | Test identifier | Run specific test only |
-| `--skip-tests` | - | Skip test execution, re-evaluate previous results |
+For advanced options and HTML report development, see:
+- `Modules/Tests/WordPressIntelligenceTests/Makefile`
+- `Modules/Tests/WordPressIntelligenceTests/lib/DEVELOPMENT.md`
 
-#### Examples
+### Evaluation Output
 
-```bash
-# Use Claude Opus for evaluation
-./evaluate-with-claude.sh --model opus
+Results are saved to `/tmp/WordPressIntelligence-Tests/evaluation-<timestamp>/`:
 
-# Use different simulator
-./evaluate-with-claude.sh --simulator "iPhone 15"
+- **`evaluation-report.html`** - Interactive report with filtering, sorting, baseline comparison
+- **`evaluation-results.json`** - Machine-readable data for CI/CD
+- Console output with quick summary
 
-# Re-evaluate existing results
-./evaluate-with-claude.sh --skip-tests
-```
+**HTML Report Features**:
+- Sortable columns (test name, status, score, duration)
+- Filter by language, status, or comparison results
+- Baseline comparison with delta indicators (↑ improved, ↓ regressed, = unchanged)
+- Click any test to see detailed scores, generated content, and Claude feedback
+- Score distribution dots (●●●) show pass/warn/fail for each excerpt
 
-#### Evaluation Output
+### Scoring
 
-Results saved to `/tmp/WordPressIntelligence-Tests/evaluation-<timestamp>/`:
+Quality scores use weighted criteria (1-10 scale):
 
-- **`evaluation-results.json`** - Machine-readable data for CI/CD integration
-- **`evaluation-report.html`** - Interactive report with filters, sorting, and baseline comparison
-- **Console output** - Quick summary with statistics and category averages
+**Excerpt Generation**:
+- Language Match (3.0×), Grammar (2.0×), Relevance (2.0×) - critical factors
+- Hook Quality (1.5×), Key Info (1.5×), Length, Style, Standalone, Engagement (1.0× each)
+- Diversity: structural, angle, length, lexical variation
 
-Open HTML report: `open /tmp/WordPressIntelligence-Tests/evaluation-*/evaluation-report.html`
+**Pass criteria**: Overall ≥ 7.0 AND no critical failures
+**Needs Improvement**: 6.0-6.9 OR any score < 4.0
+**Failed**: Language < 8.0 OR Grammar < 6.0 OR Overall < 6.0
 
-#### HTML Report Features
+*Note: Tag and summary evaluations use different criteria optimized for their use cases.*
 
-The interactive HTML report displays evaluation results in a sortable table:
+## Extending Tests
 
-**Table Columns:**
-- **Test Name** - Test identifier and parameters
-- **Status** - Pass/fail badge with score distribution indicators
-  - For excerpt tests: colored dots (●●●) show individual excerpt scores
-    - Green (●): score ≥ 7.0 (passed)
-    - Yellow (●): 6.0 ≤ score < 7.0 (needs improvement)
-    - Red (●): score < 6.0 (failed)
-  - Hover over dots to see individual scores
-- **Score** - Average score across all excerpts/attempts
-- **Δ Baseline** - Score change vs. baseline (shown only when comparison enabled)
-  - Green (↑): improvement
-  - Red (↓): regression
-  - Gray (=): unchanged
-- **Duration** - Test execution time
+### Adding Test Cases
 
-**Baseline Comparison:**
-1. Load a baseline JSON file using the file picker
-2. Table automatically shows the Δ Baseline column with score deltas
-3. Click "Clear Comparison" to hide the baseline column and return to single-report view
+1. Add test data to `lib/config.py`:
+```python
+"new_test_case": TestConfig(
+    original_content="...",
+    language="english",
+    # ... other parameters
+)
+```
 
-## Evaluating Results
+2. Update `Makefile` if adding new test type:
+```makefile
+eval-newtype:
+    @./lib/evaluate-with-claude.sh --test-type newtype
+```
 
-**Standard Tests** - Automatic verification for excerpts:
-- ✅ Language matches input (via NLLanguageRecognizer)
-- ✅ Word count within target range (with warnings for minor deviations)
-- ✅ 3 diverse variations generated (Levenshtein distance ≥ 15%)
-- ✅ HTML properly stripped
+### Customizing Evaluation Criteria
 
-**Evaluation Tests** - Weighted quality scores (1-10, example for excerpts):
+Edit scoring logic in `lib/evaluators.py`. Each test type has its own evaluator class with weighted criteria and thresholds.
 
-Status thresholds:
-- ✅ **Passed**: Overall ≥ 7.0 AND no critical failures
-- ⚠️ **Needs Improvement**: 6.0 ≤ Overall < 7.0 OR any score < 4.0
-- ❌ **Failed**: Language < 8.0 OR Grammar < 6.0 OR Overall < 6.0
+### Developing HTML Report
 
-Weighted criteria (higher weight = more important):
-- **Language Match** (3.0×) - Correct language (critical)
-- **Grammar** (2.0×) - No grammatical errors (critical)
-- **Relevance** (2.0×) - Captures main message
-- **Hook Quality** (1.5×) - Enticing opening
-- **Key Info** (1.5×) - Preserves critical facts
-- Length, Style, Standalone, Engagement (1.0× each)
+For fast iteration on HTML report UI without re-running tests:
 
-Diversity evaluation:
-- Structural (opening styles)
-- Angle (different emphases)
-- Length (sentence variety)
-- Lexical (vocabulary variation)
+```bash
+make rebuild-improve    # Regenerate with mock improvements
+# Edit lib/evaluation-viewer.html
+make rebuild-improve    # Instant preview
+```
 
-*Note: Tags and summary evaluations use different criteria tailored to their specific requirements.*
+See `lib/DEVELOPMENT.md` for complete HTML development workflow.
 
 ## Troubleshooting
 
 **Tests skipped**: Missing iOS 26 or Apple Intelligence support
-**Language issues**: Check prompt in `ExcerptGeneration.swift`
-**Evaluation fails**: Run `claude configure` to authenticate
+**Language issues**: Check prompt in `Sources/WordPressIntelligence/ExcerptGeneration.swift`
+**Evaluation fails**: Install/configure Claude CLI: `pip install claude-cli && claude configure`
 
-See `CLAUDE.md` for project guidelines.
+See `CLAUDE.md` for project development guidelines.
diff --git a/Modules/Sources/WordPressIntelligence/UseCases/ExcerptGeneration.swift b/Modules/Sources/WordPressIntelligence/UseCases/ExcerptGeneration.swift
@@ -54,8 +54,7 @@ public struct ExcerptGeneration {
         \(PromptHelper.makeLocaleInstructions())
 
         **CRITICAL Requirements (MUST be followed exactly)**
-        1. ⚠️ LANGUAGE: Generate excerpts in the SAME language as POST_CONTENT. If the post is in French, excerpts must be in French.
-           If in Japanese, excerpts must be in Japanese. NO translation. NO defaulting to English. Match input language EXACTLY.
+        1. ⚠️ LANGUAGE: Generate excerpts in the SAME language as POST_CONTENT. NO translation. NO defaulting to English. Match input language EXACTLY.
 
         2. ⚠️ LENGTH: Each excerpt MUST match the TARGET_LENGTH specification.
            - PRIMARY: Match the sentence count (e.g., "1-2 sentences" means write 1 or 2 complete sentences)
diff --git a/Modules/Tests/WordPressIntelligenceTests/Makefile b/Modules/Tests/WordPressIntelligenceTests/Makefile
@@ -1,25 +1,31 @@
-.PHONY: help rebuild rebuild-improve rebuild-regress rebuild-subtle rebuild-dramatic open-latest clean
+.PHONY: help eval eval-quick eval-tags eval-summary rebuild rebuild-improve rebuild-regress rebuild-subtle rebuild-dramatic open clean
 
-# Default target
-help:
-	@echo "HTML Reporter Development Commands"
-	@echo ""
-	@echo "Quick rebuild (fast iteration):"
-	@echo "  make rebuild          - Rebuild without baseline"
-	@echo "  make rebuild-improve  - Rebuild showing improvements"
-	@echo "  make rebuild-regress  - Rebuild showing regressions"
-	@echo "  make rebuild-subtle   - Rebuild with small changes"
-	@echo "  make rebuild-dramatic - Rebuild with large changes"
-	@echo ""
-	@echo "Utilities:"
-	@echo "  make open-latest      - Open latest evaluation report"
-	@echo "  make clean            - Clean generated reports"
-	@echo ""
-	@echo "Full evaluation:"
-	@echo "  make eval             - Run full evaluation pipeline"
-	@echo "  make eval-quick       - Run English tests only"
+# =============================================================================
+# Evaluation Targets - Run full evaluations with Claude scoring
+# =============================================================================
+
+# Default test types (all of them)
+TESTS ?= excerpts tags summary
+
+eval:
+	@for test_type in $(TESTS); do \
+		echo "Running evaluation for $$test_type..."; \
+		./lib/evaluate-with-claude.sh --test-type $$test_type || exit 1; \
+	done
+
+eval-quick:
+	@./lib/evaluate-with-claude.sh --only-testing "ExcerptGenerationTests/excerptGenerationEnglish(parameters:)"
+
+eval-tags:
+	@./lib/evaluate-with-claude.sh --test-type tags
+
+eval-summary:
+	@./lib/evaluate-with-claude.sh --test-type summary
+
+# =============================================================================
+# HTML Report Development - Fast iteration without re-running tests
+# =============================================================================
 
-# Quick rebuild commands
 rebuild:
 	@./lib/quick-rebuild-report.sh
 
@@ -35,26 +41,58 @@ rebuild-subtle:
 rebuild-dramatic:
 	@./lib/quick-rebuild-report.sh --regress --variation 2.0
 
-# Open latest report
-open-latest:
-	@LATEST=$$(ls -t "$(TMPDIR)WordPressIntelligence-Tests/quick-rebuild/evaluation-report.html" 2>/dev/null | head -1); \
+# =============================================================================
+# Utilities
+# =============================================================================
+
+open:
+	@LATEST=$$(ls -t "$(TMPDIR)WordPressIntelligence-Tests/*/evaluation-report.html" 2>/dev/null | head -1); \
 	if [ -n "$$LATEST" ]; then \
 		echo "Opening: $$LATEST"; \
 		open "$$LATEST"; \
 	else \
-		echo "No report found. Run 'make rebuild' first."; \
+		echo "No report found. Run 'make eval' or 'make rebuild' first."; \
 	fi
 
-# Clean generated files
 clean:
-	@echo "Cleaning generated reports..."
-	@rm -rf "$(TMPDIR)WordPressIntelligence-Tests/quick-rebuild"
-	@rm -f "$(TMPDIR)WordPressIntelligence-Tests/mock-baseline.json"
+	@echo "Cleaning generated files..."
+	@rm -rf "$(TMPDIR)WordPressIntelligence-Tests"
 	@echo "✓ Cleaned"
 
-# Full evaluation
-eval:
-	@./evaluate-with-claude.sh
+# =============================================================================
+# Help
+# =============================================================================
 
-eval-quick:
-	@./evaluate-with-claude.sh --only-testing "ExcerptGenerationTests/excerptGenerationEnglish(parameters:)"
+help:
+	@echo "┌─────────────────────────────────────────────────────────┐"
+	@echo "│  WordPressIntelligence Evaluation Test Suite           │"
+	@echo "└─────────────────────────────────────────────────────────┘"
+	@echo ""
+	@echo "━━━ Evaluation (Run tests + Claude scoring) ━━━"
+	@echo ""
+	@echo "  make eval              Run all test types (excerpts, tags, summary)"
+	@echo "  make eval TESTS=\"excerpts\"                Run only excerpt tests"
+	@echo "  make eval TESTS=\"excerpts tags\"           Run excerpt and tag tests"
+	@echo "  make eval-quick        Run English excerpt tests only"
+	@echo "  make eval-tags         Run tag suggestion tests"
+	@echo "  make eval-summary      Run post summary tests"
+	@echo ""
+	@echo "━━━ HTML Report Development (No test runs) ━━━"
+	@echo ""
+	@echo "  make rebuild           Rebuild report (no baseline)"
+	@echo "  make rebuild-improve   Rebuild with mock improvements"
+	@echo "  make rebuild-regress   Rebuild with mock regressions"
+	@echo "  make rebuild-subtle    Rebuild with subtle changes (±0.3)"
+	@echo "  make rebuild-dramatic  Rebuild with dramatic changes (±2.0)"
+	@echo ""
+	@echo "━━━ Utilities ━━━"
+	@echo ""
+	@echo "  make open              Open latest HTML report"
+	@echo "  make clean             Clean all generated files"
+	@echo "  make help              Show this help"
+	@echo ""
+	@echo "For HTML development workflow: lib/DEVELOPMENT.md"
+	@echo "For evaluation CLI options: ./lib/evaluate-with-claude.sh --help"
+	@echo ""
+
+.DEFAULT_GOAL := help
diff --git a/Modules/Tests/WordPressIntelligenceTests/lib/evaluate-with-claude.sh b/Modules/Tests/WordPressIntelligenceTests/lib/evaluate-with-claude.sh