A sophisticated archival system for blog posts with multi-dimensional organization including chronological, topical, and series-based navigation. The system provides intelligent categorization, series detection, and maintains content relationships through symbolic linking.
- Intelligent categorization with keyword-based classification
- Chronological organization by year and quarter
- Automatic series detection and organization
- Cross-referencing through symbolic links
- Automated organization using Python
- Preservation of original HTML content
- Smart content relationship management
/original-content/ # Original blog posts and metadata
/posts/ # Original HTML files
posts.csv # Metadata for organization
/blog-archive/ # Main organized content
/posts/ # Chronological organization
/YYYY/ # Year folders
/Q1-Q4/ # Quarter folders
*.html # Symlinks to blog posts
/categories/ # Topic-based organization
/talmud/ # Talmudic studies and interpretation
/history/ # Historical analysis and research
/tech-and-ai/ # Technology and AI related content
/biblical/ # Biblical commentary and analysis
/methodology/ # Study methods and analytical approaches
/series/ # Multi-part post collections
/{series-name}/ # Auto-detected series groupings
/assets/ # Static assets (pending implementation)
index.md # Main chronological index
categories.md # Category-based index
series.md # Series/multi-part posts index
/scripts/ # Organization tools
organize_posts.py # Main organization script
The system uses intelligent keyword detection to categorize content into:
- Talmud: Posts about Talmudic studies and interpretation
- History: Historical analysis and research
- Tech & AI: Technology, automation, and artificial intelligence
- Biblical: Biblical commentary and analysis
- Methodology: Study methods and analytical approaches
The archive uses:
- Python for automated organization
- Intelligent category detection with extensive keyword sets
- Regex-based series detection
- CSV-based metadata management
- Symbolic links for cross-referencing
- Markdown indices for navigation
- Quarterly organization for better content management
- Place blog post HTML files and metadata CSV in source directory
- Run the organization script:
python scripts/organize_posts.py
- Access content through chronological, categorical, or series-based navigation
- Category Detection: Uses sophisticated keyword matching with over 50 keywords per category
- Series Detection: Implements regex patterns to identify and group related posts
- Content Preservation: Maintains original HTML formatting while providing organized access
- Multi-dimensional Access: Provides multiple ways to discover and access content
- Relationship Management: Maintains connections between related posts
- Index Generation: Creates navigable markdown indices for easy content discovery
- HTML files maintain original formatting
- Generated indices update automatically
- Symbolic links ensure content consistency
- Quarter-based organization for scalability
- Category system expandable through keyword updates
Future enhancements planned:
- Asset processing implementation
- Enhanced content validation
- Extended test coverage
- Search functionality
- Tag system implementation
- Enhanced metadata extraction