Skip to content

EzraBrand/blog-archive

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Blog Archive

A sophisticated archival system for blog posts with multi-dimensional organization including chronological, topical, and series-based navigation. The system provides intelligent categorization, series detection, and maintains content relationships through symbolic linking.

Features

  • Intelligent categorization with keyword-based classification
  • Chronological organization by year and quarter
  • Automatic series detection and organization
  • Cross-referencing through symbolic links
  • Automated organization using Python
  • Preservation of original HTML content
  • Smart content relationship management

Directory Structure

/original-content/     # Original blog posts and metadata
  /posts/             # Original HTML files
  posts.csv           # Metadata for organization

/blog-archive/        # Main organized content
  /posts/            # Chronological organization
    /YYYY/          # Year folders
      /Q1-Q4/      # Quarter folders
        *.html     # Symlinks to blog posts
  /categories/      # Topic-based organization
    /talmud/       # Talmudic studies and interpretation
    /history/      # Historical analysis and research
    /tech-and-ai/  # Technology and AI related content
    /biblical/     # Biblical commentary and analysis
    /methodology/  # Study methods and analytical approaches
  /series/          # Multi-part post collections
    /{series-name}/ # Auto-detected series groupings
  /assets/          # Static assets (pending implementation)
  index.md         # Main chronological index
  categories.md    # Category-based index
  series.md        # Series/multi-part posts index

/scripts/           # Organization tools
  organize_posts.py # Main organization script

Categories

The system uses intelligent keyword detection to categorize content into:

  • Talmud: Posts about Talmudic studies and interpretation
  • History: Historical analysis and research
  • Tech & AI: Technology, automation, and artificial intelligence
  • Biblical: Biblical commentary and analysis
  • Methodology: Study methods and analytical approaches

Implementation

The archive uses:

  • Python for automated organization
  • Intelligent category detection with extensive keyword sets
  • Regex-based series detection
  • CSV-based metadata management
  • Symbolic links for cross-referencing
  • Markdown indices for navigation
  • Quarterly organization for better content management

Usage

  1. Place blog post HTML files and metadata CSV in source directory
  2. Run the organization script:
    python scripts/organize_posts.py
  3. Access content through chronological, categorical, or series-based navigation

Features Deep Dive

  • Category Detection: Uses sophisticated keyword matching with over 50 keywords per category
  • Series Detection: Implements regex patterns to identify and group related posts
  • Content Preservation: Maintains original HTML formatting while providing organized access
  • Multi-dimensional Access: Provides multiple ways to discover and access content
  • Relationship Management: Maintains connections between related posts
  • Index Generation: Creates navigable markdown indices for easy content discovery

Maintenance

  • HTML files maintain original formatting
  • Generated indices update automatically
  • Symbolic links ensure content consistency
  • Quarter-based organization for scalability
  • Category system expandable through keyword updates

Roadmap

Future enhancements planned:

  • Asset processing implementation
  • Enhanced content validation
  • Extended test coverage
  • Search functionality
  • Tag system implementation
  • Enhanced metadata extraction

About

blog-archive

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published