Skip to content

Zeev1988/HireMate

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

HireMate – AI Assistant for Finding and Applying to Jobs

Features

  • Searches LinkedIn jobs via Apify actors
  • Tailors your CV to each role using OpenAI
  • Emails you the job links with tailored CV attachments
  • Optionally lists 1st-degree connections at target companies (from a CSV export)

Tools Used

  • Apify Client: Fetches LinkedIn job results from your scheduled actor runs (no re-triggering).
  • OpenAI: LLM for structured CV extraction, edit plan generation, JD topic and priority extraction, and deterministic rewrite to structured JSON.
  • SMTP (email): Sends tailored resumes with job links as attachments.
  • Pydantic: Models and settings validation.

Notes

  • Requires an Apify token (APIFY_TOKEN) and an OpenAI API key for tailoring.
  • Email uses SMTP; set DRY_RUN=true to preview without sending.
  • Without OPENAI_API_KEY, tailoring falls back to your original resume text.
  • Connections CSV is optional; export from LinkedIn and set LINKEDIN_CONNECTIONS_CSV.
  • Duplicate protection: processed job links are stored in state/sent_jobs.json and skipped in future cycles.

Setup

  1. Create and fill .env based on .env.example.
  2. Install dependencies:
pip install -r requirements.txt
  1. Run commands:
# Search-only to CSV (uses settings from .env for filters)
python get_jobs.py search --max-jobs 5 --out jobs.csv

# Full pipeline: search, tailor resumes, and email
python get_jobs.py pipeline --max-jobs 3

# Watch mode: poll every 60 minutes and process only new jobs
python get_jobs.py watch --minutes 60 --max-jobs 5

Logic Used

  1. Resume Text Extraction

    • Detects file type (PDF/DOCX/TXT) and extracts raw text.
  2. LLM-based Structured Extraction

    • Parses resume to a structured entity(name, contacts, summary, skills, experience/academic).
    • Results are cached on-disk keyed by resume hash.
  3. Job Understanding

    • LLM extracts canonical JD topics and selects top priorities evidentially(!) supported by the resume and
  4. Pass A – Evidence-backed Edit Plan

    • Enforces zero fabrication, evidence-backed resume edit plan.
    • Returns JSON with suggestions and missing job description unsupported_requirements.
  5. Pass B – Deterministic Rewrite to Structured JSON

    • Applies ONLY the approved plan to the structured resume, producing a final structured payload for the template builder.
  6. Template-based DOCX Generation

    • Renders contact, summary, skills, experience, academic items.
    • output DOCX.
  7. Review Artifacts

    • For each job, a Company_Title.review.json is written to the output folder with gaps and unsupported_requirements for quick audit of missing evidence.

Design Decisions

Why Two-Pass Architecture (Pass A + Pass B)?

  • Pass A (Plan): LLM generates evidence-backed edit plans with strict constraints. This prevents fabrication and ensures all changes cite resume text or EXTRA_KEYWORDS.
  • Pass B (Apply): deterministic rule-based application of the plan to structured data. No LLM calls - just mechanical string replacement and array manipulation.
  • Benefits: Eliminates scope creep, reduces costs (1 LLM call instead of 2), faster execution, and guaranteed consistency.

Why Sanitization (_sanitize_plan)?

  • Model drift protection: LLMs sometimes return malformed JSON or unexpected field shapes. Sanitization normalizes these before processing.
  • Safety guardrails: Prevents title overwrites, validates that "find" text exists before replacement, and gates additions to wanted/extra topics only.
  • Deterministic safety: Ensures Pass B receives clean, validated input for reliable rule-based application.

Why Template-Based DOCX Generation?

  • Consistency: Clean, predictable layout regardless of input resume format.
  • Maintainability: No complex in-place editing or style preservation logic.
  • Skills grouping: Easy to render organized skills with headings and comma-joined items.
  • Alternative rejected: In-place DOCX editing was fragile and hard to debug.

Why LLM-Only Extraction?

  • Accuracy: Heuristic parsing (regex, keyword matching) was unreliable for varied resume formats.
  • Structured output: LLM provides consistent JSON schema for experience/academic items.
  • Caching: Expensive LLM calls are cached by resume hash to avoid redundant API costs.

Why Evidence-Based Planning?

  • Zero fabrication: Every addition must cite resume text or EXTRA_KEYWORDS.
  • Audit trail: Gaps and unsupported requirements are logged for review.
  • Quality control: Prevents the model from inventing experience or upgrading seniority.

Why Apify Over SerpAPI?

  • Scheduled runs: Apify actors can run automatically every hour, eliminating API rate limits.
  • Data freshness: Fetches latest results without triggering new scrapes.
  • Cost efficiency: No per-query charges, just actor subscription.

About

AI Assistant for Finding and Applying to Jobs

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages