GitHub - Zeev1988/HireMate: AI Assistant for Finding and Applying to Jobs

HireMate – AI Assistant for Finding and Applying to Jobs

Features

Searches LinkedIn jobs via Apify actors
Tailors your CV to each role using OpenAI
Emails you the job links with tailored CV attachments
Optionally lists 1st-degree connections at target companies (from a CSV export)

Tools Used

Apify Client: Fetches LinkedIn job results from your scheduled actor runs (no re-triggering).
OpenAI: LLM for structured CV extraction, edit plan generation, JD topic and priority extraction, and deterministic rewrite to structured JSON.
SMTP (email): Sends tailored resumes with job links as attachments.
Pydantic: Models and settings validation.

Notes

Requires an Apify token (APIFY_TOKEN) and an OpenAI API key for tailoring.
Email uses SMTP; set DRY_RUN=true to preview without sending.
Without OPENAI_API_KEY, tailoring falls back to your original resume text.
Connections CSV is optional; export from LinkedIn and set LINKEDIN_CONNECTIONS_CSV.
Duplicate protection: processed job links are stored in state/sent_jobs.json and skipped in future cycles.

Setup

Create and fill .env based on .env.example.
Install dependencies:

pip install -r requirements.txt

Run commands:

# Search-only to CSV (uses settings from .env for filters)
python get_jobs.py search --max-jobs 5 --out jobs.csv

# Full pipeline: search, tailor resumes, and email
python get_jobs.py pipeline --max-jobs 3

# Watch mode: poll every 60 minutes and process only new jobs
python get_jobs.py watch --minutes 60 --max-jobs 5

Logic Used

Resume Text Extraction
- Detects file type (PDF/DOCX/TXT) and extracts raw text.
LLM-based Structured Extraction
- Parses resume to a structured entity(name, contacts, summary, skills, experience/academic).
- Results are cached on-disk keyed by resume hash.
Job Understanding
- LLM extracts canonical JD topics and selects top priorities evidentially(!) supported by the resume and
Pass A – Evidence-backed Edit Plan
- Enforces zero fabrication, evidence-backed resume edit plan.
- Returns JSON with suggestions and missing job description unsupported_requirements.
Pass B – Deterministic Rewrite to Structured JSON
- Applies ONLY the approved plan to the structured resume, producing a final structured payload for the template builder.
Template-based DOCX Generation
- Renders contact, summary, skills, experience, academic items.
- output DOCX.
Review Artifacts
- For each job, a Company_Title.review.json is written to the output folder with gaps and unsupported_requirements for quick audit of missing evidence.

Design Decisions

Why Two-Pass Architecture (Pass A + Pass B)?

Pass A (Plan): LLM generates evidence-backed edit plans with strict constraints. This prevents fabrication and ensures all changes cite resume text or EXTRA_KEYWORDS.
Pass B (Apply): deterministic rule-based application of the plan to structured data. No LLM calls - just mechanical string replacement and array manipulation.
Benefits: Eliminates scope creep, reduces costs (1 LLM call instead of 2), faster execution, and guaranteed consistency.

Why Sanitization (_sanitize_plan)?

Model drift protection: LLMs sometimes return malformed JSON or unexpected field shapes. Sanitization normalizes these before processing.
Safety guardrails: Prevents title overwrites, validates that "find" text exists before replacement, and gates additions to wanted/extra topics only.
Deterministic safety: Ensures Pass B receives clean, validated input for reliable rule-based application.

Why Template-Based DOCX Generation?

Consistency: Clean, predictable layout regardless of input resume format.
Maintainability: No complex in-place editing or style preservation logic.
Skills grouping: Easy to render organized skills with headings and comma-joined items.
Alternative rejected: In-place DOCX editing was fragile and hard to debug.

Why LLM-Only Extraction?

Accuracy: Heuristic parsing (regex, keyword matching) was unreliable for varied resume formats.
Structured output: LLM provides consistent JSON schema for experience/academic items.
Caching: Expensive LLM calls are cached by resume hash to avoid redundant API costs.

Why Evidence-Based Planning?

Zero fabrication: Every addition must cite resume text or EXTRA_KEYWORDS.
Audit trail: Gaps and unsupported requirements are logged for review.
Quality control: Prevents the model from inventing experience or upgrading seniority.

Why Apify Over SerpAPI?

Scheduled runs: Apify actors can run automatically every hour, eliminating API rate limits.
Data freshness: Fetches latest results without triggering new scrapes.
Cost efficiency: No per-query charges, just actor subscription.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
hiremate		hiremate
README.md		README.md
environment.yml		environment.yml
get_jobs.py		get_jobs.py
requirements.txt		requirements.txt
send_test_email.py		send_test_email.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

HireMate – AI Assistant for Finding and Applying to Jobs

Features

Tools Used

Notes

Setup

Logic Used

Design Decisions

About

Uh oh!

Releases

Packages

Languages

Zeev1988/HireMate

Folders and files

Latest commit

History

Repository files navigation

HireMate – AI Assistant for Finding and Applying to Jobs

Features

Tools Used

Notes

Setup

Logic Used

Design Decisions

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages