- Searches LinkedIn jobs via Apify actors
- Tailors your CV to each role using OpenAI
- Emails you the job links with tailored CV attachments
- Optionally lists 1st-degree connections at target companies (from a CSV export)
- Apify Client: Fetches LinkedIn job results from your scheduled actor runs (no re-triggering).
- OpenAI: LLM for structured CV extraction, edit plan generation, JD topic and priority extraction, and deterministic rewrite to structured JSON.
- SMTP (email): Sends tailored resumes with job links as attachments.
- Pydantic: Models and settings validation.
- Requires an Apify token (APIFY_TOKEN) and an OpenAI API key for tailoring.
- Email uses SMTP; set DRY_RUN=true to preview without sending.
- Without OPENAI_API_KEY, tailoring falls back to your original resume text.
- Connections CSV is optional; export from LinkedIn and set
LINKEDIN_CONNECTIONS_CSV. - Duplicate protection: processed job links are stored in
state/sent_jobs.jsonand skipped in future cycles.
- Create and fill
.envbased on.env.example. - Install dependencies:
pip install -r requirements.txt- Run commands:
# Search-only to CSV (uses settings from .env for filters)
python get_jobs.py search --max-jobs 5 --out jobs.csv
# Full pipeline: search, tailor resumes, and email
python get_jobs.py pipeline --max-jobs 3
# Watch mode: poll every 60 minutes and process only new jobs
python get_jobs.py watch --minutes 60 --max-jobs 5-
Resume Text Extraction
- Detects file type (PDF/DOCX/TXT) and extracts raw text.
-
LLM-based Structured Extraction
- Parses resume to a structured entity(name, contacts, summary, skills, experience/academic).
- Results are cached on-disk keyed by resume hash.
-
Job Understanding
- LLM extracts canonical JD topics and selects top priorities evidentially(!) supported by the resume and
-
Pass A – Evidence-backed Edit Plan
- Enforces zero fabrication, evidence-backed resume edit plan.
- Returns JSON with suggestions and missing job description unsupported_requirements.
-
Pass B – Deterministic Rewrite to Structured JSON
- Applies ONLY the approved plan to the structured resume, producing a final structured payload for the template builder.
-
Template-based DOCX Generation
- Renders contact, summary, skills, experience, academic items.
- output DOCX.
-
Review Artifacts
- For each job, a
Company_Title.review.jsonis written to the output folder withgapsandunsupported_requirementsfor quick audit of missing evidence.
- For each job, a
Why Two-Pass Architecture (Pass A + Pass B)?
- Pass A (Plan): LLM generates evidence-backed edit plans with strict constraints. This prevents fabrication and ensures all changes cite resume text or EXTRA_KEYWORDS.
- Pass B (Apply): deterministic rule-based application of the plan to structured data. No LLM calls - just mechanical string replacement and array manipulation.
- Benefits: Eliminates scope creep, reduces costs (1 LLM call instead of 2), faster execution, and guaranteed consistency.
Why Sanitization (_sanitize_plan)?
- Model drift protection: LLMs sometimes return malformed JSON or unexpected field shapes. Sanitization normalizes these before processing.
- Safety guardrails: Prevents title overwrites, validates that "find" text exists before replacement, and gates additions to wanted/extra topics only.
- Deterministic safety: Ensures Pass B receives clean, validated input for reliable rule-based application.
Why Template-Based DOCX Generation?
- Consistency: Clean, predictable layout regardless of input resume format.
- Maintainability: No complex in-place editing or style preservation logic.
- Skills grouping: Easy to render organized skills with headings and comma-joined items.
- Alternative rejected: In-place DOCX editing was fragile and hard to debug.
Why LLM-Only Extraction?
- Accuracy: Heuristic parsing (regex, keyword matching) was unreliable for varied resume formats.
- Structured output: LLM provides consistent JSON schema for experience/academic items.
- Caching: Expensive LLM calls are cached by resume hash to avoid redundant API costs.
Why Evidence-Based Planning?
- Zero fabrication: Every addition must cite resume text or EXTRA_KEYWORDS.
- Audit trail: Gaps and unsupported requirements are logged for review.
- Quality control: Prevents the model from inventing experience or upgrading seniority.
Why Apify Over SerpAPI?
- Scheduled runs: Apify actors can run automatically every hour, eliminating API rate limits.
- Data freshness: Fetches latest results without triggering new scrapes.
- Cost efficiency: No per-query charges, just actor subscription.