phase1-ingest-resolve
This commit is contained in:
@@ -178,7 +178,9 @@ pytest -q
|
||||
## Gotchas (append confirmed findings here as you build — this section is durable memory across /clear)
|
||||
|
||||
- Verify ATS JSON field names against live responses before trusting them: Greenhouse `jobs[].absolute_url`; Lever `[].hostedUrl`; Ashby `jobs[].jobUrl`; Workday varies by tenant. Fix in code AND note the confirmed shape here.
|
||||
- JobSpy populates the company's own site (`company_url_direct`) only sometimes; `resolve.py` must cover the gap. Record the observed fill rate here after the first live fetch.
|
||||
- **JobSpy `company_url_direct` fill rate: 0% observed** (5/5 jobs had `website=None` in a live fetch on 2026-06-17, search: "software engineer", United States, `linkedin_fetch_description=False`). `resolve.py` is essential for **every** job, not just a gap-filler. Do not assume any job arrives with a website pre-populated.
|
||||
- **JobSpy `date_posted` / `listed_at` fill rate: ~40% observed** (2/5 jobs had a date; 3/5 were `None`). This is because `linkedin_fetch_description=False` (our default for speed) means LinkedIn's posted date is often absent. `listed_at` is best-effort metadata only; do not gate pipeline logic on it.
|
||||
- **JobSpy confirmed column names** (verified 2026-06-17): `job_url` (full LinkedIn URL incl. tracking params), `company` (display name), `company_url_direct` (company own site — always `None` in practice so far), `date_posted` (sparse when `linkedin_fetch_description=False`), `title`, `location`, `id` (may be `None`; always parse job_id from `job_url` instead). `company_url` is the LinkedIn *company page* URL — never use it as the company website.
|
||||
- LinkedIn parses the numeric job id from `/jobs/view/{id}`; strip tracking query params.
|
||||
- Browser Use needs Chromium installed (`playwright install chromium`) and an LLM key; without them the tier must degrade gracefully.
|
||||
- LinkedIn rate-limits aggressively; keep batches small while testing.
|
||||
|
||||
Reference in New Issue
Block a user