"""Deterministic careers-page heuristics: URL probing, homepage scan, sitemap (Stage 2, tiers 2–4). Scaffold stub -- not implemented yet. """ # TODO (Stage 2, tiers 2–4): implement per CLAUDE.md "Stage 2 — URL patterns / homepage / sitemap". # Tier 2 — URL patterns: probe /careers, /career, /jobs, /join-us, /join, # careers.{domain}, jobs.{domain} via HTTP HEAD (or GET if HEAD fails). # Tier 3 — Homepage link scan: fetch homepage HTML, parse with BeautifulSoup + lxml, # rank anchors by career/job keywords in href/text, return highest-ranked. # Tier 4 — Sitemap: fetch sitemap.xml (and sitemap index if present), scan for career/job URLs. # Each function returns (url: str | None) so cascade.py can return early on first hit.