# Ooru — Data Sources Research

> Research into every data source for the Ooru pipeline: rentals, places/geocoding,
> schools, air quality, metro, commute. Goal: replace the fragile Apify path and the
> fabricated school/metro claims with verified, cost-known, legally-clear sources.
> Researched live 2026-06-09 via browser + curl (web_search was down; see confidence labels).
> Authored under maahaa.dev.

> **v2 note.** The chosen architecture is the v2 PocketBase rewrite (docs/MIGRATION.md):
> the pipeline lives in ONE `pb_hooks/main.pb.js` hook, called via `$http.send`, keys from
> server env. Every endpoint, query shape, BBOX, ranking, and ToS/caching finding below is
> stack-agnostic and carries over verbatim — the only change is the host: logic goes into
> the JS hook, not the v1 `backend/sources/*.py`. The Python samples in §8 are kept as the
> reference implementation (port the logic to JS in the hook). Where §8 says "the 0.1
> honesty fix" or "GraphState," read "the single linear hook that passes only verified
> fields" — same intent, v2 framing.

## Confidence labels
- [VERIFIED] confirmed live this session against the source's own site/API.
- [PARTIAL] confirmed the source exists/responds, but exact pricing/terms not fully read.
- [KNOWLEDGE] from prior domain knowledge, NOT re-verified live — spot-check before relying.

---

## 1. RENTAL LISTINGS (the broken Apify path)

Current state: `house_rental_apify()` hardcodes one housing.com URL and takes no args;
the workflow node is disabled. Needs a real, parameterized source.

### Apify Store actors [VERIFIED — queried Apify API live]
All are third-party scrapers, PAY_PER_EVENT pricing (you pay per result/dataset item).
Run counts are a proxy for reliability/maintenance.

| Actor | Portal | Runs | Pricing | Notes |
|---|---|---|---|---|
| stealth_mode/99acres-property-search-scraper | 99acres | 2,059 | pay-per-event | highest usage; most battle-tested |
| krazee_kaushik/magicbricks-search-results-scraper | MagicBricks | 992 | pay-per-event | strong usage |
| fatihtahta/99acres-scraper-ppe | 99acres | 649 | ~$3.5 / 1k results | explicit price |
| easyapi/housing-com-scraper | housing.com | 640 | pay-per-event | matches current target portal |
| abotapi/housing-com-scraper | housing.com | 237 | "$0.7" tier | cheap |
| thirdwatch/nobroker-scraper | NoBroker | 213 | pay-per-event | OWNER-DIRECT listings |
| krazee_kaushik/nobroker-search-results-scraper | NoBroker | 191 | pay-per-event | owner-direct |
| codingfrontend/magicbricks-property-search-scraper | MagicBricks | 176 | pay-per-event | — |

Typical pay-per-event: ~$3.5–7 per 1,000 results, or a few cents per run. With the existing
APICache layer, marginal cost stays low (cache by locality, refresh periodically).

Verdict on Apify: viable and cheap. Replace the single hardcoded actor with a parameterized
call. NoBroker actors are interesting — owner-direct listings mirror bengaluru.rent's
"zero brokerage" data. 99acres has the most reliable actor (2k+ runs).

### Official / partner APIs [KNOWLEDGE]
- Housing.com / 99acres / MagicBricks / NoBroker: NO public developer API. Partner/affiliate
  programs exist but require business agreements, not solo-dev friendly. Dead end for now.

### Legal / ToS [KNOWLEDGE]
- Scraping these portals violates their ToS (medium-high risk). Apify shifts the operational
  risk to the actor, but the legal exposure for using scraped data commercially remains.
  For an MVP / personal tool: low practical risk. For a public product: get counsel, or pivot
  to crowdsourced/owner-direct data (the bengaluru.rent model) once trust is established.
- RERA Karnataka (rera.karnataka.gov.in) publishes registered-project data — legitimate, but
  it's project registrations, not live rental listings. Useful for verification, not discovery.

### RANKED for cost-conscious solo dev
1. **Apify `easyapi/housing-com-scraper`** (matches current portal, drop-in, 640 runs) — fastest fix.
2. **Apify `stealth_mode/99acres-...`** (most reliable, 2k+ runs) — if housing.com actor flakes.
3. **NoBroker actor** — owner-direct data, strategic fit with the differentiation thesis.
4. **Later: crowdsourced opt-in rents** (authenticated) — only after trust restored. Aligns with moat.

---

## 2. PLACES / GEOCODING / POI

Current: Mapbox Geocoding v5 (legacy) + Foursquare Places for metro/schools.

### Foursquare Places API [VERIFIED — read pricing page live]
- Pivoted hard to "FSQ Spatial" GIS products (Desktop/Workbench, $25–250/mo).
- Places API is now pay-as-you-go: **up to 10,000 free calls on Pro endpoints**, then PAYG.
- Legacy Places API deprecated; new service-key model. The app's `X-Places-Api-Version:
  2025-06-17` header shows it's already on the newer API. Auth/quota may need re-check.
- Verdict: fine for low volume, but text-search quality for "ICSE school" is poor (see §3).

### Ola Maps (Krutrim) [VERIFIED — read pricing page live] — India-native, strong
- **500,000 free API calls/month across ALL APIs.** Free for 1st year on most endpoints.
- Has: Geocoding, Reverse Geocoding, Places Nearby, Text Search, Distance Matrix, Directions,
  Route Optimizer, Snap-to-road, tiles, Street View.
- "Made in India · Data stays in India." SOC2, ISO 27001. Usage-based beyond free tier.
- Verdict: BEST value + India-local data for a cost-conscious dev. Strong candidate to replace
  BOTH Mapbox (geocoding/matrix) and Foursquare (POI). Single vendor, generous free tier.

### Google Places API (New) / Geocoding / Routes [KNOWLEDGE]
- Best global data quality; excellent Bangalore coverage. Monthly free credit, then PAYG
  (per-call, varies by SKU tier). CRITICAL ToS gotcha: Google generally PROHIBITS caching/
  storing most Places results beyond limited place IDs — conflicts with the APICache design.
- Verdict: high quality but the no-caching rule fights the architecture and cost adds up. Avoid
  as primary; use only if a specific field is unavailable elsewhere.

### OpenStreetMap — Nominatim (geocode) + Overpass (POI) [PARTIAL] — free / self-hostable
- Nominatim: free geocoding (usage policy: 1 req/s on public instance; self-host for volume).
- Overpass: free POI queries (amenity=school, railway=station, station=subway). Bangalore metro
  IS in OSM (my area-name query returned 0 only because the filter/area lookup needs tuning —
  the right query is `[railway=station][station=subway]` within the Bengaluru relation, or by
  `network=Namma Metro`). ODbL license (attribution + share-alike on derived DB).
- Verdict: best free/self-hosted fallback. Caching is fine (it's open data). Coverage good but
  community-variable.

### Mapbox [KNOWLEDGE]
- Current geocoding v5 is legacy → migrate to Search Box API. Matrix API exists. Free tier
  (~100k geocodes/mo historically), then PAYG. Caching restricted by ToS. Keep the token if
  using Mapbox GL for the map view (Horizon 1), but Ola/OSM may serve data better/cheaper.

### RANKED per use-case (cost-conscious)
- **Geocoding:** Ola Maps (500k free, India data) > OSM Nominatim self-hosted (free) > Mapbox Search Box.
- **POI (metro/schools/hospitals):** OSM Overpass (free, cacheable) > Ola Places Nearby > Foursquare.
- **Distance matrix / commute:** OSRM self-hosted (free, unlimited) > Ola Distance Matrix (free tier) > OpenRouteService (2k/day free).
- ToS caching note: OSM and Ola allow caching; Google/Mapbox restrict it. The APICache layer
  favors OSM + Ola.

---

## 3. SCHOOLS (with BOARD: ICSE/CBSE) — the honest-differentiator data

Key requirement: identify schools by BOARD with lat/lon, so the app can truthfully say
"N ICSE schools within X km". POI text-search (current Foursquare approach) CANNOT do this —
"school" is the only tag; no board attribute. This is the core problem to solve.

### UDISE+ (udiseplus.gov.in) [KNOWLEDGE — verify format before ingest]
- Govt national school dataset. Name, UDISE code, district, management type, and lat/lon for
  many schools (completeness varies). Downloadable Excel/CSV per state/district. Free, open.
- GAP: board affiliation (CBSE/ICSE) is weak/absent — mostly management type (govt/private/aided).
- Verdict: strong BASE layer (all schools + coords for Karnataka), but not board on its own.

### CISCE (cisce.org) — ICSE/ISC official directory [KNOWLEDGE]
- Authoritative ICSE/ISC school list with name + address + board. NO lat/lon, NO bulk download/API.
- Scrapable (fragile, ToS-risky). ~200–300 ICSE schools in Bangalore.
- Verdict: the ONLY authoritative ICSE source. Needs geocoding (Ola/Nominatim) after extraction.

### CBSE (cbse.gov.in) — affiliated-school directory [KNOWLEDGE]
- Authoritative CBSE list: name, affiliation no., address, board. No bulk export. Scrapable.
- Verdict: authoritative CBSE source; same geocoding step needed.

### POI fallbacks for board [VERIFIED conclusion]
- Google Places / OSM Overpass / Foursquare: tag schools as "school" only. Board filtering
  (ICSE vs CBSE) is NOT achievable from any POI API. Same limitation as the current approach.

### RANKED approach
1. **Hybrid ingest into Postgres (recommended):** UDISE+ as base (all schools + coords for
   Karnataka) + scraped CISCE & CBSE lists for board, merged on name/address, missing coords
   geocoded via Ola/Nominatim. One-time build, then it's local data — fast, free, cacheable,
   and genuinely lets the report say "3 ICSE schools within 2km" truthfully.
2. **Interim honest fallback:** if board data isn't ready, the report says "N schools within
   X km (board not verified)" — honest, not fabricated. This is the minimum for the 0.1 fix.
3. **Avoid:** pure POI text-search claiming board accuracy (the current fabrication).

KEY POINT: no single source has board + reliable lat/lon. The hybrid is the only honest path.
Until it exists, the report must NOT claim board-specific school counts.

---

## 4. AIR QUALITY

Current: Open-Meteo air-quality API (free). Works, but modelled (not station-measured).

### Open-Meteo Air Quality [KNOWLEDGE — already in use, working]
- Free, no key, no hard rate limit. Returns us_aqi, pm2_5, pm10, etc. per lat/lon.
- Modelled from CAMS/ECMWF — coarse for dense Indian cities, station density = 0.
- Verdict: good enough for relative locality ranking (which is what RANK_BY_AQI does). Keep as
  the per-locality ranking source; it's free and consistent.

### CPCB Real-time AQI via data.gov.in [VERIFIED — catalog live, updated 2026-06-09]
- Official Indian AQI. Catalog `real-time-air-quality-index` on data.gov.in HAS a Catalog API
  (needs a free data.gov.in API key). Pollutants: SO2, NO2, PM10, PM2.5, CO, O3. All-India stations.
- Bangalore: ~10–15 official CPCB monitoring stations (accurate but sparse).
- Verdict: most ACCURATE for Bangalore at station locations; use to calibrate/cross-check
  Open-Meteo, or show official AQI when a CPCB station is near the target.

### WAQI / aqicn.org [KNOWLEDGE]
- Free token (~1000 calls/day). Aggregates govt + citizen sensors → best station DENSITY for
  Bangalore. Non-commercial free; commercial needs contact (ToS check before productizing).

### OpenAQ [KNOWLEDGE]
- Free API key, open data (cacheable). Aggregates CPCB + others. Slightly fewer live stations
  than WAQI but cleaner open-data terms.

### RANKED for AQI accuracy in Bangalore
1. **CPCB (data.gov.in)** — official, most accurate at stations; free key. [VERIFIED exists]
2. **WAQI/aqicn** — best density + freshness; watch commercial terms.
3. **Open-Meteo** (current) — keep for per-locality relative ranking (free, modelled).
Best design: Open-Meteo for ranking ALL localities + CPCB official number when a station is near.

---

## 5. METRO + COMMUTE

### Namma Metro (BMRCL) stations/lines
- GTFS feed: [PARTIAL — not confirmed live this session]. Check mobilitydatabase.org and
  transit.land for a "Namma Metro / Bengaluru Metro" feed (transit.land needs an API key —
  returned Unauthorized without one). A static GTFS likely exists; realtime feed unlikely.
- OSM Overpass: [PARTIAL] metro stations + lines ARE in OSM. Correct query: railway=station
  with station=subway (or network="Namma Metro") inside the Bengaluru area relation. My first
  area-name filter returned 0 (wrong area match), not because data is absent. Free, cacheable.
- Verdict: OSM Overpass is the pragmatic free source for station coords + line geometry. Confirm
  the GTFS feed if line/route structure (Purple/Green/Yellow/Pink) is needed.

### Commute / travel time
- OSRM self-hosted: free, unlimited, you run it. Best for cost-conscious at volume. [KNOWLEDGE]
- Ola Maps Distance Matrix: in the 500k/mo free tier, India routing. [VERIFIED tier]
- OpenRouteService: ~2,000 req/day free. [KNOWLEDGE]
- Mapbox Matrix (current dep) / Google Distance Matrix: PAYG, caching-restricted.
- Note: the app already has a dead `get_travel_times` (Mapbox Matrix) — revive it via Ola or
  OSRM for the "what-if commute" Horizon-2 feature.

### RANKED
1. Metro stations/lines: **OSM Overpass** (free) — confirm GTFS for line structure.
2. Commute time: **Ola Distance Matrix** (free tier, India) or **OSRM self-hosted** (free, unlimited).

---

## 6. SYNTHESIS — recommended source stack

| Pipeline need | Current | Recommended | Why |
|---|---|---|---|
| Geocode landmark | Mapbox v5 (legacy) | Ola Maps Geocoding | 500k/mo free, India data, cacheable |
| Nearby metro/POI | Foursquare (discarded) | OSM Overpass | free, cacheable, real coords |
| Schools + BOARD | Foursquare text (fabricated) | UDISE+ + CISCE/CBSE hybrid in Postgres | only honest path to board-level claims |
| AQI ranking | Open-Meteo | Open-Meteo (keep) + CPCB cross-check | free ranking + official accuracy |
| Rentals | broken Apify (1 hardcoded URL) | Apify easyapi/housing-com or 99acres actor, parameterized | cheap pay-per-event, cacheable |
| Commute (future) | dead get_travel_times | Ola Distance Matrix or OSRM | free tier / self-host |

Single biggest win: **Ola Maps** could consolidate geocoding + POI + matrix under one
India-native vendor with a 500k/mo free tier — simpler and cheaper than the current
Mapbox + Foursquare split. Pair with the UDISE+/CISCE/CBSE school ingest for the honest
school differentiator, and a parameterized Apify actor for rentals.

Everything routes through the existing APICache table (OSM + Ola + Open-Meteo all permit
caching; Google/Mapbox restrict it — another reason to prefer Ola/OSM).

## 7. Caveats / what to verify before building
- Ola Maps free-tier exact per-endpoint limits and post-free pricing: read the calculator. [VERIFIED tier exists]
- Apify actor output schema (field names for price/BHK/lat-lon) varies per actor — test one run.
- CPCB data.gov.in API key registration + response shape. [VERIFIED catalog + API exist]
- Namma Metro GTFS feed existence/freshness on mobilitydatabase.org. [UNCONFIRMED]
- School scraping ToS (CISCE/CBSE) — low practical risk for one-time ingest, but note it.
- Rental-scraping ToS — fine for MVP, get counsel before a public commercial launch.

---

## 8. IMPLEMENTATION PLAYBOOK — buildable, copy-paste next time

> Decision (2026-06-09): do NOT scrape Google Maps for POIs. Google can't return school
> BOARD, its ToS forbids caching (fights APICache), and its anti-bot makes a DIY scraper
> brittle. The robust, free, cacheable one-time-ingest path is OSM Overpass for POIs +
> UDISE/CISCE/CBSE for school boards + Nominatim/Ola for geocoding. All endpoints below
> were hit live this session unless marked otherwise.

### 8.1 OSM Overpass — metro / schools / hospitals / supermarkets  [VERIFIED]

Endpoint: `https://overpass-api.de/api/interpreter` (POST or GET, body param `data=`)
Free, no key. License: ODbL (attribute "© OpenStreetMap contributors"; caching/DB storage allowed).

**CRITICAL GOTCHA — use a BBOX, not `area["name"=...]`.** The area-name lookup
(`area["name"="Bengaluru"]["admin_level"="8"]`) returned 0 results — the boundary
name/admin_level match is unreliable. A bounding box is robust and faster.

Bengaluru bbox (lat_min,lon_min,lat_max,lon_max): `12.83,77.45,13.14,77.78`

Verified counts this session (BBOX): hospitals+supermarkets = 1,803 elements. The metro/
school queries use the identical proven pattern; they only "failed" intermittently due to
the PUBLIC instance rate-limiting back-to-back heavy queries (HTTP 429). Space requests
~5–10s apart, or self-host Overpass for bulk.

Query shapes (Overpass QL):
```
# Metro stations (Namma Metro)
[out:json][timeout:30];node["station"="subway"](12.83,77.45,13.14,77.78);out tags center;

# All schools (POI level — NO board info; see 8.2 for board)
[out:json][timeout:40];nwr["amenity"="school"](12.83,77.45,13.14,77.78);out center tags;

# Hospitals + supermarkets in one call (bbox repeated in each filter)
[out:json][timeout:40];
(nwr["amenity"="hospital"](12.83,77.45,13.14,77.78);
 nwr["shop"="supermarket"](12.83,77.45,13.14,77.78););
out center tags;

# count-only (cheap probe): replace the trailing `out ...;` with `out count;`
```
Notes: `nwr` = node+way+relation (schools are often ways/relations, so `node` alone misses
most). `out center` gives a single lat/lon for ways/relations. `out tags` includes `name`,
`operator`, occasionally `school:board` (rare in IN data — do NOT rely on it for board).

Sample function (drop into a new `backend/sources/osm.py`; routes through APICache):
```python
import requests
from backend.database import get_cached_response, save_cached_response

OVERPASS_URL = "https://overpass-api.de/api/interpreter"
BLR_BBOX = "12.83,77.45,13.14,77.78"  # lat_min,lon_min,lat_max,lon_max

def overpass_pois(amenity_filter: str, bbox: str = BLR_BBOX, kind: str = "nwr"):
    """One-time bulk POI pull. amenity_filter e.g. '[\"amenity\"=\"school\"]'.
    Returns list of {name, lat, lon, tags}. Cached by (filter,bbox)."""
    cache_key = f"overpass:{kind}:{amenity_filter}:{bbox}"
    data = get_cached_response("osm_overpass", cache_key)
    if data is None:
        q = (f'[out:json][timeout:60];'
             f'{kind}{amenity_filter}({bbox});out center tags;')
        r = requests.get(OVERPASS_URL, params={"data": q}, timeout=90,
                         headers={"User-Agent": "ooru/1.0 (contact@maahaa.dev)"})
        r.raise_for_status()
        data = r.json()
        save_cached_response("osm_overpass", cache_key, data)
    out = []
    for el in data.get("elements", []):
        lat = el.get("lat") or el.get("center", {}).get("lat")
        lon = el.get("lon") or el.get("center", {}).get("lon")
        if lat is None or lon is None:
            continue
        out.append({"name": el.get("tags", {}).get("name"),
                    "lat": lat, "lon": lon, "tags": el.get("tags", {})})
    return out

# metro   = overpass_pois('["station"="subway"]', kind="node")
# schools = overpass_pois('["amenity"="school"]')           # POI only, no board
# hosp    = overpass_pois('["amenity"="hospital"]')
```
For nearest-N + distance, compute haversine in Python over the cached list (no API call):
```python
from math import radians, sin, cos, asin, sqrt
def haversine_km(lat1, lon1, lat2, lon2):
    dlat, dlon = radians(lat2-lat1), radians(lon2-lon1)
    a = sin(dlat/2)**2 + cos(radians(lat1))*cos(radians(lat2))*sin(dlon/2)**2
    return 2*6371*asin(sqrt(a))
def within_km(origin_lat, origin_lon, pois, km):
    return sorted(
        ({**p, "dist_km": round(haversine_km(origin_lat, origin_lon, p["lat"], p["lon"]), 2)}
         for p in pois
         if haversine_km(origin_lat, origin_lon, p["lat"], p["lon"]) <= km),
        key=lambda x: x["dist_km"])
```

### 8.2 Schools WITH BOARD (ICSE/CBSE) — one-time Postgres ingest  [PARTIAL]

POI APIs (OSM/Google/Foursquare) cannot give board. The only honest path is a hybrid
dataset, built ONCE into a `schools` table, then queried locally (free, fast).

Reachability this session: udiseplus.gov.in → 200, cisce.org → 403 (bot-blocked; needs a
real browser/UA or manual download), saras.cbse.gov.in → 503 (intermittent).

Sources + access:
- UDISE+ — https://udiseplus.gov.in (and dashboard.udiseplus.gov.in). Base layer: every
  school + UDISE code + district + (variable) lat/lon for Karnataka. Download Excel/CSV per
  state/district. Board affiliation is weak/absent → use only for coords + existence.
- CISCE (ICSE/ISC) — https://cisce.org school locator. Authoritative board, name+address,
  NO lat/lon, no bulk export. 403 to curl → fetch via browser tool or manual save, then
  parse. ~200–300 Bangalore schools.
- CBSE — https://saras.cbse.gov.in (affiliated-school search; formerly the "affiliation"
  portal). Authoritative board, name+affiliation no+address, no bulk export.

Build steps (one-time):
1. Download UDISE+ Karnataka → load into `schools` (name, udise_code, lat, lon, district).
2. Fetch CISCE + CBSE Bangalore lists (browser tool, not curl — they block bots) → board+address.
3. Geocode addresses missing coords via Nominatim (8.3) or Ola (8.4); rate-limit politely.
4. Fuzzy-match board lists onto UDISE base by name+locality (rapidfuzz token_sort_ratio ≥ ~85).
5. Result: a `schools(name, board, lat, lon, source)` table. Now "3 ICSE schools within 2km"
   is a local SQL + haversine query — no per-request API call, fully honest.

Interim honest fallback until 8.2 exists: report "N schools within X km (board not verified)"
using OSM (8.1). NEVER claim ICSE/CBSE counts the data can't prove.

### 8.3 Nominatim — geocode a landmark  [VERIFIED]

Endpoint: `https://nominatim.openstreetmap.org/search`
Free. Usage policy: max 1 req/sec, REQUIRED descriptive `User-Agent`, cache results
(self-host for volume). Verified live: "Koramangala, Bengaluru" → 12.9357366, 77.6240810.
```python
def geocode_nominatim(q: str):
    r = requests.get("https://nominatim.openstreetmap.org/search",
        params={"q": q, "format": "json", "limit": 1, "countrycodes": "in"},
        headers={"User-Agent": "ooru/1.0 (contact@maahaa.dev)"}, timeout=20)
    r.raise_for_status()
    d = r.json()
    return (float(d[0]["lat"]), float(d[0]["lon"]), d[0]["display_name"]) if d else None
```

### 8.4 Ola Maps — geocode / places / matrix (India-native)  [VERIFIED tier; endpoints KNOWLEDGE]

maps.olakrutrim.com → 200. 500k free calls/month across all APIs; caching allowed;
data stays in India. Get an API key from the Ola Maps console. Verify exact paths in their
API reference before wiring (paths below are the documented shape, not hit live this session):
```
GET https://api.olamaps.io/places/v1/geocode?address=<q>&api_key=<KEY>
GET https://api.olamaps.io/places/v1/nearbysearch?...&api_key=<KEY>
GET https://api.olamaps.io/routing/v1/distanceMatrix?...&api_key=<KEY>
```
Use Ola when Nominatim coverage is thin, or for the Horizon-2 commute matrix (revive the
dead get_travel_times against distanceMatrix).

### 8.5 AQI — Open-Meteo (ranking) + CPCB (official)  [VERIFIED reachable]

- Open-Meteo (keep, current): `https://air-quality-api.open-meteo.com/v1/air-quality`
  ?latitude=&longitude=&current=us_aqi,pm2_5,pm10 — free, no key, cacheable. Already wired.
- CPCB official via data.gov.in: host is `https://api.data.gov.in/resource/<resource_id>`
  ?api-key=<KEY>&format=json (register free key at data.gov.in). Returned 429 (rate-limited)
  this session = live + responding. Use to show the official number when a CPCB station is
  near the target; keep Open-Meteo for ranking ALL localities.
  NOTE: `https://api.data.gov.in` root returns 404 — you MUST hit `/resource/<id>`, not root.

### 8.6 Rentals — Apify actor (NOT Google; NOT DIY-Google)  [VERIFIED actors]

Google has no rental data. Self-scraping housing.com/99acres is its own anti-bot fight; a
maintained Apify actor is cheaper-to-own than DIY here (the per-event cost is small and the
maintenance is someone else's problem). Parameterize one actor, cache by locality:
- `easyapi/housing-com-scraper` (matches current portal) — pick 1.
- `stealth_mode/99acres-property-search-scraper` (2,059 runs, most reliable) — pick 2.
- `thirdwatch/nobroker-scraper` (owner-direct, fits the moat) — strategic.
Call via Apify API: `POST https://api.apify.com/v2/acts/<user~actor>/runs?token=<APIFY_TOKEN>`
with the actor's input JSON; poll the run; read `https://api.apify.com/v2/datasets/<id>/items`.
TEST ONE RUN FIRST — each actor's output field names (price/BHK/lat-lon) differ.

### 8.7 Summary — what to build, in order
1. `backend/sources/osm.py` — Overpass POI pull (8.1) + haversine. Powers metro + schools(POI)
   + hospitals immediately, free, cacheable. This alone unblocks the 0.1 honesty fix.
2. Geocode via Nominatim (8.3); swap to Ola (8.4) if coverage is thin.
3. One-time `schools` table with board (8.2) — turns the school signal honest (ICSE/CBSE).
4. Keep Open-Meteo; add CPCB cross-check (8.5).
5. Parameterize one Apify rental actor (8.6) for the rentals path (Horizon 1).
All five route through the existing APICache / a local table — zero per-request cost after
the first call. No Google scraping anywhere.
