Apify · easyapi/housing-com-scraper
verifiedMatches the current target portal. Drop-in replacement for the hardcoded actor.
/ 01 — Research · 2026-06-09
Every source the pipeline needs — rentals, geocoding, schools, air quality, metro, commute — researched live with cost, legal, and reliability notes. Confidence is labelled honestly; nothing here is dressed up as certain when it isn't.
/ 02 — Supply
The broken Apify path. Current: one hardcoded housing.com URL, node disabled. Apify Store actors queried live — all pay-per-event, run counts shown as a reliability proxy.
Matches the current target portal. Drop-in replacement for the hardcoded actor.
Highest usage of any India rental actor — most battle-tested.
Owner-direct listings — mirrors bengaluru.rent's zero-brokerage data.
Housing / 99acres / MagicBricks / NoBroker have no public dev API. Affiliate deals need a business agreement.
Scraping these portals violates their ToS (med–high risk). Apify shifts operational risk to the actor; commercial-use legal exposure remains. Fine for an MVP / personal tool — get counsel before a public launch, or pivot to crowdsourced owner-direct data once trust is restored.
/ 03 — Geocoding
Current: Mapbox geocoding v5 (legacy) + Foursquare for metro/schools. Foursquare pricing read live; Ola Maps pricing read live.
India-native. Geocoding, Places Nearby, Distance Matrix, Directions, tiles. "Data stays in India", SOC2/ISO27001. Caching allowed.
Free POI + geocoding. ODbL (attribution). Cacheable. Bangalore metro is in OSM. Community-variable coverage.
Pivoted to GIS "Spatial" products. New service-key model (app already on it). Text-search quality for "ICSE school" is poor.
Best global data. But ToS prohibits caching most results — fights the APICache design. Cost adds up.
/ 04 — Board data
The honest-differentiator data. Requirement: schools by BOARD with lat/lon. POI text-search cannot do this — "school" is the only tag, no board attribute. This is the core problem behind the fabricated claims.
National school dataset: name, UDISE code, district, lat/lon (variable). Board affiliation weak/absent.
Authoritative ICSE/ISC list with board. No lat/lon, no bulk download. ~200–300 Bangalore schools.
Authoritative CBSE affiliated-school list. Name, affiliation no., address, board. No bulk export.
Google / OSM / Foursquare tag schools as "school" only. Board filtering is NOT achievable from any POI API.
No single source has board + reliable lat/lon. The honest path is a one-time hybrid ingest into Postgres: UDISE+ base + scraped CISCE/CBSE board lists, merged and geocoded via Ola/Nominatim. Until that exists, the report must say "N schools within X km (board not verified)" — never claim ICSE counts it can't prove.
/ 05 — Environment
AQI already works (Open-Meteo). CPCB official confirmed live on data.gov.in. Metro + commute options below.
us_aqi, pm2_5, pm10 per lat/lon. Modelled (CAMS/ECMWF), coarse but consistent for relative ranking.
Official Indian AQI, all-India stations, ~10–15 in Bangalore. Catalog updated 2026-06-09, has an API.
Station coords + line geometry in OSM. Confirm a GTFS feed (mobilitydatabase.org) if Purple/Green/Yellow/Pink route structure is needed.
Revive the dead get_travel_times via Ola Distance Matrix (free tier) or self-hosted OSRM (free, unlimited).
/ 06 — Build steps
Buildable next time — exact endpoints, working query shapes, sample functions. Decision: no Google Maps scraping (can't return school board, ToS bans caching, brittle anti-bot). The robust free path is OSM Overpass + UDISE/CISCE/CBSE + Nominatim/Ola. Everything caches.
Free, no key, ODbL (caching allowed). Endpoint + Bangalore bbox:
POST https://overpass-api.de/api/interpreter (body: data=<Overpass QL>) bbox (lat_min,lon_min,lat_max,lon_max) = 12.83,77.45,13.14,77.78No POI API gives board. Build a schools(name, board, lat, lon, source) table ONCE, then query locally. Reachability: udiseplus.gov.in 200, cisce.org 403 (bot-blocked, use browser), saras.cbse.gov.in 503 (intermittent).
Steps: (1) UDISE+ Karnataka CSV → base (coords + existence). (2) CISCE + CBSE Bangalore lists via browser → board + address. (3) geocode missing coords (Nominatim/Ola). (4) fuzzy-match board onto UDISE base (rapidfuzz ≥85). (5) "3 ICSE within 2km" becomes a local SQL + haversine — fully honest.
Free; 1 req/sec, descriptive User-Agent required, cacheable. Live: "Koramangala, Bengaluru" → 12.9357, 77.6241.
GET https://nominatim.openstreetmap.org/search?q=<landmark>&format=json&limit=1&countrycodes=inIndia-native. 500k free calls/month all APIs, caching allowed, data stays in India. maps.olakrutrim.com 200. Verify exact paths in their API ref before wiring:
GET https://api.olamaps.io/places/v1/geocode?address=<q>&api_key=<KEY> GET https://api.olamaps.io/routing/v1/distanceMatrix?...&api_key=<KEY> (revives dead get_travel_times)Keep Open-Meteo for ranking ALL localities. Add CPCB official number when a station is near.
GET https://air-quality-api.open-meteo.com/v1/air-quality?latitude=&longitude=¤t=us_aqi,pm2_5,pm10 GET https://api.data.gov.in/resource/<resource_id>?api-key=<KEY>&format=json (CPCB; 429=live)Google has no rental data. A maintained Apify actor is cheaper-to-own than fighting housing.com anti-bot yourself. Parameterize one, cache by locality. Test one run first — output field names differ per actor.
POST https://api.apify.com/v2/acts/<user~actor>/runs?token=<APIFY_TOKEN> GET https://api.apify.com/v2/datasets/<id>/items (easyapi/housing-com-scraper · pick 1)/ 07 — Verdict
What to swap, and why. Everything routes through the existing APICache — OSM, Ola and Open-Meteo all permit caching; Google and Mapbox restrict it.
| Pipeline need | Current | Recommended | Why |
|---|---|---|---|
| Geocode landmark | Mapbox v5 | Ola Maps Geocoding | 500k/mo free, India data, cacheable |
| Nearby metro / POI | Foursquare (discarded) | OSM Overpass | free, cacheable, real coords |
| Schools + board | Foursquare text (fabricated) | UDISE+ + CISCE/CBSE hybrid | only honest path to board claims |
| AQI ranking | Open-Meteo (keep) | + CPCB cross-check | free ranking + official accuracy |
| Rentals | broken Apify (1 URL) | Apify housing/99acres actor | cheap pay-per-event, cacheable |
| Commute (future) | dead get_travel_times | Ola Matrix / OSRM | free tier / self-host |
One India-native vendor — Ola Maps — could consolidate geocoding, POI and matrix under a 500k/month free tier. Simpler and cheaper than the Mapbox + Foursquare split it replaces.
/ 08 — Read next