PerimeterComps

PerimeterComps is a system that assesses property values using a sophisticated model that uses long-running trends in the real estate market at the neighborhood level and shorter-term trends in the larger real estate market to provide likely price ranges for any individual property in a covered area.

PerimeterComps

PerimeterComps is a system that assesses property values using a sophisticated model that uses long-running trends in the real estate market at the neighborhood level and shorter-term trends in the larger real estate market to provide likely price ranges for any individual property in a covered area.

Design Criteria

The big problem was the multi-stage processing pipeline and the individual property target. The analysis was too simply too complex to provide the required data with any semblance of an on-demand service operating on raw data.

Implementation

The data pipeline I came up with was based on a framework of a key-value store, data products, and primers. No data processing occurred without the data product that required it having been requested. If a data product was needed by the order-fulfillment code, it requested that data product and, if it was already defined, it was simply returned with no further work done. Data products were built from other data products which were built from operations on raw data and perhaps yet more processed data products.

In this way, I was able to segment the processing work into pre-process work to be completed for the entire dataset and per-target-property work to be completed on-demand.

Hilights

The most interesting things about this project were:

  1. This was my first automated deployment system and my first serious TDD effort. It was simply too big to manage other way.
  2. The initial end-to-end run from raw data to finished product ran for 23 hours before producing the first report. This was ultimately reduced to just 2 minutes.