Analysis Engine

The Analysis Engine was a Priceonomics experiment with making its crawling infrastructure available as a service. The goal was to provide a RESTful api that could be provided as a subscription service.

Analysis Engine

The Analysis Engine was a Priceonomics experiment with making its crawling infrastructure available as a service. The goal was to provide a RESTful api that could be provided as a subscription service.

Design Criteria

Priceonomics had a considerable amount of web crawling activity they were being paid to perform and design criteria #1 was to ensure that it would function for all existing crawl tasks.

Beyond that, it just needed the authentication mechanism that made it a subscriptable service.

Implementation

We (Fred was a part of the team at Priceonomics) outfitted the Analysis Engine with a fully automated deployment system and good test coverage.

Hilights

There is something very cool about a single POST request causing fifteen servers to spring into action to fulfill a request. This was my first foray into autoscaling server groups and it was awesome.

From an algorithmic perspective, the highlights are:

  • easy to chain tasks into sequences and distributed pipelines
  • composable plugins to keep things flexible
  • a "pipeline" plugin to enable massive parallel execution at the direction of a url