Coupon Site Analyzer

This automated crawler queried major search engines for a list of phrases and determined how popular specific coupon sites were based on the results of the queries.

Coupon Site Analyzer

This automated crawler queried major search engines for a list of phrases and determined how popular specific coupon sites were based on the results of the queries.

Design Criteria

The task was simple enough, but the search engines would fail quite often, so a fair bit of retry logic was required to make the crawler work as requested.

The list was maintained as a spreadsheet, so crawler behavior was driven by the contents of the spreadsheet.

The output of this crawler was a spreadsheet with a phrase in the first column and some analytics in the columns that followed.

Implementation

This was a self-contained crawler written as a Python system that used Selenium or phantomjs with a small army of proxy servers to query the search engines.

The spreadsheet reading and writing code was implemented using Python's xlwt package.

Hilights

The most interesting things about this project were:

  1. This crawler produced some pretty neat insights into how a publicly traded company was affected by change in a google algorithm.