Skip to content

Side, Side Projects – Web Scraping For Deals

Screenshot of App
In the midst of pursuing knowledge and learning WordPress development, I caught an article on Web scraping. I had initially learned about it from Adnan Kukic’s article on Scraping the Web With Node.js It got me interested in the idea and the possibilities – I for one would love to use it as a deals watcher on multiple sites (I’m always looking out for the opportunity to grab a new case for my iPhone and review it on Excessorize Me.).


So Adnan started my interests off, I tried the article out myself, but I think it’s a bit more advanced at this moment for me to begin learning about Web scraping. I couldn’t grasp 100% of everything he was describing and because of that, I left the idea in the back burner.

It wasn’t until I read an article today from Artiom Dashinsky on, who did an analysis on Designer News to better understand the community (what a UX thing to do). He uses to build crawlers to comb through the Designer News site, which basically “turn any website into a table of data or a structured API.” That actually sounds very neat and surprisingly useful. has their own YouTube channel that offers tutorials on how to use their platform, so lets give it a shot.


I’m not going to go into detail about how works, just because they’ve got some great videos and the application guides you thoroughly through the process. Using, I gathered the following elements from The Source’s product pages that I thought would be useful:

The Source Product Page

  • A – Web ID
  • B – Product name
  • C – Original price
  • D – Savings
  • E – Final price
  • F – Icon image for if it’s available online
  • G – Icon image for it it’s available in-store
  • H – Image link

They have different layouts for different types of sales, I found the following 5, the differences being in how the prices are listed:


The Source Product Page - Original

Sale Item

The Source Product Page - Sale


The Source Product Page - Refurbished

Special Sale Items

The Source Product Page - Special Sale The Source Product Page - Special Sale

So I trained on the different layouts (except the last one, the Canon EOS Revel T5 example, because there was no way to differentiate between a refurbished product and a sale product listed in that format).

It takes a bit to gather the data, but it gathered about 2700 at the time, nothing near the full list of products. I exported it as a JSON file, but I wasn’t sure how the JSON was formatted. When I tried opening the file, due to it being such a large file, it wouldn’t even open for me, but I needed to know how it was organized to call the right data from it. I initially used the Chrome DevTools and console logging the outputs to try and understand it, but due to my lack of knowledge I wasn’t understanding the output it was giving me at the time.

So after a few frustrating hours failing to call the right values from each listing, I decided to recrawl the site with a much smaller set of data. I ended up with about 20 listings, which allowed me to open up the file and view the structure, which then allowed me to finally output the correct values. I used getJSON to collect and then generated my HTML output:

$.getJSON("the_source_deals_20140601.json", function(data){
  $.each(, function(key, val){
    total += 1;

    var percentage_off = (val.price_savings/val.price_orig)*100;
    var percentage_class;

    if(percentage_off < 26){ percentage_class = "percentage_25"; }
    else if(percentage_off < 51){ percentage_class = "percentage_50"; }
    else if(percentage_off < 76){ percentage_class = "percentage_75"; }
    else { percentage_class = "percentage_100"; }

    if(!isNaN(percentage_off) && val.price_orig > 0){
      the_source_listings.push("<tr class='the_source_sale_item " + percentage_class + "'><td><img src='" + val.product_img + "' alt='" + val.title + "' height='50px' width='50px'></td><td><a href='" + val._pageUrl + "' target='_blank'>" + val.title + "</a></td><td>$" + val.price_orig + "</td><td>$" + val.price_savings + "</td><td>" + Number(percentage_off).toFixed(2) + "%</td><td>$" + val.price_final + "</td></tr>");
    else if(val.savings == undefined && val.price_orig == 0) {
      the_source_listings.push("<tr class='the_source_refurbished_item '><td><img src='" + val.product_img + "' alt='" + val.title + "' height='50px' width='50px'></td><td><a href='" + val._pageUrl + "' target='_blank'>" + val.title + "</a></td><td>Refurbished</td><td>Refurbished</td><td>Refurbished</td><td>$" + val.price_final + "</td></tr>");



I took the opportunity to learn more about Bootstrap, so I implemented that as my template. Using the fluid grid system, I assigned 2 columns for the stacked filtering buttons and 10 for the table of data. Simple and to the point, for now:

<div class="container-fluid">
  <div class="row">
    <nav id="filter" class="col-md-2">
      <ul class="nav nav-pills nav-stacked">
        <li id="nav_all"><a href="#">ALL <span class="total"></span> DEALZ</a></li>
        <li id="nav_25" class="active"><a href="#">0% - 25%</a></li>
        <li id="nav_50" class="active"><a href="#">26% - 50%</a></li>
        <li id="nav_75" class="active"><a href="#">51% - 75%</a></li>
        <li id="nav_100" class="active"><a href="#">75% - 100%</a></li>
        <li id="nav_refurbished" class="active"><a href="#">REFURBZ</a></li>
    <div class="col-md-10">
      <table class="table">
          <td id="col_image">Image</td>
          <td id="col_product">Product</td>
          <td id="col_price_orig">Original Price</td>
          <td id="col_price_savings">Savings</td>
          <td id="col_percentage_off">Percentage</td>
          <td id="col_price_final">Final Price</td>
        <tbody id="the_source_table">


I didn’t try and stylize the page much, most of the style is coming from the Bootstrap theme, otherwise, it wasn’t a priority at such an early stage in the development.


I wanted to be able to filter by discount and I colour coded them to give precedence to the higher discounts. Using jQuery, I did some simple toggling for the buttons to show when they were active. I also used jQuery to show and hide rows that are being toggled on and off by the filter buttons, but what I started to see with a larger dataset was a significant lag and performance decrease. I’ll have to figure out another way to do the filtering. An example of how each button works:

  } else{


In future iterations, I’m hoping for a few things:

  • A nav bar on top of the table for different retailers, like a spreadsheet (could also be implemented in the side bar
  • A search function for particular products
  • UI improvements
  • Automate the data generation (already in the works as I’ve discussed with a friend on using Node.js as a back-end solution)
  • Update filtering process (also discussed to possibly use Angular.js)

My current file directory for the project is very simple:

You can check out the current version here:

Published inBlog

Comments are closed.