Skip to content

The Source’s Source Code

Screenshot of App
So after three hours trying to figure out if I could simplify the process by just accepting links that I wanted to watch the price of (like a wish list feature), I came across several problems. I did want to keep it simple and reflect that in the UI though (minus any styling).

ONE STEP FORWARD, FOUR STEPS BACK

The initial setup is to simply input a link, figure out what style of page it is as outlined in my previous post, and then gather the relevant data. I went through the HTML of each type of page to find the unique id’s that I could just grab the content from and broke it down to this:

Product title: #tbldepth > tbody > tr > td > last span
Product image url: #product_productimage
Availability online icon url: #product_lblonline prev img
Availability in-store icon url: #product_lblinstore prev img
Original
     container, div, #product_productPricing_pnlReg
     you pay price, span, #product_productPricing_price

Refurbished 1
     container, div, #product_productPricing_pnlDiscount
     special price, span, #product_productPricing_endprice

Refurbished 2
     container, div, #product_productPricing_pnlOriginal
     original price, span, product_productPricing_lblOriginalAmt
     container, div, #product_productPricing_pnlSale
     regular price, span, #product_productPricing_regamt
     container, div, #product_productPricing_pnlDiscount
     save off original, span, #product_productPricing_saveamt
     you pay price, span, #product_productPricing_endprice

Saving
     container, div, #product_productPricing_pnlSale
     regular price, span, #product_productPricing_regamt
     you save, span, #product_productPricing_saveamt
     container, div, #product_productPricing_pnlDiscount
     you pay price, span, #product_productPricing_endprice

Special 1
     container, div, #product_productPricing_pnlDiscount
     special price, span, #product_productPricing_endprice

Special 2
     container, div, #product_productPricing_pnlOriginal
     original price, span, #product_productPricing_lblOriginalAmt
     container, div, #product_productPricing_pnlSale
     regular price, span, #product_productPricing_regamt
     save off original, span, #product_productPricing_saveamt
     container, div, #product_productPricing_pnlDiscount
     you pay price, div, #product_productPricing_endprice

Using these identifiers I was hoping it’d be simple to extract the content, store it in a JSON file, and have the page update the content nightly (automation was planned for after I got the data extracted and stored correctly, probably using Node). Unfortunately I ran into several issues:

  • Using the <input type=“url”> made sure that whatever was inputted was a url, but I wanted to check if the url was from The Source. I tried solutions using regular expressions and jQuery find() and search(), which I had no success with. So I gave up on that, for now. I’ll just accept whatever is inputted, since it’s a tool I’ll be using myself, I’ll just be sure to input the url correctly. I decided to include validation when I figured out the other functions of the application first.
  • Moving onto grabbing values from the actual url inputted, I used the load() function to load the inputted url, only to find out that I can’t do cross domain requests (explained thoroughly and simply on Stack Overflow). I thought this route would work because in Adnan’s tutorial, where he uses Node.js to build a web scraper, he uses “var $ = cheerio.load(html);” which works.

WELL THEN…

Moving forward, I’ve decided to scrap sticking with jQuery only and I’m going to pursue building this into a full fledged MEAN stack application (not sure if those words are combined correctly). I’m thinking in the long run it’ll allow me to work with a lot more data, thanks to MongoDB.

My starting point will be Adnan’s tutorial and move onto Chris’s tutorial to get a complete front-end solution as well.

Published inBlog

Comments are closed.