So after three hours trying to figure out if I could simplify the process by just accepting links that I wanted to watch the price of (like a wish list feature), I came across several problems. I did want to keep it simple and reflect that in the UI though (minus any styling).
ONE STEP FORWARD, FOUR STEPS BACK
The initial setup is to simply input a link, figure out what style of page it is as outlined in my previous post, and then gather the relevant data. I went through the HTML of each type of page to find the unique id’s that I could just grab the content from and broke it down to this:
Product title: #tbldepth > tbody > tr > td > last span Product image url: #product_productimage Availability online icon url: #product_lblonline prev img Availability in-store icon url: #product_lblinstore prev img Original container, div, #product_productPricing_pnlReg you pay price, span, #product_productPricing_price Refurbished 1 container, div, #product_productPricing_pnlDiscount special price, span, #product_productPricing_endprice Refurbished 2 container, div, #product_productPricing_pnlOriginal original price, span, product_productPricing_lblOriginalAmt container, div, #product_productPricing_pnlSale regular price, span, #product_productPricing_regamt container, div, #product_productPricing_pnlDiscount save off original, span, #product_productPricing_saveamt you pay price, span, #product_productPricing_endprice Saving container, div, #product_productPricing_pnlSale regular price, span, #product_productPricing_regamt you save, span, #product_productPricing_saveamt container, div, #product_productPricing_pnlDiscount you pay price, span, #product_productPricing_endprice Special 1 container, div, #product_productPricing_pnlDiscount special price, span, #product_productPricing_endprice Special 2 container, div, #product_productPricing_pnlOriginal original price, span, #product_productPricing_lblOriginalAmt container, div, #product_productPricing_pnlSale regular price, span, #product_productPricing_regamt save off original, span, #product_productPricing_saveamt container, div, #product_productPricing_pnlDiscount you pay price, div, #product_productPricing_endprice
Using these identifiers I was hoping it’d be simple to extract the content, store it in a JSON file, and have the page update the content nightly (automation was planned for after I got the data extracted and stored correctly, probably using Node). Unfortunately I ran into several issues:
- Using the <input type=“url”> made sure that whatever was inputted was a url, but I wanted to check if the url was from The Source. I tried solutions using regular expressions and jQuery find() and search(), which I had no success with. So I gave up on that, for now. I’ll just accept whatever is inputted, since it’s a tool I’ll be using myself, I’ll just be sure to input the url correctly. I decided to include validation when I figured out the other functions of the application first.
- Moving onto grabbing values from the actual url inputted, I used the load() function to load the inputted url, only to find out that I can’t do cross domain requests (explained thoroughly and simply on Stack Overflow). I thought this route would work because in Adnan’s tutorial, where he uses Node.js to build a web scraper, he uses “var $ = cheerio.load(html);” which works.
Moving forward, I’ve decided to scrap sticking with jQuery only and I’m going to pursue building this into a full fledged MEAN stack application (not sure if those words are combined correctly). I’m thinking in the long run it’ll allow me to work with a lot more data, thanks to MongoDB.