web scraping best buy
Scrape a website using Guzzle. GitHub Gist: instantly share code, notes, and snippets. Web Scraping with requests and BeautifulSoup. We will use requests and BeautifulSoup to access and scrape the content of IMDB's homepage. What is BeautifulSoup? It is a Python library for pulling data out of HTML and XML files. It provides methods to navigate the document's tree structure that we discussed before and scrape its content. Web scraping refers to the action of extracting data from a web page using a computer program, in this case our computer program will be R. Other popular command line interfaces that can perform similar actions are wget and curl.
Automated Web Scraping Tool
require 'nokogiri' |
require 'open-uri' |
require 'httparty' |
REQUEST_URL = 'https://www.googleapis.com/urlshortener/v1/url?key=AIzaSyAUZup_oRJzsR7Ze2zcDJ6Sq-6wRX2wRoE' |
url = [ |
'http://www.bestbuy.com/site/microsoft-xbox-one-wireless-controller-black/7948025.p?id=1219687244063&skuId=7948025', |
'http://www.bestbuy.com/site/apple-iphone-6s-64gb-space-gray-verizon-wireless/4447801.p?id=bb4447801&skuId=4447801', |
'http://www.bestbuy.com/site/samsung-galaxy-s7-32gb-black-onyx-at-t/4897502.p?id=bb4897502&skuId=4897502', |
'http://www.bestbuy.com/site/nikon-d3300-dslr-camera-with-18-55mm-and-55-200mm-vr-ii-lenses-black/4437132.p?id=1219627834758&skuId=4437132', |
'http://www.bestbuy.com/site/insignia-40-class-40-diag--led-1080p-smart-hdtv-roku-tv-black/4204502.p?id=1219711477972&skuId=4204502', |
'http://www.bestbuy.com/site/lenovo-yoga-3-pro-2-in-1-13-3-touch-screen-laptop-intel-core-m-8gb-memory-512gb-solid-state-drive-platinum-silver/9644004.p?id=1219705744555&skuId=9644004', |
'http://www.bestbuy.com/site/canon-pixma-mx922-network-ready-wireless-all-in-one-printer-black/7919046.p?id=1218862932553&skuId=7919046', |
'http://www.bestbuy.com/site/garmin-nuvi-55lm-5-gps-with-lifetime-map-updates-black/3979874.p?id=1219094936231&skuId=3979874', |
'http://www.bestbuy.com/site/apple-ipad-pro-with-wi-fi-128gb-gold/4262700.p?id=1219747522322&skuId=4262700', |
'http://www.bestbuy.com/site/google-chromecast-2015-model-black/4397400.p?id=1219757973565&skuId=4397400', |
'http://www.bestbuy.com/site/beats-by-dr-dre-solo2-wireless-headphones-active-collection-red/4580000.p?id=1219775846388&skuId=4580000', |
'http://www.bestbuy.com/site/protocol-videodrone-4-channel-remote-controlled-video-quad-copter-chrome-black/7981011.p?id=1219691059881&skuId=7981011', |
'http://www.bestbuy.com/site/braven-850-wireless-bluetooth-speaker-silver/8229894.p?id=1219320179706&skuId=8229894', |
'http://www.bestbuy.com/site/samsung-galaxy-tab-4-7-8gb-black/5420045.p?id=1219127073673&skuId=5420045', |
'http://www.bestbuy.com/site/fitbit-surge-fitness-watch-large-black/8681597.p?id=1219357518160&skuId=8681597'] |
File.write('output.txt','Web Scraping nn') |
open('output.txt', 'a') do |f| |
url.each do |u| |
html = Nokogiri::HTML(open(u)) |
title = html.css('meta[property='og:title']')[0].to_a.last.last |
price = html.css('div.item-price')[0].text |
short_url = HTTParty.post(REQUEST_URL, :body => {longUrl:u}.to_json, headers:{'Content-Type' => 'application/json' })['id'] |
begin |
rating= html.css('span.average-score')[0].text |
rescue |
rating= 'No ratings yet' |
end |
f.puts 'Title: #{title} n' |
f.puts 'Price: #{price}n' |
f.puts 'Rating: #{rating} out of 5 stars n' |
f.puts ':::REVIEWS:::' |
begin |
for i in 0..4 do |
author = html.css('span[itemprop='author']')[i].text |
review = html.css('span[itemprop='description']')[i].text |
f.puts 'Review No.#{i+1}' |
f.puts 'Reviewer: #{author}' |
f.puts 'Description:nt #{review}' |
end |
rescue |
f.puts 'No Reviews Yet' |
end |
f.puts 'nShort URL: #{short_url}' |
f.puts 'nn' |
f.puts ' |
f.puts 'nn' |
end |
end |
Web Scraping Software
Sign up for freeto join this conversation on GitHub. Already have an account? Sign in to comment