Consider this case of an online e-commerce website. This business is a small startup (15 people). Their starting budget is around 14,000$. However, they managed to double their revenue and compete with other retail giants.
Since we cannot disclose the name, let’s call this company “Shopmania”. They compare prices and help their clients find the cheapest option. But how do they find it themselves? Through scraping, of course. Let’s see exactly how they find trending items and anything else that’s reduced on Amazon!
Attempts with Parsehub and an Open Source Project
“Shopmania’s” employee tried using a well known scraping method first – Parsehub. However, it was a minefield. Parsehub was expensive, it did not understand complex logic, and required a lot of editing. There had to be a better way.
What about reworking an open source project? It made sense – it was fast and free. The downside was that it required coding and scraping experience, and that not all open source solutions were properly maintained.
Let’s see how he did it.
The go-to method: Python and proxies
Step one. The specialist needed the trifecta: Python, Selenium, and proxies. Selenium is an automation tool that supports Python and is widely used by programmers. When it comes to proxies, it’s up to you – Amazon is not too careful, so you can try datacenter proxies. On the other hand, residential proxies are more reliable, and that’s why they were “Shopmania’s” preferred choice. Here’s the magic formula: Regular residential proxy plan + Python + Selenium.
Step two. Here is the script that “Shopmania” used to scrape Amazon (in its final form):
import refrom selenium import webdriverfrom selenium.webdriver.chrome.options import Optionsfrom selenium_python import smartproxyimport jsondef scraper():chrome_options = Options()chrome_options.add_argument('--disable-gpu')chrome_options.add_argument('--no-sandbox')chrome_options.add_argument('--disable-dev-shm-usage')desired_capabilities = smartproxy()driver = webdriver.Chrome('selenium-scraper/chromedriver', options=chrome_options, desired_capabilities=desired_capabilities)items_xpath = "//div[starts-with(@id, '100_dealView')]"driver.get('https://www.amazon.com/international-sales-offers/b/?ie=UTF8&node=15529609011&ref_=nav_navm_intl_deal_btn&nocache=1569852387822')elems = driver.find_elements_by_xpath(items_xpath)items_list = []single_digit_deal = re.compile(r'\d% off')double_digit_deal = re.compile(r'\d% off')for i, item in enumerate(elems):off_match = single_digit_deal.findall(item.text)if not off_match:off_match = double_digit_deal.findall(item.text)if off_match is not None:strip_data(i, item.text)else:strip_data(i, item.text)def strip_data(item_index, data):items = {}price_regex = re.compile(r'\$\d{1,10}.\d{1,2}')price = price_regex.findall(data)# assuming those with price are legitif price:# account for price rangeif len(price) == 1:# Ignore random one price itemsnew_price = Noneelif len(price) <= 2:new_price = price[0]list_price = price[1]elif len(price) > 2:# price range gotchanew_price = f"{price[0]} - {price[1]}"list_price = f"{price[2]} - {price[3]}"# check for new_price only then proceedif new_price:if "Ends in" not in data:title_slice = data.split('\n')[2]# Check for misplaced items check for basic length > 10if len(title_slice) >= 10:product_title = title_sliceelse:returnitems[item_index] = {}items[item_index]['product_title'] = product_titleitems[item_index]['new_price'] = new_priceitems[item_index]['list_price'] = list_pricewith open('products.json', 'w') as f:f.write(json.dumps(items, sort_keys=True, indent=4))if __name__ == '__main__':scraper()
Of course this script is not the final version of what you should be using to make a fortune, however, it’s a step in the right direction. Simply edit it, adjust it to your needs, and send us a thank you note.
Market data is valuable: 112.13% revenue boost
This code is one of the last stages of the quest. Expect a lot of intel – and a lot of material for your hustle:
As you can imagine, “Shopmania” really took off – and all the business needed was a custom script and some proxies. Now they work as affiliates with the retail giants that they wanted to compete with originally.
The company ended up boosting their revenue by 112.13%, cutting down on employment resources, and providing more value to new users.
Impressive, isn’t it?
Are you ready for your takeoff?