#MercadoLibreDataScraper | Explore Tumblr posts and blogs

webscreenscraping · 4 years ago

Text

How To Scrape MercadoLibre With Python And Beautiful Soup?

The blog aims is to be up-to-date and you will get every particular result in real-time.

First, you need to install Python 3. If not, you can just get Python 3 and get it installed before you proceed. Then you need to install beautiful soup with pip3 install beautifulsoup4.

We will require the library’s requests, soupsieve, and lxml to collect data, break it down to XML, and use CSS selectors. Install them using.

pip3 install requests soupsieve lxml

Once installed, open an editor and type in.

# -*- coding: utf-8 -*- from bs4 import BeautifulSoup import requests

Now let’s go to the MercadoLibre search page and inspect the data we can get

This is how it looks.

Back to our code now. Let’s try and get this data by pretending we are a browser like this.

# -*- coding: utf-8 -*- from bs4 import BeautifulSoup import requestsheaders = {'User-Agent':'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_2) AppleWebKit/601.3.9 (KHTML, like Gecko) Version/9.0.2 Safari/601.3.9'} url='https://listado.mercadolibre.com.mx/phone#D[A:phone]' response=requests.get(url,headers=headers) print(response)

Save this as scrapeMercado.py.

If you run it

python3 scrapeMercado.py

You will see the whole HTML page.

Now, let’s use CSS selectors to get to the data we want. To do that, let’s go back to Chrome and open the inspect tool. We now need to get to all the articles. We notice that class ‘.results-item.’ holds all the individual product details together.

If you notice that the article title is contained in an element inside the results-item class, we can get to it like this.

# -*- coding: utf-8 -*- from bs4 import BeautifulSoup import requests headers = {'User-Agent':'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_2) AppleWebKit/601.3.11 (KHTML, like Gecko) Version/9.0.2 Safari/601.3.9', 'Accept-Encoding': 'identity' } #'Accept-Encoding': 'identity'url = 'https://listado.mercadolibre.com.mx/phone#D[A:phone]' response=requests.get(url,headers=headers) #print(response.content) soup=BeautifulSoup(response.content,'lxml') for item in soup.select('.results-item'): try: print('---------------------------') print(item.select('h2')[0].get_text()) except Exception as e: #raise e print('')

This selects all the pb-layout-item article blocks and runs through them, looking for the element and printing its text.

So when you run it, you get the product title

Now with the same process, we get the class names of all the other data like product image, the link, and price.

# -*- coding: utf-8 -*- from bs4 import BeautifulSoup import requests headers = {'User-Agent':'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_2) AppleWebKit/601.3.11 (KHTML, like Gecko) Version/9.0.2 Safari/601.3.9', 'Accept-Encoding': 'identity' } #'Accept-Encoding': 'identity' url = 'https://listado.mercadolibre.com.mx/phone#D[A:phone]' response=requests.get(url,headers=headers) #print(response.content) soup=BeautifulSoup(response.content,'lxml') for item in soup.select('.results-item'): try: print('---------------------------') print(item.select('h2')[0].get_text()) print(item.select('h2 a')[0]['href']) print(item.select('.price__container .item__price')[0].get_text()) print(item.select('.image-content a img')[0]['data-src']) except Exception as e: #raise e print('')

What we run, should print everything we need from each product like this.

If you need to utilize this in production and want to scale to thousands of links, then you will get that you will get IP blocked rapidly by MercadoLibre. In this scenario, using a rotating proxy service to rotate IPs is a must. You can use a service like Proxies API to route your calls through a pool of millions of residential proxies.

If you need to scale the crawling speed and don’t want to set up your infrastructure, you can utilize our Cloud-based crawler by Web Screen Scraping to easily crawl thousands of URLs at high speed from our network of crawlers.

If you are looking for the best MercadoLibre with Python and Beautiful Soup, then you can contact Web Screen Scraping for all your requirements.

#mercadolibredatascraping #mercadobibredataextraction

0 notes

retailgators · 4 years ago

Text

How to Extract Mercadolibre with Python and BeautifulSoup

Introduction

Let’s observe how to extract MercadoLibre product data with BeautifulSoup & Python in an easier and refined manner.

So, primarily, you require to make certain that you have Python 3 already installed and if you are not having that, just install the Python 3 before doing any proceeding.

pip3 install beautifulsoup4

In addition, we also need library’s requests, lxml, as well as soupsieve, to extract data, break that to XML, as well as make use of CSS selectors. After that, install them with

pip3 install requests soupsieve lxml

After the installation, open the editor as well as type

# -*- coding: utf-8 -*- from bs4 import BeautifulSoup import requests

Then, visit the search page of MercadoLibre and study the data, which we can have.

It will look like this:

Now, let’s come back to code we have created and get data by assuming that we are using a browser similar to that.

# -*- coding: utf-8 -*- from bs4 import BeautifulSoup import requests headers = {'User-Agent':'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_2) AppleWebKit/601.3.9 (KHTML, like Gecko) Version/9.0.2 Safari/601.3.9'} url='https://listado.mercadolibre.com.mx/phone#D[A:phone]' response=requests.get(url,headers=headers)

Then, save it with the file name of scrapeMercado.py.

In case you run it.

python3 scrapeMercado.py

You can observe the full HTML page.

At the moment, let’s use the CSS selectors and find the necessary data… To perform that, we need to utilize Chrome as well as open an examined tool. We inform that with the class ‘.results-item.’, we have all the separate product data together.

If, you notice that this blog’s title is restricted in the elements within results or item classes, we can have it like that.

It selects all pb-layout-item blocks and also runs that, looking for different elements as and printing a text.

So, every time you run that, you will have

Bingo!!! We have the product titles.

Currently, with the similar process, we have class names regarding all other information like product’s images, links, and prices.

# -*- coding: utf-8 -*- from bs4 import BeautifulSoup import requests headers = {'User-Agent':'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_2) AppleWebKit/601.3.11 (KHTML, like Gecko) Version/9.0.2 Safari/601.3.9', 'Accept-Encoding': 'identity' } #'Accept-Encoding': 'identity' url = 'https://listado.mercadolibre.com.mx/phone#D[A:phone]' response=requests.get(url,headers=headers) #print(response.content) soup=BeautifulSoup(response.content,'lxml') for item in soup.select('.results-item'): try: print('---------------------------') print(item.select('h2')[0].get_text()) print(item.select('h2 a')[0]['href']) print(item.select('.price__container .item__price')[0].get_text()) print(item.select('.image-content a img')[0]['data-src']) except Exception as e: #raise e print('')

So, whenever we run, it needs to print things that we require from all products like that.

In case, you wish to utilize it in production or wish to scale up at thousands of different links, you would discover that you will quickly find the IP getting clogged by MercadoLibre. With the scenario, by rotating the proxy services, rotation of IPs is a necessity. You can use different services including Proxies API to route your calls using a group of millions of local proxies.

If you want to increase the crawling speed or don’t want to set an infrastructure, then you should use Cloud base crawlers to extract MercadoLibre product data with high speed from a group of different crawlers.

Still not sure about your requirements getting fulfilled? Then, contact RetailGators and we will solve all your problems.

Source: https://www.retailgators.com/how-to-extract-mercadolibre-with-python-and-beautifulsoup.php

#ExtractMercadolibreProductData #ScrapeMercadolibreProductData #MercadolibreProductDataScraping #MercadolibreDataScraping #ScrapeMercadolibreData

0 notes