WebHarvy : 2 new methods of handling pagination

The latest version of WebHarvy Web Scraper supports 2 new types of pagination styles for scraping data from multiple pages of websites.

Pages where pagination links are shown in sets

In these types of pages the pagination links are provided in sets. For example the first 5 pages will have direct links to load each of them at the bottom of the page. To load pages 6 to 10, an additional link should be clicked. Now each of the pages 6 to 10 will have direct links to load any of them at their page end, and also a link to load the next set of 5 pages. 

WebHarvy Online Help : Scraping pages where pagination links are displayed in sets

The following video demonstrates how these types of pages can be configured and mined using WebHarvy.

When each page URL contains the page number

Suppose the pages from which you need to scrape multiple listings of data have the following format.

http://www.example.com/search/listing?keywords&pageNumber=1
http://www.example.com/search/listing?keywords&pageNumber=2
http://www.example.com/search/listing?keywords&pageNumber=3
http://www.example.com/search/listing?keywords&pageNumber=4
etc..

Pagination in this case can be handled easily by following the method below :-

1. Open WebHarvy and load http://www.example.com/search/listing?keywords&pageNumber=1.
2. Start Config
3. Select required data from the page, Follow links and select data if required.
4. Select Edit menu > Edit Options > Add/Remove URLs from Configuration
5. Paste the following URL and Apply.

http://www.example.com/search/listing?keywords&pageNumber=%%pagenumber%%

Note that the actual page number is replaced by %%pagenumber%% in the above string.

6. Stop Config
7. Start Mine. You should specify the number of pages to mine since ‘Mine all pages’ option will be disabled. WebHarvy will automatically find and load the next pages and extract data.

WebHarvy Online Help : URL page-number based auto pagination

The latest version of WebHarvy Visual Web Scraper can be downloaded from https://www.webharvy.com/download.html. Try and in case you need any assistance please do not hesitate to contact our support team.

This entry was posted in WebHarvy Feature and tagged , , , , , . Bookmark the permalink.

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s