Use ‘Capture Following Text’ option to scrape data from details pages

While extracting data from details pages (page reached by navigating a link from the start page), it is recommended that the ‘Capture Following Text‘ option be used whenever possible to correctly and consistently scrape data.

This is because the layout and the amount of data displayed in details pages may not be consistent. For example, if you are trying to scrape Amazon products listing, the data displayed in the product details page (page reached by clicking the product link from the search results) may vary slightly from product to product. Here, if you are tying to extract the Shipping Weight under Product Details, instead of clicking on the data (example: ‘1.2 pounds’) click on the heading ‘Shipping Weight’ and apply the ‘Capture following text’ option under the ‘More Options’ button.

Watch the demo :-

 

So in summary, if the data to be extracted comes under a heading, always click the heading and apply the ‘Capture following Text’ option. This ensures that the data is scraped from all similar pages without missing any, even if the page contents varies slightly.

 

This entry was posted in WebHarvy, WebHarvy Feature and tagged , , , . Bookmark the permalink.

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s