Scrape HTML

WebHarvy allows you  to scrape HTML of page contents in addition to plain text. In the Capture window, click ‘More Options’ button and select the ‘Capture HTML’ option to scrape the HTML of the selected content.

To capture only a portion of the displayed HTML, you may select and highlight the required portion before clicking the Capture button.

Usually Regular Expressions are applied over the HTML source of the content to extract the data of interest like image URL or hidden fields like phone number.

The following video shows how the ‘Capture HTML’ option is used along with Regular Expressions to correctly extract the product price.

Try out the free evaluation copy of WebHarvy from https://www.webharvy.com/download.html.

This entry was posted in WebHarvy, WebHarvy Feature and tagged , , . Bookmark the permalink.

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s