Scraping data from HTML by applying Regular Expressions

WebHarvy can scrape data from HTML source code of selected area (or whole of) of web pages by applying Regular Expressions.

During configuration, after clicking on an item, the ‘Capture HTML’ option under ‘More Options’ of Capture window allows the HTML of the item to be captured and displayed in the preview area. After this, Regular Expressions can be applied (More Options > Apply Regular Expression) to select data from a portion of the HTML code displayed.

The following video shows how this feature can be applied to scrape URLs from HTML.

Download & try the 15 days evaluation version

This entry was posted in HowTo, WebHarvy and tagged , , , . Bookmark the permalink.

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s