Scrape with Regular Expressions using WebHarvy

WebHarvy is designed as a ‘point and click’ visual Web Scraper. The design concentrates on easy of use, so that you can start scraping data within few minutes after downloading the software.

But in case you need more control over what needs to be extracted you can use Regular Expressions (RegEx) with WebHarvy.  WebHarvy allows you to extract data by matching RegEx strings on text content as well as on HTML source of the web page.

If you are new to Regular Expressions, see http://en.wikipedia.org/wiki/Regular_expression.

The following video shows how WebHarvy can be used to scrape the image URL from a web page by applying Regular Expression.

The ‘Capture More Content’ feature comes in handy here (as shown in the video) to make sure that the selected text contains the data (text or HTML code) of interest, before RegEx string is applied.

Regular Expressions can also be applied directly on the text content of the page as shown in the following video.

To explore further download the latest version of WebHarvy from https://www.webharvy.com/download.html.

This entry was posted in WebHarvy, WebHarvy Feature and tagged , , , , , , . Bookmark the permalink.

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s