WebHarvy supports command line arguments so that you can run the software directly from the command line. This allows you to run WebHarvy from script or batch files, or to invoke it via code from your own applications.
To know more, read : Running WebHarvy Web Scraper from Command Line
WebHarvy comes with an in-built scheduler using which you may schedule your scraping tasks. The scheduler window can be opened from the Mine menu.
The scheduler enables you to run scraping tasks periodically – daily, weekly or monthly.
Know More about WebHarvy Scheduler
Download and Try the free 15 days evaluation version of WebHarvy Web Data Extraction Software.
In the latest update of WebHarvy, the Visual Web Scraping Software, the newly introduced ‘capture following text’ option allows you to capture text/block/paragraph following a heading within a webpage.
Often with many websites the data to be scraped may not be located at the same position within all pages, but is guaranteed to be found under a given heading (Example : “Technical Details”, “Product Specification” etc). Sometimes, the text under a given heading may not be selected as a single item during configuring. In such scenarios the ‘Capture following text’ option in the capture window will provide helpful.
How to ?
While in configuration mode, click on the heading and select the ‘Capture following text’ option in the capture window. Provide a suitable name for the field and hit OK. In the preview pane you will be able to see the text following the heading captured.
Refer http://www.webharvy.com/tour1.html#ScrapeFollowingText for more details.
The latest version (V184.108.40.206) of WebHarvy Visual Web Scraper is available for download. The changes in this update are :
- New option: ‘Capture following text’ added in capture form.
- Web Miner has been improved to handle even HTML errors of target websites.
- Allows exporting scraped data while mining is paused.
- For CSV, TSV exports, column names are added as the first row.
- Option to input keywords in CSV format.
- Option to manually set page load timeout value in application settings.
The ‘Capture following text’ feature helps to scrape text following a given heading within the page. This feature is useful when data to be scraped does not occur at a fixed position within the page, but is guaranteed to follow a heading text (Example ‘Product Details:‘ or ‘Specification‘).
The option to manually set the page load timeout value from settings window helps to scrape data from websites with slow response times or from those which employ AJAX.
We recommend that you download and try the 15 days free evaluation version.
WebHarvy Web Scraper allows you to scrape data from remote websites anonymously with the help of proxy servers. This prevents remote web servers from blocking / black listing your computer’s IP address.
WebHarvy provides you the option to specify either a single proxy server address or a list of proxy servers addresses through which the remote website will be scraped. In case you are providing a list of proxy server addresses, WebHarvy will cycle through the list in a periodic manner.
Please follow this link to know more about this feature.
Download WebHarvy Web Scraper FREE Trial !
In most cases the data to be scraped is the result of performing a search operation from the main page of the website. Often it is required that you need to extract data from the search results for a list of input keywords.
The ‘Keyword Scraping’ feature of WebHarvy allows you to perform this task with ease. You can specify a list of input keywords and WebHarvy will automatically scrape data from the search results corresponding to each keyword in the specified list.
Please follow this link to know more about ‘Keyword based Scraping’.
Video Demonstration : Keyword based Scraping
We recommend that you download and try the evaluation version of our Web Scraper to know more about the features.
The ‘category scraping’ feature of WebHarvy allows you to easily scrape a list of links which leads to similarly formatted pages within a website with a single configuration. This helps to scrape data from sections and subsections listed under the main page of a website.
Please follow this link to know more about Category Scraping.
Category Scraping : Video demonstration
You may download and try the free evaluation version of WebHarvy, the visual Web Scraper software, from http://www.webharvy.com/download.html.
The latest update of WebHarvy (version 220.127.116.11) has gone live and is available for download at www.webharvy.com/download.html.
- [New Feature] Keyword based Scraping : Allows you to run the same configuration for a set of input keywords (Read more : http://www.webharvy.com/tour71.html)
- Edit Configuration : Allows you to edit an already saved WebHarvy configuration XML file (Read more : http://www.webharvy.com/tour41.html)
- Option to contact us (WebHarvy Support) directly from the application (See Help menu)
- Option to check for new updates directly from the application (See Help menu)
- Miner performance improvement : Web mining performance while following links from the main page has been improved
- Minor improvements and bug fixes
- Miner window remembers its last position/size/state
- Issue with Auto Scroll fixed
- Issue with loading ‘Next Page’ and ‘Following Links’ in certain scenarios while mining has been fixed
- Issue which resulted in application crash while parsing HTML of certain websites has been fixed
WebHarvy allows you to scrape websites anonymously via proxy servers. You can either configure WebHarvy to scrape through a single proxy server or to use a list of proxy server addresses which are cycled automatically after a specified time interval.
You may download the 15 days evaluation copy of WebHarvy Web Scraper from http://www.webharvy.com/download.html .