WebHarvy Version 3.0 Released !

We are happy to announce the release of WebHarvy 3.0. We have added a lot of new features in this major update. The feature/changes list for this update is the longest among all product updates which we have done till date. Here we go. .

  • Added the following options in the Capture Window (grouped under ‘More Options’)
    • Capture following text: Improved by using brute force search for all elements in the page
    • Capture HTML: Option to scrape HTML of selected element
    • Capture Text as File: Option to scrape text and save it as a local file (useful while scraping articles and blog posts)
    • Click: Ability to scrape hidden (partially displayed) fields in webpages which require a click from the user to be displayed in full. For example phone numbers or email addresses which are displayed completely only if you click them.
    • Apply Regular Expression: Option to apply Regular Expressions (RegEx) on captured text. RegEx can be applied even after applying ‘Capture following text’, ‘Capture HTML’ & ‘Capture More Content’ options.
    • Capture More Content: Option to capture more text than the selected text, captures parent element’s text. For example this would capture the entire article if you apply this option after having selected the first paragraph.
  • Option to individually select categories/links (one by one) for Category Scraping (Mine menu – Scrape a list of similar links)
  • Export captured data as JSON
  • Ability to mine data from tables (row-column / grid layout)
  • Ability to mine pages which has fewer (less than 10) data items
  • Option to test proxies before using them (Edit menu – Settings – Proxy Settings)
  • Non responsive proxies are skipped during mining. Mining would not stop because of a bad/non-responsive proxy in the list.
  • Option to manually add URLs to an existing configuration (Edit menu – Add URLs to configuration)
  • Option to remove duplicates while mining (Edit menu – Settings – Miner)
  • Added ‘Hourly’ frequency option in Scheduler (Mine menu – Scheduler)
  • Added option to export data directly to database for scheduled mining tasks & command line
  • Added ‘Clear’ option in Edit menu which will clear both the browser and data preview pane
  • Language encoding defaulted to ‘utf-8’ for file exports (XML, CSV etc)
  • CSV/Database export : handles delimiters (comma, quotes etc) in captured data
  • Keyword/Category scraping allowed for 2 entries in evaluation version
  • Rendering issues with in-built browser fixed – defaults to IE 9 rendering
  • New Installer built with InstallShield

Download the latest installation of WebHarvy Web Scraper from https://www.webharvy.com/download.html.

This entry was posted in Release update, WebHarvy, WebHarvy Feature and tagged , , , , , . Bookmark the permalink.

2 Responses to WebHarvy Version 3.0 Released !

  1. Hi I tried your demo version as well. And I was also able to capture HTML element. The problem is when I saved them as CSV file the HTML coding which contains ” or ‘ making the CSV files error.

    How to preserve the HTML code when we save to CSV? So we can import it as raw HTML?

    I was trying to scrape product name and description of the product but I choose the HTML code for the description since when I tried to save it as text the CSV doesn’t give formatting and make the description text shown correctly (no lines, no paragraph, all shown as one sentence) so I choose scrape as HTML in hope when I import it to WordPress and choose it as post content, it will show appropriately.

    Thanks & Regards
    Stan

    • sysnucleus says:

      Please try by exporting the data as an XML file, instead of CSV.

      For further assistance, please save and send the configuration file which you have created to support(at)sysnucleus(dot)com. Also, please mention the web site from which you are trying to extract data and the details of data to be extracted.

      -SN WebHarvy Support-

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s