WebHarvy 4.0.2.125 – Multi-level Category / Multi-list Keyword scraping

We have introduced support for scraping multiple level categories (main categories, sub categories tree) and support for multiple input keyword lists in this release. The main features are:-

True multi-level Category Scraping

WebHarvy now supports automatically navigating category/subcategory lists of a website to extract data from the final listing pages. Know More

 

Support for multiple input keywords

Any number of input text fields can be populated with lists of strings/keywords during configuration. WebHarvy will automatically apply all combinations of provided keywords during the mining phase. Know More.

 

Capture window with new options

webharvy

Run JavaScript on Page

Run specified Java Script code on page – know more. This option can be used to load elements on a page which cannot be done using the default navigation options (link-follow, click) provided by WebHarvy.

Input strings to text input fields

Strings to be input to text fields can now be made a part of the configuration. Know More. Earlier such parameters were automatically taken from the PostData of the configuration. But sometimes, with some websites, the PostData will not contain the input strings submitted and this option helps to correctly load the page displaying data during mining phase.

Extract data from Popups

Know More. Helps to extract data by clicking each listing link/button and get data from a popup window or a view in the same page populated by data. This is different from ‘Follow this link’ option because here the data is loaded on the same page (no page navigation) and different from ‘Click’ option because after clicking each link data has to be extracted from page before clicking the next link.

Option to smoothly scroll page during mining to load all contents (lazy loading)

Smooth scroll to page end to load elements which are loaded (for example lazy loading of images) only when the elements are made visible by scrolling down. Know More.

Select drop-down/list-box options

Select drop-down/list-box/combo-box options during configuration and mining. Again this option allows navigation to result pages when normal configuration is unable to make these selections and load the result page. Know More.

Other Minor Additions Include :-

  1. Improvements in automatic scraping of multiple product images
  2. Support for loading keyword lists directly from file
  3. ‘Capture Image’ option automatically enabled via HTML/RegEx method in applicable cases.
  4. Name downloaded image files by value obtained from a column/cell in miner data table. More.
  5. Allows applying ‘Capture More Content’ after selecting ‘Capture HTML’.
  6. Quick access to items under ‘More Options’ in Capture window via toolbar buttons.
  7. Minor bug fixes.

You may please download and try the latest version from https://www.webharvy.com/download.html.

This entry was posted in Release update, Uncategorized, WebHarvy and tagged , , , , , . Bookmark the permalink.

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s