Blog

The Benefits of Using Source Data Over a Traditional Scrape

Posted by Vanessa Vary on Mar 21, 2016 4:31:03 PM

Here are some key benefits of using source data and associated plugins against a traditional scrape tool. We hope this helps explain the difference and enables you to improve your data quality. 

1. Better control over which data is outputted

While in development stages, scrapes or spiders, are engineered to match what is currently live on the site. While trying to be generic enough to ensure retrieving maximum data, that will only depend on the site structure at setup time. Plugins such as Magento or Demandware, however, allow merchants to have total control over the data that will be exported, providing merchants the security of making sure all the inventory is exported, regardless of navigation changes that may happen on the site.

2. Develop one export every time

When a spider is developed, engineers have to specify selectors to pick up data. Be it Xpath selectors or CSS selectors, each one of them is dependant on the current site setup. If the website interface is altered, be it a change in the CSS stylesheet or in the template structure, engineers will have to reproduce these changes. This will obviously impact feed generation, as it requires some time to be allocated for the change to be replicated. With an associated plugin, as it is directly sourcing the information from the internal database of the site, changes made on the UI will have no impact, and will therefore ensure a feed export is constantly generated with the most accurate data possible.

3. Full data exported almost instantly

As spiders have to navigate on each product page, the total running time can be quite long. We try to retrieve sites information within a 24h window to make sure prices are as accurate as possible, and stock information are matching what a user would see on site. However, site navigation relies on multiple factors, such as web server response time, page loads, and count. In other words, the busier a webserver is, and the higher the page count, the slower and longer it will take for the data to refresh in the system. While we try to ensure we retrieve 100% of the product information available on merchants' sites, it may very well happen that discrepancies will appear, as pages can fail to load, or not respond within the timeout thresholds. Plugins, such as Magento or Demandware, export straight from the site database, meaning that all these limitations will be removed, leading to a quicker turn around for refreshing the data in product feeds.

If you want to find out more on how we can help you better manage your data, don't hesitate to contact us. We would be more than happy to help. 

>>>    FIND OUT MORE

 

 

Topics: ecommerce platform, scraping, scrape, benefits, plugin, data management, ecommerce