headtop

Data De-dupe now available for your SAN - May 11, 2008

In the past few years, data reduction technologies like compression and more recently data de-duplication have become quite popular, especially for use in backup and archiving. Can this trend continue into primary storage?

In backup, especially where there is a great deal of redundant data, there has been a mass adoption of data reduction technologies. In just a few short years, data de-duplication has gone from an obscure to a well known term in the data center. Its ability to eliminate redundant segments of data has provided great benefit to backup storage and some types of archive storage. In backup data, assuming a weekly full backup, a 20X storage efficiency quotient is not uncommon.

Primary storage is different
Unfortunately, moving de-duplication into primary storage isn’t as simple as shifting its location. Following is an outline of the particular requirements of primary storage that need to be considered in planning de-duplication:

1. Primary storage is performance sensitive. Primary storage is active, and if the implementation of data de-duplication causes a performance impact on the production environment it will not be acceptable. Either the performance of the de-duplication technology must be so efficient and fast that it does not impact performance, or it has to be done out of band on files that are not immediately active.

The ideal is a near-production data set that is de-duplicated as a background process, removing the possibility of any performance impact. It would also make sense that this technology has the capability to de-duplicate and compress at different levels of efficiency --the greater the data reduction level, the greater the chance of impact on performance when the data is read back in. While it would be great to have an inline system that was fast enough to reduce the data set without impacting performance, the technology does not exist today.

2. Primary storage is unique. The other challenge to reducing data on primary storage is owing to the fact that the data is generally unique. This is a very different situation compared to backup data. In a backup, especially when doing a full backup every day or week, there is a high level of data redundancy. While production data may have some commonality -- for example, “extra” copies of the same database -- for the most part, data is not nearly as redundant as backup data or even archive data.

As disk-based archiving and disk backups become more common, they are actually causing even less redundant data to be kept on primary storage. In the past there was value in keeping a couple of extra copies of a database or set of files on primary storage “just in case.” Now those copies can be very easily sent to disk archives or disk backup devices. (This is a good thing!)

Note: The current user expectation to see storage efficiencies of 20X or more should not even be considered on primary storage. A more realistic goal might be 3X to at most 5X.

3. Primary storage is compressed. In addition to being unique, much of primary storage data is already in some pre-compressed format. Files such as images, media files, and industry-specific data sets like SEG-Y are already pre-compressed. Even the data files from the latest version of popular office productivity applications are pre-compressed. These pre-compressed files often represent the largest data set in the enterprise and the one with the fastest data growth.

To deal with this uniqueness and the pre-compressed nature of production data, a successful primary data storage reducer will have to “dig a little deeper.” While inline data reduction has the clear advantage in the backup and archive categories, production storage is an area where out-of-band management of the process might be more valuable.

Without the pressures to do data reduction so fast, time can be taken to examine a complex compound document and look for similarities within a file across the millions of files in the storage environment. This behind-the-scenes treatment of data also allows for time to be invested in understanding how specific formats -- .jpg, for example -- are stored; how that data becomes embedded into another document (for instance, a PowerPoint presentation); and how both the original data and its embedded occurrences might be best optimized for data reduction.

4. Primary storage is getting cheaper. The final challenge to data de-duplication on primary storage is the continual erosion of disk drive prices. The very condition that essentially killed HSM and later ILM may also be a detriment to the implementation of data reduction on primary storage. With 1 Tbyte SATA drives becoming available from the top-tier storage manufacturers, it may be deemed easier to simply buy larger capacity shelves of storage.

For more information please call (407) 265-6293 or visit us at: http://www.sencilo.com/storage-data-deduplication.php

About Us

Sencilo Solutions is a Florida-based integrator specializing in storage and security solutions. Sencilo delivers a comprehensive portfolio of products from best-of-breed hardware and software from multiple manufacturers including VMware, EMC, Juniper Networks, Hitachi, Symantec, Barracuda Networks, and HP. Its technical expertise is known throughout the storage and security industry. Clients include leading corporations, major financial institutions, top universities, government facilities, as well as small to medium size businesses. Sencilo's professional services include consulting, integration, project management, installation, maintenance and knowledge transfer.

Sencilo has offices throughout Florida including: Jacksonville, Daytona Beach, Miami, Tampa, St. Petersburg, Orlando, Hialeah, St. Augustine, Gainesville, Ocala, Palm Coast, Clearwater, Kissimmee, Lakeland, Maitland and Cape Canaveral

Offerings Projects: Replication De-Dup De-Dupe iSCSI SAN NAS VMware Security EMC NetApp HP IBM Quantum Compliance VTL Data Domain vs Gartner Magic Quadrant Quadrent LTO Backup Exc Pure Disk NetBackup Networker TSM Commvault BakBone D2D D2D2T compare cloud data deduplication  thin provisioning DXi Global Compression DDX  virtual tape library Data Reduction SEPATON FALCON compare Celerra CLARiiON Equallogic Dell NS20 NS40 CX4 CX3-20 CX3-40 CX3-80 FAS2050 FAS3050 Xiotech Nexsan Avamar DLD3 1500 D3 Storwiz storage compression data Ocarina Networks A-SIS compare Sepaton infopro BlueArc OnStor Microsoft Unified Storage data protection StorageX Brocade FAQ



headerbottomrounded