Friday, November 8, 2013

What is De-duplication

What is De-duplication

What is dedplication:
data deduplication is a specific form of compression where redundant data is eliminated, typically to improve storage utilization. In the deduplication process, duplicate data is deleted, leaving only one copy of the data to be stored. However, indexing of all data is still retained should that data ever be required. Deduplication is able to reduce the required storage capacity since only the unique data is stored.
Practical Scenario:
For example, a typical email system might contain 100 instances of the same one MB file attachment. If the email platform is backed up or archived, all 100 instances are saved, requiring 100 MB storage space. With data deduplication, only one instance of the attachment is actually stored; each subsequent instance is just referenced back to the one saved copy. In this example, a 100 MB storage demand could be reduced to only 1 MB. Different applications have different levels of data redundancy. Backup applications generally benefit the most from de-duplication due to the nature of repeated full backups of an existing file system.
Prons:
Ø de-dplication improves data protection, increases the speed of service, and reduces costs. The benefits from data de-duplication start with increasing overall data integrity and end with reducing overall data protection costs.
Ø de-duplication lets users reduce the amount of disk they need for backup by 90 percent or more
Ø de-duplication is a very valuable tool when used with virtual servers, giving you the ability to deduplicate the VMDK files needed for deployment of virtual environments and has the ability to deduplicate snap shots files i.e. VMSN & VMSD in VMWare.
Ø de-duplication can also provide significant energy, space, cooling and costs savings, by reducing the amount of data stored.
Cons:
Ø De-duplication ultimately reduces redundancy. If this was not expected and planned for, this may ruin the underlying reliability of the system
Ø effect of compression and encryption. Although deduplication is a version of compression, it works in tension with traditional compression. Deduplication achieves better efficiency against smaller data chunks, whereas compression achieves better efficiency against larger chunks. The goal of encryption is to eliminate any discernible patterns in the data. Thus encrypted data will have 0% gain from deduplication, even though the underlying data may be redundant.

No comments:

Post a Comment

Test1