Nutanix incorporates a wide range of storage optimization technologies that work in concert to make efficient use of available capacity for any workload. These technologies are intelligent and adaptive to workload characteristics, eliminating the need for manual configuration and fine-tuning.
Nutanix delivers two types of data deduplication to accelerate application performance and to optimize storage capacity. Performance tier deduplication removes duplicate data in the content cache (SSD and memory) to reduce the footprint of an application’s working set. This enables more working data to be managed in the content cache, thus yielding significant performance improvements. In addition, global, post-process MapReduce deduplication reduces repetitive data in the capacity tier to increase the effective storage capacity of a cluster.
For applications with large common working sets, performance tier deduplication increases effective flash and memory resources by up to 10x, delivering nearly instantaneous application response times. MapReduce deduplication is global and distributed across all nodes in the cluster, minimizing any performance overhead. MapReduce deduplication is particularly useful for virtual desktops with full clones.
When deduplication is enabled, data is fingerprinted on ingest using a SHA-1 hash, with fingerprint information stored in the Nutanix metadata layer. Deduplication operations are software-driven, and leverage the hardware-assist capabilities of the Intel chipset for SHA-1 fingerprint generation for the fastest possible performance.
Both types of deduplication can be easily configured and managed at vdisk granularity for fine-grained control.
Deduplication works in concert with compression and erasure coding in a Nutanix cluster to make efficient use of available capacity for any workload. These technologies are intelligent and adaptive to workload characteristics, eliminating the need for manual configuration and fine-tuning.
Data in a Nutanix cluster can be compressed to increase the effective storage capacity by up to 4x. Administrators can enable data compression as an inline capability as data is written to the system, or post-process after the data has been written. Post-process compression is executed as a series of MapReduce jobs, and because it is performed after the initial write it eliminates any impact on write path performance.
Compression in a Nutanix hyperconverged system is a distributed process that runs on each node in the cluster to leverage all system compute and memory resources. Unlike traditional storage architectures where compression operations run on just one or two CPUs, Nutanix web-scale architecture scales overall compression power as the cluster grows.
Nutanix uses the efficient Snappy compression algorithm to compress a variety of data types at minimal CPU cost. Nutanix compression is superior to legacy storage devices that compress data for entire LUNs or disks. Alternatively, Nutanix compresses data at the sub-block level for increased efficiency and greater simplicity.
Compression works in concert with deduplication and erasure coding in a Nutanix cluster to make efficient use of available capacity for any workload. These technologies are intelligent and adaptive to workload characteristics, eliminating the need for manual configuration and fine-tuning.
Erasure Coding with Nutanix EC-X
In addition to data replication for redundancy, Nutanix systems use an innovative erasure coding technology, Nutanix EC-X, that delivers resilience along with capacity efficiency. Erasure coding applies a mathematical function around a data set to calculate parity blocks, which can then be used to recover data in the event of a failure. EC-X overcomes the capacity cost of data replication without taking away any of the resilience benefits. Nutanix systems switch between data replication for hot data and erasure coding for cold data based on I/O frequency to optimize performance and storage.
Nutanix EC-X uses an improved patent-pending algorithm that offers more flexibility and speed than traditional Reed-Solomon implementations. EC-X is very efficient in terms of computational overhead, since coding and rebuilds are distributed across the entire cluster. This reduces vulnerability windows in the event of failures by speeding up rebuilds. EC-X also maintains data locality, which is crucial in delivering high performance in Nutanix systems. Unlike deduplication and compression, which vary in efficacy depending on workloads, erasure coding offers deterministic capacity savings regardless of workload characteristics.