RSS feed RSS: Events | News | Papers


PDSI Events @ UCSC

Petascale Data Storage Institute @ UC Santa Cruz

Data deduplication is an essential and critical component of backup systems. Essential, because it reduces storage space requirements, and critical, because the performance of the entire backup operation depends on its throughput. Traditional backup workloads consist of large data streams with high locality, which existing deduplication techniques require to provide reasonable throughput.

We have developed Extreme Binning, a scalable deduplication technique for non-traditional backup workloads that are made up of individual files with no locality among consecutive files in a given window of time. Due to lack of locality, existing techniques perform poorly on these workloads. Extreme Binning exploits file similarity, and makes only one disk access for chunk lookup per file, which gives reasonable throughput. Multi-node backup systems built with Extreme Binning scale gracefully with the amount of input data; more backup nodes can be added to boost throughput. Each file is allocated using a stateless routing algorithm to only one node, allowing for maximum parallelization, and each backup node is autonomous with no dependency across nodes, making data management tasks robust with low overhead.

The Storage Systems Research Center (SSRC) is part of the Petascale Data Storage Institute, a Department of Energy-funded institute exploring techniques to make high-performance storage faster and more usable.

PDSI Research at UC Santa Cruz

PDSI topics the SSRC is investigating include security for petascale storage, new approaches to distributed metadata, the use of storage class memories in high-performance systems, and archival storage.

PDSI Organization

The Petascale Data Storage Institute is led by Carnegie Mellon University. The PDSI institutions are:

  • Carnegie Mellon University
  • Lawrence Berkeley National Laboratory and the National Energy Research Scientific Computing Center
  • Los Alamos National Laboratory
  • Oak Ridge National Laboratory
  • Pacific Northwest National Laboratory
  • Sandia National Laboratory
  • University of California, Santa Cruz
  • University of Michigan
Further information about the overall PDSI organization is available at

Last modified 4 Feb 2008
Home | Research | People | Publications | Seminars | Sponsors
Site powered by Django