RSS: Events
|
News
|
Papers
PDSI News @ UCSC
PDSI Events @ UCSC
No upcoming events at this time.
|
Archival Storage
Faculty
Students
Associates
Alumni
Sponsors
Description
We have several active projects in archival storage, all of which are contributing to the ability to
build more efficient, reliable, and secure long-term storage systems.
- Deep Store: building more efficient archival storage using deduplication to take
advantage of intra-file and inter-file redundancy.
- POTSHARDS: long-term secure storage, which allows the secure preservation of data for decades without relying upon traditional encryption to prevent information leakage.
- Pergamum: long-term evolvable storage built from intelligent network-attached bricks
with both disk and NVRAM such as flash.
Digital reference data is produced at ever higher rates, increasing storage requirements, while at the same time users are increasing their demand for lower access times. On-line deep storage, with sub-second latency, is remarkably better than robot-loaded near-line media, which can take minutes. Disk-based deep storage is becoming practical because magnetic disks are rapidly becoming as inexpensive as magnetic tape and optical storage, the traditional storage media used for backup and archiving today. The Deep Store architecture uses inter-file (differential) and intra-file (sliding dictionary) data compression to increase storage density, and by adding distribution and redundancy to improve request bandwidth and robustness, the expected media costs will be much lower than that of traditional backup and archival storage.
The goal of the POTSHARDS project is to securely preserve data by spreading breaking it into pieces (shards ) and storing them across multiple archives so that no individual archive can reconstruct the data or even know which shards it must steal from other archives to build data. However, a user who gathers all of the shards must be able to reconstruct the original data with no additional information (including encryption keys). We accomplish this using multiple levels of secret splitting and approximate pointers that limit the space that must be searched for related shards while requiring an attacker to obtain exponential numbers of shards that may not be identified in advance. This approach has information-theoretic security because of the use of secret splitting, unlike encryption that might be broken by advances in algorithms or computer hardware. We believe that this approach will become common as the need to securely store data for decades becomes more pressing.
Pergamum was created to explore evolvable archival storage. The project's goal is to develop a long-term system that controls the major storage cost contributors: static, operational and management. Pergamum consists of a fully distributed network of intelligent storage devices. Each node, called a tome, consists of a SATA hard drive, a low-power processor, NVRAM and a standardized network interface. Reliability is provided through two levels of redundancy encoding: intra-tome redundancy handles latent sector errors, and inter-tome redundancy handles lost devices. By keeping most of the devices spun-down, and through the utilization of commodity hardware, Pergaumum provides cost efficiency on par with tape based systems, while providing superior random access performance. Further cost savings are realized by utilizing hierarchical consistency checking, staged rebuilds and NVRAM based metadata stores; reducing disk spin-up results in dramatic energy savings.
Status
We are currently developing a scalable system architecture that addresses new problems: searching for similar files in a very large corpus to improve compression, maximizing storage throughput, distributing a large system for throughput and reliability, and managing file similarity data for billions of files. Due to the immutable nature of archival, or reference data, content-based addressing can be used to identify and locate entire files or portions of files. Our work currently focuses on organizing similar files containing arbitrary data using data fingerprinting and summarization. We are characterizing reference data and determining suitability of the deep store for various problem domains, such as scientific computing, simulations, enterprise and organizational computing. We have experimented with chunk-based storage (variable-sized blocks) and delta-encoded storage to evaluate the relative merits of each technique for storage efficiency, performance, and workload applicability.
We have also implemented a prototype POTSHARDS system, and have tested its performance on both local clusters and the PlanetLab wide-area testbed. We have demonstrated the ability to reconstruct data from just the shards stored in the system; while this can be done relatively quickly if all of the shards are present, it is impossible to do using just the shards from a single archive. We are currently exploring different redundancy techniques and approaches that will reduce the storage overhead while maintaining a high level of security and resistance to attack.
Publications
-
Mark W. Storer,
Kevin Greenan,
Darrell D. E. Long,
Ethan L. Miller,
Secure Data Deduplication,
Proceedings of the 4th International Workshop on Storage Security and Survivability (StorageSS 2008), held in conjunction with the 15th ACM Conference on Computer and Communications Security (CCS 2008),
October 2008.
-
Kevin Greenan,
Ethan L. Miller,
Thomas Schwarz,
Optimizing Galois Field Arithmetic for Diverse Processor Architectures,
Proceedings of the 16th Annual IEEE International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS 2008),
September 2008.
-
Casey Marshall,
Efficient and safe data backup with Arrow,
Technical Report UCSC-SSRC-08-02,
June 2008.
Masters project report.
-
Mark W. Storer,
Kevin Greenan,
Ethan L. Miller,
Kaladhar Voruganti,
Pergamum: Energy-efficient Archival Storage with Disk Instead of Tape,
;login: — The USENIX Magazine 33(3),
June 2008.
-
Mark W. Storer,
Kevin Greenan,
Ethan L. Miller,
Kaladhar Voruganti,
Pergamum: Replacing Tape with Energy Efficient, Reliable, Disk-Based Archival Storage,
Proceedings of the 6th USENIX Conference on File and Storage Technologies (FAST '08),
February 2008, pages 1-16.
[slides]
-
Kevin Greenan,
Ethan L. Miller,
Thomas Schwarz,
Darrell D. E. Long,
Disaster Recovery Codes: Increasing Reliability with Large-Stripe Error Correction Codes,
Proceedings of the 3rd International Workshop on Storage Security and Survivability (StorageSS 2007), held in conjunction with the 14th ACM Conference on Computer and Communications Security (CCS 2007),
October 2007.
-
Kevin Greenan,
Ethan L. Miller,
Thomas Schwarz,
Analysis and Construction of Galois Fields for Efficient Storage Reliability,
Technical Report UCSC-SSRC-07-09,
August 2007.
Revised version published in MASCOTS 2008.
-
Deepavali Bhagwat,
Kave Eshghi,
Pankaj Mehra,
Content-based Document Routing and Index Partitioning for Scalable Similarity-based Searches in a Large Corpus,
Proceedings of the 13th ACM SIGKDD international conference on Knowledge Discovery and Data Mining (KDD '07),
August 2007, pages 105-112.
-
Mark W. Storer,
Kevin Greenan,
Ethan L. Miller,
Kaladhar Voruganti,
POTSHARDS: Secure Long-Term Storage Without Encryption,
Proceedings of the 2007 USENIX Technical Conference,
June 2007.
[slides]
-
Jehan-François Pâris,
Thomas Schwarz,
Darrell D. E. Long,
Self-Adaptive Two-Dimensional RAID Arrays,
Proceedings of the International Performance Conference on Computers and Communication (IPCCC '07),
April 2007.
-
Mark W. Storer,
Kevin Greenan,
Ethan L. Miller,
Long-Term Threats to Secure Archives,
Proceedings of the 2nd ACM Workshop on Storage Security and Survivability (StorageSS 2006),
October 2006.
-
Mark W. Storer,
Kevin Greenan,
Ethan L. Miller,
Kaladhar Voruganti,
POTSHARDS: Secure Long-Term Archival Storage Without Encryption,
Technical Report UCSC-SSRC-06-03, Storage Systems Research Center, University of California, Santa Cruz,
September 2006.
Later version published in USENIX 2007.
-
Deepavali Bhagwat,
Kristal Pollack,
Darrell D. E. Long,
Thomas Schwarz,
Ethan L. Miller,
Jehan-François Pâris,
Providing High Reliability in a Minimum Redundancy Archival Storage System,
Proceedings of the 14th IEEE International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS '06),
September 2006, pages 413-421.
-
Thomas Schwarz,
Ethan L. Miller,
Store, forget, and check: Using algebraic signatures to check remotely administered storage,
Proceedings of the IEEE Int'l Conference on Distributed Computing Systems (ICDCS '06),
July 2006.
[slides]
-
Mark W. Storer,
Kevin Greenan,
Ethan L. Miller,
Carlos Maltzahn,
POTSHARDS: Storing Data for the Long-Term Without Encryption,
Proceedings of the 3rd International IEEE Security in Storage Workshop,
December 2005.
-
Lawrence You,
Kristal Pollack,
Darrell D. E. Long,
Deep Store: An Archival Storage System Architecture,
Proceedings of the 21st International Conference on Data Engineering (ICDE '05),
April 2005.
-
Joerg Meyer,
Large-Scale Multi-Type Inverted List Indexing,
Masters thesis, University of California, Santa Cruz,
March 2005.
-
Thomas Schwarz,
Qin Xin,
Ethan L. Miller,
Darrell D. E. Long,
Andy Hospodor,
Spencer Ng,
Disk Scrubbing in Large Archival Storage Systems,
Proceedings of the 12th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS '04),
October 2004, pages 409-418.
Won Best Paper award.
-
Lawrence You,
Christos Karamanolis,
Evaluation of efficient archival storage techniques,
Proceedings of the 21st IEEE / 12th NASA Goddard Conference on Mass Storage Systems and Technologies,
April 2004.
Last modified 10 May 2008
|