ITR/IM: Data Management Using Smart Storage Systems

Elizabeth Varki
Department of Computer Science
University of New Hampshire
Durham, NH 03824

Eugene Freuder (Collaborator)
Cork Constraint Computation Centre
University College Cork
Cork, Ireland

Phone: (603) 862-2319
Fax: (603) 862-3493
Email: varki@cs.unh.edu
www: www.cs.unh.edu/~varki

WWW Page
www.cs.unh.edu/~varki/me/itr.html

Project Award Information

Keywords
storage data management, enterprise storage systems, disk arrays, constraints programming, performance evaluation, analytical performance models.

Project Summary
Storage systems have evolved from small disk systems under control of a file server to large independent disk-array systems. These storage devices have powerful controllers capable of running complex algorithms and making decisions regarding data placement. The storage controllers also have array caches capable of holding large amounts of data for quick access. As a result, the placement and movement of storage data are largely controlled by the storage controllers, independent of file systems. The performance of these smart storage devices is definitely superior to the performance of older storage devices. However, the performance of storage devices is far lower than that promised by storage manufacturers. That is, the performance of these large storage devices should be far superior given the current state of the hardware and software technology. Since storage devices are the slowest components in a computer system, an improvement in the performance of storage devices would result in an overall improvement in the performance of the computer systems accessing these devices. This project addresses this issue and attempts to improve the performance of enterprise storage systems.

We first analyzed enterprise storage systems and evaluated the internal algorithms. Based on our analysis, we concluded that storage software has not kept pace with storage hardware. As a result, storage devices are not able to perform to their potential. Some of the problem areas this project identified are: (a) the storage caching algorithms dealing with adaptive prefetching of sequential streams; (b) the load balancing algorithms; (c) the disk scheduling algorithms for mirrored disk configurations; and (d) the workload-dependent adaptive algorithms. Based on this analysis, new algorithms to address some of these areas are developed. Detailed performance models of enterprise storage devices and techniques to quickly evaluate the performance of storage devices are also developed. These algorithms and models can be implemented in storage devices so that they can quickly adapt to workload changes and deliver the performance promised by the hardware technology.

Publications and Products

Project Contributions

(a) The detailed storage system model that can be used to evaluate the performance of 'real' storage devices.
(b) The queueing techniques developed for parallel systems.
(c) The identification of areas of storage devices that under-perform and reasons for this poor performance. Based on this analysis, we developed new algorithms for some of the areas we identified.
(d) the educational and research development of students who participated in this work and/or took related courses.

Area Background
Storage systems represent a growing market. In recent years there has been an explosion of applications (which include scientific ``grand-challenge'' programs, multi-media systems, and large transaction-based information systems) with varying performance needs that use enormous amounts of data. These applications have high Quality of Service (QoS) requirements from storage devices, irrespective of the location of the data and its users and the problems that could interfere with data access. It is very difficult to coordinate the storage, network, and computation resources required for these heterogeneous applications. A solution to this problem of storage data management is to have the storage system manage its data. This approach was first proposed by Gelb [Gelb89] who referred to it as system-managed storage. Attribute-managed storage [Borowsky98], a formalization of the system-managed approach, is currently being studied in Hewlett-Packard Storage Labs. In addition, companies like EMC, Veritas, and IBM and research labs like the NASD Lab in CMU are investigating the design and development of smarter storage systems.

Area References

Acknowledgement Of Support And Disclaimer

"This material is based upon work supported by the National Science Foundation under Grant No. ITR/IM 0082399. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation."