IO-SIG Project

Thank you for visiting!
IO-SIG project

Updates 02/27/2014

The online repository has been moved to GitHub since Google Code has stopped their download service.

Get to know IO-SIG project and software:

For a quick intro, please read on the current webpage and this flyer: IO-SIG's flyer [PDF].

Ready to try it? Find the latest source code at IOSIG's repo on GitHub!

How to use it? See the user guide.

More research? Read the SC'08 paper and the CCGrid'12 paper.

IO-SIG project stands for the research project on IO-Signature based data access optimization, one of the projects in Scalable Computing Laboratory at Illinois Institute of Technology.

In high performance computing (HPC) systems, data access delay has been a major reason for poor sustained system performance (SSP). Multiple levels of memory hierarchy have been incorporated into computer architecture to take advantage of locality among data accesses to reduce the gap between peak and sustained performances. Even with the advantages, one HPC system may server some applications well enough, but works inefficiently with other applications.

To make the design, implementations and optimizations on HPC systems more efficient, we propose I/O Signature notation and a serie of tools to help people get to know the characteristics of their applications, especially the I/O characteristics.

IOSIG software allows users to characterize the I/O access patterns of an application in two steps: 1) trace collecting tool can get the trace of all the I/O operations of the application, 2) through the offline analysis on the trace, the analyzing tool gets the I/O Signature. Using the information in I/O Signatures, we can apply several optimizations on I/O systems, such as: data prefetching, I/O scheduling, cost model based data access optimization.

Keywords: MPI-IO, I/O tracing, I/O profiling, Trace analysis.

IO-Signature

IO-Signature is a pre-defined notation that can provide simple and clear presentation of data access patterns.

For example, the following trace signature tells some application does "MPI_READ" on some data file starting from offset 0 for 247 times; the size of each read is 1048576 bytes. The data access pattern is contiguous access.

{MPI_READ, 0, 1, ({1048576, 247}), 1}

More about the notation definition on I/O Signature, please see the documents.

Project motivation

Parallel I/O prefetching is considered to be effective in improving I/O performance. However, the effectiveness depends on determining patterns among future I/O accesses swiftly and fetching data in time, which is difficult to achieve in general. In this study, we propose an I/O signature-based prefetching strategy. The idea is to use a predetermined I/O signature of an application to guide prefetching. To put this idea to work, we first derived a classification of patterns and introduced a simple and effective signature notation to represent patterns. We then developed a toolkit to trace and generate I/O signatures automatically. Finally, we designed and implemented a thread-based client-side collective prefetching cache layer for MPI-IO library to support prefetching. A prefetching thread reads I/O signatures of an application and adjusts them by observing I/O accesses at runtime.

Although I/O Signature is first designed to be used for signature based data prefetching, there are more potential of I/O Signature to be explored on improving overall I/O performance.

Scalable Computing Laboratory | Department of Computer Science | Illinois Insititute of Technology
Last modified: 11/15/2010
Valid HTML 4.01 Strict Valid CSS!