Home About Research Invention Members area Contact


  • PortHadoop: Support Direct HPC Data Access for Hadoop Applications
  • PortHadoop is a software developed to support direct HPC (High Performance Computing) data access under Hadoop environments. Currently, HPC generated data often require Cloud computing data processing power to process and analyze data, while Cloud computing applications, such as Deep Learning, require HPC's compute power for computing. There is a demand and call for the converging HPC and Cloud computing. PortHadoop is designed and developed to meet the call of converging. A key component of PortHadoop is the concept of 'virtual block' which virtually maps the files on remote HPC PFS (Parallel File System) to a HDFS (Hadoop Distributed File System) under Cloud environment and enables Hadoop applications to access remote PFS-resided data directly, transparently, and seamlessly. PortHadoop is further extended to PortHadoop-R recently, to provide a more user-friendly interface and embrace R's capability and versatility in data analysis and visualization. PortHadoop-R is equipped with a novel, efficient strategy for diagnoses, sub-setting and visualization of HPC data under Hadoop and Spark environments and supports data processing and data transfer overlapping. More information about the PortHadoop project can be found here.

  • IOSIG: I/O Signatures Based Data Access Optimization
  • I/O Signature is a pre-defined notation to provide simple and clear presentation of data access patterns. IOSIG software allows us to characterize the I/O access patterns of an application in two steps: 1) trace collecting tool can get the trace of all the I/O operations of the application, 2) through the offline analysis on the trace, the analyzing tool gets the I/O Signature. Using the information in I/O Signatures, we can apply several optimizations on I/O systems, such as: data prefetching, I/O scheduling, cost model based data access optimization. (IOSIG flyer [PDF])

  • PFS-IOC: Server-side I/O-Coordination in Parallel File System
  • Parallel file systems have become a common component of modern high-end computers to mask the ever-increasing gap between disk data access speed and CPU computing power. Recognizing that an I/O request will not complete until all involved file servers in the parallel file system have completed their parts, we propose a server-side I/O coordination scheme for parallel file systems. The basic idea is to coordinate file servers to serve one application at a time in order to reduce the completion time, and in the meantime maintain the server utilization and fairness. A window-wide coordination concept is introduced to serve our purpose.

  • ORCHECK stands for "ORCHEstrated CHECKpointing". Motivated by the recognition that I/O contention is a dominant factor that impedes the performance of parallel checkpointing, ORCHECK proposes a systematic approach to improving the performance of parallel checkpointing. The main idea of ORCHECK is to orchestrate the concurrent checkpoints in an optimized and controllable way to minimize the I/O contentions. The targeted platform for ORCHECK is large-scale parallel computing systems with multi-core architecture and parallel file system such as PVFS2.

  • GHS: Grid Harvest Service
  • GHS stands for Grid Harvest Service. It is a performance evaluation and task scheduling system for solving large-scale applications in a shared environment. GHS is based on a novel performance prediction model and a set of task scheduling algorithms. GHS supports three classes of task scheduling, single task, parallel processing and meta-task. The Grid Harvest Service system comprises of five primary subsystems: performance evaluation, performance measurement, task allocation, task scheduling, and execution management. Coordinately, they provide appropriate services to harvest Grid computing.

  • Network Bandwidth Predictor(NBP)
  • This is an online network performance forecasting system developed based on neural network technology. GHS plus NBP provides a full function task scheduling system for distributed computing.

  • HPCM: High Performance Computing Mobility
  • HPCM stands for High Performance Computing Mobility. It is a middleware supporting user-level heterogeneous process migration of legacy codes written in C or Fortran or other stock-based programming languages via denoting the source code. It consists of several subsystems to support the main functionalities of heterogeneous process migration, including source code pre-compiling, data collection and restoration, communication coordination and redirecting, process monitoring, process scheduling, I/O redirecting and friendly user interfaces, etc.

  • Scarlet: A Context Aware Infrastructure
  • Pervasive Computing is one of most challenging research areas in Computer Science. Its ultimate goal is to provide 'Human Centered Computing' by understanding the user environment context. Most computer programs are controlled strictly based on program parameters and user input. Either they completely ignore these useful context information or they found it very difficult to extend them to new platforms because of tight coupling to context sources and underlying platforms. Scarlet is a context aware infrastructure deigned to captures the environment context from the environment devices and deliver them to context aware applications to provide modularity, platform independence and extensibility. It is implemented in Python and tested to work under Linux and Windows environment.
    For more information refer to: Scarlet: A Framework for Context Aware Computing

Illinois Institute of Technology
Home | About | Contact | Sitemap