Interconnect networks with Dragongly and Fat tree configurations are dominant in high-performance computing facilities and data centers. A key challenge of managing these shared networks is workload interference. In a multi-user computing environment, interference among applications for shared network resources can cause a vicious cycle of events (workload interference, low productivity, selfish user behavior, and poor scheduling) aggravating each other. This project aims to address this fundamental problem on massively parallel systems by developing the IRON (Interference ReductiON) framework. The project consists of three research thrusts: (1) network simulation to gain insights into communication interference among applications and further to explore various what-if questions in terms of workload interference, (2) interference-aware scheduling to develop intelligent scheduling strategies for avoiding or mitigating network contention among applications, and (3) real-world experiments to quantitatively characterize workload interference and to assess the interference-aware scheduling design.
Completion of the project will create novel interference-aware scheduling policies and scalable software tools for interference analysis and reduction on massively parallel systems with shared network configurations. The resulting data and tools collected from simulations and experiments will be made available to the broad community under an open source license. An integrated education and outreach plan will enhance the Computer Science curriculum, broaden the participation by underrepresented groups, and outreach to the surrounding communities that are predominantly African-American and Latino.Faculty: