The NEXTGenIO project addresses a key challenge not only for Exascale, but also for HPC and data intensive computing in general: the challenge of I/O performance. As core-counts have massively increased over the past few years, the performance of I/O subsystems have struggled...
The NEXTGenIO project addresses a key challenge not only for Exascale, but also for HPC and data intensive computing in general: the challenge of I/O performance. As core-counts have massively increased over the past few years, the performance of I/O subsystems have struggled to keep up with computational performance and have become a key bottleneck on today’s largest systems. NEXTGenIO will develop a prototype computing platform that uses on-node non-volatile memory, bridging the latency gap between DRAM and disk, thus removing this bottleneck. In addition to the hardware that will be built as part of the project, NEXTGenIO will develop the software stack (from OS and runtime support to programming models and tools) that goes hand-in-hand with this new hardware architecture. Two particular focal points are a data and power aware job scheduling system, as well as an I/O workload and workflow simulator that will allow us to stress-test our hardware and software developments. We believe that the new platform that is being developed in NEXTGenIO will be capable of delivering transformational performance across high performance and data intensive computing.
The impact of improving the I/O performance of HPC systems, in particular with a view to moving to the Exascale, is considerable and will transform the scientific and data centric computing, as well as computational workflows. Application areas such as weather forecasting, engineering and data analytics will be able to compute using unprecedented volumes of data close to the processor, improving not only the time to solution, but also the quality of the output.
The overall aim of the Next Generation I/O Project (NEXTGenIO) is to design and prototype a new, scalable, high-performance, energy efficient computing platform, designed to address the challenge of delivering the necessary scalable I/O performance to applications at the Exascale. The key objectives that are described below:
1. Hardware platform prototype: a new prototype HPC hardware platform will be developed by Fujitsu utilising the latest NVDIMM and processor technology from Intel. There are many different ways of utilising this fascinating technology in computing. The NEXTGenIO hardware architecture has been produced from a detailed requirement capture process in WP2 and steps towards producing the platform have been taken in WP6. More detailed information is given in Sections 1.2.2 and 1.2.6.
2. Exascale I/O investigation: as the NDIMM technology is new, the project will investigate different methods of utilising its functionality to support the most efficient I/O performance in HPC and data centre environments. This technology will be truly transformative in terms of HPC workloads. Understanding how best to utilise it is a key research objective of the project. Investigations were done into the different modes in which the NVDIMMs can operate and the different options for their usage, e.g. object storage, distributed file systems, or extended memory hierarchy. Initial implementations of various solutions are ongoing.
3. Systemware development: the software components that will support the use of the NVDIMM technology by applications, the operating system, and the debugging and performance tools will be developed. This will include I/O libraries, new energy and data aware schedulers, enhancements to programming model libraries, and tools development. The architectural design decisions will be taken using the results of the Exascale I/O investigation and the co-design process. Producing the necessary software to enable Exascale application execution on the hardware platform is therefore a key objective. The systemware architecture is finalised, and development of the systemware components is ongoing as part of WPs 4 ,5 and 6.
4. Application co-design: any new I/O platforms need to meet the needs of today’s highly parallel applications as these will be tomorrow’s Exascale applications. Understanding individual applications\' I/O profiles and typical I/O workloads on shared systems running multiple different applications is key to ensuring the decisions we make in hardware design and the I/O models we investigate will be relevant to the real world. The use of co-design to inform these choices is a therefore a key objective of this project. Co-design has been a continued strong focus throughput the project, and usage scenarios/usage requirements have driven the architecture specifications. Monitoring data that describe the IO workload behavior have been collected for three different data centres and have driven the design of the Kronos workload simulator.
NEXTGenIO is progressing the state of the art by developing a new computing platform that uses a transformational new non-volatile memory technology to address the I/O challenge faced by HPC and HPDA today. Not only is the project designing and developing the hardware, but it also puts in place the necessary system software for applications to be able to exploit this hardware. One of the key objectives of the project is to discover how the new technology can benefit scientific and data intensive computing, and to support its use as widely as possible.
More info: http://www.nextgenio.eu.