The focus of the RePhrase project is on producing new software engineering tools, techniques and methodologies for developing data-intensive applications in C++, targeting heterogeneous multicore/manycore systems that combine CPUs and GPUs into a coherent parallel platform...
The focus of the RePhrase project is on producing new software engineering tools, techniques and methodologies for developing data-intensive applications in C++, targeting heterogeneous multicore/manycore systems that combine CPUs and GPUs into a coherent parallel platform. Data-intensive applications are one of the most important and commonly encountered classes of industrial application. Such applications are often potentially highly parallel and are a clear match to emerging heterogeneous parallel architectures. However, exploiting this potential effectively can be difficult: it is even harder to obtain good performance for parallel data-intensive applications than for compute- intensive applications, since many additional issues related to data management need to be taken into account. These include structuring the data to make it efficient to access and to process, placement/migration/replication of the data to allow fast parallel access, ensuring data consistency etc. The RePhrase project tackles these issues directly.
The RePhrase project directly targets the challenge outlined for ICT 9. It aims to achieve a breakthrough in simplifying the problems of programming complex, parallel data-intensive systems. By using a new RePhrase software development methodology combined with new and advanced tools that address the whole software lifecycle, we aim to achieve significantly enhanced levels of reliability, robustness, resilience and software integrity. Moreover, our methodology will incorporate automatic adaptivity as an intrinsic part of the design, implementation and maintenance of parallel data-intensive applications. Use of this new methodology and the associated tools will help to foster increased growth and a more competitive EU software industry. The approach will be demonstrated through large-scale applications, taken from several application domains. RePhrase focuses directly on software tools and methods for large, complex and data-intensive systems.
We have identified initial and extended sets of patterns that are suitable for data intensive parallel programming, based on existing pattern sets (D2.1). We have extended the IBM FOCUS test planning tool and the Formal verification ExpliSAT tool to provide test planning and verification for parallel, and specifically patterned, applications. We have produced new mechanisms for detecting violations of extra-functional requirements, including performance and energy usage, and for detecting race conditions for lock-free structures (D3.1). We have developed new parallel scheduling and mapping mechanisms, plus performance monitoring to support pattern based software development (D4.1). We have produced new processes for requirements capture (D5.1), and developed new coding and data standards to support pattern-based parallel program development (D5.2).
We have implemented the initial set of patterns for C++ (D2.1). We have designed a simple and effective domain-specific language (DSL) to support the initial phases of patterned application design (D2.1). We have designed a new C++ general-purpose pattern API and produced a prototype implementation (D2.4). We have developed new program shaping techniques that improve the possibilities for parallelization, (D2.3). We have developed new automated refactorings to exploit this work and to automatically introduce parallelism into suitable source code under programmer control (D2.2). We have extended IBM’s ExpliSAT tool to support concurrent programs, extended PRL’s QA-Verify tool to support multithreaded applications (D3.1), and developed new tooling to detect the violation of extra-functional propertiess (D3.2). We have developed an initial static mapping tool for the basic pattern set, developed an initial dynamic scheduling tool and a tool for monitoring performance of parallel code (D4.1).
We have shown how IBM’s ExpliSAT tool can determine functional correctness for concurrent programs, and how PRL’s QA-Verify tool can determine whether code complies with the new coding standards from WP5 (D3.1). We have identified requirements on the project use cases to support evaluation against the software lifecycle (D6.1).
We have identified detailed metrics for robustness etc (D3.1); requirements on the use case applications (D6.1); and selected appropriate use cases based on these requirements (D6.3).
We have carried out a wide range of dissemination activities, including publishing 32 research papers and organising 5 technical workshops and one tutorial on RePhrase research activities/tools (D7.4).
The RePhrase project will revolutionise the process of designing, implementing and maintaining parallel data-intensive software by:
• introducing a new software engineering methodology for developing parallel data-intensive applications;
• building on and extending our existing pattern-based programming methodology to assist the design and implementation of parallel data-intensive software, in state-of-the-art parallel programming frameworks;
• developing automated tools for tuning and deploying applications on a wide range of different heterogeneous hardware platforms, and in a range of deployment settings (e.g. with and without other applications competing for resources);
• developing new methods for automatic discovery of parallel patterns in existing sequential C++ code and mechanisms for reshaping existing C++ code to prepare it for the introduction of patterns;
• developing new methods and tools for testing and verification of functional and extra-functional properties and requirements for data-intensive applications;
• developing novel semi-automated refactoring tools for introducing, rewriting and tuning parallel patterns in both new and existing C++ applications;
• integrating tools that address all aspects of the software development process into a coherent software engineering methodology for data-intensive applications, and ensuring their inter-operability;
• defining standards for programming pattern-based data-intensive applications, and developing new tools for automatically checking compliance with these standards.
RePhrase will provide significant productivity increase in the development, testing, verification, deployment and maintenance of parallel systems. This will impact a number of areas including data-intensive systems (which are often naturally parallel), as well as distributed/cloud systems (where execution clusters will need to be parallel, and where elastic scalability depends on easy decomposition of software into independent parallel components, as in the RePhrase project). Productivity improvements will be verifiable through reductions in coding errors, reductions in bugs, improvements in robustness, and increased portability and evolution of code that is developed using RePhrase techniques.
More info: http://rephrase-ict.eu/index.html.