Large-scale computing systems are today built as distributed systems with servers hosted in data centres for reasons of scale, heterogeneity, cost and energy efficiency. In this scenario, software components and software services are distributed as well and accessed remotely...
Large-scale computing systems are today built as distributed systems with servers hosted in data centres for reasons of scale, heterogeneity, cost and energy efficiency. In this scenario, software components and software services are distributed as well and accessed remotely over the Internet through clients and other IT devices. While recent years have seen advances, computational resources and network capacity are often provisioned using best effort provisioning models even in state-of-the-art data centres. This limitation is a major hindrance of the coming evolution of IoT and the networked society. More drastically, it has even today manifested in limited cloud adoption of systems with demands beyond best-effort.
RECAP goes beyond the current state of the art and aims at developing the next generation of cloud and edge computing capacity provisioning. The project performs targeted research advances in cloud infrastructure optimization, simulation, and automation building on advanced machine learning. The overarching result of RECAP is a framework for the realisation for the next generation of agile and optimized cloud computing systems.
The figure shows the overall, abstracted operation and optimisation loop targeted by the project. Through the collection of monitoring data from real-world applications and real-world infrastructure, the project gathers data traces that are processed by means of statistical analyses and machine learning. This step leads to models for the infrastructure, the workload, and the application which can then be used as input by other RECAP tools to either improve (optimise and adapt) the configuration of an existing infrastructure or to run simulations and what-if analysis on hypothetical or envisioned future applications and infrastructures. Finally, optimisation and adaptation steps are enacted.
In order to achieve the overall goal of RECAP, the work is decomposed into several tracks that tackle the following aspects: (a) the optimisation of infrastructure and load distribution on the infrastructure; (b) the optimisation of individual distributed applications by understanding the load propagation within the application topology. In order to get an understanding of infrastructure and application, it is necessary to collect and analyse data traces, so that (c) the collection of infrastructure and application monitoring data and (d) the generation of insights from that data are further tracks of work. What is more, (e) the use of simulation helps with validation of RECAP and the execution of what-if scenarios, as well as supporting other tracks of work. Finally, (f) RECAP is supported by four industrial use cases that help the project with the definition and priorization of requirements as well as the validation of the RECAP framework and tools.
The first quarter of the project was oriented towards gaining understanding of the use cases and the identification of their requirements. Further, effort was put into the preparation of project-internal testbeds and early discussions on architecture. Finally, the creation of visibility through web site and flyers (D2.1, MS.2) and the creation of a data management and quality assurance plans (D1.1, MS.1) was finalised. The simulation work package WP7 started in M4 and targeted the design of an early simulation architecture as well as the definition of infrastructure models for the use cases. The remaining technical work of the project in WP5, WP6, WP8 started in M9 of the project.
The second quarter of the project started with the provisioning of the project testbed enabling the collection of monitoring data (D4.1, MS.3). A major goal in the second quarter of the project was to develop initial models for describing aspects of a (geo-)distributed infrastructure, distributed, elastic applications, as well as users accessing them, and constraints imposed by their operators. These models would then guide work in tracks (a)-(e) resulting in prototypes of each individual track of work (D6.1, D7.1, D7.2, D8.1). Initial models were enhanced with prototypes of the data collection framework (D5.1) and infrastructure orchestration and optimisation (D8.2). The availability of the prototypes, in turn, leads then to an architecture defining an integrated environment with interactions between available prototypes (D4.2), closing milestone MS.4. The integration of the prototypes itself is subject to the second period of the project and will be completed after the second prototypes for each track of work has been released.
Besides, the technical development until mid-project, RECAP results were presented at 9 conferences, 7 workshops and in 9 further events. The consortium produced 11 videos and was able to reach several hundred citizens, policy makers, and industry representatives.
The outcomes of the project will pave the way for a radically novel concept in the provision of cloud services, where services are instantiated and provisioned close to the users that actually need them by self-configurable cloud computing systems. The primary expected result of RECAP is both a methodology as well as a framework for the realisation for the next generation of agile and optimized cloud computing systems. In order to achieve this result, RECAP goes beyond state of the art in several domains and achieve the following impact:
The RECAP Simulator will allow for reproducible and controllable experimentation, aiding in identifying targets for components deployment and solving them optimally prior to the actual deployment in a real cloud environment. By that, simulation can aid in capacity planning for the cloud, aid in understanding performance, and providing “what-if†analysis enabling system managers to evaluate and experiment with application performance under various deployment configurations and workload settings.
RECAP modelling tools will investigate and enhance existing performance modelling approaches in order to fulfil the requirements for the DevOps of highly connected and dynamic Cloud applications. The workload characterisation and the inference of behaviour models will be carried out targeting a comprehensive and robust solution. Through the analysis of the correlation between workload patterns and application behaviour, the RECAP analytics methods will provide application developers with the methodology and tools to identify, characterize, and understand complex application behaviours.
The RECAP application models will take advantage of the DevOps way, defining applications, as well as the model-driven deployment approaches, to automatically identify the dependencies and interactions among applications components, tightly coupled with live infrastructure monitoring information. The application structure will be a starting point for the workload models that help developers and operators identify and characterize key performance indicators for applications and components.
The RECAP Optimiser will target not unilateral optimisation actions from either the point of view of the applications or from that of the infrastructure, but consider both aspects at the same time. The reasoning process behind resource selection is the main target of the RECAP. Most systems nowadays are not context-aware and ignore input parameters, which should be considered in for highly dynamic Cloud applications, such as the user locations or the available networking infrastructure.
More info: http://recap-project.eu/.