Programming computers is error-prone: software errors are pervasive, with results ranging from mere annoyance (a crashing Web browser) to life-threading system failure in critical systems (such as planes and cars). In addition to spectacular, but rare, injuries caused by...
Programming computers is error-prone: software errors are pervasive, with results ranging from mere annoyance (a crashing Web browser) to life-threading system failure in critical systems (such as planes and cars). In addition to spectacular, but rare, injuries caused by software malfunction, the cumulative effect of software errors amounts to a large economic loss (for instance, eWEEK predicted a US$500 million starting point for the cost of the Heartbleed SSL bug uncovered in 2014). It is thus not surprising that a significant effort and budget in a software development (over 50% for critical systems) is devoted to ensuring that software are correctly implemented. Currently, software correctness is mainly ensured through testing, i.e., monitoring the program on a selected set of executions. Tests are effective at uncovering bugs quickly, but can never ensure the freedom from bugs, because the tested executions only account for a fraction of all possible executions. Bugs can still lurk in untested corners of the software. Even achieving a significant code coverage requires a large effort, and will not scale up to the size and complexity of future software.
The Mopsa project aims to improve the effectiveness and cost of program verification, which would directly benefit our software-reliant society. To achieve this goal, the project departs from testing and considers formal verification methods, which employ logic to reason on programs and provide rigorous mathematical guarantees on their correctness. Due to the inherent indecidability of the verification problem, several flavors of formal methods have been proposed, with various trade-offs in generality and automation. The focus of the project is on semantic static analysis, which relies on approximations to provide tools that are fully automated and work directly on the source code without requiring user help. Full automation makes static analysis very attractive in an industrial context, as it is usable by engineers not versed in formal methods. Moreover, the approximation is sound: by erring on the safe side (for instance, over-approximating the range of variables when exact computation is not possible or impractical), the analysis infers true facts on the program behavior, and can provably demonstrate its correctness with respect to well-defined criteria. But approximations cause incompleteness: not every true fact about a program can be inferred, so that the analysis can fail to prove that a correct program is correct, an unfortunate effect we strive to minimize. The project is based on the theory of abstract interpretation, a formal framework to design static analyses that are sound by construction and parameterized by the choice of an abstract (i.e., approximate) domain of interpretation, providing a fine tuning between cost and precision. Abstract interpretation consists in defining formally the semantics of the programs and the correctness properties of interest, and then designing abstract domains that achieve a sufficient precision while ignoring (for efficiency) aspects that are not relevant to the verification problem.
In the last two decades, static analyzers by abstract interpretation have enjoyed a growing success and moved form academic research to industry, with the availability of tools such as Polyspace (The Mathworks) and Astrée (AbsInt). While successful in the embedded critical industry, such tools remain focused on a narrow set of programming languages (statically typed languages, such as C, Java, or Ada), highly specific applications and development practices (the embedded software industry), and low-level correctness properties (the absence of run-time errors). Other classes of tools analyze larger code bases but at the cost of relaxing soundness guarantees, reducing the confidence in the analysis results.
The objective of the Mopsa project is to improve the state of sound semantic static analysis methods to the point where they can be used by pr
We have worked along several directions to advance the state of the art in static analysis by abstract interpretation: analyses for dynamic languages, modular analyses for scalability, the analysis of novel data-structures, and the analysis of novel properties to ease software evolution.
Firstly, we considered the static analysis of programs in Python, a recent and popular language featuring dynamic typing, complex control-flow (such as generators), and a dynamically-alterable class and object system. It also has no formally defined semantics. These aspects make the formal static analysis of Python programs very challenging, compared to the more static languages considered in existing static analyses. We started by formalizing the semantics of a representative fragment of Python, using a form of semantics able to express properties of variable values and exceptions, that is also amenable to approximate computation by abstract interpretation. We then proposed abstract domains leading to effective analyses: a domain for simple scalar and object values, a domain to handle generators, and a more complex domain to represent numeric relations. We are pursuing ongoing work on a type-based domain that merges ideas from ML-style parametric type systems, gradual typing, and class and object based typing.
We also considered the scalability problem, choosing here a more classic language target (C). We developed modular methods able to analyze a program one component at a time, and reuse and compose the results into a full-program analysis. This work includes techniques to lift existing non-modular analyses to modular ones, and novel abstractions dedicated to the specific needs of modular analyses. In particular, we obtained preliminary results for framing methods that ease the reuse of analysis results by isolating the part of a program state relevant to a function analysis.
We considered the analysis of two pervasive data-structures: C-style null-terminating strings, and matrices as used in linear algebra libraries. For both types, we developed specific abstract domains that can be plugged into an analyzer to enhance the analysis precision of programs featuring these data-structures.
We started the study of novel program properties beyond safety, useful for the maintenance and evolution of software. We built the first steps for a fully semantic and sound analysis of software patches. It is based on a novel notion of semantic difference between program versions, and it can exploit existing and novel abstract domains to design effective analyzers.
Finally, throughout the project, we are developing a novel software platform to integrate these analyses, and analyze C and Python software.
The static analysis of dynamic languages aiming towards soundness is a recent field. Past results focus mainly on JavaScript. We have proposed the first formal semantics, and the first static analysis, for a realistic subset of Python, that can already handle small programs without modification. We plan to add support for the extensive Python standard library in order to demonstrate the analysis on actual applications. The developed analysis framework will allow a wide spectrum of analyses, from precise value analyses to efficient type analyses.
Additionally, the state of the art uses costly monolithic analyses to achieve the degree of precision required for program verification, while very scalable modular analyses exist but are not well adapted to sound and precise formal verification. We are bridging the gap between these two extremes by designing new classes of modular analyses with a high level of precision, leveraging novel abstract domains. We have demonstrated that modular analyses can be designed to verify string and matrix manipulation routines on programs with a realistic, low-level, C semantics. We expect to develop this method to the point where it can analyze full C applications, designing new frame abstractions in the process. We will also propose similar methods adapted to the analysis of Python programs.
Analyses taking into account the evolution of software are also relatively news. Our analysis of software patches has the advantage compared to the state of the art of being based on a formal semantics in abstract interpretation style, permitting the use of classic abstraction techniques. We will consider realistic classes of software patches and design novel abstractions adapted to their efficient and precise analysis. We also plan to design a novel portability analysis, which will complement the current analysis by taking into account the evolution of the execution environment of the software, in addition to the evolution of the software code.
The design of our static analysis platform advances the state of the art by allowing a higher flexibility on the definition and composition of abstractions, up to the point where analyses for as widely different languages as C and Python can be constructed from a common pool of components sharing the same interface. The platform will continue integrating the abstractions developed in the future. We plan to release it as open-source software and showcase the available analyses through a web interface.
More info: http://mopsa.lip6.fr.