Automatic reproducibility of COMPSs experiments through the integration of RO-Crate in Chameleon
- Topics: Provenance, reproducibility, standards, image creation
- Skills: Python, JSON, Bash scripting, Linux, image creation and deployment
- Difficulty: Medium
- Size: Large (350 hours)
- Mentors: Raül Sirvent
Project Idea Description
The COMPSs programming model provides an interface for the programming of a sequential application that is transformed in a workflow that, thanks to the COMPSs runtime, is later scheduled in the available computing resources. Programming is enabled for different languages through the use of bindings: Java, C/C++ and Python (named PyCOMPSs). COMPSs is able to generate Workflow Provenance information after the execution of an experiment. The generated artifact (code + data + recorded metadata) enables the sharing of results through the use of tools such as the WorkflowHub portal, that provides the capacity of generating a DOI of the results to include them as permanent references in scientific papers.
The format of the metadata generated in COMPSs experiments follows the RO-Crate specification, and, more specifically, two profiles: the Workflow and Workflow Run Crate profiles. This metadata enables not only the sharing of results, but also their reproducibility.
This project proposes the creation of a service that enables the automatic reproducibility of COMPSs experiments in the Chameleon infrastructure. The service will be able to get a COMPSs crate (artifact that follows the RO-Crate specification), and, by parsing the available metadata, build a Chameleon compatible image for reproducing the experiment in the testbed. Small modifications to the COMPSs RO-Crate are foreseen (i.e. the inclusion of third party software required by the application).
Project Deliverables
- Study the different environments and specifications (COMPSs, RO-Crate, Chameleon, Trovi, …).
- Design the most appropriate integration, considering all the elements involved.
- Integrate PyCOMPSs basic experiments reproducibility in Chameleon.
- Integrate PyCOMPSs complex experiments reproducibility in Chameleon (i.e. with third party software dependencies).