A Web-Based Tool for Executing Computational Pipelines in HPC Environments

#Introduction #Overview #Implementation #Features #Video

Introduction

Nowadays, workflow management software is widely used in various fields of data processing. These tools simplify the process of managing and implementing computational tools for data processing by providing users with a graphical interface. Such software reduces coding errors and assists users across different fields in analyzing data. Workflow tools designed for managing and implementing computational pipelines streamline pipeline setup and enable precise analysis of outputs, error tracking, pipeline reuse, reporting, and more. Achieving these results manually through command-line tools is often complex for non-specialist users. Access levels in these tools allow users, whether experts or novices, to create, manage, or execute processing workflows at various levels.

Overview

The tool presented in this study is a web-based graphical interface adhering to workflow standards. It serves users with varying levels of expertise, enabling them to create and execute computational pipelines in HPC environments as directed acyclic graphs (DAGs), focusing on big data in bioinformatics. DAGs simplify and accelerate many data processing algorithms. In bioinformatics, these graphs—free of cyclic dependencies—are widely used to process large datasets requiring various tools for normalization, sorting, searching, etc. Each tool functions as a graph node, with nodes processed in the order of their directed edges. DAGs avoid self-referencing nodes and ensure unidirectional data flow, similar to a pipeline where water flows in one direction.

Node A can fetch data from an API.
Node B removes duplicate data using a suitable tool.
Node D sorts the data with a sorting tool.
Node E prepares the data for database storage.
Node F saves the data to the database.

When converting a computational pipeline into a DAG, each node represents a step composed of a tool. The output of one tool serves as the input for the next, and this process can continue for up to *n* steps as required.

Implementation

Our tool consists of three main components: frontend, backend, and the HPC computational cluster. The frontend, which displays the user interface, was developed using HTML5 and CSS3, along with Bootstrap, JavaScript, and Vue.js libraries to enhance usability and user experience. The backend, the core of our tool, was developed using PHP 7. We utilized the MVC framework for readability, management, testing, and scalability, leveraging Laravel as the PHP MVC framework for implementation. MySQL was used as the default database to store user inputs and execution histories, though other databases like PostgreSQL or SQLite can be configured if needed.

Features

Our tool provides various features through a web-based graphical interface. Creating computational tools is straightforward for expert users familiar with the computational cluster and configured server tools. They can design forms for input, parameters, and outputs via the graphical interface, making their tools available to less technically inclined users. The created pipelines are stored in the tool’s database for reuse, allowing regular users to obtain outputs by merely providing the necessary inputs and parameters. Notably, all steps—including tool creation, pipeline design, and execution—are accessible through the web interface without requiring users to write code or use the command line. Advanced users can also specify commands, parameter fields, inputs, and execution methods via the graphical interface, enabling others to analyze data efficiently.

Video

Download video.

Participate in Survey