Soma-workflow: A unified and simple interface to parallel computing resource¶
Parallel computing resources are now highly available: multiple core machines, clusters or grids. Soma-workflow is a unified and simple interface to parallel computing resources which aims at making easier the use of parallel resources by non expert users and software.
Soma-workflow is an open source Python application.
Contents¶
- Soma-workflow concepts
- Important changes in Soma-Workflow 3
- Graphical User Interface
- Examples
- MPI workflow runner
- Tips, Good Practices and Errors
- Status list
- Workflow creation API
- Python API
- Supporting new computing resources: writing a Scheduler implementation
- Installation and configuration
- Changelog
Licence¶
Soma-Workflow is free software, and is distributed under the CeCILL-B licence, which is similar to the BSD licence (with precisions for the french law).
Download¶
Quick start on a multiple core machine¶
Requirements: Python 3 or more. For the GUI: Qt version 5 or more, PyQt, or PySide and optionally matplotlib <http://matplotlib.sourceforge.net/>. From Soma-workflow 3.2, python 2.7 is no longer supported.
We recommend to install Soma-workflow in a local directory: no special rights required & easy clean up at any time removing the local directory.
Create a local directory such as ~/.local/lib/python3.10/site-packages and create the bin directory: ~/.local/bin
Setup the environment variables with the commands:
$ export PYTHONPATH=$HOME/.local/lib/python3.10/site-packages:$PYTHONPATH $ export PATH=$HOME/.local/bin:$PATH
You can copy these lines in your ~/.bashrc for an automatic setup of the variables at login.
Download the latest tarball from PyPI and expand it.
Install Soma-workflow in the ~/.local directory:
$ python setup.py install --user
If you chose a different name for you local directory (ex: ~/mylocal) use instead the following command:
$ python setup.py install --prefix ~/mylocal
Run the GUI:
$ soma_workflow_gui
Run the documentation examples.
Main Features¶
- Unified interface to multiple computing resources:
Submission of jobs or workflows with an unique interface to various parallel resources: multiple core machines or clusters which can be managed by various systems (such as Grid Engine, Condor, Torque/PBS, LSF..)
- Workflow management:
Soma-workflow provides the possibility to submit a set of tasks (called jobs) with execution dependencies without dealing with individual task submission.
- Python API and Graphical User Interface:
The Python API was designed to be easily used by non expert user, but also complete to meet external software needs: submission, control and monitoring of jobs and workflows. The GUI provides an easy and quick way of monitoring workflows on various computing resources. The workflows can also be submitted and controlled using the GUI.
- Quick start on multiple core machines:
Soma-workflow is directly operational on any multiple core machine.
- Transparent remote access to computing resources:
When the computing resource is remote, Soma-workflow can be used as a client-server application. The communication with a remote computing resource is done transparently for the user through a ssh port forwarding tunnel. The client/server architecture enables the user to close the client application at any time. The workflows and jobs execution are not stopped. The user can open a client at any time to check the status of his work.
- File transfer and file path mapping tools:
If the user’s machine and the remote computing resource do not have a shared file system, Soma-workflow provides tools to handle file transfers and/or path name matchings.
New¶
In version 3.2:¶
Python2 support has been dropped.
Optimization for scalability: performance in the engine and database has been much improved, in order to manage very large workflows, especially for workflows of independent jobs (large iterations). Where soma-workflow <= 3.1 used to reach a limit around 5-10000 jobs (with timeouts causing aborts and hangs in the engine), in 3.2 it has now been tested using up to 500000 jobs in local and MPI modes. The MPI mode is by far the most efficient for such large workflows (processing 500000 very small jobs in about 25 minutes on a 8-core laptop). Both Workflow submission, engine management, and scheduling have been much improved, and processing time has dropped by several orders of magnitude. For very large workflows around the limit, however, monitoring (using either soma_workflow_gui or the client API) may involve an overhead which may cause timeouts in the database, which may in turn hang or crash the engine.
Docs complements and improvements.
Support for PyQt6 on the way (not entirely tested yet).
Added an “isolated light mode” mode where all soma-workflow (config, database, temp files and transfers) are in a separate, potentially temporary directory, and will not interfere with other instances of soma-workflow and databases.
Several bug fixes.
Communications¶
- Communications:
MICCAI 2011 workshop High Performance and Distributed Computing for Medical Imaging, September 22nd 2011, Toronto: paper, poster
Python in Neuroscience workshop satellite to Euroscipy, August 29-30 2011, Paris: abstract, presentation
HBM Annual Meeting 2011, Quebec: poster
- They use Soma-workflow:
A fast computational framework for genome-wide association studies with neuroimaging data by Benoit Da Mota et al.