capsul.engine module¶
Main module¶
This module defines the main API to interact with Capsul processes.
In order to execute a process, it is mandatory to have an instance of
CapsulEngine
. Such an instance can be created with factory
capsul_engine()
Classes¶
CapsulEngine
¶
Functions¶
database_factory()
¶
capsul_engine()
¶
activate_configuration()
¶
- class capsul.engine.CapsulEngine(self, database_location, database, config=None)[source]¶
A CapsulEngine is the mandatory entry point of all software using Capsul. It contains objects to store configuration and metadata, defines execution environment(s) (possibly remote) and performs pipelines execution.
A CapsulEngine must be created using capsul.engine.capsul_engine function. For instance:
from capsul.engine import capsul_engine ce = capsul_engine()
Or:
from capsul.api import capsul_engine ce = capsul_engine()
By default, CapsulEngine only stores necessary configuration. But it may be necessary to modify the Python environment globally to apply this configuration. For instance, Nipype must be configured globally. If SPM is configured in CapsulEngine, it is necessary to explicitly activate the configuration in order to modify the global configuration of Nipype for SPM. This activation is done by explicitly activating the execution context of the capsul engine with the following code, inside a running process:
from capsul.engine import capsul_engine, activate_configuration ce = capsul_engine() # Nipype is not configured here config = capsul_engine.settings.select_configurations( 'global', {'nipype': 'any'}) activate_configuration(config) # Nipype is configured here
Note
CapsulEngine is the replacement of the older
StudyConfig
, which is still present in Capsul 2.2 for backward compatibility, but will disappear in later versions. In Capsul 2.2 both objects exist, and are synchronized internally, which means that a StudyConfig object will also create a CapsulEngine, and the other way, and modifications in the StudyConfig object will change the corresponding item in CapsulEngine and vice versa. Functionalities of StudyConfig are moving internally to CapsulEngine, StudyConfig being merely a wrapper.Using CapsulEngine
It is used to store configuration variables, and to handle execution within the configured context. The configuration has 2 independent axes: configuration modules, which provide additional configuration variables, and “environments” which typically represent computing resources.
Computing resources
Capsul is using Soma-Workflow to run processes, and is thus able to connect and execute on a remote computing server. The remote computing resource may have a different configuration from the client one (paths for software or data, available external software etc). So configurations specific to different computing resources should be handled in CapsulEngine. For this, the configuration section is split into several configuration entries, one for each computing resource.
As this is a little bit complex to handle at first, a “global” configuration (what we call “environment”) is used to maintain all common configuration options. It is typically used to work on the local machine, especially for users who only work locally.
Configuration is stored in a database (either internal or persistent), through the
Settings
object found inCapsulEngine.settings
. Access and modification of settings should occur within a session block usingwith capsul_engine.settings as session
. See theSettings
class for details.>>> from capsul.api import capsul_engine >>> ce = capsul_engine() >>> config = ce.settings.select_configurations('global') >>> config = ce.global_config >>> print(config) {'capsul_engine': {'uses': {'capsul.engine.module.fsl': 'ALL', 'capsul.engine.module.matlab': 'ALL', 'capsul.engine.module.spm': 'ALL'}}}
Whenever a new computing resource is used, it can be added as a new environment key to all configuration operations.
Note that the settings store all possible configurations for all environments (or computing resources), but are not “activated”: this is only done at runtime in specific process execution functions: each process may need to select and use a different configuration from other ones, and activate it individually.
Process
subclasses or instances may provide their configuration requirements via theirrequirements()
method. This method returns a dictionary of request strings (one element per needed module) that will be used to select one configuration amongst the available settings entries of each required module.configuration modules
The configuration is handled through a set of configuration modules. Each is dedicated for a topic (for instance handling a specific external software paths, or managing process parameters completion, etc). A module adds a settings table in the database, with its own variables, and is able to manage runtime configuration of programs, if needed, through its
activate_configurations
function. Capsul comes with a set of predefined modules:attributes
,axon
,fom
,fsl
,matlab
,spm
Methods
The CapsulEngine constructor should not be called directly. Use
capsul_engine()
factory function instead.- connected_to()[source]¶
Return the name of the computing resource this capsul engine is connected to or None if it is not connected.
- detailed_information(execution_id)[source]¶
Return complete (and possibly big) information about a process execution.
- dispose(execution_id, conditional=False)[source]¶
Update the database with the current state of a process execution and free the resources used in the computing resource (i.e. remove the workflow from SomaWorkflow).
If
conditional
is set to True, then dispose is only done if the configuration does not specify to keep succeeded / failed workflows.
- executions()[source]¶
List the execution identifiers of all processes that have been started but not disposed in the connected computing resource. Raises an exception if the computing resource is not connected.
- get_iteration_pipeline(pipeline_name, node_name, process_or_id, iterative_plugs=None, do_not_export=None, make_optional=None, **kwargs)[source]¶
Create a pipeline with an iteration node iterating the given process.
- Parameters:
pipeline_name (str) – pipeline name
node_name (str) – iteration node name in the pipeline
process_or_id (process description) – as in
get_process_instance()
iterative_plugs (list (optional)) – passed to
Pipeline.add_iterative_process()
do_not_export (list) – passed to
Pipeline.add_iterative_process()
make_optional (list) – passed to
Pipeline.add_iterative_process()
- Returns:
pipeline
- Return type:
Pipeline
instance
- get_process_instance(process_or_id, **kwargs)[source]¶
The only official way to get a process instance is to use this method. For now, it simply calls self.study_config.get_process_instance but it will change in the future.
- import_configs(environment, config_dict, cont_on_error=False)[source]¶
Import config values from a dictionary as given by
Settings.select_configurations()
.Compared to
Settings.import_configs()
this method (atCapsulEngine
level) also loads the required modules.
- interrupt(execution_id)[source]¶
Try to stop the execution of a process. Does not wait for the process to be terminated.
- load_module(module_name)[source]¶
Load a module if it has not already been loaded (is this case, nothing is done)
A module is a fully qualified name of a Python module (as accepted by Python import statement). Such a module must define the two following functions (and may define two others, see below):
def load_module(capsul_engine, module_name): def set_environ(config, environ):
load_module of each module is called once before reading and applying the configuration. It can be used to add traits to the CapsulEngine in order to define the configuration options that are used by the module. Values of these traits are automatically stored in configuration in database when self.save() is used, and they are retrieved from database before initializing modules.
set_environ is called in the context of the processing (i.e. on the, possibly remote, machine that runs the pipelines). It receives the configuration as a JSON compatible dictionary (for instance a CapsulEngine attribute capsul_engine.spm.directory would be config[‘spm’][‘directory’]). The function must modify the environ dictionary to set the environment variables that must be defined for pipeline configuration. These variables are typically used by modules in capsul.in_context module to manage running external software with appropriate configuration.
- load_modules(require)[source]¶
Call self.load_module for each required module. The list of modules to load is located in self.modules (if it is None, capsul.module.default_modules is used).
- raise_for_status(status, execution_id=None)[source]¶
Raise an exception if a process execution failed
- start(process, workflow=None, history=True, get_pipeline=False, **kwargs)[source]¶
Asynchronously start the execution of a process or pipeline in the connected computing environment. Returns an identifier of the process execution and can be used to get the status of the execution or wait for its termination.
TODO: if history is True, an entry of the process execution is stored in the database. The content of this entry is to be defined but it will contain the process parameters (to restart the process) and will be updated on process termination (for instance to store execution time if possible).
- Parameters:
process (Process or Pipeline instance)
workflow (Workflow instance (optional - if already defined before call))
history (bool (optional)) – TODO: not implemented yet.
get_pipeline (bool (optional)) – if True, start() will return a tuple (execution_id, pipeline). The pipeline is normally the input pipeline (process) if it is actually a pipeline. But if the input process is a “single process”, it will be inserted into a small pipeline for execution. This pipeline will be the one actually run, and may be passed to
wait()
to set output parameters.
- Returns:
execution_id (int) – execution identifier (actually a soma-workflow id)
pipeline (Pipeline instance (optional)) – only returned if get_pipeline is True.
- capsul.engine.activate_configuration(selected_configurations)[source]¶
Activate a selected configuration (set of modules) for runtime.
- capsul.engine.activate_module(module_name)[source]¶
Activate a module configuration for runtime. This function is called by activate_configuration() and assumes the global variable
capsul.engine.configurations
is properly setup.
- capsul.engine.capsul_engine(database_location=None, require=None)[source]¶
User facrory for creating capsul engines.
If no database_location is given, it will default to an internal (in- memory) database with no persistent settings or history values.
Configuration is read from a dictionary stored in two database entries. The first entry has the key ‘global_config’ (i.e. database.json_value(‘global_config’)), it contains the configuration values that are shared by all processing engines. The second entry is computing_config`. It contains a dictionary with one item per computing resource where the key is the resource name and the value is configuration values that are specific to this computing resource.
Before initialization of the CapsulEngine, modules are loaded. The list of loaded modules is searched in the ‘modules’ value in the database (i.e. in database.json_value(‘modules’)) ; if no list is defined in the database, capsul.module.default_modules is used.
capsul.engine.database_json submodule¶
- class capsul.engine.database_json.JSONDBEngine(json_filename)[source]¶
A JSON dictionary implementation of
capsul.engine.database.DatabaseEngine
- named_directories()[source]¶
List the names of all named directories. This method may return any iterable value (list, generator, etc.)
- set_json_value(name, json_value)[source]¶
Store a json value and associate it with a key given in “name”. The value can be retrieved with method json_value().
@param name: unique key used to identify and retrieve the value @type name: C{string} @param json_value: a value to store in the database @type name: any JSON compatible value
- set_named_directory(name, path)[source]¶
Associate an absolute path to a generic name. This allow to always use a location independent name for a directory such as ‘spm_template’ and to customize the real absolute path in the configuration. These named directories are used when setting/retrieving path metadata with set_path_metadata() and path_metadata().
capsul.engine.database_populse submodule¶
- class capsul.engine.database_populse.PopulseDBEngine(database_engine)[source]¶
-
- named_directories()[source]¶
List the names of all named directories. This method may return any iterable value (list, generator, etc.)
- set_json_value(name, json_value)[source]¶
Store a json value and associate it with a key given in “name”. The value can be retrieved with method json_value().
@param name: unique key used to identify and retrieve the value @type name: C{string} @param json_value: a value to store in the database @type name: any JSON compatible value
- set_named_directory(name, path)[source]¶
Associate an absolute path to a generic name. This allow to always use a location independent name for a directory such as ‘spm_template’ and to customize the real absolute path in the configuration. These named directories are used when setting/retrieving path metadata with set_path_metadata() and path_metadata().
capsul.engine.database submodule¶
- class capsul.engine.database.DatabaseEngine[source]¶
A
DatabaseEngine
is the base class for all engines allowing to store, retrieve and search metadata associated with a key that can be either a string or a path (i.e. a file or directory name).To instantiate a
DatabaseEngine
one must use the factory To date, two concreteDatabaseEngine
implementations exist:- check_path(path, named_directory=None)[source]¶
Find a pair (named_directory, path) given a path and eventually a named_directory.
If named_directory is not given, path must be absolute or a ValueError is raised. Then, either the corresponding named directory is found or ‘absolute’ is used.
If name_directory is given, the path must be relative (unless named_directory == ‘absolute’) or begin with the path of the named directory.
- named_directories()[source]¶
List the names of all named directories. This method may return any iterable value (list, generator, etc.)
- set_json_value(name, json_value)[source]¶
Store a json value and associate it with a key given in “name”. The value can be retrieved with method json_value().
@param name: unique key used to identify and retrieve the value @type name: C{string} @param json_value: a value to store in the database @type name: any JSON compatible value
- set_named_directory(name, path)[source]¶
Associate an absolute path to a generic name. This allow to always use a location independent name for a directory such as ‘spm_template’ and to customize the real absolute path in the configuration. These named directories are used when setting/retrieving path metadata with set_path_metadata() and path_metadata().
capsul.engine.module submodule¶
attributes¶
Attributes completion config module
axon¶
Configuration module which links with Axon
- capsul.engine.module.axon.check_configurations()[source]¶
Checks if the activated configuration is valid to use BrainVisa and returns an error message if there is an error or None if everything is good.
- capsul.engine.module.axon.check_notably_invalid_config(conf)[source]¶
Checks if the given module config is obviously invalid, for instance if a mandatory path is not filled
- Returns:
invalid – list of invalid config keys
- Return type:
fom¶
Config module for File Organization models (FOMs)
Classes¶
FomConfig
¶
fsl¶
- capsul.engine.module.fsl.activate_configurations()[source]¶
Activate the FSL module (set env variables) from the global configurations, in order to use them via
capsul.in_context.fsl
functions
- capsul.engine.module.fsl.check_configurations()[source]¶
Checks if the activated configuration is valid to run FSL and returns an error message if there is an error or None if everything is good.
- capsul.engine.module.fsl.check_notably_invalid_config(conf)[source]¶
Checks if the given module config is obviously invalid, for instance if a mandatory path is not filled
- Returns:
invalid – list of invalid config keys
- Return type:
matlab¶
- capsul.engine.module.matlab.check_configurations()[source]¶
Check if the activated configuration is valid for Matlab and return an error message if there is an error or None if everything is good.
python¶
Python configuration module for CAPSUL
This config module allows the customization of python executable and python path in process execution. It can (as every config module) assign specific config values for different environments (computing resources, typically).
Python configuration is slightly different from other config modules in the way that it cannot be handled during execution inside a python library: python executable and modules path have to be setup before starting python and loading modules. So the config here sometimes has to be prepared from client side and hard-coded in the job to run.
For this reason, what we call here “python jobs” have a special handling. “python jobs” are Process
classes defining a _run_process()
method, and not get_commandline()
. Processing are thus python functions or methods, and need the capsul library to run.
Python jobs are handled in workflow building (capsul.pipeline.pipeline_workflow
), and jobs on engine side should not have to bother about it.
The python config module is not mandatory: if no specific configuration is needed, jobs are run using the python command from the path, following the client sys.executable
short name (if the client runs /usr/bin/python3
, the engine will try to use python3
from the PATH
.
The python config module is used optionally (if there is a config, it is used, otherwise no error is produced), and automatically for all jobs: no need to declare it in jobs requirements()
method.
Inside process execution, the module is otherwise handled like any other.
- capsul.engine.module.python.activate_configurations()[source]¶
Activate the python module from the global configurations
spm¶
- capsul.engine.module.spm.activate_configurations()[source]¶
Activate the SPM module (set env variables) from the global configurations, in order to use them via
capsul.in_context.spm
functions
capsul.engine.run submodule¶
Implementation of CapsulEngine
processing methods.
They have been moved to this file for clarity.
Running is always using Soma-Workflow.
- exception capsul.engine.run.WorkflowExecutionError(controller, workflow_id, status=None, workflow_kept=True, verbose=True)[source]¶
Exception class raised when a workflow execution fails. It holds references to the
WorkflowController
and the workflow id
- capsul.engine.run.detailed_information(engine, execution_id)[source]¶
Return complete (and possibly big) information about a process execution.
- capsul.engine.run.dispose(engine, execution_id, conditional=False)[source]¶
Update the database with the current state of a process execution and free the resources used in the computing resource (i.e. remove the workflow from SomaWorkflow).
If
conditional
is set to True, then dispose is only done if the configuration does not specify to keep succeeded / failed workflows.
- capsul.engine.run.interrupt(engine, execution_id)[source]¶
Try to stop the execution of a process. Does not wait for the process to be terminated.
- capsul.engine.run.raise_for_status(engine, status, execution_id=None)[source]¶
Raise an exception if a process execution failed
- capsul.engine.run.start(engine, process, workflow=None, history=True, get_pipeline=False, **kwargs)[source]¶
Asynchronously start the execution of a process or pipeline in the connected computing environment. Returns an identifier of the process execution and can be used to get the status of the execution or wait for its termination.
TODO: if history is True, an entry of the process execution is stored in the database. The content of this entry is to be defined but it will contain the process parameters (to restart the process) and will be updated on process termination (for instance to store execution time if possible).
- Parameters:
engine (CapsulEngine)
process (Process or Pipeline instance)
workflow (Workflow instance (optional - if already defined before call))
history (bool (optional)) – TODO: not implemented yet.
get_pipeline (bool (optional)) – if True, start() will return a tuple (execution_id, pipeline). The pipeline is normally the input pipeline (process) if it is actually a pipeline. But if the input process is a “single process”, it will be inserted into a small pipeline for execution. This pipeline will be the one actually run, and may be passed to
wait()
to set output parameters.
- Returns:
execution_id (int) – execution identifier (actually a soma-workflow id)
pipeline (Pipeline instance (optional)) – only returned if get_pipeline is True.
capsul.engine.settings submodule¶
This module provides classes to store CapsulEngine settings for several execution environment and choose a configuration for a given execution environment. Setting management in Capsul has several features that makes it different from classical ways to deal with configuration:
CapsulEngine must be able to deal with several configurations for the same software. For instance, one can configure both SPM 8 and SPM 12 and choose later the one to use.
A single pipeline may use various configurations of a software. For instance a pipeline could compare the results of SPM 8 and SPM 12.
Settings definition must be modular. It must be possible to define possible settings values either in Capsul (for well known for instance) or in external modules that can be installed separately.
Capsul must deal with module dependencies. For instance the settings of SPM may depends on the settings of Matlab. But this dependency exists only if a non standalone SPM version is used. Therefore, dependencies between modules may depends on settings values.
CapsulEngine settings must provide the possibility to express a requirement on settings. For instance a process may require to have version of SPM greater or equal to 12.
The configuration of a module can be defined for a specific execution environment. Settings must allow to deal with several executions environments (e.g. a local machine and a computing cluster). Each environment may have a different configuration (for instance the SPM installation directory is not the same on the local computer and on a computing cluster).
To implement all these features, it was necessary to have a settings storage system providing a query language to express requirements such as spm.version >= 12
. Populse_db was thus chosen as the storage and query system for settings. Some of the settings API choices have been influenced by populse_db API.
CapsulEngine settings are organized in modules. Each module defines and documents the schema of values that can be set for its configuration. Typically, a module is dedicated to a software. For instance the module for SPM accepts confiurations containing a version (a string), an install directory (a string), a standalone/matlab flag (a boolean), etc. This schema is used to record configuration documents for the module. There can be several configuration documents per module. Each document corresponds to a full configuration of the module (for instance a document for SPM 8 configuration and another for SPM 12, or one for SPM 12 standalone and another for SPM 12 with matlab).
Settings cannot be used directly to configure the execution of a software. It is necessary to first select a single configuration document for each module. This configurations selection step is done by the Settings.select_configurations()
method.
- class capsul.engine.settings.Settings(populse_db)[source]¶
Main class for the management of CapsulEngine settings. Since these settings are always stored in a populse_db database, it is necessary to activate a settings session in order to read or modify settings. This is done by using a with clause:
from capsul.api import capsul_engine # Create a CapsulEngine ce = capsul_engine() with ce.settings as settings: # Read or modify settings here conf = settings.new_config('spm', 'global', {'version': '12', 'standalone': True}) # modify value conf.directory = '/usr/local/spm12-standalone'
Create a settings instance using the given populse_db instance
- import_configs(environment, config_dict, cont_on_error=False)[source]¶
Import config values from a dictionary as given by
select_configurations()
.Compared to
CapsulEngine.import_configs()
this method (atSettings
level) does not load the required modules.
- static module_name(module_name)[source]¶
Return a complete module name (which must be a valid Python module name) given a possibly abbreviated module name. This method must be used whenever a module name is written by a user (for instance in a configuration file. This method add the prefix ‘capsul.engine.module.’ if the module name does not contain a dot.
- select_configurations(environment, uses=None, check_invalid_mods=False)[source]¶
Select a configuration for a given environment. A configuration is a dictionary whose keys are module names and values are configuration documents. The returned set of configuration per module can be activaded with capsul.api.activate_configuration().
The uses parameter determine which modules must be included in the configuration. If not given, this method considers all configurations for every module defined in settings. This parameter is a dictionary whose keys are a module name and values are populse_db queries used to select module.
The environment parameter defines the execution environment in which the configurations will be used. For each module, configurations are filtered with the query. First, values are searched in the given environment and, if no result is found, the ‘global’ environment (the value defined in Settings.global_environment) is used.
If check_invalid_mods is True, then each selected config module is checked for missing values and discarded if there are.
Example
To select a SPM version greater than 8 for an environment called ‘my_environment’ one could use the following code:
config = ce.select_configurations('my_environment', uses={'spm': 'version > 8'})
- class capsul.engine.settings.SettingsSession(populse_session, module_notifiers=None)[source]¶
Settings use/modifiction session, returned by “with settings as session:”
SettingsSession are created with Settings.__enter__ using a with statement.
- static collection_name(module)[source]¶
Return the name of the populse_db collection corresponding to a settings module. The result is the full name of the module prefixed by Settings.collection_prefix (i.e. ‘settings/’).
- config(module, environment, selection=None, any=True)[source]¶
Selects configurations (like in
configs()
) and ensures at most one one is selected- Parameters:
- Returns:
config – None if no matching config is found or more than one if any is False
- Return type:
SettingsConfig instance or None
- configs(module, environment, selection=None)[source]¶
Returns a generator that iterates over all configuration documents created for the given module and environment.
- ensure_module_fields(module, fields)[source]¶
Make sure that the given module exists in settings and create the given fields if they do not exist. fields is a list of dictionaries with three items: - name: the name of the key - type: the data type of the field (in populse_db format) - description: the documentation of the field
- new_config(module, environment, values)[source]¶
Creates a new configuration document for a module in the given environment. Values is a dictionary used to set values for the document. The document mut have a unique string identifier in the Settings.config_id_field (i.e. ‘config_id’), if None is given in values a unique random value is created (with uuid.uuid4()).