XML 2.0 specification¶
Table of content¶
Parameters types
Processes¶
The XML process specification makes it possible to use a standard Python
function and to associate it with an XML string that enables the
creation of a Process
instance. This XML string will define the type
and behaviour of function parameters and return value(s).
In order to create a Process
instance for a function it is necessary
to get some information about each parameter of the function and about
the return value. This information about parameters is defined in an XML
string with the exception of the default values of the parameters
that are extracted from the function definition.
The process XML string contains one single <process>
element.
This element that may contain some global properties for the process.
<process>
may contain the following attributes:
capsul_xml (optional): version of the Capsul XML specification this process definition is compatible with. If omitted, the process definition is supposed to be compatible with the latest Capsul XML specification available.
role (optional): A role that is attached to the process. See “Process roles” below.
In the <process>
element, one can find one <input>
element
per parameter of the function. If the process produces one or several
outputs, it must use a <return>
element. If <return>
is not
defined, the value returned by the Python function is ignored and cannot
be used in pipelines. For a single output, the Python function must
directly return the value and the value name (an output value must
always have a name), type and documentations must be in the element’s
attributes (see below). Here is an example of a process defined as a
function returning a value:
from capsul.process.xml import xml_process
@xml_process('''
<process capsul_xml="2.0">
<input name="a" type="int" doc="An integer"/>
<input name="b" type="int" doc="Another integer"/>
<return name="addition" type="int" doc="a + b"/>
</process>
''')
def add(a, b):
return a + b
If the process needs to return several values, they must be declared with
<output>
elements located between <return>
and
</return>
. The function must return the output values either in a
list or in a dictionary. If it is a list the order of the <output>
elements is used to match the values in the list and the process
parameter names. If it is a dictionary, each key must correspond to a
name
attribute in an <output>
element. For instance:
from capsul.process.xml import xml_process
@xml_process('''
<process capsul_xml="2.0">
<input name="a" type="int" doc="An integer"/>
<input name="b" type="int" doc="Another integer"/>
<return>
<output name="quotient" type="int" doc="Quotient of a / b"/>
<output name="remainder" type="int" doc="Remainder of a / b"/>
</return>
</process>
''')
def divide(a, b):
return {
'quotient': int(a / b),
'remainder': a % b,
}
# On a process point of view, it would be equivalent to
# use the following code:
# return [int(a / b), a % b]
<input>
, <output>
, or <return>
(for a single return with no
children elements), contain the following attributes:
name: the name of the function parameter
type: the type of the parameter. See possible parameter types below.
allowed_extensions: for
file
type, list of possible file extensions.doc (optional): the documentation of the parameter
<input>
is straightforward: it is always an input parameter.<output>
is normally an output parameter, except in some cases when it is a file: an output file may have its filename specified as input (the filename is not generated by the process). In this case an additional attribute input_filename specifies the parameter used to specify the filename. this parameter has the typeFile
and is marked as output, but is actually an input to the processing function.<return>
is an output which is returned by the processing function. For a single<return>
it is very similar to<output>
but only one<return>
element is allowed in a process. The process should return a single value.
Parameter types¶
For <input>
, <output>
and <return>
elements, the type
attribute can have the following values:
int
float
string
unicode
file
directory
enum : when this type is used, there must be a
values
attribute that contains a Python literal representing a list of possible values for the parameter.list_int
list_float
list_string
list_unicode
list_file
list_directory
When a parameter accepts multiple types, they must be separated by a
|
. For instance a parameter accepting either a file or a list of
files would use type="file|list_file"
.
Process roles¶
The role of a process gives information about the expected execution
context. It can be used to decide whether a process should be executed
in a given context or not. The role can also be used to propose a
specific GUI for the process. For instance the role "viewer"
indicate that the execution of the process will display something to the
user. There is no need to execute such a process in a remote computer
that is disconnected from the user environment.
The possible process roles are :
viewer
: the process is used to display something to the user. It cannot be executed outside the user graphical environment. A viewer is not supposed to be blocking. It should terminate immediately an let the view live independently of the rest of the process. If blocking is required, use thedialog
role.dialog
: a dialog is used to show something to the user and wait for a user action before ending its execution. Like aviewer
, it cannot be executed outside the user graphical environment. The expected user action can be as simple as clicking on a single “ok” button ; in that case, the process should have no output. But it can be a complete form whose result must be returned via the process output parameter(s).
Association between a Python function and an XML string¶
There are two ways to perform the association between the function and the XML. The recommended method is to use a decorator to explicitly define the XML string associated to the function. Here is an example :
from capsul.process.xml import xml_process
@xml_process('''
<process capsul_xml="2.0">
<input name="input_image" type="file" desc="Path of a NIFTI-1 image file."/>
<input name="method" type="enum" values="['gt', 'ge', 'lt', 'le']" desc="Method for thresolding."/>
<input name="threshold" type="float" desc="Threshold value."/>
<output name="output_image" input_filename="output_location" type="file"
desc="If set, define the output file name. Otherwise, the name is generated using a "threshold_" prefix on the input file name."/>
</process>
''')
def threshold(input_image, method='gt', threshold=0, output_location=None):
pass
It is also possible to put the XML in the docstring of the function. However, this method is not recommend and should be avoided if possible. Example :
def threshold(input_image, method='gt', threshold=0, output_location=None):
'''
<process capsul_xml="2.0">
<input name="input_image" type="file" desc="Path of a NIFTI-1 image file."/>
<input name="method" type="enum" values="['gt', 'ge', 'lt', 'le']" desc="Method for thresolding."/>
<input name="threshold" type="float" desc="Threshold value."/>
<output name="output_image" input_filename="output_location" type="file"
desc="If set, define the output file name. Otherwise, the name is generated using a 'threshold_' prefix on the input file name."/>
</process>
'''
pass
Processes examples¶
from capsul.process.xml import xml_process
@xml_process('''
<process capsul_xml="2.0">
<input name="input_image" type="file" doc="Path of a NIFTI-1 image file."/>
<input name="method" type="enum" values="['gt', 'ge', 'lt', 'le']"
doc="Method for thresolding."/>
<input name="threshold" type="float" doc="Threshold value."/>
<output name="output_image" input_filename="output_image" type="file" doc="Output file name."/>
</process>
''')
def threshold(input_image, output_image, method='gt', threshold=0):
pass
@xml_process('''
<process capsul_xml="2.0">
<input name="input_image" type="file" doc="Path of a NIFTI-1 image file."/>
<input name="mask" type="file" doc="Path of mask binary image."/>
<output name="output_image" input_filename="output_location" type="file" doc="Output file name."/>
</process>
''')
def mask(input_image, mask, output_location=None):
pass
Pipelines¶
An XML pipeline is an XML document containing a single
<pipeline>
element that may contains some global properties for
the pipeline. Since a pipeline is also a process, the <pipeline>
element may contain the same attributes as the <process>
element
(see above).
An XML pipeline contains a series of processes that are defined by
<process>
elements. The input and outputs of processes are connected
by links that are defined in <link>
elements. A pipeline may
allow a user to select one group of processes among a series of process
groups. The processes that are not selected are disabled (they will not
be executed) whereas the selected processes are enabled. The
<processes_selection>
element is used to define a set of
selectable process groups.
The <doc>
element¶
This element has no attributes and contains the documentation of the process in a Sphinx compatible format.
The <process>
element¶
A <process>
element adds a new process instance to the pipeline.
This instance is given a name that can be used in other XML elements
to reference it. The process instance is referencing a module which
is the function that is called when the instance is run. The
<process>
element can have the following attributes:
name: a string that can be used to reference the process instance. This must be a valid Python variable name. It should use the variable naming convention of Python’s PEP 8.
module: a valid Capsul process identifier. This is typically a fully qualified (e.g. containing the absolute Python module dotted path) Python object name. But any string value accepted by
capsul.loadre.get_process_instance()
can be used.role (optional): set the role of the process instance (se “Process roles” above). If a role has been defined on the process module, it is ignored and replaced be the one declared in the pipeline. It is possible to use an empty string to force the process instance in the pipeline to have no role.
iteration (optional): when this attribute is used, the process instance will be an iteration process. The
iteration
attributes contains a coma separated lists of parameter names (for instance"input1,input2,output1"
). This list indicate the process parameter names on which the iteration will be performed. For each of these parameters, the actual type of the process instance parameter will be replaced by a list whose elements must have the process parameter type.enabled (optional): used to explicitly mark a node as disabled (value: “false”)
The <process>
element can contain the following elements:
<set>
¶
The <set>
element is used to set a fixed value to a parameter. It
contains only two attributes:
name: the name of the parameter
value: The value of the parameter expressed as a Python literal. The use of a Python literal format enables the representation of structures values such as list. Some examples of values:
integer:
<set name="x" value ="42"/>
float:
<set name="x" value ="4.2"/>
string:
<set name="x" value ="'a value'"/>
None (i.e. JSON null):
<set name="x" value ="None"/>
list:
<set name="x" value ="['one', 'two', 'three']"/>
When a value is set on a parameter, it becomes an optional parameter.
<nipype>
¶
Capsul can use Nipype interfaces as process module. These interfaces
uses traits
types that have some parameters that need to be set in
some contexts. The Nipype specific <nipype>
element contains a
name
attribute to identify a process parameter. For more information
about these parameters, see Nipype interface
specification
The following attributes can be used to customize Nipype traits
:
usedefault: can be set to
"true"
or"false"
. Omitting the attribute is equivalent to"False"
.copyfile: can be set to
"true"
or"false"
. Omitting the attribute is equivalent to"False"
. If the special value"discard"
is used, the Nipype interfacecopyfile
parameter will be set toTrue
but the copied file will be deleted when the process terminates. This makes it possible to avoid some software (such as SPM) to modify input image but to keep only the original image at the end of the execution (the modified copy is deleted).
The <switch>
element¶
Represents switch nodes. May be replaced by process selection if it proves to fulfill all the needs, but for now “old-style” switches still exist, and are the only ones which can be saved.
Attributes:
name: node name in the pipeline (as in process elements)
switch_value (optional): value of the “switch” parameter: name of the active input
enabled (optional): as in process elements
Children:
<input>
¶
Input name for the switch. Input plugs will be a combination of
input/output names <input>_switch_<output>
Attributes:
name
optional (optional)
"true"
or"false"
<output>
¶
Output plug for the switch.
Attributes:
name
optional (optional)
The <optional_output_switch>
element¶
Represents a specific switch node which allows to have optional output files in the pipeline parameters, while keeping them available for temporary values inside the pipeline if they are left undefined.
Attributes:
name: node name in the pipeline (as in process elements)
enabled (optional): as in process elements
Children:
<input>
¶
Input name for the switch. Input plugs will be a combination of
input/output names <input>_switch_<output>
. In an optional output
switch, only one input is allowed.
Attributes:
name
optional (optional)
"true"
or"false"
<output>
¶
Output plug for the switch. Only one output is allowed.
Attributes:
name
The <link>
element¶
This element adds a ling between an input parameter of a process and an
output parameter of another pipeline. It can also be used to “export” a
process parameter. Exporting a process parameter means making it visible
in the parameters of the pipeline. Unlike, the default Pipeline
behaviour in Capsul’s API, a pipeline defined in Capsul XML 2.0 dot not
export automatically the unconnected parameters of its processes. The
<link>
element contains no child elements and mus have exactly two
attributes:
source: the parameter where the link starts from.
dest: the parameter where the link ends to.
weak_link (optional):
"true"
or"false"
The value of these attributes can be either a single identifier (e.g.
"parameter_name"
) or two identifiers separated by a dot (e.g.
"process_name.parameter_name"
). A single identifier correspond to a
pipeline parameter whereas two identifiers identify a process parameter,
they must correspond to the name of a process and the name of one
parameter of this process.
The <processes_selection>
element¶
The <processes_selection>
element defines a series of processes
groups. Each processes group is composed by a series of processes added
in the pipeline with the <process>
element. Only one of these
processes groups can be executed in the pipeline. Therefore, a new
parameter is added to the pipeline that allows the user to select the
group to execute. All processes in the selected group are activated
(i.e. will be executed) whereas all processes in other groups are
disabled (i.e. will not be executed).
The <processes_selection>
has a single name
attribute that
is the name of the parameter that is added to the pipeline. It must
contains two or more <processes_group>
elements. Each
<processes_group>
contains one or more <process>
element having
only a single name
attribute. This attribute is the name of a
process defined in the pipeline (see The ``<process>`
element <#the-process-element>`__ above).
The <pipeline_steps>
element¶
Children:
<step>
¶
Attributes:
name: name for the step
enabled (optional):
"true"
or"false"
Children:
<node>
¶
Attributes:
name: name of an existing pipeline node which will be part of this step.
The <gui>
element¶
The <gui>
element enables to define the position of nodes for a
graphical representation. The position of a node is given by a
<position>
element that contains three attributes :
name: The name of the process (as given in the process element).
x: The x coordinate of the process.
y: The y coordinate of the process.
A single global zoom level can be given to the gui with a <zoom>
element that contains a single level
attributes whose value is a
floating point.
Pipeline example¶
<pipeline capsul_xml="2.0">
<process name="threshold_gt_1"
module="capsul.process.test.test_load_from_description.threshold">
<set name="threshold" value="1"/>
<set name="method" value="'gt'"/>
</process>
<process name="threshold_gt_10"
module="capsul.process.test.test_load_from_description.threshold">
<set name="threshold" value="10"/>
<set name="method" value="'gt'"/>
</process>
<process name="threshold_gt_100"
module="capsul.process.test.test_load_from_description.threshold">
<set name="threshold" value="100"/>
<set name="method" value="'gt'"/>
</process>
<process name="threshold_lt_1"
module="capsul.process.test.test_load_from_description.threshold">
<set name="threshold" value="1"/>
<set name="method" value="'lt'"/>
</process>
<process name="threshold_lt_10"
module="capsul.process.test.test_load_from_description.threshold">
<set name="threshold" value="10"/>
<set name="method" value="'lt'"/>
</process>
<process name="threshold_lt_100"
module="capsul.process.test.test_load_from_description.threshold">
<set name="threshold" value="100"/>
<set name="method" value="'lt'"/>
</process>
<process name="mask_1"
module="capsul.process.test.test_load_from_description.mask">
</process>
<process name="mask_10"
module="capsul.process.test.test_load_from_description.mask">
</process>
<process name="mask_100"
module="capsul.process.test.test_load_from_description.mask">
</process>
<link source="input_image" dest="threshold_gt_1.input_image"/>
<link source="input_image" dest="threshold_gt_10.input_image"/>
<link source="input_image" dest="threshold_gt_100.input_image"/>
<link source="input_image" dest="threshold_lt_1.input_image"/>
<link source="input_image" dest="threshold_lt_10.input_image"/>
<link source="input_image" dest="threshold_lt_100.input_image"/>
<link source="input_image" dest="mask_1.input_image"/>
<link source="input_image" dest="mask_10.input_image"/>
<link source="input_image" dest="mask_100.input_image"/>
<link source="threshold_gt_1.output_image" dest="mask_1.mask"/>
<link source="threshold_gt_10.output_image" dest="mask_10.mask"/>
<link source="threshold_gt_100.output_image" dest="mask_100.mask"/>
<link source="threshold_lt_1.output_image" dest="mask_1.mask"/>
<link source="threshold_lt_10.output_image" dest="mask_10.mask"/>
<link source="threshold_lt_100.output_image" dest="mask_100.mask"/>
<link source="mask_1.output_image" dest="output_1"/>
<link source="mask_10.output_image" dest="output_10"/>
<link source="mask_100.output_image" dest="output_100"/>
<processes_selection name="select_method">
<processes_group name="greater than">
<process name="threshold_gt_1"/>
<process name="threshold_gt_10"/>
<process name="threshold_gt_100"/>
</processes_group>
<processes_group name="lower than">
<process name="threshold_lt_1"/>
<process name="threshold_lt_10"/>
<process name="threshold_lt_100"/>
</processes_group>
</processes_selection>
<gui>
<position name="threshold_gt_100" x="386.0" y="403.0"/>
<position name="inputs" x="50.0" y="50.0"/>
<position name="mask_1" x="815.0" y="153.0"/>
<position name="threshold_gt_10" x="374.0" y="242.0"/>
<position name="threshold_lt_100" x="556.0" y="314.0"/>
<position name="threshold_gt_1" x="371.0" y="88.0"/>
<position name="mask_10" x="820.0" y="293.0"/>
<position name="mask_100" x="826.0" y="451.0"/>
<position name="threshold_lt_1" x="570.0" y="6.0"/>
<position name="threshold_lt_10" x="568.0" y="145.0"/>
<zoom level="1.0"/>
</gui>
</pipeline>
API¶
Definition of processes and pipelines in Capsul XML 2.0 are recognised
by get_process_instance
. For an XML process, the identifier of
the process is <module>.<function>
where <module>
is the fully
qualified name of the Python module where the function is located and
<function>
is the name of the function as defined in the module. In
order to work with get_process_instance
, the module must be in the
Python path. For instance,
capsul.process.test.test_load_from_description.threshold
is the
identifier of the function threshold
located in the module
capsul.process.test.test_load_from_description
.
For an XML pipeline, get_process_instance
is looking for the XML
file defining the pipeline. The file name must ends with .xml
and be
located in a directory associated to a valid Python package (i.e. a
module in a directory). The pipeline identifier is a string
<module>.<name>
where <module>
is the fully qualified Python
module name and <name>
is the file name without the .xml
extension. For instance capsul.process.test.test_pipeline
is the
identifier for the pipeline defined in
<python_path>/capsul/process/test/test_pipeline.xml
.
One can find all the Processe and Pipeline identifiers defined in a
module (and recursively in all its sub-modules) with the function
find_processes(module_name)
(in capsul.process.finder
). For
instance, to try to instantiate all processes and pipelines defined in
the module clinfmri
:
from capsul.api import get_process_instance, find_processes
for p in find_processes('clinfmri'):
try:
get_process_instance(p)
except Exception:
print 'FAILED', p
else:
print 'GOOD', p
XML validation¶
There is no validation of the XML document in get_process_instance
.
As a consequence, one will only get an error if the XML does not allow
to build a process or pipeline class (for instance if a mandatory
attribute is missing). On the other hand, misspelling of an element or
attribute name may not raise an error (the unknown item is simply
ignored). If there is a need for a validation feature for pipeline
development, it will be added in separate functions that would be built
to give precise errors and warnings to the user (including line number
in the XML file).