populse_mia.data_manager.project

Mia Project Module

This module provides the core functionality for managing projects and their associated databases within the populse_mia framework. It enables the creation, configuration, and manipulation of projects, including their metadata, data collections, processing history, and workflows.

Classes

Project(project_root_folder, new_project)

Class for managing Populse_mia projects and their associated databases.

class populse_mia.data_manager.project.Project(project_root_folder, new_project)[source]

Bases: object

Class for managing Populse_mia projects and their associated databases.

The Project class is the central component for handling all aspects of a project, including its directory structure, database schema, metadata, and workflows.

It provides methods to:

  • Initialize and configure project directories, database schemas, and default properties.

  • Manage project metadata (name, date, sorting tags, and clinical tags).

  • Interact with data collections (current, initial, bricks, and history) for storing, retrieving, and updating project data.

  • Track and clean up orphaned bricks, histories, and files to maintain database integrity.

  • Handle workflows and pipelines by retrieving finished bricks, their outputs, and processing history.

  • Support undo/redo functionality for user modifications to project data and metadata.

  • Apply and save filters to customize data visualization and analysis.

  • Update database paths when projects are renamed or moved.

This class integrates with DatabaseMIA for database operations, Filter for data filtering, and capsul for pipeline and workflow management.

Contains:

Methods:
  • add_clinical_tags: Add the clinical tags to the .

  • cleanup_orphan_bricks: Remove orphan bricks from the database.

  • cleanup_orphan_history: Remove orphan bricks from the database.

  • cleanup_orphan_nonexisting_files: Remove orphan files which do not exist from the database.

  • del_clinical_tags: Remove clinical tags to the project.

  • files_in_project: Return file / directory names within the project folder.

  • finished_bricks: Retrieve a dictionary of finished bricks from workflows.

  • get_data_history: Get the processing history for the given data file.

  • getDate: Return the date of creation of the project.

  • get_finished_bricks_in_pipeline: Retrieve a dictionary of finished bricks from a given pipeline

  • get_finished_bricks_in_workflows: Retrieve a dictionary of finished bricks from a workflow

  • getFilter: Return a Filter object.

  • getFilterName: Input box to get the name of the filter to save.

  • getName: Return the name of the project.

  • get_orphan_bricks: Identifies orphan bricks and their associated weak files

  • get_orphan_history: Identifies orphan history entries.

  • get_orphan_nonexisting_files: Get orphan files which do not exist from the database.

  • getSortedTag: Return the sorted tag of the project.

  • getSortOrder: Return the sort order of the project.

  • hasUnsavedModifications: Return if the project has unsaved modifications or not

  • init_filters: Initialize the filters at project opening.

  • loadProperties: Load the properties file.

  • redo: Redo the last action made by the user on the project.

  • reput_values: Re-put the value objects in the database.

  • saveConfig: Save the changes in the properties file.

  • save_current_filter: Save the current filter.

  • saveModifications: Save the pending operations of the project (actions still not saved).

  • setCurrentFilter: Set the current filter of the project.

  • setDate: Set the date of the project.

  • setName: Set the name of the project.

  • setSortedTag: Set the sorted tag of the project.

  • setSortOrder: Set the sort order of the project.

  • undo: Undo the last action made by the user on the project.

  • unsavedModifications: Modify the window title depending of whether the project has unsaved modifications or not.

  • unsaveModifications: Unsaves the pending operations of the project.

  • update_db_for_paths: update the history and brick tables with a new project file.

__init__(project_root_folder, new_project)[source]

Initialize a Mia project and its associated database.

This method sets up the project directory structure, initializes the database, and configures default properties. It also checks if the project is already opened in another instance to prevent conflicts.

Parameters:
  • project_root_folder – Path to the project’s root directory. If None, a temporary project directory is created.

  • new_project – Boolean indicating whether this is a new project (True) or an existing one (False). If True, the directory structure and databaseschema are initialized.

add_clinical_tags()[source]

Add new clinical tags to the project.

Returns:

List of clinical tags that were added.

cleanup_orphan_bricks(bricks=None)[source]

Remove orphan bricks and their associated files from the database.

This method performs the following cleanup operations:
  • Removes obsolete brick documents from the brick collection

  • Removes orphaned file documents from both current and initial collections

  • Deletes the corresponding physical files from the filesystem

Parameters:

bricks – (str) List of brick IDs to check for orphans. If None, checks all bricks in the database.

cleanup_orphan_history()[source]

Remove orphan histories, their associated bricks, and files from the database.

This method performs three cleanup operations:
  • Removes obsolete history documents from the history collection

  • Removes orphaned brick documents from the brick collection

  • Removes orphaned file documents from both current and initial collections, along with their corresponding physical files

cleanup_orphan_nonexisting_files(failed=False)[source]

Remove database entries for files that are missing on disk.

This method:
  • Retrieves filenames considered orphaned (see get_orphan_nonexisting_files),

  • Deletes their entries from both current and initial collections,

  • Attempts a defensive filesystem cleanup if the file still exists.

Parameters:

failed – (bool) Passed through to get_orphan_nonexisting_files to control orphan selection.

del_clinical_tags()[source]

Remove clinical tags from the project’s current and initial collections.

Iterates through predefined clinical tags and removes them from both collections if they exist in the current collection’s field names.

Returns:

(list) Clinical tags that were successfully removed.

files_in_project(files)[source]

Extract file/directory names from input that are within the project folder.

Recursively processes the input to find all file paths, handling nested data structures. Only paths within the project directory are included.

Parameters:

files

Input that may contain file paths. Can be:

  • str: A single file path

  • list/tuple/set: Collection of file paths or nested structures

  • dict: Only values are processed, keys are ignored

Returns:

(set) Relative file paths that exist within the project folder, with paths normalized and made relative to the project directory.

finished_bricks(engine, pipeline=None, include_done=False)[source]

Retrieve and process finished bricks from workflows and pipelines.

This method:
  • Gets finished bricks from workflows and optionally a specific pipeline

  • Filters them based on their presence in the Mia database

  • Updates brick metadata with execution status and outputs

  • Collects all output files that are within the project directory

Parameters:
  • engine – Engine instance for retrieving finished bricks.

  • pipeline – Optional pipeline object to filter specific bricks.

  • include_done – (bool)If True, includes all bricks regardless of execution status. If False, only includes “Not Done” bricks.

Returns:

(dict) Dictionary containing:

  • ‘bricks’: Dict mapping brick IDs to their metadata

  • ‘outputs’: Set of output file paths relative to project directory

Contains:

Inner functions:

  • _update_dict: Merge two dictionaries by updating the first with the second.

  • _collect_outputs: Recursively collects file paths from output values that are within the project directory.

get_data_history(path)[source]

Get the processing history for the given data file.

The history dict contains several elements:
  • parent_files: Set of other data used (directly or indirectly) to produce the data.

  • processes: Processing bricks set from each ancestor data which lead to the given one. Elements are process (brick) UUIDs.

Parameters:

path – Path to the data file.

Returns:

(dict) History.

getDate()[source]

Return the date of creation of the project.

Returns:

(str) The date of creation of the project if it’s not Unnamed project, otherwise empty string

get_finished_bricks_in_pipeline(pipeline)[source]

Retrieves a dictionary of finished processes (bricks) from a given pipeline, including nested pipelines, if any.

Parameters:

pipeline – (Pipeline or Process) The pipeline or single process to analyze. If a single process is provided, it will be treated as a minimal pipeline.

Returns:

(dict) A dictionary where keys are process UUIDs (brick IDs) and values are dictionaries containing the associated process instances.

get_finished_bricks_in_workflows(engine)[source]

Return finished Soma-Workflow jobs indexed by their brick UUID.

A job is considered successful if its termination status is "finished_regularly". Any other termination status is treated as a failure. A workflow is marked as failed if at least one of its jobs did not finish successfully.

Parameters:

engine – Engine providing access to the Soma-Workflow controller.

Returns:

(dict) Mapping brick_uuid -> job_info where job_info contains:

  • workflow (int): Workflow identifier.

  • job: Soma-Workflow job instance.

  • job_id (int): Job identifier.

  • swf_status (tuple): Raw Soma-Workflow status tuple.

  • running (bool): True of any job in the workflow is running.

  • failed (bool): True if any job in the workflow failed.

Contains:

Inner functions:

  • parse_status: Parse a Soma-Workflow job status tuple into structured information.

getFilter(target_filter)[source]

Return a Filter object from its name.

Parameters:

target_filter – (str) Filter name.

Returns:

(Filter) Filter object corresponding to the given name or None if not found

getFilterName()[source]

Input box to type the name of the filter to save.

Returns:

(str) Return the name typed by the user or None if cancelled.

getName()[source]

Return the name of the project.

Returns:

(str) The name of the project if it’s not Unnamed project, otherwise empty string

get_orphan_bricks(bricks=None)[source]

Identifies orphan bricks and their associated weak files.

Parameters:

bricks – (list or set) A list or set of brick IDs to filter the search. If None, all bricks in the database are considered. Defaults to None.

Returns:

(tuple) A tuple containing two sets:

  • orphan: (set) Brick IDs considered orphaned, meaning they have no valid or existing outputs linked to the current database.

  • orphan_weak_files: (set) Paths to weak files associated with orphaned bricks, such as script files or files that no longer exist.

get_orphan_history()[source]

Identifies orphaned history entries, their associated orphan bricks, and weak files.

Returns:

(tuple) A tuple containing three sets:

  • orphan_hist: (set) IDs of history entries that are no longer linked to any current document in the database.

  • orphan_bricks: (set) IDs of bricks associated with orphaned history entries.

  • orphan_weak_files: (set) Paths to weak files (e.g., script files or non-existent files) linked to orphaned history entries.

get_orphan_nonexisting_files(failed)[source]

Return filenames that are recorded in the database but missing on disk.

A file is considered “orphaned” if:
  • It does not exist on the filesystem, and

  • It is not associated with any existing bricks, unless failed is True (in which case brick association is ignored).

Parameters:

failed – (bool) If True, include files even if they are linked to existing bricks. If False, exclude such files.

Returns:

(set) A set of filenames from the database that are not found on the filesystem and are not associated with existing bricks.

getSortedTag()[source]

Return the sorted tag of the project.

Returns:

(str) Sorted tag of the project if it’s not Unnamed project, otherwise empty string.

getSortOrder()[source]

Return the sort order of the project.

Returns:

(str) Sort order of the project if it’s not Unnamed project, otherwise empty string.

hasUnsavedModifications()[source]

Return if the project has unsaved modifications or not.

Returns:

(bool) True if the project has pending modifications, False otherwise.

init_filters()[source]

Initializes project filters by loading them from stored JSON files.

This method sets the currentFilter to a default empty filter and populates the filters list with Filter objects created

loadProperties()[source]

Loads the project properties from the ‘properties.yml’ file.

This method reads the project’s YAML properties file and returns its contents as a Python dictionary.

Returns:

(dict) A dictionary containing the project properties if successfully loaded, or None if an error occurs.

redo(table)[source]

Redo the last action made by the user on the project.

Parameters:

table – (QTableWidget) The table on which to apply the modifications.

Actions that can be redone:
  • add_tag

  • remove_tags

  • add_scans

  • modified_values

  • modified_visibilities

Raises:

ValueError – If an unknown action type is encountered.

reput_values(values)[source]

Re-put the value objects in the database.

Parameters:

values – (list) List of value objects.

saveConfig()[source]

Save the changes in the properties file.

save_current_filter(custom_filters)[source]

Save the current filter.

Parameters:

custom_filters – The customized filter

saveModifications()[source]

Save the pending operations of the project (actions still not saved).

setCurrentFilter(new_filter)[source]

Set the current filter of the project.

Parameters:

new_filter – New Filter object.

setDate(date)[source]

Set the date of the project.

Parameters:

date – New date of the project.

setName(name)[source]

Set the name of the project if it’s not Unnamed project, otherwise does nothing.

Parameters:

name – (str) New name of the project.

setSortedTag(tag)[source]

Set the sorted tag of the project.

Parameters:

tag – New sorted tag of the project.

setSortOrder(order)[source]

Set the sort order of the project.

Parameters:

order – New sort order of the project (ascending or descending).

undo(table)[source]

Undo the last action made by the user on the project.

Parameters:

table – Table on which to apply the modifications.

Actions that can be undone:
  • add_tag

  • remove_tags

  • add_scans

  • modified_values

  • modified_visibilities

property unsavedModifications

Getter for _unsavedModifications.

unsaveModifications()[source]

Unsave the pending operations of the project.

update_db_for_paths(new_path=None)[source]

Update database paths when renaming or loading a project.

This method updates path references in the database when a project is renamed or loaded from a different location. It scans the HISTORY and BRICK collections to identify the old project path, then systematically replaces it with the new path.

The method looks for the old path in brick input/output fields and history pipeline XML data. If the old path contains ‘data/derived_data’, the method uses the portion before this segment as the base path.

Parameters:

new_path – (str) The new project path. If not provided, the current project folder path is used.

Contains:

Inner functions:

  • _update_json_data: Helper method to update paths in JSON data structures