Index

Documentation

GitHub

Introduction

populse_db is a small Python module aiming is to store and query efficiently JSON data with the following constraints:

populse_db offers a simple API on top of a SQLite backend that allows to respect all the requirements.

Dependencies

Install tools take care of all the required dependencies for you but here is a summary of what populse_db needs:

  • Python >= 3.9. populse_db uses some features introduced in Python 3.9 (such as the possibility to subscript list type as in list[str]). Therefore it can’t work with Python 3.8 or earlier releases.

  • dateutil

  • Lark-parser >= 0.7.0.

  • [optional] Sphinx allows to build the documentation from source code.

Get started

Installation

Populse_db can be installed by standard Python tool. For instance:

pip install populse_db

The documentation is embedded in the project. However, if one needs to rebuild the documentation, use:

pip install populse_db[doc]

Basic usage

Populse_db is organized in collections and documents. A document is a JSON object represented by a dictionary in Python. A collection is a named container in which documents can be stored. Internally, collections corresponds to database tables and documents are stored in table rows (one document per row). But populse_db can hide this internal database stuff. The only thing user must do is to declare its collections if it uses an empty database. This is illustrated in the following example:

from datetime import date, datetime
from populse_db import Storage
from pprint import pprint

# Create a storage using a file name
storage = Storage('/tmp/populse_db.sqlite')
# Create a read/write session
with storage.session(write=True) as db:
   db.last_modified = datetime.now()

   # Store documents in the database
   db['subject'] = Storage.Collection('subject_id')
   db['subject']['rbndt001'] = {
      'name': 'Eléa',
      'sex': 'f',
      'birth_date': date(1968, 3, 3),
   }
   db['subject']['rbndt002'] = {
      'name': 'Païkan',
      'sex': 'm',
      'birth_date': date(1963, 12, 7),
   }
   db['acquisition'] = Storage.Collection()
   db['acquisition']['rbndt001_t1'] = {
      'subject_id': 'rbndt001',
      'type': ['image', 'mri', 'T1'],
      'format': 'DICOM',
      'files': [
            '/somewhere/t1/acq0001.dcm',
            '/somewhere/t1/acq0002.dcm',
            '/somewhere/t1/acq0003.dcm',
            '/somewhere/t1/acq0004.dcm',
      ],
      'date': date(2022, 3, 28),
   }
   db['acquisition']['rbndt001_t2'] = {
      'subject_id': 'rbndt001',
      'type': ['image', 'mri', 'T2'],
      'format': 'DICOM',
      'files': [
            '/somewhere/t2/acq0001.dcm',
            '/somewhere/t2/acq0002.dcm',
            '/somewhere/t2/acq0003.dcm',
            '/somewhere/t2/acq0004.dcm',
      ],
      'date': date(2022, 3, 28),
   }
   db['acquisition']['rbndt002_t1'] = {
      'subject_id': 'rbndt002',
      'type': ['image', 'mri', 'T1'],
      'format': 'NIFTI',
      'files': [
            '/elsewhere/sub-rbndt001.nii',
      ],
      'date': date(2022, 3, 29),
   }

   # Retrieve a single value from storage
   print('Last modified:', db.last_modified.get())

   # Retrieve a single document from collection 'subject'
   pprint(db['subject']['rbndt001'].get())

   # Retrieve all documents from collection 'subject' respecting the
   # following conditions:
   #   - the "subject" field equals to "rbndt001"
   #   - the "type" field is a list containing the value "T1"
   for doc in db['acquisition'].search('subject=="rbndt001" and "T1" in type'):
      pprint(doc)

   # Retrieve all possible types of acquisition
   print('Subjects:', ', '.join(db.subject.distinct_values('subject_id')))

The example above illustrate how to use populse_db to store and retrieve JSON objects without wondering about the underlying SQL database engine. This script produces the following result:

{'birth_date': datetime.date(1968, 3, 3),
 'name': 'Eléa',
 'primary_key': 'rbndt001',
 'sex': 'f'}
{'date': datetime.date(2022, 3, 28),
 'files': ['/somewhere/t1/acq0001.dcm',
         '/somewhere/t1/acq0002.dcm',
         '/somewhere/t1/acq0003.dcm',
         '/somewhere/t1/acq0004.dcm'],
 'format': 'DICOM',
 'primary_key': 'rbndt001_t1',
 'subject': 'rbndt001',
 'type': ['image', 'mri', 'T1']}

To further understand how to use populse_db, see the tutorial

Indices and tables