Managing Data in DERIVA with deriva-client

The deriva-client package bundles an application suite of Python-based client software for use with the DERIVA platform. These tools provide functions such as:

  1. Authentication services for programmatic and non browser-based application access.
  2. Bulk import and export of catalog assets and (meta) data.
  3. Catalog configuration, mutation and administration.
  4. Tools for working with bdbags, a file container format used by DERIVA for the import and export of data.

Installed Applications

Command-Line Interface (CLI) applications

Executable Name Description
bdbag The bdbag application provides a variety of functions for working with BagIt file archives, a file packaging format used by DERIVA for data export. This format is created by the DERIVA web applications when exporting data sets using the BDBAG option.
bdbag-utils The bdbag-utils application is used to make some of the more repetitive and programmable tasks associated with creating and maintaining bags easier.
deriva-acl-config The deriva-acl-config utility reads a configuration file and uses it to set ACLs for an ERMRest catalog (or for a schema or table within that catalog).
deriva-annotation-config The deriva-annotation-config utility reads a configuration file and uses it to set annotations for an ERMRest catalog (or for a schema or table within that catalog).
deriva-annotation-dump Outputs the current set of annotations in use for the specified catalog in JSON format.
deriva-annotation-rollback Provides a function to rollback the entire annotation hierarchy for the specified catalog to a given point in time specified by catalog snapshot ID.
deriva-catalog-config The deriva-catalog-config application provides functions to set up catalog schema and tables with a standard baseline annotation and ACL configuration.
deriva-catalog-dump The deriva-catalog-dump application provides functions to dump the current configuration of a catalog as a set of deriva-py scripts. The scripts are pure deriva-py and have placeholder variables to set annotations, acls, and acl-bindings.
deriva-csv The deriva-csv application provides functions to upload csv or other table-like data to a catalog with options to create a new table, validate input data and upload data.
deriva-download-cli The deriva-download-cli is used for orchestrating the bulk export of tabular data (stored in ERMRest catalogs) and download of asset data (stored in Hatrac, or other supported HTTP-accessible object store).
deriva-hatrac-cli The deriva-hatrac-cli is a command-line utility for interacting directly with the DERIVA Hatrac object store.
deriva-upload-cli The deriva-upload-cli provides batch upload functionality for both catalog (ERMRest) and asset (Hatrac) data. This application is generally used for automating the bulk transfer of data to DERIVA servers.
deriva-sitemap-cli The deriva-sitemap-cli utility creates a sitemap containing record entries for all publicly-readable rows in one or more ERMRest tables.
deriva-globus-auth-utils The deriva-globus-auth-utils provides numerous utility functions for working with the Globus Auth API in addition to Globus Auth Native App login functionality.

Graphical User Interface (GUI) applications

Executable Name Application Name Description
deriva-auth DERIVA Authentication Agent Provides credential authentication and refresh services for one or more DERIVA servers. This application is intended to be run in the background after the user completes the login sequence for each server.
deriva-upload DERIVA Upload Utility Provides batch upload functionality for both catalog and asset data. This application is an interactive tool used for the bulk transfer of data to DERIVA servers.

Installer packages for Windows and MacOSX

Pre-packaged installers of deriva-client for Windows and MacOSX are available. These installer packages include a bundled Python interpreter and all other software dependencies and are recommended for Windows and MacOSX users who are looking for a more traditional “turnkey” installation that does not require them to install Python and manage Python software package installations.

Download the installer packages here.

Installing deriva-client from PyPi via pip

For users who already have the base Python interpreter installed and are comfortable installing Python software via the pip application, deriva-client can be easily installed along with all of it’s dependencies directly from PyPi using basic pip commands. For those users who wish to write programs against the various APIs included in deriva-client, this is the recommended installation method.

Installation Prerequisites

  • A Python 3.5.4 or greater system installation is required. The latest stable version of Python is recommended.
  • Verify that the appropriate Python 3 interpreter can be invoked from a command shell using the python3 command. This can be tested simply with the following command:
python3 --version

Installation Quickstart

The following commands can be used to perform a venv-based virtual environment installation to the current working directory.

Mac/Linux

The following commands assume a BASH (or compatible) command shell is used. For a different command interpreter (e.g. CSH), invoke the source command on the appropriate activation script in the virtual environment’s bin directory.

python3 -m venv ./deriva-client-venv
source ./deriva-client-venv/bin/activate
python3 -m pip install --upgrade pip setuptools wheel
pip install deriva-client

Important Note: For MacOSX users running Python 3.5.x with pip version < 9.0.3

If you encounter the following error:

Could not fetch URL https://pypi.python.org/simple/pip/: 
There was a problem confirming the ssl certificate: 
[SSL: TLSV1_ALERT_PROTOCOL_VERSION] tlsv1 alert protocol version (_ssl.c:720) - skipping

This error means that you cannot update pip, setuptools, and wheel via the command provided above. You can work around this error by issuing the following commands instead, and then continue with the installation procedure as described.

curl https://bootstrap.pypa.io/get-pip.py | python3
pip install --upgrade setuptools

Windows

The following commands assume a Windows Command Prompt command shell is used. For a Powershell shell, the activate.ps1 activation script should be invoked instead.

python3 -m venv .\deriva-client-venv
.\deriva-client-venv\Scripts\activate
python3 -m pip install --upgrade pip setuptools wheel
pip install deriva-client

IMPORTANT NOTE: Python virtual environments versus user environments

While a virtual environment installation is generally the safest way to install and isolate multiple software packages, it also must be activated before use and deactivated after use. If this requirement is too cumbersome, the recommended alternative is to install the software into a user environment instead. See the complete installation procedure below for more information.


Installation Procedure

  • For MacOSX and Linux systems which include Python as a core part of the operating system, it is highly recommended to install this software into a virtual environment or a user environment, so that it does not interfere or conflict with the operating system’s Python installation. The native Python3 venv module, the virtualenv package from PyPi, or the Anaconda Distribution environment are all suitable for use as virtual environments.
  • Instead of using a virtual environment, it is also possible to install the software into a user environment using the --user argument when invoking pip install.
  • Recent versions of pip, setuptools, and wheel are recommended. If these components are already installed, updating them to the latest versions available is optional.

Installation Sequence

  1. Create and/or activate the target virtual environment, if any. This step is specific to the type of virtual environment being used.

  2. Update pip, setuptools, and wheel (optional).

    1. For virtual environments execute the following (ensure the environment is active):

      python -m pip install --upgrade pip setuptools wheel
      
    2. For user environments execute the following:

      python3 -m pip install --user --upgrade pip setuptools wheel
      
    3. For Linux system python installations it is recommended to use the system’s package manager such as dnf, apt, or yum to update the following packages: python3-pip, python3-setuptools , and python3-wheel.

  3. Install deriva-client directly from PyPi using the pip install command.

    1. For virtual environments execute the following (ensure the environment is active):

      pip install deriva-client
      
    2. For user environments execute the following:

      pip3 install --user deriva-client
      
    3. For system-wide python installations (only do this if you understand the complexities involved):

      pip3 install deriva-client
      
IMPORTANT NOTES: Using pip to install software into system-wide Python locations
  • Many newer Linux (as well as MacOSX) distributions contain both Python2 and Python3 installed alongside each other. In these environments, both the python interpreter and pip are symbolically linked to the system default version, which in general results in python and pip being linked to the Python2 versions.
  • Python3 versions are commonly accessed via python3 and pip3. If you are working outside of a Python3 virtual environment and installing either to the system-wide Python location (not recommended) or a user-based location (e.g. with the pip --user argument), then you must substitute pip3 for pip when issuing pip installation commands.
  • Also note that when installing into the system Python location via pip on Linux/MacOSX, the commands must be run as root or the sudo command must be prefixed to the command line.

Managing data with the datapath API (deriva-py)

The deriva-py package (part of deriva-client) also includes a Python API for a programmatic interface for ERMRest.

The datapath module in particular is an interface for building ERMRest “data paths” and retrieving data from ERMRest catalogs. It also supports data manipulation (insert, update, delete). In its present form, the module provides a limited programmatic interface to ERMRest.

Source Code

The source code for the primary components of deriva-client can be found at the links below: