Deriva Python APIs

This page describes 3 separate but related interfaces for DERIVA.

  1. ERMrest Query and Data Manipulation: interfaces for building simple to complex expressions that query or manipulate data in ERMrest, such as selects, joins, aggregates, inserts, updates, etc.
  2. ERMrest Model Management: essential model management for ERMrest catalogs, such as creating or dropping tables, columns, etc. Also, see CHiSEL for ERMrest schema evolution interfaces that build on ERMrest Model Management with operations that combine model management and annotation updates in one.
  3. ERMrest Catalog: primary interface to an ERMrest catalog and the starting point for accessing the above interfaces.

ERMrest Query and Data Manipulation

The datapath module is an interface for building ERMrest “data paths” and retrieving data from ERMrest catalogs. It also supports data manipulation (insert, update, delete). In its present form, the module provides a limited programmatic interface to ERMrest.

Features

  • Build ERMrest “data path” URLs with a Pythonic interface
  • Covers the essentials for data retrieval: link tables, filter on attributes, select attributes, alias tables
  • Retrieve entity sets; all or limited numbers of entities
  • Fetch computed aggregates or grouped aggregates
  • Insert and update entities of a table
  • Delete entities identified by a (potentially, complex) data path

Limitations

  • Only supports application/json CONTENT-TYPE (i.e., protocol could be made more efficient).
  • The ResultSet interface is a thin wrapper over a dictionary of a list of results.
  • Many user errors are caught by Python assert statements rather than checking for “invalid parameters” and throwing custom Exception objects.

Tutorials

See the Jupyter Notebook tutorials in the docs/ folder.

Now, get started!

ERMrest Model Management

The core.ermrest_model module provides an interface for managing the model (schema definitions) of an ERMrest catalog. This library provides an (incomplete) set of helper routines for common model management idioms.

For some advanced scenarios supported by the server but not yet supported in this library, a client may need to resort to direct usage of the low-level deriva.core.ermrest_catalog.ErmrestCatalog HTTP access layer.

Features

  • Obtain an object hierarchy mirroring the model of a catalog or catalog snapshot.
  • Discover names of schemas, tables, and columns as well as definitions where applicable.
  • Discover names and definitions of key and foreign key constraints.
  • Discover annotations on catalog and model elements.
  • Discover policies on catalog and model elements (if sufficiently privileged).
  • Create model elements
    • Create new schemas.
    • Create new tables.
    • Create new columns on existing tables.
    • Create new key constraints over existing columns.
    • Create new foreign key constraints over existing columns and key constraints.
  • Reconfigure model elements
    • Change comment string on schema, table, column, key, or foreign key constraints.
    • Change acls on schema, table, column, and foreign key constraints.
    • Change acl_bindings on table, column, and foreign key constraints.
    • Change annotations on catalog, schema, table, column, key, or foreign key constraints.
  • Drop model elements
    • Drop schemas.
    • Drop tables.
    • Drop columns.
    • Drop key constraints.
    • Drop foreign key constraints.
  • Alter model elements
    • Rename a schema.
    • Rename a table or move the table between schemas.
    • Rename a column, or change column storage type, default value, or null-ok status.
    • Rename a key constraint.
    • Rename a foreign key constraint.

Limitations

Because the model management interface mirrors a complex remote catalog model with a hierarchy of local objects, it is possible for the local objects to get out of synchronization with the remote catalog and either represent model elements which no longer exist or lack model elements recently added.

The provided management methods can incrementally update the local representation with changes made to the server by the calling client. However, if other clients make concurrent changes, it is likely that the local representation will diverge.

The only robust solution to this problem is for the caller to discard its model representation, reconstruct it to match the latest server state, and retry whatever changes are intended.

Examples

For the following examples, we assume this common setup:

from deriva.core import ErmrestCatalog
import deriva.core.ermrest_model as em
from deriva.core.ermrest_model import builtin_types as typ


catalog = ErmrestCatalog(...)
model_root = catalog.getCatalogModel()

Also, when examples show keyword arguments, they illustrate a typical override value. If omitted, a default value will apply. Many parts of the model definition are immutable once set, but in general comment, acl, acl_binding, and annotation attributes can be modified after the fact through configuration management APIs.

Add Schema to Catalog

To create a new schema, call the model’s schema-creation method.

schema = model_root.create_schema({"schema_name": "My new schema"})

Add Table to Schema

To create a new table, you build a table definition document and pass it to the table-creation method on the object representing an existing schema. The various classes involved include class-methods define(...) to construct the constituent parts of the table definition:

column_defs = [ 
  em.Column.define("Col1", typ.text), 
  em.Column.define("Col2", typ.int8),
]
key_defs = [
  em.Key.define(
    ["Col1"], # this is a list to allow for compound keys
    constraint_names=[ [schema_name, "My New Table_Col1_key"] ],
    comment="Col1 text values must be distinct.",
    annotations={},
  )
]
fkey_defs = [
  em.ForeignKey.define(
    ["Col2"], # this is a list to allow for compound foreign keys
    "Foreign Schema",
    "Referenced Table",
    ["Referenced Column"], # this is a list to allow for compound keys
    on_update='CASCADE',
    on_delete='SET NULL',
    constraint_names=[ [schema_name, "My New Table_Col2_fkey"] ],
    comment="Col2 must be a valid reference value from the domain table.",
    acls={},
    acl_bindings={},
    annotations={},
  )
]
table_def = em.Table.define(
  "My New Table",
  column_defs,
  key_defs=key_defs,
  fkey_defs=fkey_defs,
  comment="My new entity type.",
  acls={},
  acl_bindings={},
  annotations={},
  provide_system=True,
)
schema = model_root.schemas[schema_name]
new_table = schema.create_table(table_def)

By default, create_table(...) will add system columns to the table definition, so the caller does not need to reconstruct these standard elements of the column definitions nor the RID key definition.

Add a Vocabulary Term Table

A vocabulary term table is often useful to track a controlled vocabulary used as a domain table for foreign key values used in science data columns. A simple vocabulary term table can be created with a helper function:

schema = model_root.schemas[schema_name]
new_vocab_table = schema.create_table(
  Table.define_vocabulary(
    "My Vocabulary",
    "MYPROJECT:{RID}",
    "https://server.example.org/id/{RID}"
  )
)

The Table.define_vocabulary() method is a convenience wrapper around Table.define() to automatically generate core vocabulary table structures. It accepts other table definition parameters which a sophisticated caller can use to override or extend these core structures.

Add Column to Table

To create a new column, you build a column definition document and pass it to the column-creation method on the object representing an existing table.

column_def = em.Column.define(
  "My New Column",
  typ.text,
  nullok=False,
  comment="A string representing my new stuff.",
  annotations={},
  acls={},
  acl_bindings={},
)
table = model_root.table(schema_name, table_name)
new_column = table.create_column(column_def)

The same pattern can be used to add a key or foreign key to an existing table via table.create_key(key_def) or table.create_fkey(fkey_def), respectively. Similarly, a schema can be added to a model with model.create_schema(schema_def).

Remove a Column from a Table

To remove or “drop” a column, you invoke the drop() method on the column object itself:

table = model_root.table(schema_name, table_name)
column = table.column_definitions[column_name]
column.drop()

The same pattern can be used to remove a key or foreign key from a table via key.drop() or foreign_key.drop(), respectively. Similarly, a schema or table can be removed with schema.drop() or table.drop(), respectively.

Alter a Table

To alter certain aspects of an existing table, you invoke the alter() method with optional keyword arguments for the aspects you wish to change. The default for omitted keyword arguments is a special nochange value which means to keep that aspect as it is currently defined:

table = model_root.table(orig_schema_name, orig_table_name)
table.alter(
  schema_name=destination_schema_name,
  table_name=new_table_name
)

The schema_name argument allows you to relocate an existing table from an original schema to a destination schema, where both named schemas already exist in the model. This also relocates key or foreign key constraints in the table at the same time. The table_name argument allows you to revise the name of an existing table in the model, while preserving other aspects of the table definition, content, and content history.

The same pattern can be used to alter schemas, columns, keys, and foreign keys:

schema.alter(schema_name=new_schema_name)

column.alter(
  name=new_column_name,
  type=new_column_type_obj,
  nullok=new_nullok_value,
  default=new_default_value
)

key.alter(constraint_name=new_unqualified_name_str)

foreign_key.alter(
  constraint_name=new_unqualified_name_str,
  on_update=new_action_string,
  on_delete=new_action_string
)

The key and foreign key alterations accept only the unqualified constraint name string, because it is not possible to change the schema qualification other than by relocating the parent table to a different schema. The foreign key alteration also supports changes to the on_update and on_delete action, e.g. NO ACTION, SET NULL, or CASCADE.

As a convenience, there are also optional alter() arguments to reconfigure comment, acls, acl_bindings if they exist in the define() method for the same class of object. They are omitted from the preceding examples for the sake of brevity. These arguments allow similar effect to mutating the local configuration fields and then invoking the apply() method to send them to the server, except that configuration changes included in an alter() request will happen atomically with respect to the other indicated alterations.

ERMrest Catalog

The deriva.core.ermrest_catalog.ErmrestCatalog class provides HTTP bindings to an ERMrest catalog as a thin wrapper around the Python Requests library. Likewise, the deriva.core.ermrest_catalog.ErmrestSnapshot class provides HTTP bindings to an ERMrest catalog snapshot. While catalogs permit mutation of stored content, a snapshot is mostly read-only and only permits retrieval of content representing the state of the catalog at a specific time in the past.

Instances of ErmrestCatalog or ErmrestSnapshot represent a particular remote catalog or catalog snapshot, respectively. They allow the client to perform HTTP requests against individual ERMrest resources, but require clients to know how to formulate those requests in terms of URL paths and resource representations.

Other, higher-level client APIs are layered on top of this implementation class and exposed via factory-like methods integrated into each catalog instance.

Catalog Binding

A catalog is bound using the class constructor, given parameters necessary for binding:

from deriva.core.ermrest_catalog import ErmrestCatalog
from deriva.core import get_credential

scheme = "https"
server = "myserver.example.com"
catalog_id = "1"
credentials = get_credential(server)

catalog = ErmrestCatalog(scheme, server, catalog_id, credentials=credentials)

Client Credentials

In the preceding example, a credential is obtained from the filesystem assuming that the user has activated the deriva-auth authentication agent prior to executing this code. For catalogs allowing anonymous access, the optional credentials parameter can be omitted to establish an anonymous binding.

The same client credentials (or anonymous access) is applied to all HTTP operations performed by the subsequent calls to the catalog object’s methods. If a calling program wishes to perform a mixture of requests with different credentials, they should create multiple catalog objects and choose the appropriate object for each request scenario.

High-Level API Factories

Several optional access APIs are layered on top of ErmrestCatalog and/or ErmrestSnapshot and may be accessed by invoking convenient factory methods on a catalog or snapshot object:

  • catalog_snapshot = catalog.latest_snapshot()
    • ErmrestSnapshot binding for latest known revision of catalog
  • path_builder = catalog.getPathBuilder()
    • deriva.core.datapath.Catalog path builder for catalog (or snapshot)
    • Allows higher-level data access idioms as described previously.
  • model_root = catalog.getCatalogModel()
    • deriva.core.ermrest_model.Model object for catalog (or snapshot)
    • The model_root object roots a tree of objects isomorphic to the catalog model, organizing model definitions according to each part of the model.
    • Allows inspection of catalog/snapshot models (schemas, tables, columns, constraints)
    • Allows inspection of catalog/snapshot annotations and policies.
    • Allows configuration field mutation to draft a new configuration objective.
    • Draft changes are applied with model_root.apply()
    • Many model management idioms are exposed as methods on individual objects in the model hierarchy.

Low-Level HTTP Methods

When the client understands the URL structuring conventions of ERMrest, they can use basic Python Requests idioms on a catalog instance:

  • resp = catalog.get(path)
  • resp = catalog.delete(path)
  • resp = catalog.put(path, json=data)
  • resp = catalog.post(path, json=data)

Unlike Python Requests, the path argument to each of these methods should exclude the static prefix of the catalog itself. For example, assuming catalog has been bound to https://myserver.example.com/ermrest/catalog/1 as in the constructor example above, an attempt to access table content at https://myserver.example.com/ermrest/catalog/1/entity/MyTable would call catalog.get(/entity/MyTable`) and the catalog binding would prepend the complete catalog prefix.

The json input to the catalog.put and catalog.post methods behaves just as in Python Requests. The data is supplied as native Python lists, dictionaries, numbers, strings, and booleans. The method implicitly serializes the data to JSON format and sets the appropriate Content-Type header to inform the server we are sending JSON content.

All of these HTTP methods return a requests.Response object which must be further interrogated to determine request status or to retrieve any content produced by the server:

  • resp.status_code: the HTTP response status code
  • resp.raise_for_status(): raise a Python exception for non-success codes
  • resp.json(): deserialize JSON content from server response
  • resp.headers: a dictionary of HTTP headers from the server response

Low-level usage errors may raise exceptions directly from the HTTP methods. However, normal server-indicated errors will produce a response object and the caller must interrogate the status_code field or use the raise_for_status() helper to determine whether the request was successful.

HTTP Caching

By default, the catalog binding uses HTTP caching for the catalog.get method: it will store previous responses, include appropriate If-None-Match headers in the new HTTP GET request, detect 304 Not Modified responses indicating that cached content is valid, and return the cached content to the caller. This mechanism can be disabled by specifying caching=False in the ErmrestCatalog constructor call.