Data Resource Naming

The ERMrest data resource names always have a common structure:

  • service /catalog/ cid [ @ revision ] / api / path
  • service /catalog/ cid [ @ revision ] / api / path suffix
  • service /catalog/ cid [ @ revision ] / api / path suffix ? query parameters

where the components in this structure are:

  • service: the ERMrest service endpoint such as https://www.example.com/ermrest.
  • cid: the catalog identifier for one dataset such as 42.
  • revision: (optional) timestamp identifying a snapshot of the catalog to query.
  • api: the API or data resource space identifier such as entity, attribute, attributegroup, or aggregate.
  • path: the data path which identifies one filtered entity set with optional joined context.
  • suffix: additional content that depends on the api
    • the group keys associated with attributegroup resources
    • the projection associated with attribute, attributegroup, and aggregate resources
    • the @sort(...) modifier to order results of queries
    • the @before(...) or @after(...) modifier to select a paging position within sorted results
  • query parameters: optional parameters which may affect interpretation of the data name
    • the limit parameter to define query result paging length
    • the accept parameter to override HTTP Accept header for content negotiation
    • the defaults and nondefaults parameters to modify the behavior of POST operations to the entity API

Entity Resolution Names

The entity_rid resource space denotes resolvable RIDs using names of the form:

  • service /catalog/ cid [ @ revision ] /entity_rid/ rid

The single rid parameter is an entity reference to be resolved in the catalog or catalog snapshot. This rid should be a RID column value observed in an entity within the catalog. The resolved resource, when found, is a concise JSON record telling the client in which table to look for a given entity. The client must then use other data-access resources, described next, to actually retrieve the entity content associated with the rid.

Entity Names

The entity resource space denotes whole entities using names of the form:

  • service /catalog/ cid [ @ revision ] /entity/ path

The primary naming convention, without query parameters, denotes the final entity set referenced by path, as per the data path rules. The denoted entity set has the same tuple structure as the final table instance in path and may be a subset of the entities based on joining and filtering criteria encoded in path. The set of resulting tuples are distinct according to the key definitions of that table instance, i.e. any joins in the path may be used to filter out rows but do not cause duplicate rows.

Attribute Names

The attribute resource space denotes projected attributes of entities using names of the form:

  • service /catalog/ cid [ @ revision ] /attribute/ path / column reference ,

The path is interpreted identically to the entity resource space. However, rather than denoting a set of whole entities, the attribute resource space denotes specific fields projected from that set of entities. The projected column reference list elements can be in one of several forms:

  • [ out alias := ] column name
    • A field is projected from the final table instance of path.
    • An optional out alias can be assigned to rename the output column, and by default the output column will be named by the unqualified column name.
  • *
    • A wildcard that expands to all of the columns from the final table instance of path.
    • The output columns are automatically named by their unqualified column names.
  • [ out alias := ] alias : column name
    • A field is projected from a table instance bound to alias in path.
    • An optional out alias can be assigned to rename the output column, and by default the output column will be named by the unqualified column name.
  • alias : *
    • A wildcard that expands to all of the columns from a table instance bound to alias in path.
    • The output columns are automatically named by their alias qualified column names to prevent collisions between the multiple wildcard-expansions that are possible within one complex path. If a projection A:* is used for a table instance with a column named foo in it, the output data will then have a column with the literal name A:foo. Special attention must be paid when trying to reference such columns using the sort modifier, as this modifier uses the output name A:foo as a user-supplied literal and therefore the : must be escaped as in @sort(A%3Afoo).

Like in the entity resource space, joined tables may cause filtering but not duplication of rows in the final entity set. Thus, when projecting fields from aliased table instances in path, values are arbitrarily selected from one of the joined contextual rows if more than one such row was joined to the same final entity.

Aggregate Names

The aggregate resource space denotes computed (global) aggregates using names of the form:

  • service /catalog/ cid [ @ revision ] /aggregate/ path / aggregate ,

The path is interpreted slightly differently than in the attribute resource space. Rather than denoting a set of entities drawn from the final table instance in path, it denotes a set of entity combinations, meaning that there is a potential for a combinatoric number of intermediate records depending on how path entity elements are linked. This denoted set of entity combinations is reduced to a single aggregate tuple. The computed aggregate tuple elements can be in one of several forms:

  • out alias := function ( column name )
  • out alias := function (*)
  • out alias := function ( in alias : column name )
  • out alias := function ( in alias : * )

The out alias is the name given to the computed field. The function is one of a limited set of aggregate functions supported by ERMrest:

  • min: the minimum non-NULL value (or NULL)
  • max: the maximum non-NULL value (or NULL)
  • avg: the average non-NULL value (or NULL)
  • cnt_d: the count of distinct non-NULL values
  • cnt: the count of non-NULL values
  • array: an array containing all values (including NULL)
  • array_d: an array containing distinct values (including NULL)

These aggregate functions are evaluated over the set of values projected from the entity set denoted by path. The same column resolution rules apply as in other projection lists: a bare column name MUST reference a column of the final entity set while an alias-qualified column name MUST reference a column of a table instance bound to alias in the path. Some aggregate functions are only supported for a subset of available column types.

As a special case, the psuedo-column * can be used in several idiomatic forms:

  • cnt(*): a count of entities rather than of non-NULL values is computed
  • array(alias:*): an array of records rather than an array of values is computed
  • array_d(alias:*): an array of distinct records rather is computed

TODO: document other variants?

Attribute Group Names

The attributegroup resource space denotes groups of entities by arbitrary grouping keys and computed (group-level) aggregates using names of the form:

  • service /catalog/ cid [ @ revision ] /attributegroup/ path / group key ,
  • service /catalog/ cid [ @ revision ] /attributegroup/ path / group key ,; aggregate ,

The path is interpreted slightly differently than in the attribute resource space. Rather than denoting a set of entities drawn from the final table instance in path, it denotes a set of entity combinations, meaning that there is a potential for a combinatoric number of records depending on how path entity elements are linked. This denoted set of entity combinations is reduced to groups where each group represents a set of entities sharing the same group key tuple, and optional aggregate list elements are evaluated over this set of entities to produce a group-level aggregate value.

The group key list elements use the same notation as the column reference elements in the attribute resource space. The aggregate list elements use the same notation as the aggregate elements in the aggregate resource space or the column reference elements in the attribute resource space. An aggregate using column reference notation denotes an example value chosen from an arbitrary member of each group.

Attribute Binning

In order to group numerical values into bins, e.g. for histogram presentation, a special binning operator is allowed in attribute or group key projections in place of a bare column reference:

  • bin( column name ; nbins ; minval ; maxval )
  • bin( in alias : column name ; nbins ; minval ; maxval )

The binning operator determines which bucket the value in column name belongs to, dividing the requested range from minval (inclusive) to maxval (exclusive) into nbins equal-width intervals. The result is always a three-element JSON array [ bucket , lower , upper ] describing the bucket.

  • bucket: The bin number which the value falls into.
    • null: The null bin captures all NULL values.
    • 0: The zero bin captures all values below the requested range.
    • 1: The first bin in the requested range.
    • nbins: The last bin in the requested range.
    • nbins + 1: The final bin captures all values above the requested range.
  • lower: The lower bound (inclusive) of the bin, or null.
  • upper: The upper bound (exclusive) of the bin, or null.

If the client does not wish to consider NULL or out-of-range values, they MAY include an appropriate filter to exclude those rows from the query.

A useful idiom is to use binning as a group-key in the attributegroup API with cnt(*) to count all matching rows within each bin. The results will be sparse: only bins with a non-zero row count will appear as grouped output rows. The sort modifier MAY be applied to the binning group key.

Supported Column Types for Binning

The binning operator is only supported for certain basic column storage types:

  • date
  • float4, float8
  • int2, int4, int8
  • numeric
  • timestamptz, timestamp

Data Paths

ERMrest introduces a general path-based syntax for naming data resources with idioms for navigation and filtering of entity sets. The path element of the data resource name always denotes a set of entities or joined entities. The path must be interpreted from left to right in order to understand its meaning. The denoted entity set is understood upon reaching the right-most element of the path and may be modified by the resource space or api under which the path occurs.

Path Root

A path always starts with a direct table reference:

  • table name
  • schema name : table name

which must already be defined in the catalog under the corresponding model resource:

  • /schema/ schema name /table/ table name

The unqualified table name MAY be used in a path if it is the only occurrence of that table name across all schemata in the catalog, i.e. only if it is unambiguous.

A path consisting of only one table reference denotes the entities within that table.

Path Filters

A filter element can augment a path with a filter expression:

  • parent path / filter

after which the combined path denotes a filtered subset of the entities denoted by parent path where the filter expressed in the filter language evaluates to a true value. The accumulative affect of several filter path elements is a logical conjunction of all the filtering criteria in those elements. It is also intuitive to think of a chain of filter elements as a filtering pipeline or sieve, each eliminating data which does not match the filter criteria.

Table Instance Aliases

The root element or an entity link element may be decorated with an alias prefix:

  • alias := table name
  • parent path / alias := table name
  • parent path / alias :=( column name, … )

This denotes the same entity set as the plain element but also binds the alias as a way to reference a particular table instance from other path elements to the right of the alias binding. All aliases bound in a single path must be distinct. The alias can form a convenient short-hand to avoid repeating long table names, and also enables expression of more complex concepts not otherwise possible.

Path Context Reset

A path can be modified by resetting its denoted entity context:

  • parent path /$ alias

where alias name MUST be a table instance alias already bound by an element within parent path.

This has no effect on the overall joining structure nor filtering of the parent path but changes the denoted entity set to be that of the aliased table instance. It also changes the column resolution logic to attempt to resolve unqualified column names within the aliased table instance rather than right-most entity link element within parent path.

A path can chain a number of entity link elements from left to right to form long, linear joining structures. With the use of path context resets, a path can also form tree-shaped joining structures, i.e. multiple chains of links off a single ancestor table instance within the parent path. It can also be used to “invert” a tree to have several joined structures augmenting the final entity set denoted by the whole path.

Filter Language

The filter element of data paths uses a general filter language described here. There are unary and binary filter predicates, logical combinations, negation, and parenthetic grouping. Together, these language elements allow arbitrarily complex boolean logic functions to be expressed directly, in conjunctive normal form, or in disjunctive normal form.

The operator precedence is as follows:

  1. Parenthetic grouping overrides precedence, causing the expression inside the parenthetic group to be evaluated and its result used as the value of the parenthetic group.
  2. Negation has the highest precedence, negating the immediately following predicate or parenthetic group.
  3. Conjunction using the & operator has the next highest precedence, combining adjacent parenthetic groups, negated predicates, predicates, and conjunctions.
  4. Disjunction using the ; operator has the next highest precedence, combining adjacent parenthetic groups, negated predicates, predicated, conjunctions, and disjunctions.
  5. The path separator / has the lowest precedence, adding complete logical expressions to a path.

Unary Filter Predicate

A unary predicate has the form:

  • column reference operator

There is currently only one unary operator, ::null::, which evaluates True if and only if the column is NULL for the row being tested.

Binary Filter Predicate

A binary predicate as the form:

  • column reference operator literal value
operator meaning notes
= column equals value
::lt:: column less than value
::leq:: column less than or equal to value
::gt:: column greater than value
::geq:: column greater than or equal to value
::regexp:: column matches regular expression value also allowed on * free-text psuedo column
::ciregexp:: column matches regular expression value case-insensitively also allowed on * free-text psuedo column
::ts:: column matches text-search query value also allowed on * free-text psuedo column

Negated Filter

Any predicate or parenthetic filter may be prefixed with the ! negation operator to invert its logical value:

  • ! predicate
  • ! ( logical expression )

The negation operator has higher precedence than conjunctive or disjunctive operators, meaning it negates the nearest predicate or parenthetic expression on the right-hand side before logical operators apply.

Parenthetic Filter

Any predicate, conjunction, or disjunction may be wrapped in parentheses to override any implicit precedence for logical composition:

  • ( logical expression )

Conjunctive Filter

A conjunction (logical AND) uses the & separator:

  • predicate & conjunction
  • predicate & predicate
  • predicate & ! predicate
  • predicate & ( logical expression )
  • predicate & ! ( logical expression )

Individual filter elements in the path are also conjoined (logical AND), but the path separator / cannot appear in a parenthetic group.

Disjunctive Filter

A disjunction (logical OR) uses the ; separator:

  • predicate ; disjunction
  • predicate ; conjunction
  • predicate ; predicate
  • predicate ; ! predicate
  • predicate ; ( logical expression )
  • predicate ; ! ( logical expression )

Conjunctive Normal Form

A filter in conjunctive normal form (CNF) is a conjunction of disjunctions over a set of possibly negated predicate terms. To write a CNF filter in a data resource name, use a sequence of filter path elements, separated by /, to express the top-level conjunction. Use the disjunction separator ; and optional negation prefix ! on individual predicate terms in each disjunctive clause.

Disjunctive Normal Form

A filter in disjunctive normall form (DNF) is a disjunction of conjunctions over a set of possibly negated predicate terms. To write a DNF filter in a data resource name, use a single filter path element using the ; separator to express the top-level disjunction. Use the conjunction separator & and optional negation prefix ! on individual predicate terms in each conjunctive clause.

Sort Modifier

An optional sorting modifier can modify the ordering of elements in the set-based resources denoted by entity, attribute, and attributegroup resource names. This modifier applies sorting based on output columns available in the set-based resource representation and may increase service cost significantly. The modifier has the form:

  • @sort( output column ,)
  • @sort( output column ::desc:: ,)

where the optional ::desc:: direction indicator can apply a descending sort to that sort key to override the default ascending sort order. ERMrest by default sorts equivalently to the SQL sort order ASC NULLS LAST (ascending, with NULLs after non-NULLs) and switches to DESC NULLS FIRST (descending, with NULLs before non-NULLs) when the ::desc:: direction is selected.

The list of sort keys goes left-to-right from primary to secondary etc. The individual output column names are user-supplied values and therefore must be URL-escaped if they contain any special characters, including the : character in implicitly named output columns introduced using the alias : * wildcard syntax in projected attribute names or aggregate names.

The modifier appears as an optional suffix to data names, but before any query parameters in the URL:

  • service /catalog/ cid [ @ revision ] /entity/ path @sort( sort key ,)
    • Each sort key MUST be a column name in the denoted entities since no column renaming is supported in entity resources.
    • The sort modifies the order of the entity records in the external representation.
  • service /catalog/ cid [ @ revision ] /attribute/ path / projection ,@sort( sort key ,)
    • Each sort key MUST refer to a column in the external representation, i.e. after any renaming has been applied.
    • The sort modifies the order of the entity records in the external representation.
  • service /catalog/ cid [ @ revision ] /attributegroup/ path / group key ,; projection ,@sort( sort key ,)
    • Each sort key MUST refer to a column in the external representation, i.e. after any renaming has been applied.
    • The sort modifies the order of the group records in the external representation, i.e. groups are sorted after aggregation has occurred. Sorting by a projection value means sorting by a computed aggregate or an arbitrarily chosen example value when projecting bare columns.

The sort modifier is only meaningful on retrieval requests using the GET method described in Data Operations.

Paging Modifiers

Optional paging modifiers can designate results that come before or after a designated page key in a sorted sequence. A page key is a vector of values taken from a row that falls outside the page, with one component per field in the sort modifier.

The modifier MUST be accompanied by a sort modifier to define the ordering of rows in the result set as well as the ordering of fields of the page key vector. The paging modifiers support a special symbol ::null:: to represent a NULL column value in a page key. For determinism, page keys SHOULD include a non-null, unique key as the least significant key.

Supported combinations:

@after(...) @before(...) ?limit Result set
K1 absent absent All records after K1
K1 absent N First records after K1 limited by page size
K1 K2 N First records after K1 limited by page size or K2 whichever is smaller
K1 K2 absent All records between K1 and K2
absent K2 N Last records before K2 limited by page size

Before Modifier

The @before modifier designates a result set of rows immediately antecedent to the encoded page key, unless combined with the @after modifier:

  • @sort( output column)@before( ,) (i.e. empty string)
  • @sort( output column)@before( value ,) (i.e. literal string)
  • @sort( output column)@before( ::null:: ,) (i.e. NULL)

For each comma-separated output column named in the sort modifier, the corresponding comma-separated value represents a component in the page key vector. The denoted result MUST only include rows which come immediately before the page key according to the sorted sequence semantics (including ascending/descending direction). This means that at the time of evaluation, no rows exist between the returned set and the row identified by the page key vector.

The @before modifier MUST be combined with the @after modifier and/or the ?limit=N query parameter.

After Modifier

The @after modifier designates a result set of rows immediately subsequent to the encoded page key:

  • @sort( output column)@after( ,) (i.e. empty string)
  • @sort( output column)@after( value ,) (i.e. literal string)
  • @sort( output column)@after( ::null:: ,) (i.e. NULL)

For each comma-separated output column named in the sort modifier, the corresponding comma-separated value represents a component in the page key vector. The denoted result MUST only include rows which come immediately after the page key according to the sorted sequence semantics (including ascending/descending direction). This means that at the time of evaluation, no rows exist between the returned set and the row identified by the page key vector.

The @after modifier MAY be combined with the @before modifier and/or the ?limit=N query parameter.

Accept Query Parameter

An optional accept query parameter can override the Accept HTTP header and content-negotiation in data access:

  • service /catalog/ cid [ @ revision ] /entity/ path?accept= t
  • service /catalog/ cid [ @ revision ] /attribute/ path / projection?accept= t
  • service /catalog/ cid [ @ revision ] /attributegroup/ path / group key ; projection?accept= t
  • service /catalog/ cid [ @ revision ] /aggregate/ path / projection?accept= t

If the specified MIME content-type t is one of those supported by the data API, it is selected in preference to normal content-negotiation rules. Otherwise, content-negotiation proceeds as usual. Two short-hand values are recognized:

  • accept=csv is interpreted as accept=text%2Fcsv
  • accept=json is interpreted as accept=application%2Fjson

Note that the content-type t MUST be URL-escaped to protect the / character unless using the short-hands above.

Download Query Parameter

An optional download query parameter can activate a Content-Disposition: attachment response header for GET operations on data resources.

  • service /catalog/ cid [ @ revision ] /entity/ path?download= bn
  • service /catalog/ cid [ @ revision ] /attribute/ path / projection?download= bn
  • service /catalog/ cid [ @ revision ] /attributegroup/ path / group key ; projection?download= bn
  • service /catalog/ cid [ @ revision ] /aggregate/ path / projection?download= bn

The specified file base-name bn MUST be non-empty and SHOULD NOT include a file-extension suffix to indicate the download file type. The bn, when URL-decoded, MUST be a valid UTF-8 string. The service SHOULD append an appropriate suffix based on the negotiated response content type, e.g. .json' or.csv`.

As an example:

GET /ermrest/catalog/1/entity/MyTable?download=My%20File

will produce a response like:

200 OK
Content-Type: application/json
Content-Length: 3
Content-Disposition: attachment; download*=UTF-8''My%20File.json

[]

which the browser will interpret to suggest a local filename such as My File.json.

Defaults Query Parameter

An optional defaults query parameter can be used with the POST operation on the entity API:

  • service /catalog/ cid /entity/ schema name : table name ?defaults= column name ,

A list of one or more column name indicates columns of the target table which should be populated using server-assigned defaults values, ignoring any values provided by the client. See the Entity Creation with Defaults operation documentation for more explanation.

Nondefaults Query Parameter

An optional nondefaults query parameter can be used with the POST operation on the entity API:

  • service /catalog/ cid /entity/ schema name : table name ?nondefaults= column name ,

A list of one or more column name indicates columns of the target table which should be populated using client-supplied values, overriding an implicit default behavior. This is primarily useful for administrative clients who are copying table data including existing RID values from one table or catalog to another. See the Entity Creation with Defaults operation documentation for more explanation.

Limit Query Parameter

An optional limit query parameter can truncate the length of set-based resource representations denoted by entity, attribute, and attributegroup resource names:

  • service /catalog/ cid [ @ revision ] /entity/ path ?limit= n
  • service /catalog/ cid [ @ revision ] /entity/ path @sort( sort key ,) ?limit= n
  • service /catalog/ cid [ @ revision ] /attribute/ path / projection ,?limit= n
  • service /catalog/ cid [ @ revision ] /attribute/ path / projection ,@sort( sort key ,) ?limit= n
  • service /catalog/ cid [ @ revision ] /attributegroup/ path / group key ,; projection ,?limit= n
  • service /catalog/ cid [ @ revision ] /attributegroup/ path / group key ,; projection ,@sort( sort key ,) ?limit= n

If the set denoted by the resource name (without the limit modifier) has k elements, the denoted limited subset will have n members if n < k and will otherwise have all k members. When combined with a sort modifier, the first n members will be returned, otherwise an arbitrary subset will be chosen.

The limit query parameter is only meaningful on retrieval requests using the GET method described in Data Operations.

Data Paging

The sort modifier, limit parameter, and paging modifiers can be combined to express paged access to set-based data resources:

  1. The sort order defines a stable sequence of set elements.
  2. The paging modifiers select set elements following (or preceding) the last-visited element.
  3. The limit parameter defines the number of set elements in the retrieved page.

This allows sequential paging or scrolling of large result sets with reversing/rewind to earlier pages. Because ERMrest supports concurrent retrieval and modification of data resources by multiple clients, it is not sensible to randomly access set elements by stream position offsets (whether by element or page count) because you might skip or repeat elements if preceding elements have been inserted or removed from the sequence in between page requests.

A client can choose an arbitrary application-oriented sort order with paging. However, the client SHOULD include row-level unique key material in the sort and page key to avoid hazards of missing rows that have identical sorting rank due to non-unique page keys. This can be achieved by appending unique key columns to the application sort as the lowest precedence sort criteria, i.e. sort first by an interesting but non-unique property and then finally break ties by a unique serial ID or similar property.

  1. Fetch first page:
    • service /catalog/ cid [ @ revision ] /entity/ path @sort( sort key ,) ?limit= n
  2. Fetch subsequent page by encoding a page key projected from the last row of the preceding page:
    • service /catalog/ cid [ @ revision ] /entity/ path @sort( sort key ,) @after( limit value ,) ?limit= n
  3. Fetch antecedent page by encoding a page key projected from the first row of the subsequent page:
    • service /catalog/ cid [ @ revision ] /entity/ path @sort( sort key ,) @before( limit value ,) ?limit= n

Realize that a sequence of forward and backward page requests through a dataset might not land on the same page boundaries on both visits!

  • Rows might be inserted during a traversal. An inserted row MAY appear in the traversal or MAY be skipped depending on where it falls in the sorted sequence.
  • Rows might be deleted during a traversal. A deleted row MAY appear in the traversal or MAY be skipped depending on where it falls in the sorted sequence.
  • Rows might be mutated such that they change positions in the sorted sequence during a traversal. A mutated table row contains one tuple of data before the mutation and another tuple of data after. A single traversal concurrent with that mutation MAY encounter zero, one, or two copies of the row depending on where they fall in the sorted sequence.