DataPath Example 4

This notebook covers somewhat more advanced examples for using DataPaths. It assumes that you understand the concepts presented in the previous example notebooks.

You should also read the ERMrest documentation and the derivapy wiki. There are more advanced concepts in this notebook that are demonstrated but not fully (re)explained here, as the concepts are explained in other documentation.

Exampe Data Model

The examples require that you understand a little bit about the example catalog data model, which in this case manages data for biological experiments.

Key tables

  • 'dataset' : represents a unit of data usually for a study or set of experiments;
  • 'biosample' : a biosample (describes biological details of a specimen);
  • 'replicate' : a replicate (describes both bio- and technical-replicates);
  • 'experiment' : a bioassay (any type of experiment or assay; e.g., imaging, RNA-seq, ChIP-seq, etc.).

Relationships

  • dataset <- biosample: A dataset may have one to many biosamples. I.e., there is a foreign key reference from biosample to dataset.
  • dataset <- experiment: A dataset may have one to many experiments. I.e., there is a foreign key reference from experiment to dataset.
  • experiment <- replicate: An experiment may have one to many replicates. I.e., there is a foreign key reference from replicate to experiment.
[1]:
# Import deriva modules and pandas DataFrame (for use in examples only)
from deriva.core import ErmrestCatalog, get_credential
from pandas import DataFrame
[2]:
# Connect with the deriva catalog
protocol = 'https'
hostname = 'www.facebase.org'
catalog_number = 1
credential = None
# If you need to authenticate, use Deriva Auth agent and get the credential
# credential = get_credential(hostname)
catalog = ErmrestCatalog(protocol, hostname, catalog_number, credential)
[3]:
# Get the path builder interface for this catalog
pb = catalog.getPathBuilder()

# Get some local variable handles to tables for convenience
dataset = pb.isa.dataset
experiment = pb.isa.experiment
biosample = pb.isa.biosample
replicate = pb.isa.replicate

Implicit DataPaths

Proceed with caution

For compactness, Table objects (and TableAlias objects) provide DataPath-like methods. E.g., link(...), filter(...), and entities(...), which will implicitly create DataPaths rooted at the table and return the newly created path. These operations return the new DataPath rather than mutating the Table (or TableAlias) objects.

[4]:
entities = dataset.filter(dataset.released == True).entities()
len(entities)
[4]:
1049

DataPath-like methods

The DataPath-like methods on Tables are essentially “wrapper” functions over the implicitly generated DataPath rooted at the Table instance. The wrappers include, link(...), filter(...), entities(...), attributes(...), aggregates(...), and groupby(...).

Attribute Examples

Example: selecting all columns of a table instance

Passing a table (or table instance) object to the attributes(...) method will project all (i.e., *) of its attributes.

[5]:
path = dataset.alias('D').path
path.link(experiment).link(replicate)
results = path.attributes(path.D)
print(len(results))
print(results.uri)
14390
https://www.facebase.org/ermrest/catalog/1/attribute/D:=isa:dataset/experiment:=isa:experiment/replicate:=isa:replicate/D:*

It is important to remember that the attributes(...) method returns a result set based on the entity type of the last elmenent of the path. In this example that means the number of results will be determined by the number of unique rows in the replicate table instance in the path created above, as the last link method used the replicate table.

Example: selecting from multiple table instances

More than one table instance may be selected in this manner and it can be mixed and matched with columns from other tables instances.

[6]:
results = path.attributes(path.D,
                          path.experiment.experiment_type,
                          path.replicate)
print(len(results))
print(results.uri)
14390
https://www.facebase.org/ermrest/catalog/1/attribute/D:=isa:dataset/experiment:=isa:experiment/replicate:=isa:replicate/D:*,experiment:experiment_type,replicate:*

If you want to base the results on a different entity, you can introduce a table instance alias into the end of the path, before calling the attributes function. In this case, even though we are asking for the same attributes, we are getting the set of datasets, not the set of replicates. Also, since we are including the attributes from dataset in our query, we know that we will not be seeing any duplicate rows.

[7]:
results = path.D.attributes(path.D,
                            path.experiment.experiment_type,
                            path.replicate)
print(len(results))
print(results.uri)
273
https://www.facebase.org/ermrest/catalog/1/attribute/D:=isa:dataset/experiment:=isa:experiment/replicate:=isa:replicate/$D/D:*,experiment:experiment_type,replicate:*

Filtering Examples

Example: filter on null attribute

To test for a null attribute value, do an equality comparison against the None identity.

[8]:
path = dataset.link(experiment).filter(experiment.molecule_type == None)
print(path.uri)
print(len(path.entities()))
https://www.facebase.org/ermrest/catalog/1/entity/dataset:=isa:dataset/experiment:=isa:experiment/molecule_type::null::
2398

Example: advanced text filters

Deriva supports advanced text filters for regular expressions (regexp), case-instansitive regexp (ciregexp), and text search (ts). You may have to review your text and full-text indexes in your ERMrest catalog before using these features.

[9]:
path = dataset.filter(dataset.description.ciregexp('palate'))
print(path.uri)
print(len(path.entities()))
https://www.facebase.org/ermrest/catalog/1/entity/dataset:=isa:dataset/description::ciregexp::palate
88

Example: negate a filter

Use the “inverse” (‘~’) operator to negate a filter. Negation works against simple comparison filters as demonstrated above as well as on logical operators to be discussed next. You must wrap the comparison or logical operators in an extra parens to use the negate operation, e.g., “~ (...)”.

[10]:
path = dataset.filter( ~ (dataset.description.ciregexp('palate')) )
print(path.uri)
print(len(path.entities()))
https://www.facebase.org/ermrest/catalog/1/entity/dataset:=isa:dataset/!(description::ciregexp::palate)
957

Example: filters with logical operators

This example shows how to combine two comparisons with a conjuncting (i.e., and operator). Because Python’s logical-and (and) keyword cannot be overloaded, we instead overload the bitwise-and (&) operator. This approach has become customary among many similar data access libraries.

[11]:
path = dataset.link(biosample).filter(
    ((biosample.species == 'NCBITAXON:10090') & (biosample.anatomy == 'UBERON:0002490')))

print(path.uri)
https://www.facebase.org/ermrest/catalog/1/entity/dataset:=isa:dataset/biosample:=isa:biosample/(species=NCBITAXON%3A10090)&(anatomy=UBERON%3A0002490)
[12]:
DataFrame(path.entities())
[12]:
RID dataset local_identifier species specimen gene genotype strain mutation stage ... litter collection_date _keywords RCB RMB RCT RMT cell_source cell_characterization treatment
0 1-4TT8 1-4TSP scWFE18_S197 NCBITAXON:10090 FACEBASE:1-4GNR None FACEBASE:1-4GV2 MGI:3028467 None FACEBASE:1-4GJA ... None None None https://auth.globus.org/8ae274db-d033-47eb-bd3... https://auth.globus.org/bb256144-d274-11e5-adb... 2018-12-11T03:42:13.870864+00:00 2021-10-07T17:51:24.42557+00:00 None None None
1 1-8DVG 1-4TSP scWFE18_S616 NCBITAXON:10090 FACEBASE:1-4GNR None FACEBASE:1-4GV2 MGI:3028467 None FACEBASE:1-4GJA ... None 2019-06-19 None https://auth.globus.org/8ae274db-d033-47eb-bd3... https://auth.globus.org/bb256144-d274-11e5-adb... 2020-02-13T16:45:26.843631+00:00 2021-10-07T17:51:24.42557+00:00 None None None
2 1-8DVM 1-4TSP scWFE18_S631 NCBITAXON:10090 FACEBASE:1-4GNR None FACEBASE:1-4GV2 MGI:3028467 None FACEBASE:1-4GJA ... None 2019-06-18 None https://auth.globus.org/8ae274db-d033-47eb-bd3... https://auth.globus.org/bb256144-d274-11e5-adb... 2020-02-13T17:03:19.454718+00:00 2021-10-07T17:51:24.42557+00:00 None None None
3 1-C3PR 1-4TSP scTFE18_S979 NCBITAXON:10090 FACEBASE:1-4GNR None FACEBASE:1-4GV0 MGI:3028467 None FACEBASE:1-4GJA ... None 2019-12-11 None https://auth.globus.org/8ae274db-d033-47eb-bd3... https://auth.globus.org/bb256144-d274-11e5-adb... 2020-03-17T16:51:19.48232+00:00 2021-10-07T17:51:24.42557+00:00 None None None
4 1-C3PT 1-4TSP scAFE18_S1075 NCBITAXON:10090 FACEBASE:1-4GNR None FACEBASE:1-4GT6 MGI:3028467 None FACEBASE:1-4GJA ... None 2019-12-24 None https://auth.globus.org/8ae274db-d033-47eb-bd3... https://auth.globus.org/bb256144-d274-11e5-adb... 2020-03-17T16:52:41.937568+00:00 2021-10-07T17:51:24.42557+00:00 None None None
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
67 3Q1E VHE W8IF1SM NCBITAXON:10090 FACEBASE:1-4GNG None FACEBASE:1-4GV2 MGI:3028467 None FACEBASE:1-4GJA ... 4/15/14WL1-1 2014-04-15 None https://auth.globus.org/b506963e-d274-11e5-99f... https://auth.globus.org/bb256144-d274-11e5-adb... 2018-03-16T19:03:38.228586+00:00 2021-10-07T17:51:24.42557+00:00 None None None
68 3Q1J VHE W8IF2SM NCBITAXON:10090 FACEBASE:1-4GNG None FACEBASE:1-4GV2 MGI:3028467 None FACEBASE:1-4GJA ... 4/15/14WL1-2 2014-04-15 None https://auth.globus.org/b506963e-d274-11e5-99f... https://auth.globus.org/bb256144-d274-11e5-adb... 2018-03-16T19:03:38.228586+00:00 2021-10-07T17:51:24.42557+00:00 None None None
69 3Q1P VHE W8IF3SM NCBITAXON:10090 FACEBASE:1-4GNG None FACEBASE:1-4GV2 MGI:3028467 None FACEBASE:1-4GJA ... 4/14/14WL1-6 2014-04-14 None https://auth.globus.org/b506963e-d274-11e5-99f... https://auth.globus.org/bb256144-d274-11e5-adb... 2018-03-16T19:03:38.228586+00:00 2021-10-07T17:51:24.42557+00:00 None None None
70 3Q1T VHE W8IF4SM NCBITAXON:10090 FACEBASE:1-4GNG None FACEBASE:1-4GV2 MGI:3028467 None FACEBASE:1-4GJA ... 4/15/14WL1-7 2014-04-15 None https://auth.globus.org/b506963e-d274-11e5-99f... https://auth.globus.org/bb256144-d274-11e5-adb... 2018-03-16T19:03:38.228586+00:00 2021-10-07T17:51:24.42557+00:00 None None None
71 3Q1Y VHE W8IF5SM NCBITAXON:10090 FACEBASE:1-4GNG None FACEBASE:1-4GV2 MGI:3028467 None FACEBASE:1-4GJA ... 4/28/14WL1-1 2014-04-28 None https://auth.globus.org/b506963e-d274-11e5-99f... https://auth.globus.org/bb256144-d274-11e5-adb... 2018-03-16T19:03:38.228586+00:00 2021-10-07T17:51:24.42557+00:00 None None None

72 rows × 25 columns

Example: combine conjunction and disjunctions in filters

Similar to the prior example, the filters allow combining of conjunctive and disjunctive operators. Like the bitwise-and operator, we also overload the bitwise-or (|) operator because the logical-or (or) operatar cannot be overloaded.

[13]:
path = dataset.link(biosample).filter(
    ((biosample.species == 'NCBITAXON:10090') & (biosample.anatomy == 'UBERON:0002490')) |
    ((biosample.specimen == 'FACEBASE:1-4GNR') & (biosample.stage == 'FACEBASE:1-4GJA')))

print(path.uri)
https://www.facebase.org/ermrest/catalog/1/entity/dataset:=isa:dataset/biosample:=isa:biosample/((species=NCBITAXON%3A10090)&(anatomy=UBERON%3A0002490));((specimen=FACEBASE%3A1-4GNR)&(stage=FACEBASE%3A1-4GJA))
[14]:
DataFrame(path.entities())
[14]:
RID dataset local_identifier species specimen gene genotype strain mutation stage ... litter collection_date _keywords RCB RMB RCT RMT cell_source cell_characterization treatment
0 1-4TT8 1-4TSP scWFE18_S197 NCBITAXON:10090 FACEBASE:1-4GNR None FACEBASE:1-4GV2 MGI:3028467 None FACEBASE:1-4GJA ... None None None https://auth.globus.org/8ae274db-d033-47eb-bd3... https://auth.globus.org/bb256144-d274-11e5-adb... 2018-12-11T03:42:13.870864+00:00 2021-10-07T17:51:24.42557+00:00 None None None
1 1-8DVG 1-4TSP scWFE18_S616 NCBITAXON:10090 FACEBASE:1-4GNR None FACEBASE:1-4GV2 MGI:3028467 None FACEBASE:1-4GJA ... None 2019-06-19 None https://auth.globus.org/8ae274db-d033-47eb-bd3... https://auth.globus.org/bb256144-d274-11e5-adb... 2020-02-13T16:45:26.843631+00:00 2021-10-07T17:51:24.42557+00:00 None None None
2 1-8DVM 1-4TSP scWFE18_S631 NCBITAXON:10090 FACEBASE:1-4GNR None FACEBASE:1-4GV2 MGI:3028467 None FACEBASE:1-4GJA ... None 2019-06-18 None https://auth.globus.org/8ae274db-d033-47eb-bd3... https://auth.globus.org/bb256144-d274-11e5-adb... 2020-02-13T17:03:19.454718+00:00 2021-10-07T17:51:24.42557+00:00 None None None
3 1-C3PR 1-4TSP scTFE18_S979 NCBITAXON:10090 FACEBASE:1-4GNR None FACEBASE:1-4GV0 MGI:3028467 None FACEBASE:1-4GJA ... None 2019-12-11 None https://auth.globus.org/8ae274db-d033-47eb-bd3... https://auth.globus.org/bb256144-d274-11e5-adb... 2020-03-17T16:51:19.48232+00:00 2021-10-07T17:51:24.42557+00:00 None None None
4 1-C3PT 1-4TSP scAFE18_S1075 NCBITAXON:10090 FACEBASE:1-4GNR None FACEBASE:1-4GT6 MGI:3028467 None FACEBASE:1-4GJA ... None 2019-12-24 None https://auth.globus.org/8ae274db-d033-47eb-bd3... https://auth.globus.org/bb256144-d274-11e5-adb... 2020-03-17T16:52:41.937568+00:00 2021-10-07T17:51:24.42557+00:00 None None None
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
71 3Q1P VHE W8IF3SM NCBITAXON:10090 FACEBASE:1-4GNG None FACEBASE:1-4GV2 MGI:3028467 None FACEBASE:1-4GJA ... 4/14/14WL1-6 2014-04-14 None https://auth.globus.org/b506963e-d274-11e5-99f... https://auth.globus.org/bb256144-d274-11e5-adb... 2018-03-16T19:03:38.228586+00:00 2021-10-07T17:51:24.42557+00:00 None None None
72 3Q1T VHE W8IF4SM NCBITAXON:10090 FACEBASE:1-4GNG None FACEBASE:1-4GV2 MGI:3028467 None FACEBASE:1-4GJA ... 4/15/14WL1-7 2014-04-15 None https://auth.globus.org/b506963e-d274-11e5-99f... https://auth.globus.org/bb256144-d274-11e5-adb... 2018-03-16T19:03:38.228586+00:00 2021-10-07T17:51:24.42557+00:00 None None None
73 3Q1Y VHE W8IF5SM NCBITAXON:10090 FACEBASE:1-4GNG None FACEBASE:1-4GV2 MGI:3028467 None FACEBASE:1-4GJA ... 4/28/14WL1-1 2014-04-28 None https://auth.globus.org/b506963e-d274-11e5-99f... https://auth.globus.org/bb256144-d274-11e5-adb... 2018-03-16T19:03:38.228586+00:00 2021-10-07T17:51:24.42557+00:00 None None None
74 3V06 3TYP scWCE18_S198 NCBITAXON:10090 FACEBASE:1-4GNR None FACEBASE:1-4GV2 MGI:3028467 None FACEBASE:1-4GJA ... 6/6/17WL1,L2,L3 2017-06-06 None https://auth.globus.org/f8ae714f-6015-48da-971... https://auth.globus.org/bb256144-d274-11e5-adb... 2018-05-02T18:27:13.657825+00:00 2021-10-07T17:51:24.42557+00:00 None None None
75 3V1P 3V0E scWLE18_S199 NCBITAXON:10090 FACEBASE:1-4GNR None FACEBASE:1-4GV2 MGI:3028467 None FACEBASE:1-4GJA ... 6/6/17WL1,L2,L3 2017-06-06 None https://auth.globus.org/8ae274db-d033-47eb-bd3... https://auth.globus.org/bb256144-d274-11e5-adb... 2018-05-02T18:50:12.225872+00:00 2021-10-07T17:51:24.42557+00:00 None None None

76 rows × 25 columns

Example: filtering at different stages of the path

Filtering a path does not have to be done at the end of a path. In fact, the initial intention of the ERMrest URI was to mimick “RESTful” semantics where a RESTful “resource” is identified, then filtered, then a “sub-resource” is identified, and then filtered, and so on.

[15]:
path = dataset.filter(dataset.release_date >= '2017-01-01') \
    .link(experiment).filter(experiment.experiment_type == 'OBI:0001271') \
    .link(replicate).filter(replicate.bioreplicate_number == 1)

print(path.uri)
https://www.facebase.org/ermrest/catalog/1/entity/dataset:=isa:dataset/release_date::geq::2017-01-01/experiment:=isa:experiment/experiment_type=OBI%3A0001271/replicate:=isa:replicate/bioreplicate_number=1
[16]:
DataFrame(path.entities())
[16]:
RID dataset biosample bioreplicate_number technical_replicate_number RCB RMB RCT RMT experiment
0 1-3T5A 1-3SWE 1-3T0A 1 1 https://auth.globus.org/f8ae714f-6015-48da-971... https://auth.globus.org/f8ae714f-6015-48da-971... 2018-06-13T23:41:16.162698+00:00 2018-06-13T23:41:16.162698+00:00 1-3SZA
1 1-3T5E 1-3SWE 1-3T0E 1 1 https://auth.globus.org/f8ae714f-6015-48da-971... https://auth.globus.org/f8ae714f-6015-48da-971... 2018-06-13T23:41:16.162698+00:00 2018-06-13T23:41:16.162698+00:00 1-3SZA
2 1-3T5J 1-3SWE 1-3T0J 1 1 https://auth.globus.org/f8ae714f-6015-48da-971... https://auth.globus.org/f8ae714f-6015-48da-971... 2018-06-13T23:41:16.162698+00:00 2018-06-13T23:41:16.162698+00:00 1-3SZA
3 1-3T5P 1-3SWE 1-3T0P 1 1 https://auth.globus.org/f8ae714f-6015-48da-971... https://auth.globus.org/f8ae714f-6015-48da-971... 2018-06-13T23:41:16.162698+00:00 2018-06-13T23:41:16.162698+00:00 1-3SZA
4 1-3T5T 1-3SWE 1-3T0T 1 1 https://auth.globus.org/f8ae714f-6015-48da-971... https://auth.globus.org/f8ae714f-6015-48da-971... 2018-06-13T23:41:16.162698+00:00 2018-06-13T23:41:16.162698+00:00 1-3SZA
... ... ... ... ... ... ... ... ... ... ...
359 3TSJ 2ARP 30GP 1 1 https://auth.globus.org/b506963e-d274-11e5-99f... None 2018-04-11T19:28:24.259798+00:00 2018-06-06T01:44:38.502116+00:00 3422
360 A-XH76 1-X42W A-XH6P 1 1 https://auth.globus.org/93416595-2bc4-42a8-b5b... https://auth.globus.org/93416595-2bc4-42a8-b5b... 2021-11-24T00:35:15.863829+00:00 2021-11-24T00:35:15.863829+00:00 A-XH72
361 A-XH7R 1-X42W A-XH6W 1 1 https://auth.globus.org/93416595-2bc4-42a8-b5b... https://auth.globus.org/93416595-2bc4-42a8-b5b... 2021-11-24T00:38:56.917633+00:00 2021-11-24T00:38:56.917633+00:00 A-XH7M
362 Z-2WMG Z-2NMR Z-2WMC 1 1 https://auth.globus.org/93416595-2bc4-42a8-b5b... https://auth.globus.org/b506963e-d274-11e5-99f... 2022-03-25T23:21:09.358366+00:00 2022-04-04T19:24:52.771093+00:00 Z-2WM6
363 Z-2WMM Z-2NMR Z-2WME 1 1 https://auth.globus.org/93416595-2bc4-42a8-b5b... https://auth.globus.org/b506963e-d274-11e5-99f... 2022-03-25T23:22:19.554979+00:00 2022-04-04T19:26:09.037904+00:00 Z-2WMJ

364 rows × 10 columns

Linking Examples

Faceting Examples

You may have noticed that in the examples above, the ‘species’ and ‘experiment_type’ attributes are identifiers (‘CURIE’s to be precise). We may want to construct filters on our datasets based on these categories. This can be used for “faceted search” modes and can be useful even within the context of programmatic access to data in the catalog.

Example: faceting on “related” tables

Let’s say we want to find all of the biosamples in our catalog where their species are ‘Mus musculus’ and their age stage are ‘E10.5’.

We need to extend our understanding of the data model with the following tables that are related to ‘biosample’. - isa.biosample.species -> vocab.species: the biosample table has a foreign key reference to the ‘species’ table. - isa.biosample.stage -> vocab.stage: the biosample table has a foreign key reference to the ‘stage’ table.

We may say that species and stage are related to the biosample table in the sense that biosample has a direct foreign key relationship from it to them.

For convenience, we will get local variables for the species and stage tables.

[22]:
species = pb.vocab.species
stage = pb.vocab.stage

First, let’s link samples with species and filter on the term “Mus musculus” (i.e., “mouse”).

[23]:
# Here we have to use the container `columns_definitions` because `name` is reserved
path = biosample.alias('S').link(species).filter(species.column_definitions['name'] == 'Mus musculus')
print(path.uri)
https://www.facebase.org/ermrest/catalog/1/entity/S:=isa:biosample/species:=vocab:species/name=Mus%20musculus

Now the context of the path is the species table instance, but we need to link from the biosample to the age stage table.

To do so, we reference the biosample table instance, in this case using its alias S. Then we link off of that table instance which updates the path itself.

[24]:
path.S.link(stage).filter(stage.column_definitions['name'] == 'E10.5')
print(path.uri)
https://www.facebase.org/ermrest/catalog/1/entity/S:=isa:biosample/species:=vocab:species/name=Mus%20musculus/$S/stage:=vocab:stage/name=E10.5

Now, the path context is the age stage table instance, but we wanted to get the entities for the biosample table. To do so, again we will reference the biosample table instance by the alias S we used. From there, we will call the entities(...) method to get the samples.

[25]:
results = path.S.attributes(path.S.RID,
                            path.S.collection_date,
                            path.species.column_definitions['name'].alias('species'),
                            path.species.column_definitions['uri'].alias('species_uri'),
                            path.stage.column_definitions['name'].alias('stage'),
                            path.stage.column_definitions['uri'].alias('stage_uri'))
print(results.uri)
https://www.facebase.org/ermrest/catalog/1/attribute/S:=isa:biosample/species:=vocab:species/name=Mus%20musculus/$S/stage:=vocab:stage/name=E10.5/$S/S:RID,S:collection_date,species:=species:name,species_uri:=species:uri,stage:=stage:name,stage_uri:=stage:uri
[26]:
DataFrame(results)
[26]:
RID collection_date species species_uri stage stage_uri
0 1-4TQ8 None Mus musculus http://purl.obolibrary.org/obo/NCBITaxon_10090 E10.5 https://www.facebase.org/id/1-4GJ0
1 1-4TR6 None Mus musculus http://purl.obolibrary.org/obo/NCBITaxon_10090 E10.5 https://www.facebase.org/id/1-4GJ0
2 1-4TRG None Mus musculus http://purl.obolibrary.org/obo/NCBITaxon_10090 E10.5 https://www.facebase.org/id/1-4GJ0
3 18-XYWP None Mus musculus http://purl.obolibrary.org/obo/NCBITaxon_10090 E10.5 https://www.facebase.org/id/1-4GJ0
4 18-XYWR None Mus musculus http://purl.obolibrary.org/obo/NCBITaxon_10090 E10.5 https://www.facebase.org/id/1-4GJ0
... ... ... ... ... ... ...
527 3-SHHP None Mus musculus http://purl.obolibrary.org/obo/NCBITaxon_10090 E10.5 https://www.facebase.org/id/1-4GJ0
528 3T3P None Mus musculus http://purl.obolibrary.org/obo/NCBITaxon_10090 E10.5 https://www.facebase.org/id/1-4GJ0
529 3T3T None Mus musculus http://purl.obolibrary.org/obo/NCBITaxon_10090 E10.5 https://www.facebase.org/id/1-4GJ0
530 3T3Y None Mus musculus http://purl.obolibrary.org/obo/NCBITaxon_10090 E10.5 https://www.facebase.org/id/1-4GJ0
531 6-2NNR None Mus musculus http://purl.obolibrary.org/obo/NCBITaxon_10090 E10.5 https://www.facebase.org/id/1-4GJ0

532 rows × 6 columns

Grouping Examples

Now support you would like to aggregate all of the vocabulary terms associated with a Dataset. Here, we examine what happens when you have a model such that dataset <- dataset_VOCAB -> VOCAB where VOCAB is a placeholder for a table that includes a vocabulary term set. These tables typically have a name column for the human-readable preferred label to go along with the formal URI or CURIE of the concept class.

[27]:
# We need to import the `ArrayD` aggregate function for this example.
from deriva.core.datapath import ArrayD

# For convenience, get python objects for the additional tables.
dataset_species = pb.isa.dataset_species
dataset_experiment_type = pb.isa.dataset_experiment_type
species = pb.vocab.species
experiment_type = pb.vocab.experiment_type

# Start by doing a couple left outer joins on the dataset-term association tables, then link
# (i.e., inner join) the associated vocabulary term table, then reset the context back to the
# dataset table.
path = dataset.link(dataset_species, on=dataset.id==dataset_species.dataset_id, join_type='left') \
              .link(species) \
              .dataset \
              .link(dataset_experiment_type, on=dataset.id==dataset_experiment_type.dataset_id, join_type='left') \
              .link(experiment_type)

# Again, notice that we reset the context to the `dataset` table alias so that we will retrieve
# dataset entities based on the groupings to be defined next. For the groupby key we will use the
# dataset.RID, but for this example any primary key would work. Then we will get aggregate arrays
# of the linked vocabulary tables.
results = path.dataset.groupby(dataset.RID).attributes(
    dataset.title,
    ArrayD(path.species.column_definitions['name']).alias('species'),
    ArrayD(path.experiment_type.column_definitions['name']).alias('experiment_type')
)

#results = path.dataset.entities()
print(results.uri)
print(len(results))
https://www.facebase.org/ermrest/catalog/1/attributegroup/dataset:=isa:dataset/dataset_species:=left(id)=(isa:dataset_species:dataset_id)/species:=vocab:species/$dataset/dataset_experiment_type:=left(id)=(isa:dataset_experiment_type:dataset_id)/experiment_type:=vocab:experiment_type/$dataset/RID;title,species:=array_d(species:name),experiment_type:=array_d(experiment_type:name)
1048
[28]:
DataFrame(results.fetch(limit=20))
[28]:
RID title species experiment_type
0 1-3SB2 Activity of human neural crest enhancer near G... [Mus musculus] [enhancer activity detection by reporter gene ...
1 1-3SGA Activity of chimp neural crest enhancer near G... [Mus musculus] [enhancer activity detection by reporter gene ...
2 1-3SM2 Activity of human neural crest enhancer near F... [Mus musculus] [enhancer activity detection by reporter gene ...
3 1-3SQJ Activity of chimp neural crest enhancer near F... [Mus musculus] [enhancer activity detection by reporter gene ...
4 1-3SVP FB0023_18mo male with hypertelorism_Candidate ... [Homo sapiens] [exome sequencing assay, genotyping assay, ima...
5 1-3SVT FB0043_3 male cousins with natal teeth and ano... [Homo sapiens] [exome sequencing assay, genotyping assay, ima...
6 1-3SVY FB0051_6 year old male with hypertelorism (mar... [Homo sapiens] [exome sequencing assay, genotyping assay, ima...
7 1-3SW2 FB0064_Male with Congenital craniosynostosis_C... [Homo sapiens] [exome sequencing assay, genotyping assay, ima...
8 1-3SW6 FB0115_11yo male with dysmorphic facial featur... [Homo sapiens] [exome sequencing assay, genotyping assay, ima...
9 1-3SWA FB0122_29yo female with bilateral hearing loss... [Homo sapiens] [exome sequencing assay, genotyping assay, ima...
10 1-3SWE Interpalatine suture, WT and Fgfr2+/S252W (Ape... [Mus musculus] [RNA-seq assay]
11 1-3SXY Maxillary-Palatine Suture, WT and Fgfr2+/S252W... [Mus musculus] [RNA-seq assay]
12 1-3V7A FB0011_9 year old female with single central m... [Homo sapiens] [exome sequencing assay, genotyping assay, ima...
13 1-3V7J FB0015_Male with bilateral cleft lip and palate [Homo sapiens] [exome sequencing assay, genotyping assay, ima...
14 1-3V7P FB0027_29 month old male_Multiple congenital a... [Homo sapiens] [exome sequencing assay, genotyping assay, ima...
15 1-3V8A FB0039_"Identical Male Twins"_Treacher Collins... [Homo sapiens] [exome sequencing assay, genotyping assay, ima...
16 1-3V8Y FB0041_8 yo male_Brachi-Oto-Renal (BOR) Syndrome [Homo sapiens] [genotyping assay, imaging assay, Morphometric...
17 1-3V9E FB0049_17 month old male with bilateral macros... [Homo sapiens] [exome sequencing assay, genotyping assay, ima...
18 1-3VAT FB0062_Proband Only Saliva_Blepharo-cheilo-don... [Homo sapiens] [exome sequencing assay, genotyping assay, ima...
19 1-3VBE FB0124_Trio Saliva_5 mo male with Hallermann S... [Homo sapiens] [exome sequencing assay, genotyping assay, ima...
[ ]: