DataPath Example 4¶
This notebook covers somewhat more advanced examples for using DataPath
s. It assumes that you understand the concepts presented in the previous example notebooks.
You should also read the ERMrest documentation and the derivapy wiki. There are more advanced concepts in this notebook that are demonstrated but not fully (re)explained here, as the concepts are explained in other documentation.
Exampe Data Model¶
The examples require that you understand a little bit about the example catalog data model, which in this case manages data for biological experiments.
Key tables¶
'dataset'
: represents a unit of data usually for a study or set of experiments;'biosample'
: a biosample (describes biological details of a specimen);'replicate'
: a replicate (describes both bio- and technical-replicates);'experiment'
: a bioassay (any type of experiment or assay; e.g., imaging, RNA-seq, ChIP-seq, etc.).
Relationships¶
dataset <- biosample
: A dataset may have one to many biosamples. I.e., there is a foreign key reference from biosample to dataset.dataset <- experiment
: A dataset may have one to many experiments. I.e., there is a foreign key reference from experiment to dataset.experiment <- replicate
: An experiment may have one to many replicates. I.e., there is a foreign key reference from replicate to experiment.
[1]:
# Import deriva modules and pandas DataFrame (for use in examples only)
from deriva.core import ErmrestCatalog, get_credential
from pandas import DataFrame
[2]:
# Connect with the deriva catalog
protocol = 'https'
hostname = 'www.facebase.org'
catalog_number = 1
credential = None
# If you need to authenticate, use Deriva Auth agent and get the credential
# credential = get_credential(hostname)
catalog = ErmrestCatalog(protocol, hostname, catalog_number, credential)
[3]:
# Get the path builder interface for this catalog
pb = catalog.getPathBuilder()
# Get some local variable handles to tables for convenience
dataset = pb.isa.dataset
experiment = pb.isa.experiment
biosample = pb.isa.biosample
replicate = pb.isa.replicate
Implicit DataPaths¶
Proceed with caution
For compactness, Table
objects (and TableAlias
objects) provide DataPath
-like methods. E.g., link(...)
, filter(...)
, and entities(...)
, which will implicitly create DataPath
s rooted at the table and return the newly created path. These operations return
the new DataPath
rather than mutating the Table
(or TableAlias
) objects.
[4]:
entities = dataset.filter(dataset.released == True).entities()
len(entities)
[4]:
1049
DataPath-like methods¶
The DataPath
-like methods on Table
s are essentially “wrapper” functions over the implicitly generated DataPath
rooted at the Table
instance. The wrappers include, link(...)
, filter(...)
, entities(...)
, attributes(...)
, aggregates(...)
, and groupby(...)
.
Attribute Examples¶
Example: selecting all columns of a table instance¶
Passing a table (or table instance) object to the attributes(...)
method will project all (i.e., *
) of its attributes.
[5]:
path = dataset.alias('D').path
path.link(experiment).link(replicate)
results = path.attributes(path.D)
print(len(results))
print(results.uri)
14390
https://www.facebase.org/ermrest/catalog/1/attribute/D:=isa:dataset/experiment:=isa:experiment/replicate:=isa:replicate/D:*
It is important to remember that the attributes(...)
method returns a result set based on the entity type of the last elmenent of the path. In this example that means the number of results will be determined by the number of unique rows in the replicate table instance in the path created above, as the last link method used the replicate table.
Example: selecting from multiple table instances¶
More than one table instance may be selected in this manner and it can be mixed and matched with columns from other tables instances.
[6]:
results = path.attributes(path.D,
path.experiment.experiment_type,
path.replicate)
print(len(results))
print(results.uri)
14390
https://www.facebase.org/ermrest/catalog/1/attribute/D:=isa:dataset/experiment:=isa:experiment/replicate:=isa:replicate/D:*,experiment:experiment_type,replicate:*
If you want to base the results on a different entity, you can introduce a table instance alias into the end of the path, before calling the attributes function. In this case, even though we are asking for the same attributes, we are getting the set of datasets, not the set of replicates. Also, since we are including the attributes from dataset in our query, we know that we will not be seeing any duplicate rows.
[7]:
results = path.D.attributes(path.D,
path.experiment.experiment_type,
path.replicate)
print(len(results))
print(results.uri)
273
https://www.facebase.org/ermrest/catalog/1/attribute/D:=isa:dataset/experiment:=isa:experiment/replicate:=isa:replicate/$D/D:*,experiment:experiment_type,replicate:*
Filtering Examples¶
Example: filter on null
attribute¶
To test for a null
attribute value, do an equality comparison against the None
identity.
[8]:
path = dataset.link(experiment).filter(experiment.molecule_type == None)
print(path.uri)
print(len(path.entities()))
https://www.facebase.org/ermrest/catalog/1/entity/dataset:=isa:dataset/experiment:=isa:experiment/molecule_type::null::
2398
Example: advanced text filters¶
Deriva supports advanced text filters for regular expressions (regexp
), case-instansitive regexp (ciregexp
), and text search (ts
). You may have to review your text and full-text indexes in your ERMrest catalog before using these features.
[9]:
path = dataset.filter(dataset.description.ciregexp('palate'))
print(path.uri)
print(len(path.entities()))
https://www.facebase.org/ermrest/catalog/1/entity/dataset:=isa:dataset/description::ciregexp::palate
88
Example: negate a filter¶
Use the “inverse” (‘~
’) operator to negate a filter. Negation works against simple comparison filters as demonstrated above as well as on logical operators to be discussed next. You must wrap the comparison or logical operators in an extra parens to use the negate operation, e.g., “~ (...)
”.
[10]:
path = dataset.filter( ~ (dataset.description.ciregexp('palate')) )
print(path.uri)
print(len(path.entities()))
https://www.facebase.org/ermrest/catalog/1/entity/dataset:=isa:dataset/!(description::ciregexp::palate)
957
Example: filters with logical operators¶
This example shows how to combine two comparisons with a conjuncting (i.e., and
operator). Because Python’s logical-and (and
) keyword cannot be overloaded, we instead overload the bitwise-and (&
) operator. This approach has become customary among many similar data access libraries.
[11]:
path = dataset.link(biosample).filter(
((biosample.species == 'NCBITAXON:10090') & (biosample.anatomy == 'UBERON:0002490')))
print(path.uri)
https://www.facebase.org/ermrest/catalog/1/entity/dataset:=isa:dataset/biosample:=isa:biosample/(species=NCBITAXON%3A10090)&(anatomy=UBERON%3A0002490)
[12]:
DataFrame(path.entities())
[12]:
RID | dataset | local_identifier | species | specimen | gene | genotype | strain | mutation | stage | ... | litter | collection_date | _keywords | RCB | RMB | RCT | RMT | cell_source | cell_characterization | treatment | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1-4TT8 | 1-4TSP | scWFE18_S197 | NCBITAXON:10090 | FACEBASE:1-4GNR | None | FACEBASE:1-4GV2 | MGI:3028467 | None | FACEBASE:1-4GJA | ... | None | None | None | https://auth.globus.org/8ae274db-d033-47eb-bd3... | https://auth.globus.org/bb256144-d274-11e5-adb... | 2018-12-11T03:42:13.870864+00:00 | 2021-10-07T17:51:24.42557+00:00 | None | None | None |
1 | 1-8DVG | 1-4TSP | scWFE18_S616 | NCBITAXON:10090 | FACEBASE:1-4GNR | None | FACEBASE:1-4GV2 | MGI:3028467 | None | FACEBASE:1-4GJA | ... | None | 2019-06-19 | None | https://auth.globus.org/8ae274db-d033-47eb-bd3... | https://auth.globus.org/bb256144-d274-11e5-adb... | 2020-02-13T16:45:26.843631+00:00 | 2021-10-07T17:51:24.42557+00:00 | None | None | None |
2 | 1-8DVM | 1-4TSP | scWFE18_S631 | NCBITAXON:10090 | FACEBASE:1-4GNR | None | FACEBASE:1-4GV2 | MGI:3028467 | None | FACEBASE:1-4GJA | ... | None | 2019-06-18 | None | https://auth.globus.org/8ae274db-d033-47eb-bd3... | https://auth.globus.org/bb256144-d274-11e5-adb... | 2020-02-13T17:03:19.454718+00:00 | 2021-10-07T17:51:24.42557+00:00 | None | None | None |
3 | 1-C3PR | 1-4TSP | scTFE18_S979 | NCBITAXON:10090 | FACEBASE:1-4GNR | None | FACEBASE:1-4GV0 | MGI:3028467 | None | FACEBASE:1-4GJA | ... | None | 2019-12-11 | None | https://auth.globus.org/8ae274db-d033-47eb-bd3... | https://auth.globus.org/bb256144-d274-11e5-adb... | 2020-03-17T16:51:19.48232+00:00 | 2021-10-07T17:51:24.42557+00:00 | None | None | None |
4 | 1-C3PT | 1-4TSP | scAFE18_S1075 | NCBITAXON:10090 | FACEBASE:1-4GNR | None | FACEBASE:1-4GT6 | MGI:3028467 | None | FACEBASE:1-4GJA | ... | None | 2019-12-24 | None | https://auth.globus.org/8ae274db-d033-47eb-bd3... | https://auth.globus.org/bb256144-d274-11e5-adb... | 2020-03-17T16:52:41.937568+00:00 | 2021-10-07T17:51:24.42557+00:00 | None | None | None |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
67 | 3Q1E | VHE | W8IF1SM | NCBITAXON:10090 | FACEBASE:1-4GNG | None | FACEBASE:1-4GV2 | MGI:3028467 | None | FACEBASE:1-4GJA | ... | 4/15/14WL1-1 | 2014-04-15 | None | https://auth.globus.org/b506963e-d274-11e5-99f... | https://auth.globus.org/bb256144-d274-11e5-adb... | 2018-03-16T19:03:38.228586+00:00 | 2021-10-07T17:51:24.42557+00:00 | None | None | None |
68 | 3Q1J | VHE | W8IF2SM | NCBITAXON:10090 | FACEBASE:1-4GNG | None | FACEBASE:1-4GV2 | MGI:3028467 | None | FACEBASE:1-4GJA | ... | 4/15/14WL1-2 | 2014-04-15 | None | https://auth.globus.org/b506963e-d274-11e5-99f... | https://auth.globus.org/bb256144-d274-11e5-adb... | 2018-03-16T19:03:38.228586+00:00 | 2021-10-07T17:51:24.42557+00:00 | None | None | None |
69 | 3Q1P | VHE | W8IF3SM | NCBITAXON:10090 | FACEBASE:1-4GNG | None | FACEBASE:1-4GV2 | MGI:3028467 | None | FACEBASE:1-4GJA | ... | 4/14/14WL1-6 | 2014-04-14 | None | https://auth.globus.org/b506963e-d274-11e5-99f... | https://auth.globus.org/bb256144-d274-11e5-adb... | 2018-03-16T19:03:38.228586+00:00 | 2021-10-07T17:51:24.42557+00:00 | None | None | None |
70 | 3Q1T | VHE | W8IF4SM | NCBITAXON:10090 | FACEBASE:1-4GNG | None | FACEBASE:1-4GV2 | MGI:3028467 | None | FACEBASE:1-4GJA | ... | 4/15/14WL1-7 | 2014-04-15 | None | https://auth.globus.org/b506963e-d274-11e5-99f... | https://auth.globus.org/bb256144-d274-11e5-adb... | 2018-03-16T19:03:38.228586+00:00 | 2021-10-07T17:51:24.42557+00:00 | None | None | None |
71 | 3Q1Y | VHE | W8IF5SM | NCBITAXON:10090 | FACEBASE:1-4GNG | None | FACEBASE:1-4GV2 | MGI:3028467 | None | FACEBASE:1-4GJA | ... | 4/28/14WL1-1 | 2014-04-28 | None | https://auth.globus.org/b506963e-d274-11e5-99f... | https://auth.globus.org/bb256144-d274-11e5-adb... | 2018-03-16T19:03:38.228586+00:00 | 2021-10-07T17:51:24.42557+00:00 | None | None | None |
72 rows × 25 columns
Example: combine conjunction and disjunctions in filters¶
Similar to the prior example, the filters allow combining of conjunctive and disjunctive operators. Like the bitwise-and operator, we also overload the bitwise-or (|
) operator because the logical-or (or
) operatar cannot be overloaded.
[13]:
path = dataset.link(biosample).filter(
((biosample.species == 'NCBITAXON:10090') & (biosample.anatomy == 'UBERON:0002490')) |
((biosample.specimen == 'FACEBASE:1-4GNR') & (biosample.stage == 'FACEBASE:1-4GJA')))
print(path.uri)
https://www.facebase.org/ermrest/catalog/1/entity/dataset:=isa:dataset/biosample:=isa:biosample/((species=NCBITAXON%3A10090)&(anatomy=UBERON%3A0002490));((specimen=FACEBASE%3A1-4GNR)&(stage=FACEBASE%3A1-4GJA))
[14]:
DataFrame(path.entities())
[14]:
RID | dataset | local_identifier | species | specimen | gene | genotype | strain | mutation | stage | ... | litter | collection_date | _keywords | RCB | RMB | RCT | RMT | cell_source | cell_characterization | treatment | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1-4TT8 | 1-4TSP | scWFE18_S197 | NCBITAXON:10090 | FACEBASE:1-4GNR | None | FACEBASE:1-4GV2 | MGI:3028467 | None | FACEBASE:1-4GJA | ... | None | None | None | https://auth.globus.org/8ae274db-d033-47eb-bd3... | https://auth.globus.org/bb256144-d274-11e5-adb... | 2018-12-11T03:42:13.870864+00:00 | 2021-10-07T17:51:24.42557+00:00 | None | None | None |
1 | 1-8DVG | 1-4TSP | scWFE18_S616 | NCBITAXON:10090 | FACEBASE:1-4GNR | None | FACEBASE:1-4GV2 | MGI:3028467 | None | FACEBASE:1-4GJA | ... | None | 2019-06-19 | None | https://auth.globus.org/8ae274db-d033-47eb-bd3... | https://auth.globus.org/bb256144-d274-11e5-adb... | 2020-02-13T16:45:26.843631+00:00 | 2021-10-07T17:51:24.42557+00:00 | None | None | None |
2 | 1-8DVM | 1-4TSP | scWFE18_S631 | NCBITAXON:10090 | FACEBASE:1-4GNR | None | FACEBASE:1-4GV2 | MGI:3028467 | None | FACEBASE:1-4GJA | ... | None | 2019-06-18 | None | https://auth.globus.org/8ae274db-d033-47eb-bd3... | https://auth.globus.org/bb256144-d274-11e5-adb... | 2020-02-13T17:03:19.454718+00:00 | 2021-10-07T17:51:24.42557+00:00 | None | None | None |
3 | 1-C3PR | 1-4TSP | scTFE18_S979 | NCBITAXON:10090 | FACEBASE:1-4GNR | None | FACEBASE:1-4GV0 | MGI:3028467 | None | FACEBASE:1-4GJA | ... | None | 2019-12-11 | None | https://auth.globus.org/8ae274db-d033-47eb-bd3... | https://auth.globus.org/bb256144-d274-11e5-adb... | 2020-03-17T16:51:19.48232+00:00 | 2021-10-07T17:51:24.42557+00:00 | None | None | None |
4 | 1-C3PT | 1-4TSP | scAFE18_S1075 | NCBITAXON:10090 | FACEBASE:1-4GNR | None | FACEBASE:1-4GT6 | MGI:3028467 | None | FACEBASE:1-4GJA | ... | None | 2019-12-24 | None | https://auth.globus.org/8ae274db-d033-47eb-bd3... | https://auth.globus.org/bb256144-d274-11e5-adb... | 2020-03-17T16:52:41.937568+00:00 | 2021-10-07T17:51:24.42557+00:00 | None | None | None |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
71 | 3Q1P | VHE | W8IF3SM | NCBITAXON:10090 | FACEBASE:1-4GNG | None | FACEBASE:1-4GV2 | MGI:3028467 | None | FACEBASE:1-4GJA | ... | 4/14/14WL1-6 | 2014-04-14 | None | https://auth.globus.org/b506963e-d274-11e5-99f... | https://auth.globus.org/bb256144-d274-11e5-adb... | 2018-03-16T19:03:38.228586+00:00 | 2021-10-07T17:51:24.42557+00:00 | None | None | None |
72 | 3Q1T | VHE | W8IF4SM | NCBITAXON:10090 | FACEBASE:1-4GNG | None | FACEBASE:1-4GV2 | MGI:3028467 | None | FACEBASE:1-4GJA | ... | 4/15/14WL1-7 | 2014-04-15 | None | https://auth.globus.org/b506963e-d274-11e5-99f... | https://auth.globus.org/bb256144-d274-11e5-adb... | 2018-03-16T19:03:38.228586+00:00 | 2021-10-07T17:51:24.42557+00:00 | None | None | None |
73 | 3Q1Y | VHE | W8IF5SM | NCBITAXON:10090 | FACEBASE:1-4GNG | None | FACEBASE:1-4GV2 | MGI:3028467 | None | FACEBASE:1-4GJA | ... | 4/28/14WL1-1 | 2014-04-28 | None | https://auth.globus.org/b506963e-d274-11e5-99f... | https://auth.globus.org/bb256144-d274-11e5-adb... | 2018-03-16T19:03:38.228586+00:00 | 2021-10-07T17:51:24.42557+00:00 | None | None | None |
74 | 3V06 | 3TYP | scWCE18_S198 | NCBITAXON:10090 | FACEBASE:1-4GNR | None | FACEBASE:1-4GV2 | MGI:3028467 | None | FACEBASE:1-4GJA | ... | 6/6/17WL1,L2,L3 | 2017-06-06 | None | https://auth.globus.org/f8ae714f-6015-48da-971... | https://auth.globus.org/bb256144-d274-11e5-adb... | 2018-05-02T18:27:13.657825+00:00 | 2021-10-07T17:51:24.42557+00:00 | None | None | None |
75 | 3V1P | 3V0E | scWLE18_S199 | NCBITAXON:10090 | FACEBASE:1-4GNR | None | FACEBASE:1-4GV2 | MGI:3028467 | None | FACEBASE:1-4GJA | ... | 6/6/17WL1,L2,L3 | 2017-06-06 | None | https://auth.globus.org/8ae274db-d033-47eb-bd3... | https://auth.globus.org/bb256144-d274-11e5-adb... | 2018-05-02T18:50:12.225872+00:00 | 2021-10-07T17:51:24.42557+00:00 | None | None | None |
76 rows × 25 columns
Example: filtering at different stages of the path¶
Filtering a path does not have to be done at the end of a path. In fact, the initial intention of the ERMrest URI was to mimick “RESTful” semantics where a RESTful “resource” is identified, then filtered, then a “sub-resource” is identified, and then filtered, and so on.
[15]:
path = dataset.filter(dataset.release_date >= '2017-01-01') \
.link(experiment).filter(experiment.experiment_type == 'OBI:0001271') \
.link(replicate).filter(replicate.bioreplicate_number == 1)
print(path.uri)
https://www.facebase.org/ermrest/catalog/1/entity/dataset:=isa:dataset/release_date::geq::2017-01-01/experiment:=isa:experiment/experiment_type=OBI%3A0001271/replicate:=isa:replicate/bioreplicate_number=1
[16]:
DataFrame(path.entities())
[16]:
RID | dataset | biosample | bioreplicate_number | technical_replicate_number | RCB | RMB | RCT | RMT | experiment | |
---|---|---|---|---|---|---|---|---|---|---|
0 | 1-3T5A | 1-3SWE | 1-3T0A | 1 | 1 | https://auth.globus.org/f8ae714f-6015-48da-971... | https://auth.globus.org/f8ae714f-6015-48da-971... | 2018-06-13T23:41:16.162698+00:00 | 2018-06-13T23:41:16.162698+00:00 | 1-3SZA |
1 | 1-3T5E | 1-3SWE | 1-3T0E | 1 | 1 | https://auth.globus.org/f8ae714f-6015-48da-971... | https://auth.globus.org/f8ae714f-6015-48da-971... | 2018-06-13T23:41:16.162698+00:00 | 2018-06-13T23:41:16.162698+00:00 | 1-3SZA |
2 | 1-3T5J | 1-3SWE | 1-3T0J | 1 | 1 | https://auth.globus.org/f8ae714f-6015-48da-971... | https://auth.globus.org/f8ae714f-6015-48da-971... | 2018-06-13T23:41:16.162698+00:00 | 2018-06-13T23:41:16.162698+00:00 | 1-3SZA |
3 | 1-3T5P | 1-3SWE | 1-3T0P | 1 | 1 | https://auth.globus.org/f8ae714f-6015-48da-971... | https://auth.globus.org/f8ae714f-6015-48da-971... | 2018-06-13T23:41:16.162698+00:00 | 2018-06-13T23:41:16.162698+00:00 | 1-3SZA |
4 | 1-3T5T | 1-3SWE | 1-3T0T | 1 | 1 | https://auth.globus.org/f8ae714f-6015-48da-971... | https://auth.globus.org/f8ae714f-6015-48da-971... | 2018-06-13T23:41:16.162698+00:00 | 2018-06-13T23:41:16.162698+00:00 | 1-3SZA |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
359 | 3TSJ | 2ARP | 30GP | 1 | 1 | https://auth.globus.org/b506963e-d274-11e5-99f... | None | 2018-04-11T19:28:24.259798+00:00 | 2018-06-06T01:44:38.502116+00:00 | 3422 |
360 | A-XH76 | 1-X42W | A-XH6P | 1 | 1 | https://auth.globus.org/93416595-2bc4-42a8-b5b... | https://auth.globus.org/93416595-2bc4-42a8-b5b... | 2021-11-24T00:35:15.863829+00:00 | 2021-11-24T00:35:15.863829+00:00 | A-XH72 |
361 | A-XH7R | 1-X42W | A-XH6W | 1 | 1 | https://auth.globus.org/93416595-2bc4-42a8-b5b... | https://auth.globus.org/93416595-2bc4-42a8-b5b... | 2021-11-24T00:38:56.917633+00:00 | 2021-11-24T00:38:56.917633+00:00 | A-XH7M |
362 | Z-2WMG | Z-2NMR | Z-2WMC | 1 | 1 | https://auth.globus.org/93416595-2bc4-42a8-b5b... | https://auth.globus.org/b506963e-d274-11e5-99f... | 2022-03-25T23:21:09.358366+00:00 | 2022-04-04T19:24:52.771093+00:00 | Z-2WM6 |
363 | Z-2WMM | Z-2NMR | Z-2WME | 1 | 1 | https://auth.globus.org/93416595-2bc4-42a8-b5b... | https://auth.globus.org/b506963e-d274-11e5-99f... | 2022-03-25T23:22:19.554979+00:00 | 2022-04-04T19:26:09.037904+00:00 | Z-2WMJ |
364 rows × 10 columns
Linking Examples¶
Example: explicit column links¶
Up until now, the examples have shown how to link entities via implicit join predicates. That is, we knew there existed a foriegn key reference constraint between foreign keys of one entity and keys of another entity. We needed only to ask ERMrest to link the entities in order to get the linked set.
The problem with implicit links is that it become ambiguous if there are more than one foreign key reference between tables. To support these situations, ERMrest and the DataPath
’s link(...)
method can specify the columns to use for the link condition, explicitly.
The structure of the on
clause is: - an equality comparison operation where - the left operand is a column of the left table instance which is also the path context before the link method is called, and - the right operand is a column of the right table instance which is the table to be linked to the path.
[17]:
path = dataset.link(experiment, on=(dataset.RID==experiment.dataset))
print(path.uri)
https://www.facebase.org/ermrest/catalog/1/entity/dataset:=isa:dataset/experiment:=(RID)=(isa:experiment:dataset)
IMPORTANT Not all tables are related by foreign key references. ERMrest does not allow arbitrary relational joins. Tables must be related by a foreign key reference in order to link them in a data path.
[18]:
DataFrame(path.entities().fetch(limit=3))
[18]:
RID | dataset | local_identifier | experiment_type | molecule_type | strandedness | rnaseq_selection | target_of_assay | chromatin_modifier | transcription_factor | histone_modification | control_assay | protocol | RCB | RMB | RCT | RMT | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1-3SD2 | 1-3SB2 | 3XhET4_pHsp68-lacZ-tdTomato | OBI:0002083 | None | None | None | None | None | None | None | None | None | https://auth.globus.org/a1d30d14-b3b0-49de-854... | https://auth.globus.org/b506963e-d274-11e5-99f... | 2018-06-08T19:40:48.98831+00:00 | 2018-11-28T00:54:17.199655+00:00 |
1 | 1-3SHT | 1-3SGA | 3XcET4_pHsp68-lacZ-tdTomato | OBI:0002083 | None | None | None | None | None | None | None | None | None | https://auth.globus.org/a1d30d14-b3b0-49de-854... | https://auth.globus.org/b506963e-d274-11e5-99f... | 2018-06-08T21:29:49.540747+00:00 | 2018-11-28T00:54:17.199655+00:00 |
2 | 1-3SNY | 1-3SM2 | 3XhET7_pHsp68-lacZ-tdTomato | OBI:0002083 | None | None | None | None | None | None | None | None | None | https://auth.globus.org/a1d30d14-b3b0-49de-854... | https://auth.globus.org/b506963e-d274-11e5-99f... | 2018-06-08T21:47:53.103435+00:00 | 2018-11-28T00:54:17.199655+00:00 |
Example: explicit column links combined with table aliasing¶
As usual, table instances are generated automatically unless we provide a table alias.
[19]:
path = dataset.link(biosample.alias('S'), on=(dataset.RID==biosample.dataset))
print(path.uri)
https://www.facebase.org/ermrest/catalog/1/entity/dataset:=isa:dataset/S:=(RID)=(isa:biosample:dataset)
Notice that we cannot use the alias right away in the on
clause because it was not bound to the path until after the link(...)
operation was performed.
Example: links with “outer join” semantics¶
Up until now, the examples have shown “link
s” with inner join semantics. Outer join semantics can be expressed as part of explicit column links, and only when using explicit column links.
The link(...)
method accepts a “join_type
” parameter, i.e., “.link(... join_type=TYPE)
”, where TYPE may be 'left'
, 'right'
, 'full'
, and defaults to ''
which indicates inner join type.
By ‘left
’ outer joining in the link from 'dataset'
to 'experiment
’ and to 'biosample'
, and then reseting the context of the path to 'dataset'
, the following path gives us a reference to 'dataset'
entities that whether or not they have any experiments or biosamples.
[20]:
# Notice in between `link`s that we have to reset the context back to `dataset` so that the
# second join is also left joined from the dataset table instance.
path = dataset.link(experiment.alias('E'), on=dataset.RID==experiment.dataset, join_type='left') \
.dataset \
.link(biosample.alias('S'), on=dataset.RID==biosample.dataset, join_type='left') \
# Notice that we have to perform the attribute fetch from the context of the `path.dataset`
# table instance.
results = path.dataset.attributes(path.dataset.RID,
path.dataset.title,
path.E.experiment_type,
path.S.species)
print(results.uri)
len(results)
https://www.facebase.org/ermrest/catalog/1/attribute/dataset:=isa:dataset/E:=left(RID)=(isa:experiment:dataset)/$dataset/S:=left(RID)=(isa:biosample:dataset)/$dataset/dataset:RID,dataset:title,E:experiment_type,S:species
[20]:
1049
We can see above that we have a full set of datasets whether or not they have any experiments with biosamples. For further evidence, we can convert to a DataFrame and look at a slice of its entries. Note that the biosample’s ‘species’ and ‘stage’ attributes do not exist for some results (i.e., NaN
) because those attributes did not exist for the join condition.
[21]:
DataFrame(results)[:10]
[21]:
RID | title | experiment_type | species | |
---|---|---|---|---|
0 | TZP | Gene expression microarray - mouse E10.5 maxil... | None | None |
1 | VTG | Gene expression microarray - mouse E10.5 mandi... | None | None |
2 | V4G | miRNAs expressed in the E13.5 mouse mandible | None | None |
3 | TMP | miRNAs expressed in the 72hpf zebrafish head | None | None |
4 | TVA | Gene expression microarray - mouse E10.5 media... | None | None |
5 | V1C | Gene expression microarray - mouse E11.5 singl... | None | None |
6 | V8A | Gene expression microarray - mouse E10.5 later... | None | None |
7 | VM6 | Gene expression microarray - mouse E10.5 media... | None | None |
8 | VHR | Gene expression profiling of palatal and mandi... | None | None |
9 | V9A | Gene expression microarray - mouse E10.5 olfac... | None | None |
Faceting Examples¶
You may have noticed that in the examples above, the ‘species’ and ‘experiment_type’ attributes are identifiers (‘CURIE’s to be precise). We may want to construct filters on our datasets based on these categories. This can be used for “faceted search” modes and can be useful even within the context of programmatic access to data in the catalog.
Grouping Examples¶
Now support you would like to aggregate all of the vocabulary terms associated with a Dataset. Here, we examine what happens when you have a model such that dataset <- dataset_VOCAB -> VOCAB
where VOCAB
is a placeholder for a table that includes a vocabulary term set. These tables typically have a name
column for the human-readable preferred label to go along with the formal URI or CURIE of the concept class.
[27]:
# We need to import the `ArrayD` aggregate function for this example.
from deriva.core.datapath import ArrayD
# For convenience, get python objects for the additional tables.
dataset_species = pb.isa.dataset_species
dataset_experiment_type = pb.isa.dataset_experiment_type
species = pb.vocab.species
experiment_type = pb.vocab.experiment_type
# Start by doing a couple left outer joins on the dataset-term association tables, then link
# (i.e., inner join) the associated vocabulary term table, then reset the context back to the
# dataset table.
path = dataset.link(dataset_species, on=dataset.id==dataset_species.dataset_id, join_type='left') \
.link(species) \
.dataset \
.link(dataset_experiment_type, on=dataset.id==dataset_experiment_type.dataset_id, join_type='left') \
.link(experiment_type)
# Again, notice that we reset the context to the `dataset` table alias so that we will retrieve
# dataset entities based on the groupings to be defined next. For the groupby key we will use the
# dataset.RID, but for this example any primary key would work. Then we will get aggregate arrays
# of the linked vocabulary tables.
results = path.dataset.groupby(dataset.RID).attributes(
dataset.title,
ArrayD(path.species.column_definitions['name']).alias('species'),
ArrayD(path.experiment_type.column_definitions['name']).alias('experiment_type')
)
#results = path.dataset.entities()
print(results.uri)
print(len(results))
https://www.facebase.org/ermrest/catalog/1/attributegroup/dataset:=isa:dataset/dataset_species:=left(id)=(isa:dataset_species:dataset_id)/species:=vocab:species/$dataset/dataset_experiment_type:=left(id)=(isa:dataset_experiment_type:dataset_id)/experiment_type:=vocab:experiment_type/$dataset/RID;title,species:=array_d(species:name),experiment_type:=array_d(experiment_type:name)
1048
[28]:
DataFrame(results.fetch(limit=20))
[28]:
RID | title | species | experiment_type | |
---|---|---|---|---|
0 | 1-3SB2 | Activity of human neural crest enhancer near G... | [Mus musculus] | [enhancer activity detection by reporter gene ... |
1 | 1-3SGA | Activity of chimp neural crest enhancer near G... | [Mus musculus] | [enhancer activity detection by reporter gene ... |
2 | 1-3SM2 | Activity of human neural crest enhancer near F... | [Mus musculus] | [enhancer activity detection by reporter gene ... |
3 | 1-3SQJ | Activity of chimp neural crest enhancer near F... | [Mus musculus] | [enhancer activity detection by reporter gene ... |
4 | 1-3SVP | FB0023_18mo male with hypertelorism_Candidate ... | [Homo sapiens] | [exome sequencing assay, genotyping assay, ima... |
5 | 1-3SVT | FB0043_3 male cousins with natal teeth and ano... | [Homo sapiens] | [exome sequencing assay, genotyping assay, ima... |
6 | 1-3SVY | FB0051_6 year old male with hypertelorism (mar... | [Homo sapiens] | [exome sequencing assay, genotyping assay, ima... |
7 | 1-3SW2 | FB0064_Male with Congenital craniosynostosis_C... | [Homo sapiens] | [exome sequencing assay, genotyping assay, ima... |
8 | 1-3SW6 | FB0115_11yo male with dysmorphic facial featur... | [Homo sapiens] | [exome sequencing assay, genotyping assay, ima... |
9 | 1-3SWA | FB0122_29yo female with bilateral hearing loss... | [Homo sapiens] | [exome sequencing assay, genotyping assay, ima... |
10 | 1-3SWE | Interpalatine suture, WT and Fgfr2+/S252W (Ape... | [Mus musculus] | [RNA-seq assay] |
11 | 1-3SXY | Maxillary-Palatine Suture, WT and Fgfr2+/S252W... | [Mus musculus] | [RNA-seq assay] |
12 | 1-3V7A | FB0011_9 year old female with single central m... | [Homo sapiens] | [exome sequencing assay, genotyping assay, ima... |
13 | 1-3V7J | FB0015_Male with bilateral cleft lip and palate | [Homo sapiens] | [exome sequencing assay, genotyping assay, ima... |
14 | 1-3V7P | FB0027_29 month old male_Multiple congenital a... | [Homo sapiens] | [exome sequencing assay, genotyping assay, ima... |
15 | 1-3V8A | FB0039_"Identical Male Twins"_Treacher Collins... | [Homo sapiens] | [exome sequencing assay, genotyping assay, ima... |
16 | 1-3V8Y | FB0041_8 yo male_Brachi-Oto-Renal (BOR) Syndrome | [Homo sapiens] | [genotyping assay, imaging assay, Morphometric... |
17 | 1-3V9E | FB0049_17 month old male with bilateral macros... | [Homo sapiens] | [exome sequencing assay, genotyping assay, ima... |
18 | 1-3VAT | FB0062_Proband Only Saliva_Blepharo-cheilo-don... | [Homo sapiens] | [exome sequencing assay, genotyping assay, ima... |
19 | 1-3VBE | FB0124_Trio Saliva_5 mo male with Hallermann S... | [Homo sapiens] | [exome sequencing assay, genotyping assay, ima... |
[ ]: