DataPath Example 2¶
This notebook gives a very basic example of how to access data. It assumes that you understand the concepts presented in the example 1 notebook.
[1]:
# Import deriva modules
from deriva.core import ErmrestCatalog, get_credential
[2]:
# Connect with the deriva catalog
protocol = 'https'
hostname = 'www.facebase.org'
catalog_number = 1
credential = get_credential(hostname)
catalog = ErmrestCatalog(protocol, hostname, catalog_number, credential)
[3]:
# Get the path builder interface for this catalog
pb = catalog.getPathBuilder()
DataPaths¶
The PathBuilder
object allows you to begin DataPath
s from the base Table
s. A DataPath
begins with a Table
(or an TableAlias
to be discussed later) as its “root” from which one can “link
”, “filter
”, and fetch its “entities
”.
Start a path rooted at a table from the catalog¶
We will reference a table from the PathBuilder pb
variable from above. Using the PathBuilder, we will reference the “isa” schema, then the “dataset” table, and from that table start a path.
[4]:
path = pb.schemas['isa'].tables['dataset'].path
We could have used the more compact dot-notation to start the same path.
[5]:
path = pb.isa.dataset.path
Getting the URI of the current path¶
All DataPaths have URIs for the referenced resources in ERMrest. The URI identifies the resources which are available through “RESTful” Web protocols supported by ERMrest.
[6]:
print(path.uri)
https://www.facebase.org/ermrest/catalog/1/entity/dataset:=isa:dataset
ResultSets¶
The data from a DataPath are accessed through a pythonic container object, the ResultSet
. The ResultSet
is returned by the DataPath’s entities()
and other methods.
[7]:
results = path.entities()
Fetch entities from the catalog¶
Now we can get entities from the server using the ResultSet’s fetch()
method.
[8]:
results.fetch()
[8]:
<deriva.core.datapath._ResultSet at 0x1039f3c70>
ResultSet
s behave like python containers. For example, we can check the count of rows in this ResultSet.
[9]:
len(results)
[9]:
1131
Note: If we had not explicitly called the fetch()
method, then it would have been called implicitly on the first container operation such as len(...)
, list(...)
, iter(...)
or get item [...]
.
Get an entity¶
To get one entity from the set, use the usual container operator to get an item.
[10]:
results[9]
[10]:
{'id': 14130,
'accession': 'FB00000807.2',
'title': 'ChIP-seq of multiple histone marks and RNA-seq from CS22 human embryonic face tissue',
'project': 305,
'funding': 'PI: Axel Visel. This work was supported by NIDCR grant U01-DE024427',
'summary': 'ChIP-seq, RNA-seq and transgenic assays to identify non-coding regulatory elements (enhancers) active during craniofacial development',
'description': 'ChIP-seq, RNA-seq and transgenic assays to identify non-coding regulatory elements (enhancers) active during craniofacial development\n\nThis is restricted-access human data. To gain access to this data, you must first go through the [process outlined here](/odocs/data-guidelines/).\n\nThe listing and corresponding checksums of all data files included in this dataset can be found in the [manifest file](https://www.facebase.org/id/3C-QJHW) in the Supplementary Files Section below.',
'mouse_genetic': None,
'human_anatomic': None,
'study_design': None,
'release_date': '2017-04-12',
'show_in_jbrowse': True,
'_keywords': 'Homo RNA-seq sapiens face female 22 male Visel organism ChIP-seq Axel TTR Carnegie stage assay',
'RID': 'TTR',
'RCB': None,
'RMB': 'https://www.facebase.org/webauthn_robot/fb_cron',
'RCT': '2017-09-23T00:33:18.797126+00:00',
'RMT': '2023-09-15T03:41:39.424591+00:00',
'released': True,
'Requires_DOI?': True,
'DOI': '10.25550/TTR',
'protected_human_subjects': True,
'cellbrowser_uri': None}
Get a specific attribute value from an entity¶
To get one attribute value from an entity get the item using its Column
’s name
property.
[11]:
dataset = pb.schemas['isa'].tables['dataset']
print(results[9][str(dataset.accession)])
FB00000807.2
Fetch a Limited Number of Results¶
To set a limit on the number of results to be fetched from the catalog, use the explicit fetch(limit=...)
method with the desired upper limit to fetch from the catalog.
[12]:
results.fetch(limit=3)
len(results)
[12]:
3
Iterate over the ResultSet¶
ResultSet
s are iterable like a typical container.
[13]:
for entity in results:
print(entity[str(dataset.accession)])
FB00000933
FB00000382.01
FB00001315
Convert to Pandas DataFrame¶
ResultSets can be transformed into the popular Pandas DataFrame.
[14]:
from pandas import DataFrame
DataFrame(results)
[14]:
id | accession | title | project | funding | summary | description | mouse_genetic | human_anatomic | study_design | ... | RID | RCB | RMB | RCT | RMT | released | Requires_DOI? | DOI | protected_human_subjects | cellbrowser_uri | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 14177 | FB00000933 | ChIP-seq of multiple histone marks and RNA-seq... | 153 | None | ChIP-seq of multiple histone marks and RNA-seq... | ChIP-seq, RNA-seq and ATAC-seq to identify non... | None | None | None | ... | 2A7J | https://auth.globus.org/f226978f-e0be-4f47-a57... | https://www.facebase.org/webauthn_robot/fb_cron | 2018-02-15T19:07:28.598486+00:00 | 2023-09-15T03:41:39.424591+00:00 | False | True | 10.25550/2A7J | True | None |
1 | 6430 | FB00000382.01 | microMRI images of skulls of Wnt1Cre * Tgfbr2F... | 151 | PIs: Scott Fraser and Seth Ruffins. This work ... | microMRI images of skulls of Wnt1Cre * Tgfbr2... | microMRI images of skulls of Wnt1Cre * Tgfbr2... | None | None | None | ... | VK2 | None | https://www.facebase.org/webauthn_robot/fb_cron | 2017-09-23T00:33:18.797126+00:00 | 2023-09-15T03:41:39.424591+00:00 | True | True | 10.25550/VK2 | False | None |
2 | 14570 | FB00001315 | microCT scan of 12.0 mm hyperthyroid Danio rer... | 354 | None | None | 3D microCT scan of a hyperthyroid D. rerio hea... | None | None | 8.0 um scanning resolution, 0.1 degree rotatio... | ... | 2E-YW0R | https://auth.globus.org/f6d7e728-21bf-4034-80d... | https://www.facebase.org/webauthn_robot/fb_cron | 2023-01-06T18:02:29.345965+00:00 | 2023-09-15T03:41:39.424591+00:00 | False | True | 10.25550/2E-YW0R | False | None |
3 rows × 23 columns
Selecting Attributes¶
It is also possible to fetch only a subset of attributes from the catalog. The attributes(...)
method accepts a variable argument list followed by keyword arguments. Each argument must be a Column
object from the table’s columns
container.
Renaming selected attributes¶
To rename the selected attributes, use the alias(...)
method on the column object. For example, attributes(table.column.alias('new_name'))
will rename table.column
with new_name
in the entities returned from the server. (It will not change anything in the stored catalog data.)
[15]:
results = path.attributes(dataset.accession, dataset.title, dataset.released.alias('is_released')).fetch(limit=5)
Convert to list¶
Now we can look at the results from the above fetch. To demonstrate a different access mode, we can convert the entities to a standard python list and dump to the console.
[16]:
list(results)
[16]:
[{'accession': 'FB00000933',
'title': 'ChIP-seq of multiple histone marks and RNA-seq from CS18 human embryonic face tissue',
'is_released': False},
{'accession': 'FB00000382.01',
'title': 'microMRI images of skulls of Wnt1Cre * Tgfbr2F mice at E16.5',
'is_released': True},
{'accession': 'FB00001315',
'title': 'microCT scan of 12.0 mm hyperthyroid Danio rerio head',
'is_released': False},
{'accession': 'FB00000784',
'title': 'Sample to subject mapping file for the 3D Facial Images-Tanzania dataset',
'is_released': True},
{'accession': 'FB00000009',
'title': 'Gene expression microarray - mouse E10.5 mandibular arch ',
'is_released': True}]
[ ]: