DataPath Example 2

This notebook gives a very basic example of how to access data. It assumes that you understand the concepts presented in the example 1 notebook.

[1]:
# Import deriva modules
from deriva.core import ErmrestCatalog, get_credential
[2]:
# Connect with the deriva catalog
protocol = 'https'
hostname = 'www.facebase.org'
catalog_number = 1
credential = get_credential(hostname)
catalog = ErmrestCatalog(protocol, hostname, catalog_number, credential)
[3]:
# Get the path builder interface for this catalog
pb = catalog.getPathBuilder()

DataPaths

The PathBuilder object allows you to begin DataPaths from the base Tables. A DataPath begins with a Table (or an TableAlias to be discussed later) as its “root” from which one can “link”, “filter”, and fetch its “entities”.

Start a path rooted at a table from the catalog

We will reference a table from the PathBuilder pb variable from above. Using the PathBuilder, we will reference the “isa” schema, then the “dataset” table, and from that table start a path.

[4]:
path = pb.schemas['isa'].tables['dataset'].path

We could have used the more compact dot-notation to start the same path.

[5]:
path = pb.isa.dataset.path

Getting the URI of the current path

All DataPaths have URIs for the referenced resources in ERMrest. The URI identifies the resources which are available through “RESTful” Web protocols supported by ERMrest.

[6]:
print(path.uri)
https://www.facebase.org/ermrest/catalog/1/entity/dataset:=isa:dataset

ResultSets

The data from a DataPath are accessed through a pythonic container object, the ResultSet. The ResultSet is returned by the DataPath’s entities() and other methods.

[7]:
results = path.entities()

Fetch entities from the catalog

Now we can get entities from the server using the ResultSet’s fetch() method.

[8]:
results.fetch()
[8]:
<deriva.core.datapath._ResultSet at 0x1039f3c70>

ResultSets behave like python containers. For example, we can check the count of rows in this ResultSet.

[9]:
len(results)
[9]:
1131

Note: If we had not explicitly called the fetch() method, then it would have been called implicitly on the first container operation such as len(...), list(...), iter(...) or get item [...].

Get an entity

To get one entity from the set, use the usual container operator to get an item.

[10]:
results[9]
[10]:
{'id': 14130,
 'accession': 'FB00000807.2',
 'title': 'ChIP-seq of multiple histone marks and RNA-seq from CS22 human embryonic face tissue',
 'project': 305,
 'funding': 'PI: Axel Visel. This work was supported by NIDCR grant U01-DE024427',
 'summary': 'ChIP-seq, RNA-seq and transgenic assays to identify non-coding regulatory elements (enhancers) active during craniofacial development',
 'description': 'ChIP-seq, RNA-seq and transgenic assays to identify non-coding regulatory elements (enhancers) active during craniofacial development\n\nThis is restricted-access human data. To gain access to this data, you must first go through the [process outlined here](/odocs/data-guidelines/).\n\nThe listing and corresponding checksums of all data files included in this dataset can be found in the [manifest file](https://www.facebase.org/id/3C-QJHW) in the Supplementary Files Section below.',
 'mouse_genetic': None,
 'human_anatomic': None,
 'study_design': None,
 'release_date': '2017-04-12',
 'show_in_jbrowse': True,
 '_keywords': 'Homo RNA-seq sapiens face female 22 male Visel organism ChIP-seq Axel TTR Carnegie stage assay',
 'RID': 'TTR',
 'RCB': None,
 'RMB': 'https://www.facebase.org/webauthn_robot/fb_cron',
 'RCT': '2017-09-23T00:33:18.797126+00:00',
 'RMT': '2023-09-15T03:41:39.424591+00:00',
 'released': True,
 'Requires_DOI?': True,
 'DOI': '10.25550/TTR',
 'protected_human_subjects': True,
 'cellbrowser_uri': None}

Get a specific attribute value from an entity

To get one attribute value from an entity get the item using its Column’s name property.

[11]:
dataset = pb.schemas['isa'].tables['dataset']
print(results[9][str(dataset.accession)])
FB00000807.2

Fetch a Limited Number of Results

To set a limit on the number of results to be fetched from the catalog, use the explicit fetch(limit=...) method with the desired upper limit to fetch from the catalog.

[12]:
results.fetch(limit=3)
len(results)
[12]:
3

Iterate over the ResultSet

ResultSets are iterable like a typical container.

[13]:
for entity in results:
    print(entity[str(dataset.accession)])
FB00000933
FB00000382.01
FB00001315

Convert to Pandas DataFrame

ResultSets can be transformed into the popular Pandas DataFrame.

[14]:
from pandas import DataFrame
DataFrame(results)
[14]:
id accession title project funding summary description mouse_genetic human_anatomic study_design ... RID RCB RMB RCT RMT released Requires_DOI? DOI protected_human_subjects cellbrowser_uri
0 14177 FB00000933 ChIP-seq of multiple histone marks and RNA-seq... 153 None ChIP-seq of multiple histone marks and RNA-seq... ChIP-seq, RNA-seq and ATAC-seq to identify non... None None None ... 2A7J https://auth.globus.org/f226978f-e0be-4f47-a57... https://www.facebase.org/webauthn_robot/fb_cron 2018-02-15T19:07:28.598486+00:00 2023-09-15T03:41:39.424591+00:00 False True 10.25550/2A7J True None
1 6430 FB00000382.01 microMRI images of skulls of Wnt1Cre * Tgfbr2F... 151 PIs: Scott Fraser and Seth Ruffins. This work ... microMRI images of skulls of Wnt1Cre * Tgfbr2... microMRI images of skulls of Wnt1Cre * Tgfbr2... None None None ... VK2 None https://www.facebase.org/webauthn_robot/fb_cron 2017-09-23T00:33:18.797126+00:00 2023-09-15T03:41:39.424591+00:00 True True 10.25550/VK2 False None
2 14570 FB00001315 microCT scan of 12.0 mm hyperthyroid Danio rer... 354 None None 3D microCT scan of a hyperthyroid D. rerio hea... None None 8.0 um scanning resolution, 0.1 degree rotatio... ... 2E-YW0R https://auth.globus.org/f6d7e728-21bf-4034-80d... https://www.facebase.org/webauthn_robot/fb_cron 2023-01-06T18:02:29.345965+00:00 2023-09-15T03:41:39.424591+00:00 False True 10.25550/2E-YW0R False None

3 rows × 23 columns

Selecting Attributes

It is also possible to fetch only a subset of attributes from the catalog. The attributes(...) method accepts a variable argument list followed by keyword arguments. Each argument must be a Column object from the table’s columns container.

Renaming selected attributes

To rename the selected attributes, use the alias(...) method on the column object. For example, attributes(table.column.alias('new_name')) will rename table.column with new_name in the entities returned from the server. (It will not change anything in the stored catalog data.)

[15]:
results = path.attributes(dataset.accession, dataset.title, dataset.released.alias('is_released')).fetch(limit=5)

Convert to list

Now we can look at the results from the above fetch. To demonstrate a different access mode, we can convert the entities to a standard python list and dump to the console.

[16]:
list(results)
[16]:
[{'accession': 'FB00000933',
  'title': 'ChIP-seq of multiple histone marks and RNA-seq from CS18 human embryonic face tissue',
  'is_released': False},
 {'accession': 'FB00000382.01',
  'title': 'microMRI images of skulls of Wnt1Cre * Tgfbr2F mice at E16.5',
  'is_released': True},
 {'accession': 'FB00001315',
  'title': 'microCT scan of 12.0 mm hyperthyroid Danio rerio head',
  'is_released': False},
 {'accession': 'FB00000784',
  'title': 'Sample to subject mapping file  for the 3D Facial Images-Tanzania dataset',
  'is_released': True},
 {'accession': 'FB00000009',
  'title': 'Gene expression microarray - mouse E10.5 mandibular arch ',
  'is_released': True}]
[ ]: