{ "cells": [ { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "# DataPath Example 2\n", "This notebook gives a very basic example of how to access data. \n", "It assumes that you understand the concepts presented in the \n", "example 1 notebook." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "# Import deriva modules\n", "from deriva.core import ErmrestCatalog, get_credential" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "# Connect with the deriva catalog\n", "protocol = 'https'\n", "hostname = 'www.facebase.org'\n", "catalog_number = 1\n", "credential = get_credential(hostname)\n", "catalog = ErmrestCatalog(protocol, hostname, catalog_number, credential)" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "# Get the path builder interface for this catalog\n", "pb = catalog.getPathBuilder()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## DataPaths\n", "The `PathBuilder` object allows you to begin `DataPath`s from the base `Table`s. A `DataPath` begins with a `Table` (or an `TableAlias` to be discussed later) as its \"root\" from which one can \"`link`\", \"`filter`\", and fetch its \"`entities`\"." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Start a path rooted at a table from the catalog\n", "We will reference a table from the PathBuilder `pb` variable from above. Using the PathBuilder, we will reference the \"isa\" schema, then the \"dataset\" table, and from that table start a path." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "path = pb.schemas['isa'].tables['dataset'].path" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We could have used the more compact dot-notation to start the same path." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "path = pb.isa.dataset.path" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Getting the URI of the current path\n", "All DataPaths have URIs for the referenced resources in ERMrest. The URI identifies the resources which are available through \"RESTful\" Web protocols supported by ERMrest." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "https://www.facebase.org/ermrest/catalog/1/entity/dataset:=isa:dataset\n" ] } ], "source": [ "print(path.uri)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## ResultSets\n", "The data from a DataPath are accessed through a pythonic container object, the `ResultSet`. The `ResultSet` is returned by the DataPath's `entities()` and other methods." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "results = path.entities()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Fetch entities from the catalog\n", "Now we can get entities from the server using the ResultSet's `fetch()` method." ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "results.fetch()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`ResultSet`s behave like python containers. For example, we can check the count of rows in this ResultSet." ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "1131" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "len(results)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**Note**: If we had not explicitly called the `fetch()` method, then it would have been called implicitly on the first container operation such as `len(...)`, `list(...)`, `iter(...)` or get item `[...]`." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Get an entity\n", "To get one entity from the set, use the usual container operator to get an item." ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'id': 14130,\n", " 'accession': 'FB00000807.2',\n", " 'title': 'ChIP-seq of multiple histone marks and RNA-seq from CS22 human embryonic face tissue',\n", " 'project': 305,\n", " 'funding': 'PI: Axel Visel. This work was supported by NIDCR grant U01-DE024427',\n", " 'summary': 'ChIP-seq, RNA-seq and transgenic assays to identify non-coding regulatory elements (enhancers) active during craniofacial development',\n", " 'description': 'ChIP-seq, RNA-seq and transgenic assays to identify non-coding regulatory elements (enhancers) active during craniofacial development\\n\\nThis is restricted-access human data. To gain access to this data, you must first go through the [process outlined here](/odocs/data-guidelines/).\\n\\nThe listing and corresponding checksums of all data files included in this dataset can be found in the [manifest file](https://www.facebase.org/id/3C-QJHW) in the Supplementary Files Section below.',\n", " 'mouse_genetic': None,\n", " 'human_anatomic': None,\n", " 'study_design': None,\n", " 'release_date': '2017-04-12',\n", " 'show_in_jbrowse': True,\n", " '_keywords': 'Homo RNA-seq sapiens face female 22 male Visel organism ChIP-seq Axel TTR Carnegie stage assay',\n", " 'RID': 'TTR',\n", " 'RCB': None,\n", " 'RMB': 'https://www.facebase.org/webauthn_robot/fb_cron',\n", " 'RCT': '2017-09-23T00:33:18.797126+00:00',\n", " 'RMT': '2023-09-15T03:41:39.424591+00:00',\n", " 'released': True,\n", " 'Requires_DOI?': True,\n", " 'DOI': '10.25550/TTR',\n", " 'protected_human_subjects': True,\n", " 'cellbrowser_uri': None}" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "results[9]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Get a specific attribute value from an entity\n", "To get one attribute value from an entity get the item using its `Column`'s `name` property." ] }, { "cell_type": "code", "execution_count": 11, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "FB00000807.2\n" ] } ], "source": [ "dataset = pb.schemas['isa'].tables['dataset']\n", "print(results[9][str(dataset.accession)])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Fetch a Limited Number of Results\n", "To set a limit on the number of results to be fetched from the catalog, use the explicit `fetch(limit=...)` method with the desired upper limit to fetch from the catalog." ] }, { "cell_type": "code", "execution_count": 12, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "3" ] }, "execution_count": 12, "metadata": {}, "output_type": "execute_result" } ], "source": [ "results.fetch(limit=3)\n", "len(results)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Iterate over the ResultSet\n", "`ResultSet`s are iterable like a typical container." ] }, { "cell_type": "code", "execution_count": 13, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "FB00000933\n", "FB00000382.01\n", "FB00001315\n" ] } ], "source": [ "for entity in results:\n", " print(entity[str(dataset.accession)])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Convert to Pandas DataFrame\n", "ResultSets can be transformed into the popular Pandas DataFrame." ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
idaccessiontitleprojectfundingsummarydescriptionmouse_genetichuman_anatomicstudy_design...RIDRCBRMBRCTRMTreleasedRequires_DOI?DOIprotected_human_subjectscellbrowser_uri
014177FB00000933ChIP-seq of multiple histone marks and RNA-seq...153NoneChIP-seq of multiple histone marks and RNA-seq...ChIP-seq, RNA-seq and ATAC-seq to identify non...NoneNoneNone...2A7Jhttps://auth.globus.org/f226978f-e0be-4f47-a57...https://www.facebase.org/webauthn_robot/fb_cron2018-02-15T19:07:28.598486+00:002023-09-15T03:41:39.424591+00:00FalseTrue10.25550/2A7JTrueNone
16430FB00000382.01microMRI images of skulls of Wnt1Cre * Tgfbr2F...151PIs: Scott Fraser and Seth Ruffins. This work ...microMRI images of skulls of Wnt1Cre * Tgfbr2...microMRI images of skulls of Wnt1Cre * Tgfbr2...NoneNoneNone...VK2Nonehttps://www.facebase.org/webauthn_robot/fb_cron2017-09-23T00:33:18.797126+00:002023-09-15T03:41:39.424591+00:00TrueTrue10.25550/VK2FalseNone
214570FB00001315microCT scan of 12.0 mm hyperthyroid Danio rer...354NoneNone3D microCT scan of a hyperthyroid D. rerio hea...NoneNone8.0 um scanning resolution, 0.1 degree rotatio......2E-YW0Rhttps://auth.globus.org/f6d7e728-21bf-4034-80d...https://www.facebase.org/webauthn_robot/fb_cron2023-01-06T18:02:29.345965+00:002023-09-15T03:41:39.424591+00:00FalseTrue10.25550/2E-YW0RFalseNone
\n", "

3 rows × 23 columns

\n", "
" ], "text/plain": [ " id accession title \n", "0 14177 FB00000933 ChIP-seq of multiple histone marks and RNA-seq... \\\n", "1 6430 FB00000382.01 microMRI images of skulls of Wnt1Cre * Tgfbr2F... \n", "2 14570 FB00001315 microCT scan of 12.0 mm hyperthyroid Danio rer... \n", "\n", " project funding \n", "0 153 None \\\n", "1 151 PIs: Scott Fraser and Seth Ruffins. This work ... \n", "2 354 None \n", "\n", " summary \n", "0 ChIP-seq of multiple histone marks and RNA-seq... \\\n", "1 microMRI images of skulls of Wnt1Cre * Tgfbr2... \n", "2 None \n", "\n", " description mouse_genetic \n", "0 ChIP-seq, RNA-seq and ATAC-seq to identify non... None \\\n", "1 microMRI images of skulls of Wnt1Cre * Tgfbr2... None \n", "2 3D microCT scan of a hyperthyroid D. rerio hea... None \n", "\n", " human_anatomic study_design ... \n", "0 None None ... \\\n", "1 None None ... \n", "2 None 8.0 um scanning resolution, 0.1 degree rotatio... ... \n", "\n", " RID RCB \n", "0 2A7J https://auth.globus.org/f226978f-e0be-4f47-a57... \\\n", "1 VK2 None \n", "2 2E-YW0R https://auth.globus.org/f6d7e728-21bf-4034-80d... \n", "\n", " RMB \n", "0 https://www.facebase.org/webauthn_robot/fb_cron \\\n", "1 https://www.facebase.org/webauthn_robot/fb_cron \n", "2 https://www.facebase.org/webauthn_robot/fb_cron \n", "\n", " RCT RMT \n", "0 2018-02-15T19:07:28.598486+00:00 2023-09-15T03:41:39.424591+00:00 \\\n", "1 2017-09-23T00:33:18.797126+00:00 2023-09-15T03:41:39.424591+00:00 \n", "2 2023-01-06T18:02:29.345965+00:00 2023-09-15T03:41:39.424591+00:00 \n", "\n", " released Requires_DOI? DOI protected_human_subjects \n", "0 False True 10.25550/2A7J True \\\n", "1 True True 10.25550/VK2 False \n", "2 False True 10.25550/2E-YW0R False \n", "\n", " cellbrowser_uri \n", "0 None \n", "1 None \n", "2 None \n", "\n", "[3 rows x 23 columns]" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from pandas import DataFrame\n", "DataFrame(results)" ] }, { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "## Selecting Attributes\n", "It is also possible to fetch only a subset of attributes from the catalog. The `attributes(...)` method accepts a variable argument list followed by keyword arguments. Each argument must be a `Column` object from the table's `columns` container.\n", "\n", "### Renaming selected attributes\n", "To rename the selected attributes, use the `alias(...)` method on the column object. For example, `attributes(table.column.alias('new_name'))` will rename `table.column` with `new_name` in the entities returned from the server. (It will not change anything in the stored catalog data.)" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [], "source": [ "results = path.attributes(dataset.accession, dataset.title, dataset.released.alias('is_released')).fetch(limit=5)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Convert to list\n", "Now we can look at the results from the above fetch. To demonstrate a different access mode, we can convert the entities to a standard python list and dump to the console." ] }, { "cell_type": "code", "execution_count": 16, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[{'accession': 'FB00000933',\n", " 'title': 'ChIP-seq of multiple histone marks and RNA-seq from CS18 human embryonic face tissue',\n", " 'is_released': False},\n", " {'accession': 'FB00000382.01',\n", " 'title': 'microMRI images of skulls of Wnt1Cre * Tgfbr2F mice at E16.5',\n", " 'is_released': True},\n", " {'accession': 'FB00001315',\n", " 'title': 'microCT scan of 12.0 mm hyperthyroid Danio rerio head',\n", " 'is_released': False},\n", " {'accession': 'FB00000784',\n", " 'title': 'Sample to subject mapping file for the 3D Facial Images-Tanzania dataset',\n", " 'is_released': True},\n", " {'accession': 'FB00000009',\n", " 'title': 'Gene expression microarray - mouse E10.5 mandibular arch ',\n", " 'is_released': True}]" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "list(results)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.6" } }, "nbformat": 4, "nbformat_minor": 2 }