{ "cells": [ { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "# DataPath Example 1\n", "This notebook gives an example of how to access the model elements (e.g.,\n", "schemas, tables, columns) that are used when building data paths." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "# Import deriva modules\n", "from deriva.core import ErmrestCatalog, get_credential" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "# Connect with the deriva catalog\n", "protocol = 'https'\n", "hostname = 'www.facebase.org'\n", "catalog_number = 1\n", "credential = get_credential(hostname)\n", "catalog = ErmrestCatalog(protocol, hostname, catalog_number, credential)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## The PathBuilder Interface\n", "The path builder interface gives you access to a representation of the catalog's data model beginning with the catalog's schemas. The path builder does *not* record any state about your paths, it is just an entry point to begin building paths." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "# Get the path builder interface for this catalog\n", "pb = catalog.getPathBuilder()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Access to Schemas\n", "The `.schemas` property acts like a python dictionary or Map object. Use its `keys()` method to\n", "get a listing of the schema names." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "dict_keys(['public', '_acl_admin', 'isa', 'viz', 'vocab', 'Imaging'])" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "pb.schemas.keys()" ] }, { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "Here we will get a handle to the `isa` schmea." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "isa = pb.schemas['isa']" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "**PROTIP**: Jupyter Notebook supports `` completion. Press the `` key\n", "after typing the brackets of a dictionary to see the keys. Typing\n", "`pb.schemas[]` will give you a dropdown of schema names. Note that the notebook\n", "interpretter will not know anything about your objects until you have executed a \n", "step with them. So instantiate an object in one step, and then you can use tab-completion\n", "in the following steps." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Alternative access method for schemas and other model objects\n", "\n", "An alternative way to get a handle to the same schema object is \n", "directly as a property of the path builder object itself. However,\n", "this only works for schema names that are _valid python identifiers_.\n", "\n", "A valid python identifier may start with '`_`' or a letter as its first\n", "character and have '`_`', letters, or numbers for the rest of its characters.\n", "- **Valid Python identifiers**: `'dataset'`, `'assay'`, `'Molecule_Type'`, etc.\n", "- **Invalid Python identifiers**: `'Sample 1 Type'`, `'Control?'`, `'# of reads'`, etc.\n", "\n", "**IMPORTANT**\n", "Similar access methods will be demonstrated for tables and columns \n", "below. Since not all catalog model names are valid python identifiers\n", "when you use this method, you may not see your catalog's complete\n", "data model. However, the notation is more compact and ideal for\n", "cases where your model uses (all or mostly) valid python identifiers\n", "in its model element names." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [], "source": [ "isa = pb.isa # same schema object we got from the previous step" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Access to Tables\n", "Similarly, a schema object has a `tables` property that gives you access\n", "to a representation of the catalog schema's tables. Again, use its `keys()`\n", "method to list the table names in the schema." ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "dict_keys(['enhancer', 'project_member', 'dataset_sex', 'dataset_data_type', 'icon', 'biosample_cell_characterization', 'file', 'project_publication', 'track_data', 'dataset_syndrome', 'imaging_data', 'clinical_assay', 'dataset_phenotype', 'dataset_enhancer', 'array_data', 'dataset_contributor', 'sample_replicate_group', 'dataset_human_age', 'sample', 'dataset_somite_count', 'dataset_mouse_genetic_background', 'dataset_chromosome', 'data_access_request', 'replicate', 'library', 'external_reference', 'dataset_geo', 'dataset_anatomy', 'protocol', 'clinical_assay_syndrome', 'processed_data', 'mesh_data', 'dataset_cell_source', 'dataset_gene', 'alignment', 'human_subjects_classification', 'biosample', 'related_dataset', 'experiment', 'dataset_cell_characterization', 'public_key', 'experiment_protocol', 'dataset_dar', 'person', 'publication', 'dar_status', 'project_investigator', 'biosample_cell_source', 'pipeline', 'dataset_experiment_type', 'sequencing_data', 'tracks', 'project', 'dataset_strain', 'dataset_mutation', 'dataset_data_use_limitation', 'dataset_stage', 'thumbnail', 'track_data_visibility', 'dataset_qc_issue', 'dataset_genotype', 'previews', 'dataset', 'dataset_species', 'dataset_instrument'])" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "isa.tables.keys()" ] }, { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "Similarly we can get a table from the schema's `tables` property in\n", "both of the demonstrated methods." ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [], "source": [ "dataset = isa.tables['dataset']\n", "# or\n", "dataset = isa.dataset" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Access to Columns\n", "A table has a `columns` dictionary. We can get the column names as usual." ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "dict_keys(['id', 'accession', 'title', 'project', 'funding', 'summary', 'description', 'mouse_genetic', 'human_anatomic', 'study_design', 'release_date', 'show_in_jbrowse', '_keywords', 'RID', 'RCB', 'RMB', 'RCT', 'RMT', 'released', 'Requires_DOI?', 'DOI', 'protected_human_subjects', 'cellbrowser_uri'])" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "dataset.column_definitions.keys()" ] }, { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "Again, we have the following methods to get handles to the table's\n", "column objects." ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [], "source": [ "accession = dataset.column_definitions['accession']\n", "# or\n", "accession = dataset.accession" ] }, { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "## Final Thought\n", "\n", "The model introspection provided in the datapath\n", "module (i.e., the PathBuilder) is intended for the narrowly scoped\n", "usage required for building paths and accessing data from ERMrest\n", "catalogs. It is _not_ intended for general introspection of catalogs\n", "and therefore _does not_ include details such as constraints, \n", "annotations, ACLs, column data types, etc." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.6" } }, "nbformat": 4, "nbformat_minor": 2 }