REST API reference: deriva/export

The deriva/export service endpoint provides export functionality from an ERMrest catalog via a REST API. The service is generally configured to be co-located on the same server with the ERMrest catalog that it is providing data services for.

Exporting Files

This API endpoint is used to perform one or more queries to an ERMrest catalog and create corresponding individual result files which can then be retrieved via GET. URLs for result files are returned in the response body as Content-Type:text/uri-list.


Export file(s)

Executes one more ERMrest queries and writes the results to individual files.

URL

/deriva/export/file

Method:

POST

URL Params

None

Data Params

The input data is composed of a JSON object with the following form:

root (object)

Variable Type Inclusion Description
catalog catalog required A catalog object. See below.

catalog (object)

Variable Type Inclusion Description
host string required The hostname (and port) of the ERMrest service to query.
catalog_id string optional The catalog identifier, e.g. 1.
username string optional If username is not specified, authentication is assumed to come from the webauthn authentication context stored in the caller's cookie.
password string optional The password field is only used when username is specifed.
queries array[query] required An array of query objects. See below.

query (object)

Parent Object Variable Type Inclusion Description Interpolatable
query processor string required This is a string value used to select from one of the built-in query output processor formats. Valid values are env, csv, json, json-stream, download, or fetch. No
query processor_type string optional A fully qualified Python class name declaring an external processor class instance to use. If this parameter is present, it OVERRIDES the default value mapped to the specified processor. This class MUST be derived from the base class deriva.transfer.download.processors.BaseDownloadProcessor. For example, "processor_type": "deriva.transfer.download.processors.CSVDownloadProcessor". No
query processor_params object required This is an extensible JSON Object that contains processor implementation-specific parameters. No
processor_params query_path string required This is string representing the actual ERMRest query path to be used in the HTTP(S) GET request. It SHOULD already be percent-encoded per RFC 3986 if it contains any characters outside of the unreserved set. Yes
processor_params output_path string required This is a POSIX-compliant path fragment indicating the target location of the retrieved data relative to the specified base download directory. Yes
processor_params output_filename string optional This is a POSIX-compliant path fragment indicating the OVERRIDE filename of the retrieved data relative to the specified base download directory and the value of output_path, if any. Yes
Success Response:

Code: 200

Content:

http://localhost:8080/deriva/export/file/9ad15e5b-9c2c-4faf-8829-05fa8252c8bc/genotypes.csv
http://localhost:8080/deriva/export/file/9ad15e5b-9c2c-4faf-8829-05fa8252c8bc/phenotypes.csv
Error Responses:
  • 404: NOT FOUND
  • 401: UNAUTHORIZED
  • 400: BAD REQUEST
  • 500: INTERNAL SERVER ERROR
Sample Call:
var exportParameters =
{
    "catalog":
    {
        "host": "http://localhost:8080",
        "catalog_id": "1",
        "username": "devuser",
        "password": "devpass",
        "query_processors": [
            {
                "processor": "csv",
                "processor_params": {
                    "query_path": "/entity/A:=dev:subject/A1:=snp_v/snp_id=rs6265",
                    "output_path": "genotypes",
                }
            },
            {
                "processor": "csv",
                "processor_params": {
                    "query_path": "/entity/A:=dev:subject/A1:=snp_v/snp_id=rs6265/$A/B:=dev:subject_phenotypes_v",
                    "output_path": "phenotypes",
                }
            }
        ]
    }
};

$.ajax({
    url: "/deriva/export/file",
    dataType: "json",
    type : "POST",
    data: exportParameters,
    success : function(r) {
      console.log(r);
    }
});

Retrieve exported file(s)

Retrieves a file previously created by a POST.

URL

/deriva/export/file/<id>/<filename>

Method:

GET

URL Params

Required:

id=[string]

Optional:

filename=[string] - This argument is required when the uri-list returned from POST contains more than one entry. If it is not specified and there is more than one file result, a 400 Bad Request is returned.

Data Params

None

Success Response:

Code: 200

Content: The file content.

Error Responses:

  • 404: NOT FOUND
  • 403: FORBIDDEN
  • 401: UNAUTHORIZED
  • 400: BAD REQUEST
  • 500: INTERNAL SERVER ERROR

Sample Call:

$.ajax({
    url: "/deriva/export/file/9ad15e5b-9c2c-4faf-8829-05fa8252c8bc/genotypes.csv",
    type : "GET",
    success : function(r) {
      console.log(r);
    }
});

Retrieve log file for exported file(s)

Retrieves the log file generated by an invocation of POST.

URL

/deriva/export/file/<id>/log

Method:

GET

URL Params

Required:

id=[string]

Data Params

None

Success Response:

Code: 200

Content: The file content.

Error Responses:

  • 404: NOT FOUND
  • 403: FORBIDDEN
  • 401: UNAUTHORIZED
  • 400: BAD REQUEST
  • 500: INTERNAL SERVER ERROR

Sample Call:

$.ajax({
    url: "/deriva/export/file/9ad15e5b-9c2c-4faf-8829-05fa8252c8bc/log",
    type : "GET",
    success : function(r) {
      console.log(r);
    }
});

Exporting Bags

This API endpoint is used to perform one or more queries to an ERMrest catalog and create a BDBag archive file which can then be retrieved via GET. The URL for the result bag is returned in the response body as Content-Type:text/uri-list.

Export bag

Executes one more ERMrest queries and writes the results to a BDBag.

URL

/deriva/export/bdbag

Method:

POST

URL Params

None

Data Params

The input data is composed of a JSON object with the following form:

root (object)

Variable Type Inclusion Description
bag bag required A bag object. See below.
catalog catalog required A catalog object. See below.

bag (object)

Variable Type Inclusion Description
bag_name string, enum ["zip","tgz","bz2","tar"] required The base file name of the bag. An appropriate extension will be added to the base depending on the archive type selected.
bag_archiver string required The archive format used to serialize the result bag.
bag_metadata object optional A simple 'dictionary' object consisting of key-value pairs. The only supported primitive type for value pairs is string. The metdata object will be written directly to the bag's bag-info.txt file.

catalog (object)

Variable Type Inclusion Description
host string required The hostname (and port) of the ERMrest service to query.
catalog_id string optional The catalog identifier, e.g. 1.
username string optional If username is not specified, authentication is assumed to come from the webauthn authentication context stored in the caller's cookie.
password string optional The password field is only used when username is specifed.
queries array[query] required An array of query objects. See below.

query (object)

Parent Object Variable Type Inclusion Description Interpolatable
query processor string required This is a string value used to select from one of the built-in query output processor formats. Valid values are env, csv, json, json-stream, download, or fetch. No
query processor_type string optional A fully qualified Python class name declaring an external processor class instance to use. If this parameter is present, it OVERRIDES the default value mapped to the specified processor. This class MUST be derived from the base class deriva.transfer.download.processors.BaseDownloadProcessor. For example, "processor_type": "deriva.transfer.download.processors.CSVDownloadProcessor". No
query processor_params object required This is an extensible JSON Object that contains processor implementation-specific parameters. No
processor_params query_path string required This is string representing the actual ERMRest query path to be used in the HTTP(S) GET request. It SHOULD already be percent-encoded per RFC 3986 if it contains any characters outside of the unreserved set. Yes
processor_params output_path string required This is a POSIX-compliant path fragment indicating the target location of the retrieved data relative to the specified base download directory. Yes
processor_params output_filename string optional This is a POSIX-compliant path fragment indicating the OVERRIDE filename of the retrieved data relative to the specified base download directory and the value of output_path, if any. Yes
Success Response:

Code: 200

Content:

http://localhost:8080/deriva/export/file/9ad15e5b-9c2c-4faf-8829-05fa8252c8bc/sample-bag.zip
Error Responses:
  • 404: NOT FOUND
  • 401: UNAUTHORIZED
  • 400: BAD REQUEST
  • 500: INTERNAL SERVER ERROR
Sample Call:
var exportParameters =
{
    "bag":
    {
      "bag_name": "sample-bag",
      "bag_archiver":"zip",
      "bag_metadata":
      {
        "Source-Organization": "USC Information Sciences Institute, Informatics Systems Research Division",
        "Contact-Name": "Mike D'Arcy",
        "External-Description": "A bag containing a sample PheWas cohort for downstream analysis.",
        "Internal-Sender-Identifier": "USC-ISI-IRSD"
      }
    },
    "catalog":
    {
      "host": "https://localhost:8080",
      "path": "/ermrest/catalog/1",
      "username": "",
      "password": "",
      "queries":
      [
        {
          "processor": "csv",
          "query_path": "/entity/A:=pnc:subject/A1:=snp_v/snp_id=rs6265/$A/B:=pnc:metrics_v",
          "output_path": "metrics"
        },
        {
          "processor": "csv",
          "processor_params": {
            "query_path": "/entity/A:=pnc:subject/A1:=snp_v/snp_id=rs6265",
            "output_path": "genotypes"
          }
        },
        {
          "processor": "csv",
          "processor_params": {
            "query_path": "/entity/A:=pnc:subject/A1:=snp_v/snp_id=rs6265/$A/B:=pnc:subject_phenotypes_v",
            "output_path": "phenotypes"
          }
        },
        {
          "processor": "fetch",
          "processor_params": {
            "query_path": "/attribute/A:=pnc:subject/A1:=snp_v/snp_id=rs6265/$A/B:=pnc:image_files/filename::ciregexp::0mm.mgh/url:=B:uri,length:=B:bytes,filename:=B:filepath,sha256:=B:sha256sum",
            "output_path": "images"
          }
        }
      ]
    }
};

$.ajax({
    url: "/deriva/export/file",
    dataType: "json",
    type : "POST",
    data: exportParameters,
    success : function(r) {
      console.log(r);
    }
});

Retrieve an exported bag

Retrieves a bag previously created by a POST.

URL

/deriva/export/bdbag/<id>

Method:

GET

URL Params

Required:

id=[string]

Data Params

None

Success Response:

Code: 200

Content: The file content.

Error Responses:

  • 404: NOT FOUND
  • 403: FORBIDDEN
  • 401: UNAUTHORIZED
  • 400: BAD REQUEST
  • 500: INTERNAL SERVER ERROR

Sample Call:

$.ajax({
    url: "/deriva/export/file/9ad15e5b-9c2c-4faf-8829-05fa8252c8bc",
    type : "GET",
    success : function(r) {
      console.log(r);
    }
});

Retrieve log file for exported bag

Retrieves the log file generated by an invocation of POST.

URL

/deriva/export/bdbag/<id>/log

Method:

GET

URL Params

Required:

id=[string]

Data Params

None

Success Response:

Code: 200

Content: The file content.

Error Responses:

  • 404: NOT FOUND
  • 403: FORBIDDEN
  • 401: UNAUTHORIZED
  • 400: BAD REQUEST
  • 500: INTERNAL SERVER ERROR

Sample Call:

$.ajax({
    url: "/deriva/export/file/9ad15e5b-9c2c-4faf-8829-05fa8252c8bc/log",
    type : "GET",
    success : function(r) {
      console.log(r);
    }
});