Stemformatics

API access

All Stemformatics data can be accessed via API. Use your favourite tool, such as python or R to search for relevant datasets and download them to do your own analyses. Or simply go to the api server and view the data in raw form.


# cURL example
curl https://api.stemformatics.org/datasets/2000/metadata

# python examples
import pandas, requests
r = requests.get('https://api.stemformatics.org/datasets/2000/samples')
df = pandas.DataFrame(r.json())
print(df.head())
        sample_id    cell_type parental_cell_type  ... developmental_stage treatment external_source_id
0  2000_1787466030_H  neurosphere         epithelium  ...
1  2000_1787466065_A  neurosphere         epithelium  ...
2  2000_1787466030_E  neurosphere         epithelium  ...
3  2000_1787466065_D  neurosphere         epithelium  ...
4  2000_1699538158_H  neurosphere         epithelium  ...

# Note that you can safely use spaces inside query string variable and requests will parse it for you
r = requests.get('https://api.stemformatics.org/search/samples?query_string=%s&field=tissue_of_origin,dataset_id' % 'dendritic cell')
print(r.json()[:2])
[{'sample_id': '7277_GSM2067549', 'dataset_id': 7277, 'tissue_of_origin': 'umbilical cord blood'}, 
    {'sample_id': '7277_GSM2067548', 'dataset_id': 7277, 'tissue_of_origin': 'umbilical cord blood'}]

# To get expression matrix as file but read it into pandas directly
import io
r = requests.get('https://api.stemformatics.org/datasets/6756/expression?as_file=true')
df = pandas.read_csv(io.StringIO(r.text), sep='\t', index_col=0)
print(df.head())
            GSM741192.CEL  GSM741193.CEL  GSM741194.CEL  GSM741195.CEL  \
1415670_at         8.209027       8.262415       8.557468       9.205204   
1415671_at        10.852328      11.100999      10.912304      10.836298   
1415672_at        10.431524      10.364212      10.517259      11.122440   

# R example
library(httr)
library(jsonlite)
response = GET("https://api.stemformatics.org/datasets/2000/metadata")
print(content(response))

Full list of APIs

Where you see the parameters, default vaules are given and these can be left out. For example, /datasets/2000/samples will work the same as /datasets/2000/sample?orient=records&as_file=false. If default value is not given, it is a required parameter and this is explained.

The parameter 'orient' can have same values as specified by to_dict() function in python pandas package.

/datasets/{dataset_id}/metadata

/datasets/{dataset_id}/samples?orient=records&as_file=false

/datasets/{dataset_id}/expression?gene_id={Ensembl_gene_id}&key=cpm&log2=false&orient=records&as_file=false

/datasets/{dataset_id}/pca?orient=records&dims=20

/datasets/{dataset_id}/correlated-genes?gene_id={Ensembl_gene_id}&cutoff=30

/datasets/{dataset_id}/ttest?gene_id={Ensembl_gene_id}&sample_group={sample_group}&sample_group_item1={item1}&sample_group_item2={item2}

/search/datasets

Fetches dataset metadata for datasets matching the search parameters. See below for parameters. Note that some sample metadata are included in the results here, unlike /datasets/{dataset_id}/metadata which only contains dataset metadata. Use query_string=* to return metadata for all datasets. Multiple parameters work as "AND" operator, so platform_type=Microarray&projects=myeloid_atlas,blood_atlas will fetch Microarray datasets under myeloid and blood atlas projects.


Optional parameters:
        dataset_id: comma separated list of dataset ids to restrict the search on
        query_string: perform text search on this query string on dataset (and sample metadata if include_samples_query)
        include_samples_query: if true, query_string will search sample metadata as well as dataset metadata

        platform_type: comma separated list of platform types to restrict the search of datasets on
        projects: comma separated list of projects to restrict the search of datasets on
        organism: comma separated list of organisms to restrict the search of datasets on

        -- The parameters below are probably more useful for the Stemformatics website UI than general use
        and the output format is also slightly different if pagination_limit is specified.
        -- parameters for returning data formatted for a plotly's sunburst plot
        sunburst_inner: sample group (eg. 'cell_type') for inner wheel of sunburst
        sunburst_outer: sample group for outer wheel of sunburst
        sunburst_inner_cutoff: max number of items in the inner wheel
        sunburst_outer_cutoff: max number of items in the outer wheel

        -- parameters filtering data after the search; these work like a sub-query if present, where the counts are not affected
        filter_Project: comma separated list of projects to apply filter on (note capital 'P')
        filter_platform_type: platform type to apply filter on (not comma separated list)
        filter_cell_type: comma separated list of cell types to apply filter on
        filter_tissue_of_origin: comma separated list of tissues to apply filter on

        -- parameters for sorting
        sort_field: sort the list of datasets based on this field; default is 'name'
        sort_ascending: default is true

        -- parameters for pagination
        pagination_limit: number of items per page
        pagination_start: start page

    Example data returned
    [
        {
            "dataset_id": 6741,
            "title": "Transcriptional specialization of human dendritic cell subsets in response to microbial vaccines",
            "authors": "Banchereau R, Baldwin N, Cepika AM, Athale S, Xue Y, Yu CI, Metang P, Cheruku A, Berthier I, Gayet I, Wang Y, Ohouo M, Snipes L, Xu H, Obermoser G, Blankenship D, Oh S, Ramilo O, Chaussabel D, Banchereau J, Palucka K, Pascual V",
            "description": "The mechanisms by which microbial vaccines interact with human APCs remain elusive. Herein, we describe the transcriptional programs induced in human DCs by pathogens, innate receptor ligands and vaccines. Exposure of DCs to influenza, Salmonella enterica and Staphylococcus aureus allows us to build a modular framework containing 204 transcript clusters. We use this framework to characterize the responses of human monocytes, monocyte-derived DCs and blood DC subsets to 13 vaccines. Different vaccines induce distinct transcriptional programs based on pathogen type, adjuvant formulation and APC targeted. Fluzone, Pneumovax and Gardasil, respectively, activate monocyte-derived DCs, monocytes and CD1c+ blood DCs, highlighting APC specialization in response to vaccines. Finally, the blood signatures from individuals vaccinated with Fluzone or infected with influenza reveal a signature of adaptive immunity activation following vaccination and symptomatic infections, but not asymptomatic infections. These data, offered with a web interface, may guide the development of improved vaccines. Abstract from Nat Commun. 2014 Oct 22;5:5283.",
            "platform": "Illumina HumanHT-12 v4.0 Expression BeadChip",
            "private": false,
            "pubmed_id": "25335753",
            "name": "Banchereau_2014_25335753_d",
            "accession": "GSE56744",
            "version": 1.0,
            "platform_type": "Microarray",
            "status": "passed",
            "projects": [
                "myeloid_atlas",
                "blood_atlas",
                "dc_atlas"
            ],
            "samples": 88,
            "cell_type": "CD1c+ dendritic cell,CD141+ dendritic cell",
            "display_name": "Banchereau (2014)"
        },
        ...
    ]

/search/samples?limit=50&orient=records

/values/datasets/{key}?include_count=false

/values/samples/{key}?include_count=false

/download?dataset_id={comma separated dataset ids}

/genes/sample-group-to-genes?sample_group={sample_group}&sample_group_item={sample_group_item}&cutoff=10

/genes/gene-to-sample-groups?gene_id={Ensembl_gene_id}&sample_group=cell_type

/atlas-types

/atlases/{atlas_type}/{item}?version=''&orient=records&filtered=false&query_string=''&gene_id=''&as_file=false

Fetches atlas data for atlas_type (one of myeloid,blood,dc). Additional parameters are applicable depending on the item. version specifies a particular version of the atlas to fetch.


For item=coordinates, returns PCA coordinates used by the atlas:
    [
        {
            "index": "1000_1674120023_B",
            "0": -0.3089953429523829,
            "1": -2.331254360290466,
            "2": -3.708666728816238
        },
        {
            "index": "1000_1674120023_F",
            "0": -1.6988969119278687,
            "1": -3.539426275611378,
            "2": -3.028939281330146
        },
        ...
    ]

    For item=samples, returns the sample table:
    [
        {
            "index": "7268_GSM2360259",
            "Cell Type": "hematopoietic multipotent progenitor",
            "Sample Source": "in vitro",
            "Progenitor Type": "pluripotent stem cell",
            "Activation Status": "growth factor",
            "Tissue": "in vitro",
            "Disease State": "normal",
            "Platform Category": "RNASeq"
        },
        ...
    ]

    For item=expression-values, returns expression values for gene_id.
    [
        {
            "index": "ENSG00000118513",
            "1000_1674120023_B": 0.9276657500763824,
            "1000_1674120023_F": 0.6500152765047357,
            "1000_1674120053_B": 0.7045142071494043,
            ...
        }
    ]

    For item=expression-file, file download is served for expression matrix
    For item=genes, file download is served for genes matrix.

    For item=colours-and-ordering, return the dictionary of colours and ordering used for the atlas
    {
        "colours": {
            "Sample Source": {
                "in vivo": "#54E4AD",
                "ex vivo": "#d87e22",
                "in vitro": "#a6611a",
                "in vivo (HuMouse)": "#4c9282"
            },
            ...
        },
        "ordering": {
            "Sample Source": [
                "in vivo",
                "ex vivo",
                "in vitro",
                "in vivo (HuMouse)"
            ],
            ...
        }
    }

    For item=possible-genes, returns matching genes in the atlas for query_string:
    [
        {
            "ensembl": "ENSG00000118513",
            "inclusion": false,
            "symbol": "MYB"
        },
        ...
    ]

/atlas-projection/{atlas_type}/{data_source}