This is the Help and Frequently Asked Questions page for Stemformatics

This provides you with hands-on tutorials and frequently asked questions to help you get started or answer questions in Stemformatics.

How to Cite Us

Please ensure that you cite the original publications describing the datasets hosted by Stemformatics.

To cite Stemformatics, please use:
Wells CA et al Stemformatics: Visualisation and sharing of stem cell gene expression. Stem Cell Research, DOI


What is the best screen resolution and browser to use with Stemformatics?

The minimum recommended screen resolution for Stemformatics is 1152px (width) x 864px (height). Stemformatics is 100% compatible with the latest Google Chrome and Mozilla Firefox.

Please note that only Internet Explorer 11 is currently supported. All other Internet Explorers 10 and below are not supported. You can download Firefox for free at You can download Google Chrome for free at

How can I get my favourite dataset into Stemformatics?

You can suggest a dataset that goes straight into our dataset queue, Agile_org.

We will then review your request and provide some feedback on when this might be put into Stemformatics. Unfortunately, due to limited resources and potential technical issues, we may not always be able to process your request.

How can I get my private dataset into Stemformatics?

We are currently working on the ability to provide security to handle private datasets with access based on individual or group permisisons. In the meantime, please send us an email (our details are on the Contact page) with your details and we can keep you informed of our progress.

Why do I get multiple genes back from a gene search?

We use gene annotations from Ensembl and Entrez. In cases where Ensembl have provided multiple Entrez IDs for their genes, we retrieve all associated gene symbols. The first gene symbol retrieved in such cases is the more trusted (canonical) symbol, usually sourced from HGNC (HUGO Gene) or MGI (Mouse Genome Informatics) for human and mouse gene annotations, respectively. For gene disambiguity, please refer to the Ensembl and Entrez links for your gene of interest.

Why is there sometimes no data available for my favourite stem cell gene (such as NANOG, SOX2, OCT4, MYC), or other genes for a given dataset?

Currently, we rely on microarray probe mappings provided by Ensembl for the various microarray platforms used in the experiments associated with our expression datasets. On some platforms, there are no reliable probe mappings to the genomic sequences associated with these genes of interest. In such cases, we unfortunately cannot show expression for these genes. Note that this does not mean that these genes are not present or not expressed in the available data - only that we have no means to detect their presence (or absence) for a given platform.

Are gene expression results accurate for my gene of interest?

The accuracy of gene detection and expression for a particular gene are constrained by the combination of the accuracy of a given microarray platform's probe sequences and probe set mappings to transcript sequences. Furthermore, we rely upon the accuracy of Ensembl's probe-to-transcript and transcript-to-gene mappings. Most probes map to a single gene, but some (about 7%) map to multiple genes (usually two, rarely more). If in doubt, refer to Ensembl's probe mapping pipeline for more information.

How do you handle probes that map to multiple genes?

We show all probe expressions. Multi-mapping probes are highlighted in our graphs however; to see what other genes map to these probes, click on the probe IDs to access the multi-mapping probe summary page for this probe.

How are biological sample replicates treated in our expression data and statistics?

Across the site, we use and display data pertaining to biological replicates that have been collapsed (averaged), with few exceptions. In our expression results, only scatter plots show un-collapsed sample expression, however the bar graphs and box plots aggregate biological replicate samples for a given sample or chip type (or other experimental metadata). In these cases, we provide error bars and standard deviations for sample expression.

What do the lines in the expression graphs represent?

Two measures were taken for each dataset. The blue line is representative of the detection threshold (minimum level for this dataset where the gene is said to be detected) and the green line is representative of the median of normalized detection scores for all genes in the dataset.

In the public gene lists, why are some genes missing from Kegg pathways?

These Kegg pathways were downloaded using the R Bioconductor library Kegg.db. This library used Entrez identifiers and these were converted to Ensembl identifiers via Stemformatics mapping and stored in the database. Around 5% of the mouse Entrez identifiers could not be converted to Ensembl and 2% of the human Entrez identifiers could not be converted to Ensembl.

How can I access the probe-to-gene (or other reporter-to-gene) mappings for an assay platform?

You can download the mapping files from here.