1. Introduction

Primary Usage: Identification of cis-regulatory elements initially identified by matrix scoring and then additionally scored on 7 other relevant contextual datapoints. Based on analysis of protein-coding transcripts in the Ensembl database.

logo

Note

TFBS_footprinting is now available in a Docker image.

Predict TFBSs in the promoters any of 1-80,000 human protein coding transcripts in the Ensembl database. TFBS predictions can also be made for 87 unique non-human species (including model organisms such as mouse and zebrafish), present in the following groups:

  • 70 Eutherian mammals
  • 24 Primates
  • 11 Fish
  • 7 Sauropsids

The TFBS footprinting method computationally predicts transcription factor binding sites (TFBSs) in a target species (e.g. homo sapiens) using 575 position weight matrices (PWMs) based on binding data from the JASPAR database. Additional experimental data from a variety of sources is used to support or detract from these predictions:

  • DNA sequence conservation in homologous mammal species sequences
  • proximity to CAGE-supported transcription start sites (TSSs)
  • correlation of expression between target gene and predicted transcription factor (TF) across 1800+ samples
  • proximity to ChIP-Seq determined TFBSs (GTRD project)
  • proximity to qualitative trait loci (eQTLs) affecting expression of the target gene (GTEX project)
  • proximity to CpGs
  • proximity to ATAC-Seq peaks (ENCODE project)