Overview
This website offers different statistics in the context of position
frequency matrices (PFMs). Either the website can be used for
calculation or the source code can be downloaded. Below is a short
description of each method, and, finally, instructions for
compilation. Please note that all statistics assume an i.i.d. sequence
model. You find more information and references to check for an
i.i.d. sequence in the
readme file.
- Count Statistic: Annotation of a sequence with binding sites based on a PFM can yield more than one hit. Furthermore, hits may overlap especially for palindromic DNA motifs. Therefore, a sophisticated statistic is required to compute the significance of an enrichment of binding sites in a certain region. Here, one can plug in a PFM and the parameters and p-values for the statistic are computed. Annotation of a sequence can be done here.
- Similarity Statistic: The discovery of new PFMs can obtain redundant PFMs which are already known. Computing the similarity between all TFs in a set, uncovers redundancies.
- Clustering: The discovery of
new PFMs can obtain redundant PFMs which are already known. Clustering
of such a set of TFs to obtain representatives for all subsets with
very similar TFs is a typical post processing step.
- Co-Occurrences: Calculation of p-values for CRMs (cis-regulatory modules) has to consider similarity between PFMs. Therefore, we offer co-occurrence and co-operativity statistics incorporating similarity between PFMs.
Download and Compilation
Contact
Utz J. Pape, pape..at.molgen.mpg.de