Brief Description ----------------- BioOptimizer is an algorithm designed to clean up Motif-finding output by finding the configuration of motif start sites that maximizes a scoring function based on the log-posterior distribution given in Jensen et al (200X). In addition, the algorithm allows the motif width to vary and attempts to find the best motif width, again in terms of maximizing the scoring function. It is necessary to have results from one of these motif-finding programs (BioProspector, Consensus, AlignACE, MEME) before running BioOptimizer. The primary program "BioOptimizer.biop" takes BioProspector results as input, but the programs "BioOptimizer.aa", "BioOptimizer.con" and "BioOptimizer.meme" take alignace, consensus and meme results as input respectively. The two-block version of BioOptimizer only accepts BioProspector results as input. Reference --------- JENSEN, S.T., LIU, X.S., ZHOU, Q. and LIU, J.S. (200X). Computational discovery of gene regulatory binding motifs: a Bayesian perspective. Accepted in {\it Statistical Science}. Copyright --------- BioOptimizer, version 1.0, is copyrighted to Shane T. Jensen (2003) Software Requirements --------------------- It is necessary to have the "Math::SpecFun::Gamma" perl library installed prior to using BioOptimizer. You may need to make your BioOptimizer programs executable by using the the command chmod +x BioOptimizer chmod +x BioOptimizer.aa etc. Input Information Needed ------------------------ 1. file of DNA sequences: seqfile Note that sequences should be in the following format: >genename1 acagctagctagcatcgatctagctgctacgat >genename2 agacgtacgatcgatcgactgcgtcatgactac 2. output file from a motif-finding program: BioProspector: biopfile Consensus: consfile Alignace: aafile MEME: memefile 3. number of different motifs in motif-finding output: nummotif 4. should reverse complement of input sequences also be searched? rc=1 if yes, rc = 0 if no 5. a priori expectation of motif width: w0 Command to Start Program ------------------------ If using BioProspector as input, command line is: ./BioOptimizer.biop seqfile biopfile nummotif rc w0 Example: ./BioOptimizer.biop example.upstream example.biop 5 1 7 Output: biopfile.opt.all - file with the optimized version of each motif input, along with site predictions biopfile.opt.best - file with optimized motif that had best final score compared to other motifs biopfile.opt.sum - file with summary of each optimization: starting motif width and score compared to final motif width, score and consensus. Notes: 1. Motif is described in terms of a consensus matrix and corresponding consensus sequence. The consensus sequence is formed by taking the dominant nucleotide in each column of the motif matrix, with a capital letter indicating that nucleotide has over 75% conservation, and a small letter if conservation is < 75%. 2. Sites are given as the starting position (from start of input sequences given) of the site, as well as the strand. "f" for forward, "r" for reverse 3. If the biop input file contains only one motif, biop.input.all and biop.input.best will be the same file 4. The null score is given for each motif, which is the score a motif would get if it had the same width and background nucleotide frequencies but no sites at all. This is included for rough comparison only...if the null score is greater than the final motif score, this indicates that the final motif is not strong. If using AlignACE as input, command line is: ./BioOptimizer seqfile biopfile nummotif rc w0 If using Consensus as input, command line is: ./BioOptimizer seqfile consensusfile nummotif rc w0 If using MEME as input, command line is: ./BioOptimizer.meme seqfile memefile nummotif rc w0 If using the two-block version of BioOptimizer, you must start with BioProspector results, so command line is: ./BioOptimizer.twoblock seqfile biopfile nummotif rc w1 w2 g1 g2 where w1 = expected width of block 1 w2 = expected width of block 2 g1 = minimum length of gap between blocks g2 = maximum length of gap between blocks