Syllabus for STAT 315b/ BIO-277cd

http://www.fas.harvard.edu/~junliu/

 

Fundamentals of Computational Biology (II)

Friday 1:30-3:20 PM

Kresge 502, HSPH, 655 Huntington Avenue.

 

Professors Jun Liu and Wing Hung Wong

Departments of Statistics and Biostatistics

Harvard University

 

Course Description: A substantial core of computational biology (or bioinformatics) methods has been developed during the past two decades to meet the need of biological scientists for data storage, data retrieval, and data analysis. A main problem that motivated early research in computational biology is protein sequence analysis. Recently, because of the dramatic increase in many types of biological data due to the human genome project and other high-throughput projects, the scope of bioinformatics research has been extended to embrace diverse topics such as micro-array analysis, protein classification, regulatory motif analysis, RNA analysis, structural and functional predictions, gene prediction, etc. This one-year course is intended to provide coverage of these developments of bioinformatics in the past thirty years with an emphasis on topics of recent interest. It is widely recognized that research in this field is interdisciplinary in nature and requires knowledge in computational algorithms, statistics, and molecular biology. Students in this class are expected to spend a substantial amount of time reading research articles/monographs ranging from statistics to biology.

 

Course Meetings:  Every Friday from 1:30PM to 3:25PM, Kresge 502. In School of Public Health Ave, 655 Huntington Ave, Boston, MA.

Course requirements: presentation of readings and researches related to designated articles (students can work in team).  This is a continuation of Stat-315 offered in Fall 2000. The following is a tentative list of topics to be covered.

 

  1. Introduction to microarray experiments: principles and experimental design.

 

  1. Low-level analysis of microarray data: feature extraction, normalization, ratio statistics, two-sample comparisons.

 

  1. Cluster analysis (I): hierarchical clustering, k-means, self-organizing maps, gene shaving, plaid models.

 

  1. Cluster analysis (II): dimension reduction, principle component analysis, singular value decomposition, correspondence analysis, multi-dimensional scaling.

 

  1. Supervised learning (I): discriminant analysis, neural network, error-rate concepts.

 

  1. Supervised learning (II): support vector machines, tree-based methods, bagging, boosting, ada-boost.

 

  1. Bayesian network: graphical probabilistic models and related computational algorithms. Applications to genetic networks.

 

  1. Introduction to protein structure prediction: folding and threading algorithms.

 

  1. Introduction to comparative genomics: examples of large-scale human-mouse genome sequence analysis.

 

 

Main References/Textbooks:

Not available.

Recommended Readings:

Baxevanis, A.D., Ouellette, B.F.F. (1998). Bioinformatics: A Practical Guide to the Analysis of Genes and Proteins. Wiley-Interscience.

Lodish et al. (2000). Molecular Cell biology (4th Edition). W.H. Freeman & Co.