CSE5290: Algorithms for Bioinformatics
Course Objectives
- Familiarity with Computational Problems in Biology
- Applying algorithmic ideas
- See real-life computational challenges
- Improve understanding of algorithms
News:
Administrivia
- Instructor: Suprakash Datta ( email: [lastname in lowercase] @ cse.yorku.ca , web: http://www.cse.yorku.ca/~[lastname in lowercase])
-
Textbook: [available at the University bookstore] An Introduction to Bioinformatics Algorithms, by Neil C. Jones and Pavel A. Pevzner
August 2004, ISBN-10: 0-262-10106-8, ISBN-13: 978-0-262-10106-6
publisher site. and book site maintained by the authors.
-
Class hours: Th 19:00-22:00 at CB 122.
-
Office hours: By appointment, at CSEB 3043.
References:
- Gusfield D (1997) Algorithms on Strings, Trees and Sequences
- A. Tozeren, S. Byers (2004):
New Biology for Engineers and Computer Scientists. Pearson Prentice Hall.
Evaluation:
- Midterm: 15%: October 27, in class
- Final : 30%: Scheduled by the registrar's office.
- HW: 30%
- Project: 25%
Your marks are here
Project:
- Choose a topic
- Read a few (typically 2-3) papers
- Critique them
- Implement with modifications/combination of ideas (the instructor will offer suggestions)
- Present an overview as well as your results.
Possible project topics:
- Probe design
- Protein-protein interactions
- Phylogenetic trees
- Clustering
- Exon prediction
- RNA secondary structure
- Retroviruses
- Other topics can be agreed to by the student and instructor.
List of topics and Lecture schedule (tentative):
- Introduction:
Lecture 1 (Sep 08):
Administrivia: Objectives, course overview. My slides are
here.
Please consider participating in the York Programming contests. Details are
here.
Some notes on installing and using R are here.
- Lecture 2 (Sep 15). Finish introduction,
Genome assembly. My slides are here.
- Lecture 3 (Sep 28): Exhaustive search algorithms. Same slides as the last lecture. Fourier methods for genome analysis.
- Lecture 4 (Oct 4): Greedy algorithms (Ch 5). Intro to dynamic programming
(Ch 6). My slides are here.
- Lecture 5 (Oct 18): Sequence alignment using dynamic programming (Ch 6). My slides are here.
- Lecture 6 (Oct 25): Sequence assembly using graph algorithms (Ch 7). My slides are here.
- Lecture 7 (Nov 2): midterm. Suffix trees and repeat finding. My slides are here.
- Lecture 8 (Nov 9): Clustering, phyllogenetic trees. My slides are here.
A r-file that demonstrates some problems with the k-means algorithm is
here.
- Lecture 9 (Nov 16): phyllogenetic trees contd. Markov chains and hidden markov models. My slides are here.
- Lecture 10 (Nov 23): Hidden markov models - contd. Randomized algorithms.
My slides are here.
- Lecture 11 (Nov 30): Extra topics. My slides are here.
Assignments:
- HW 1. R Basics: pdf.
- HW 2. Ab initio exon prediction: pdf. If you wish you can use this example of the use of the sapply() function.
R resources
-
The starting point for R is its homepage.
- Read the introduction here. Then check out some of the documentation available here. If you are looking for a smaller document, here is one I like.
-
To search for R functionality on a certain topic, use google or this search engine.
-
Some tips on R are here.
-
Another resource on R is here.
Other links
While it is not required for the course, this is a good opportunity to
start using LaTeX. A good starting point is this page.