CSE5290: Algorithms for Bioinformatics

Course Objectives

News:


The course is over. I thank you for your participation and contribution in this course. It was a great experience for me. I have put brief comments on your project in ePost, along with all the grades.

Administrivia

References:
  1. Gusfield D (1997) Algorithms on Strings, Trees and Sequences
  2. A. Tozeren, S. Byers (2004): New Biology for Engineers and Computer Scientists. Pearson Prentice Hall.
Evaluation: Your marks are here
Project: Possible project topics:

List of topics and Lecture schedule (tentative):

  1. Introduction:
  2. Lecture 1 (Sep 10, 09): Administrivia: Objectives, course overview.
    R-demo
    My slides are here. Some notes on installing and using R are here.
    Please consider participating in the York Programming contests. Details are here.
  3. Biology overview
    Lecture 2: Slides are here.
    A good article is here. If you are looking for more detailed information, read the book by Tozeren and Byers.
    The genetic code is "almost" universal. There is more information and references here.
    Lecture 3: Biology overview - contd. Manipulating DNA. Measuring gene expression levels.
    We will use slides from the previous class as well as these.
  4. Ch 4: Exhaustive search Lecture 4: the Partial Digest problem.
    My slides are here.
    Lecture 5: The motif finding problem.
    My slides are here.
  5. Ch 5: Greedy algorithms: genome rearrangements
    Lecture 6: My slides are here.
  6. Ch 6: Dynamic programming: sequence alignment.
    Lecture 7: My slides are here.
    Lecture 8: DP continued. Local and global alignment. We will use the last set of sides and (if time permits) these.
    Lecture 9: DP continued. Multiple alignment. Slides from the last set will be used.
    This wiki page contains a good summary of multiple alignment.
    Gene finding slides.
  7. Ch 7: Divide and Conquer: more efficient alignment. My slides are here.
  8. Ch 8: Graph Algorithms: Sequence assembly algorithms. My slides are here.
  9. Ch 9: Pattern Matching: different variants. My slides are here.
    BLAST: My slides are here.
  10. Ch 10: Clustering: My slides are here.
    Note: The above slides were updated slightly on Nov 17.
    The r-script I ran in class is here.
    Phyllogenetic trees: My slides are here.
  11. Ch 11: Hidden Markov Models: Intro to Markov Chains and Hidden Markov Models. My slides are here.
    The last set of slides on HMMs are here.
Assignments R resources Other links

While it is not required for the course, this is a good opportunity to start using LaTeX. A good starting point is this page.