CSCI 4490-6490 Algorithms for Computational Biology Spring 2016

Homework Assignment 2


This homework is to exercise for dynamic programming. Code the LCS algorithm and its associated function PrintLCS using a programming language. Some requirements are:

  1. The input to your program is two DNA sequences. They can be read from the keyboard or (preferably) from a text file in the fasta format;

  2. The program first calls LCS to accomplish the major task;

  3. The LCS algorithm should be modified a little to output the computed dynamic programming table for the si,j values once the tabe is computed;

  4. The program should then call PrintLCS to output the longest common subsequence found for the two inputted sequences;

  5. Outputs can be to the screen or (preferably) to a text file.
Alternatively, you may choose to code the algorithm GlobalPairwiseAlignment instead. For this task, the input remains the same; but the output is an alignment between the two inputted sequences. There are also options of scoring schemes for substitution (including match) and gap penaltiesi: (a) the scheme given in Eddy's short paper; (b) some biologically more meaningful scheme that you may find; (c) as simple as '1 for a match', '0 for everything else' as in the LCS algorithm. You may NOT want to hard-code the scoring scheme (matrix) in the program; instead, you should try to code your program to read the scoring matrix from a text file.

This homework is due on Tuesday February 16, 2016.