#
4.5. [응용1] Longest Common Subsequence (LCS)
In
- [T. Cormen et al., Introduction to Algorithms (3rd ed.), The MIT Press, 2009. 16.3]
- Definitions
- Given a sequence X = <x_1, x_2, ..., xm > another sequence Z = <z_1, z_2, ..., zk > is a subsequence of X if there exists a strictly increasing sequence <_i1, i_2, ..., i_k > of indices of X such that \forall j = 1, 2, ..., k, we have x_{ij} = z_j.
- A subsequence of a given sequence is just the given sequence with some elements (possibly none) left out.
- Ex: \\ X=<A,B,C,B,D,A,B>, \\ Z=<B,C,D,B>(<2,3,5,7>)
- Given two sequences X and Y, we say that a sequence Z is a common subsequence of X and Y if Z is a subsequence of both X and Y.
- Ex: X = <A, B, C, B, D, A, B>, \\ Y = <B, D, C, A, B, A>,\\ Z_1 = <B, C, A>,\\ Z_2 = <B, C, B, A>,\\ Z_3 = <B, D, A, B>
- Given a sequence X = <x_1, x_2, ..., x_m >, X_i = <x_1, x_2, ..., x_i > is the ith prefix of X, for i = 0, 1, ..., m.
- Ex: X = <A, B, C, B, D, A, B>, \\ X_4 = <A, B, C, B>,\\ X_0 = null sequence
- Given a sequence X = <x_1, x_2, ..., xm > another sequence Z = <z_1, z_2, ..., zk > is a subsequence of X if there exists a strictly increasing sequence <_i1, i_2, ..., i_k > of indices of X such that \forall j = 1, 2, ..., k, we have x_{ij} = z_j.
- Problem
- Given two sequences X = <x_1, x_2, ..., x_m > and Y = <y_1, y_2, ..., y_n >
- find a longest common subsequence of X and Y.
- Naïve approach
- Enumerate all subsequences of X and check each subsequence to see if it is also a subsequence of Y, keeping track of the longest subsequence found. → Exponential algorithm!
- The LCS problem can be solved efficiently using dynamic programming.
- Optimal substructure of an LCS
- Let X = <x_1, x_2, ..., x_m > and $Y = <y_1, y_2, ..., y_n > $be sequences, and let Z = <z_1, z_2, ..., z_k > be any LCS of X and Y.
- If x_m = y_n, then z_k = x_m = y_n, and Z_{k-1} is an LCS of X_{m-1} and Y_{n-1}.
- If x_m \neq y_n, then an LCS of X and Y is either an LCS of X_{m-1} and Y or an LCS of X and Y_{n-1}.
- Let X = <x_1, x_2, ..., x_m > and $Y = <y_1, y_2, ..., y_n > $be sequences, and let Z = <z_1, z_2, ..., z_k > be any LCS of X and Y.
- Let c[i, j] be the length of an LCS of the sequences X_i and Y_j
- Optimal substructure for computing c[i, j]
#
4.5.1. O(mn) Algorithm
- Filling the table
- Printing the LCS
#
4.5.2. C Implementation
- Courtesy of link
/** Copyright (C) 2005 Neil Jones. **/ #include <stdio.h> char *LCS(char *a, char *b); #define NEITHER 0 #define UP 1 #define LEFT 2 #define UP_AND_LEFT 3 int main(int argc, char *argv[]) { printf("%s\n", LCS(argv[1], argv[2])); } char *LCS(char *a, char *b) { int n = strlen(a); int m = strlen(b); int **S; int **R; int ii; int jj; int pos; char *lcs; S = (int **)malloc((n + 1) * sizeof(int *)); R = (int **)malloc((n + 1) * sizeof(int *)); for (ii = 0; ii <= n; ++ii) { S[ii] = (int *)malloc((m + 1) * sizeof(int)); R[ii] = (int *)malloc((m + 1) * sizeof(int)); } for (ii = 0; ii <= n; ++ii) { S[ii][0] = 0; R[ii][0] = UP; } for (jj = 0; jj <= m; ++jj) { S[0][jj] = 0; R[0][jj] = LEFT; } for (ii = 1; ii <= n; ++ii) { for (jj = 1; jj <= m; ++jj) { if (a[ii - 1] == b[jj - 1]) { S[ii][jj] = S[ii - 1][jj - 1] + 1; R[ii][jj] = UP_AND_LEFT; } else { S[ii][jj] = S[ii - 1][jj - 1] + 0; R[ii][jj] = NEITHER; } if (S[ii - 1][jj] >= S[ii][jj]) { S[ii][jj] = S[ii - 1][jj]; R[ii][jj] = UP; } if (S[ii][jj - 1] >= S[ii][jj]) { S[ii][jj] = S[ii][jj - 1]; R[ii][jj] = LEFT; } } } ii = n; jj = m; pos = S[ii][jj]; lcs = (char *)malloc((pos + 1) * sizeof(char)); lcs[pos--] = (char)NULL; while (ii > 0 || jj > 0) { if (R[ii][jj] == UP_AND_LEFT) { ii--; jj--; lcs[pos--] = a[ii]; } else if (R[ii][jj] == UP) { ii--; } else if (R[ii][jj] == LEFT) { jj--; } } for (ii = 0; ii <= n; ++ii) { free(S[ii]); free(R[ii]); } free(S); free(R); return lcs; }