1: Longest Ascending Sub-Sequence
Take an unsorted array of N ints. Each int except for the first one may be less than,
equal to, or greater than the one before it. Sometimes there will be long stretches where
each int is greater than the one before it. These are "Ascending Sub-Sequences", and
we want to find the longest one. Example:
int A[] = { 7, 3, 5, 2, 5, 8, 5, 3, 1, 5, 7, 8, 9, 4 };
here, the longest ascending sub-sequence is <1,5,7,8,9>, starting at position
8 (remember that arrays start from zero).
To solve this problem, we can consider a similar but much simpler sub-problem:
What is the length of the ascending sub-sequence starting from position X?
when X is 8, the answer is 5 (the ascending sub-sequence is <1,5,7,8,9>); when
X is 3 the answer is 3 (for <2,5,8>), when X is 4 the answer is 2 (for <5,8>)
when X is 5 the answer is 1 (for <8>). All we have to do is ask this sub-question
for all possible values of X within the length of the array, and the biggest answer
is the answer to the real question.
Finding the length of the ascending sub-sequence starting from some position is very
easy; it just needs a loop that watches for a non-increasing next entry:
int LongestAscSubStartingAt(int X, int A[], int N)
{ int answer=1;
while (X<N && A[X]<A[X+1])
{ answer+=1;
X+=1; }
return answer; }
Using that, the original question becomes easy to answer:
int LengthOfLongestAscSub(int A[], int N)
{ int best=0;
for (int i=0; i<N; i+=1)
{ int len=LongestAscSubStartingAt(i, A, N);
if (len>best)
best=len; }
return best; }
However, a quick examination reveals that the time for this
algorithm is O(N2). You should do that quick examination yourself and make sure you
agree. With dynamic programming we can do better.
First, observe that if we already know the length of the longest ascending sub-sequence
starting from position X+1 (i.e. next=LongestAscSubStartingAt(X+1, A, N)),
the sub-problem is even easier to answer. If A[X]<A[X+1] then the entry A[X] is just another
one on the front of the old subsequence, giving a length of next+1. On the other hand, if
A[X]>=A[X+1] then A[X] is a sequence all on its own, with length 1, because the next element
does not continue increasing. So we have an alternate formulation:
int LongestAscSubStartingAt(int X, int A[], int N)
{ if (X>=N)
return 0;
else if (A[X]<A[X+1])
return 1+LongestAscSubStartingAt(X+1, A, N);
else
return 1; }
Memoising this function would be easy, if we knew the maximum possible value for N (the length of
the array):
int answers[MAX+1] = { 0 };
int LongestAscSubStartingAt(int X, int A[], int N)
{ if (X>=N)
return 0;
if (answers[X]!=0)
return answers[X];
int result;
if (A[X]<A[X+1])
result=1+LongestAscSubStartingAt(X+1, A, N);
else
result=1;
answers[X]=result;
return result; }
And then we could go for a proper Dynamic Programming solution by just filling in the table
once,without ever using the memoised function. So long as we work out answers[MAX] first,
and work backwards towards answers[0], everything will be easy. Having to start at the end
and work backwards toward the beginning is a very common feature of dynamic programming solutions.
int table[MAX+1];
void FillInTable(int A[], int N)
{ table[N]=0;
for (int i=N-1; i>=0; i-=1)
{ if (A[i]<A[i+1])
table[i]=1+table[i+1];
else
table[i]=1; } }
int LongestAscSubStartingAt(int X, int A[], int N)
{ return table[X]; }
int LengthOfLongestAscSub(int A[], int N)
{ FillInTable(A, N);
int best=0;
for (int X=0; X<N; X+=1)
{ int len=LongestAscSubStartingAt(X, A, N);
if (len>best)
best=len; }
return best; }
With that, after the table is filled in, the LongestAscSubStartingAt
function is constant time O(1), so the main LengthOfLongestAscSub function becomes
linear O(N). The initialisation function FillInTable is linear too O(N) and only used once,
so the whole solution is now O(N).
We can't do better than linear time for this problem, so what we've got is an Optimal algorithm.
However, A little bit of thinking reveals that we can save a lot of memory because we don't really need
table at all. Each time round the loop, FillInTable only ever looks at the one result that it
just worked out: no need for a table to do that. The LengthOfLongestAscSub function really
just finds the biggest number in the table: we could instead remember the largest while filling in
the table. This gives an even better solution:
int LengthOfLongestAscSub(int A[], int N)
{ int previous=0, next;
int best=0;
for (int i=N-1; i>=0; i-=1)
{ if (A[i]<A[i+1])
next=1+previous;
else
next=1;
if (next>best)
best=next; }
return best; }
This algorithm is linear in time and constant in space. Tbis is very unusual for dynamic programming,
and is really because this was quite a simple problem, but it does illustrate quite well the power of
this method, and the things you have to think about when designing a dynamic programming algorithm.
2: Longest Common Unbroken Sub-String
Take two strings. They may have absolutely no characters in common, but usually there will
be a number of characters, and perhaps even whole sequences that appear unbroken in both strings.
We are to find the longest of these unbroken common sub-strings. Some examples:
- LCS("abcdefghijklmn", "OOOOdefOOOabghiOOefghOOO") is "efgh".
- LCS("hello", "sailor") is "l".
- LCS("the cat sat on the mat", "the dog sat on the hat") is " sat on the ".
- LCS("one two three", "abcde fghijk") is "".
This problem turns out almost to be just a two-dimensional version of the previous one.
First, we identify a simpler sub-problem: instead of finding the longest of all possible
common sub-strings, we will find length of the longest common sub-string given its exact starting position
in both strings. In this case, LLCS("abcdefghijkl", 3, "defabbcghijabc", 6) is supposed to
find the longest exactly shared substring that starts at position 3 of "abcdefghijkl" and
position 6 of "defabbcghiabc". Remembering that arrays and strings start at position zero,
position 3 of "abcdefghijkl" contains the letter 'd', and position 6 of "defabbcghiabc" contains the
letter 'c'. As these letters are different, no common substring can start at these positions,
so the answer is 0. As another example, LLCS("abcdefghijkl", 6, "defabbcghijabc", 7) finds the length of the
longest shared substring startig at position 6 of "abcdefghijkl" and position 7 of "defabbcghijabc".
"abcdefghijkl"[6] is 'g' and "defabbcghijabc"[7] is also 'g', so a common substring does start at
these positions. How long is it? No need to calculate, it must be one character longer than
the common substring starting from the next positions (7 and 8). This gives a simple recursive
function:
int LLCS(char *s1, int p1, char *s2, int p2)
{ if (p1>=strlen(s1))
return 0;
else if (p2>=strlen(s2))
return 0;
else if (s1[p1]!=s2[p2])
return 0;
else
return 1+LLCS(s1, p1+1, s2, p2+1); }
To solve the original problem (find the longest common substring, no matter where it appears)
we have to use this function at all possible positions within the two strings:
char *LCS(char *s1, char *s2)
{ int n1=strlen(s1), n2=strlen(s2);
int longest=0, best1, best2;
for (int i=0; i<n1; i+=1)
{ for (int j=0; j<n2; j+=1)
{ int r=LLCS(s1, i, s2, j);
if (r>longest)
{ longest=r;
best1=i;
best2=j; } } }
/* at this point we have the answer. Making the string is just the cherry on top */
char *answer=new char[longest+1];
for (int i=0; i<longest; i+=1)
answer[i]=s1[best1+i];
answer[longest]=0;
return answer; }
Now we can see how inefficient this algorithm is. When the two strings contain a fair number
of near-matches, LLCS will take linear time (let's simplify things just a little by assuming
that the two strings have roughly the same length, N). LLCS is used inside two nested loops
in LCS, giving a total time of O(N3). You won't be surprised to find that dynamic
programming can improve this.
The dynamic programming table this time is two dimensional. At table[i][j], we store the
value of LLCS(s1, i, s2, j). LCS just finds the biggest value in this table (to get longest),
and the position of that biggest value gives best1 and best2.
How do we fill in the table? The function LLCS completely gives it away. If two characters are different,
there entry is zero. If they are the same, table[i][j] is 1+table[i+1][j+1]. We must go through
the table in such an order that we will always know table[i+1][j+1] before trying to find out
table[i][j]. Easy.
int table[MAX+1][MAX+1];
char *LCS(char *s1, char *s2)
{ int n1=strlen(s1), n2=strlen(s2);
int longest=0, best1, best2;
for (int i=n1; i>=0; i-=1)
{ for (int j=n2; j>=0; j-=1)
{ if (i1==n1 || i2==n2)
table[i][j]=0;
else if (s1[i]!=s2[j])
table[i][j]=0;
else
{ int x=1+table[i+1][j+1];
table[i][j]=x;
if (x>longest)
{ longest=x;
best1=i;
best2=j; } } } }
/* create the string exactly as before */ }
This time, I can't see any way to eliminate the table, and the time is clearly O(N2).
Still a big improvement.
5: Maximum-Sum Unbroken Sub-Sequence
This is almost the one of the examples we did in class, I have only included it as a suggested
exercise if you need more practice.
Given an array of ints, some of which may be negative, some positive, find the largest number
you can possibly get by adding together a single unbroken run of numbers from the array. For
example:
int A[] = { 1, -7, 3, 2, -4, -6, 1, 4, -2, -1, 2, 1, 4, -5, 4 };
Unless I have mis-calculated, for this array, the answer should be 9. The unbroken run of six
numbers <1,4,-2,-1,2,1,4> adds up to 9, and no other unbroken run gives a higher sum.