Some Dynamic Programming Examples

This contains a couple of completely worked examples, a couple with the beginning done to get you going, and a couple of suggested problems to try. They are not the same as the ones we looked at in class.

For the maximum benefit, don't just read this. Each time you read something about a problem, try to solve it, or at least go on to the next step yourself, before reading further.

Contents:

1: Longest Ascending Sub-Sequence

Take an unsorted array of N ints. Each int except for the first one may be less than, equal to, or greater than the one before it. Sometimes there will be long stretches where each int is greater than the one before it. These are "Ascending Sub-Sequences", and we want to find the longest one. Example:

int A[] = { 7, 3, 5, 2, 5, 8, 5, 3, 1, 5, 7, 8, 9, 4 };

here, the longest ascending sub-sequence is <1,5,7,8,9>, starting at position 8 (remember that arrays start from zero).

To solve this problem, we can consider a similar but much simpler sub-problem: What is the length of the ascending sub-sequence starting from position X? when X is 8, the answer is 5 (the ascending sub-sequence is <1,5,7,8,9>); when X is 3 the answer is 3 (for <2,5,8>), when X is 4 the answer is 2 (for <5,8>) when X is 5 the answer is 1 (for <8>). All we have to do is ask this sub-question for all possible values of X within the length of the array, and the biggest answer is the answer to the real question.

Finding the length of the ascending sub-sequence starting from some position is very easy; it just needs a loop that watches for a non-increasing next entry:

int LongestAscSubStartingAt(int X, int A[], int N)
{ int answer=1;
  while (X<N && A[X]<A[X+1])
  { answer+=1;
    X+=1; }
  return answer; }

Using that, the original question becomes easy to answer:

int LengthOfLongestAscSub(int A[], int N)
{ int best=0;
  for (int i=0; i<N; i+=1)
  { int len=LongestAscSubStartingAt(i, A, N);
    if (len>best)
      best=len; }
  return best; }

However, a quick examination reveals that the time for this algorithm is O(N²). You should do that quick examination yourself and make sure you agree. With dynamic programming we can do better.

First, observe that if we already know the length of the longest ascending sub-sequence starting from position X+1 (i.e. next=LongestAscSubStartingAt(X+1, A, N)), the sub-problem is even easier to answer. If A[X]<A[X+1] then the entry A[X] is just another one on the front of the old subsequence, giving a length of next+1. On the other hand, if A[X]>=A[X+1] then A[X] is a sequence all on its own, with length 1, because the next element does not continue increasing. So we have an alternate formulation:

int LongestAscSubStartingAt(int X, int A[], int N)
{ if (X>=N)
    return 0;
  else if (A[X]<A[X+1])
    return 1+LongestAscSubStartingAt(X+1, A, N);
  else
    return 1; }

Memoising this function would be easy, if we knew the maximum possible value for N (the length of the array):

int answers[MAX+1] = { 0 };

int LongestAscSubStartingAt(int X, int A[], int N)
{ if (X>=N)
    return 0;
  if (answers[X]!=0)
    return answers[X];
  int result;
  if (A[X]<A[X+1])
    result=1+LongestAscSubStartingAt(X+1, A, N);
  else
    result=1;
  answers[X]=result;
  return result; }

And then we could go for a proper Dynamic Programming solution by just filling in the table once,without ever using the memoised function. So long as we work out answers[MAX] first, and work backwards towards answers[0], everything will be easy. Having to start at the end and work backwards toward the beginning is a very common feature of dynamic programming solutions.

int table[MAX+1];

void FillInTable(int A[], int N)
{ table[N]=0;
  for (int i=N-1; i>=0; i-=1)
  { if (A[i]<A[i+1])
      table[i]=1+table[i+1];
    else
      table[i]=1; } }

int LongestAscSubStartingAt(int X, int A[], int N)
{ return table[X]; }

int LengthOfLongestAscSub(int A[], int N)
{ FillInTable(A, N);
  int best=0;
  for (int X=0; X<N; X+=1)
  { int len=LongestAscSubStartingAt(X, A, N);
    if (len>best)
      best=len; }
  return best; }

With that, after the table is filled in, the LongestAscSubStartingAt function is constant time O(1), so the main LengthOfLongestAscSub function becomes linear O(N). The initialisation function FillInTable is linear too O(N) and only used once, so the whole solution is now O(N).

We can't do better than linear time for this problem, so what we've got is an Optimal algorithm. However, A little bit of thinking reveals that we can save a lot of memory because we don't really need table at all. Each time round the loop, FillInTable only ever looks at the one result that it just worked out: no need for a table to do that. The LengthOfLongestAscSub function really just finds the biggest number in the table: we could instead remember the largest while filling in the table. This gives an even better solution:

int LengthOfLongestAscSub(int A[], int N)
{ int previous=0, next;
  int best=0;
  for (int i=N-1; i>=0; i-=1)
  { if (A[i]<A[i+1])
      next=1+previous;
    else
      next=1;
    if (next>best)
      best=next; }
  return best; }

This algorithm is linear in time and constant in space. Tbis is very unusual for dynamic programming, and is really because this was quite a simple problem, but it does illustrate quite well the power of this method, and the things you have to think about when designing a dynamic programming algorithm.

2: Longest Common Unbroken Sub-String

Take two strings. They may have absolutely no characters in common, but usually there will be a number of characters, and perhaps even whole sequences that appear unbroken in both strings. We are to find the longest of these unbroken common sub-strings. Some examples:

LCS("abcdefghijklmn", "OOOOdefOOOabghiOOefghOOO") is "efgh".
LCS("hello", "sailor") is "l".
LCS("the cat sat on the mat", "the dog sat on the hat") is " sat on the ".
LCS("one two three", "abcde fghijk") is "".

This problem turns out almost to be just a two-dimensional version of the previous one. First, we identify a simpler sub-problem: instead of finding the longest of all possible common sub-strings, we will find length of the longest common sub-string given its exact starting position in both strings. In this case, LLCS("abcdefghijkl", 3, "defabbcghijabc", 6) is supposed to find the longest exactly shared substring that starts at position 3 of "abcdefghijkl" and position 6 of "defabbcghiabc". Remembering that arrays and strings start at position zero, position 3 of "abcdefghijkl" contains the letter 'd', and position 6 of "defabbcghiabc" contains the letter 'c'. As these letters are different, no common substring can start at these positions, so the answer is 0. As another example, LLCS("abcdefghijkl", 6, "defabbcghijabc", 7) finds the length of the longest shared substring startig at position 6 of "abcdefghijkl" and position 7 of "defabbcghijabc". "abcdefghijkl"[6] is 'g' and "defabbcghijabc"[7] is also 'g', so a common substring does start at these positions. How long is it? No need to calculate, it must be one character longer than the common substring starting from the next positions (7 and 8). This gives a simple recursive function:

int LLCS(char *s1, int p1, char *s2, int p2)
{ if (p1>=strlen(s1))
    return 0;
  else if (p2>=strlen(s2))
    return 0;
  else if (s1[p1]!=s2[p2])
    return 0;
  else
    return 1+LLCS(s1, p1+1, s2, p2+1); }

To solve the original problem (find the longest common substring, no matter where it appears) we have to use this function at all possible positions within the two strings:

char *LCS(char *s1, char *s2)
{ int n1=strlen(s1), n2=strlen(s2);
  int longest=0, best1, best2;
  for (int i=0; i<n1; i+=1)
  { for (int j=0; j<n2; j+=1)
    { int r=LLCS(s1, i, s2, j);
      if (r>longest)
      { longest=r;
        best1=i;
        best2=j; } } }
  /* at this point we have the answer. Making the string is just the cherry on top */
  char *answer=new char[longest+1];
  for (int i=0; i<longest; i+=1)
    answer[i]=s1[best1+i];
  answer[longest]=0;
  return answer; }

Now we can see how inefficient this algorithm is. When the two strings contain a fair number of near-matches, LLCS will take linear time (let's simplify things just a little by assuming that the two strings have roughly the same length, N). LLCS is used inside two nested loops in LCS, giving a total time of O(N³). You won't be surprised to find that dynamic programming can improve this.

The dynamic programming table this time is two dimensional. At table[i][j], we store the value of LLCS(s1, i, s2, j). LCS just finds the biggest value in this table (to get longest), and the position of that biggest value gives best1 and best2.

How do we fill in the table? The function LLCS completely gives it away. If two characters are different, there entry is zero. If they are the same, table[i][j] is 1+table[i+1][j+1]. We must go through the table in such an order that we will always know table[i+1][j+1] before trying to find out table[i][j]. Easy.

int table[MAX+1][MAX+1];

char *LCS(char *s1, char *s2)
{ int n1=strlen(s1), n2=strlen(s2);
  int longest=0, best1, best2;
  for (int i=n1; i>=0; i-=1)
  { for (int j=n2; j>=0; j-=1)
    { if (i1==n1 || i2==n2)
        table[i][j]=0;
      else if (s1[i]!=s2[j])
        table[i][j]=0;
      else
      { int x=1+table[i+1][j+1];
        table[i][j]=x;
        if (x>longest)
        { longest=x;
          best1=i;
          best2=j; } } } }
  /* create the string exactly as before */ }

This time, I can't see any way to eliminate the table, and the time is clearly O(N²). Still a big improvement.

3: Longest Common Sub-String

This is very similar to 2: Longest Common Unbroken Sub-String, but not quite the same. For this version of the problem (let's just find the length of the sequence, no need for all the bells and whistles this time) LCS("abcde", "abXXcXcXXeX") should be 4, the length of the string "abce", which exists as a substring of "abcde" and of "abXXcXcXXeX", even though other letters get in the way. In this version of the problem, the substring doesn't have to be unbroken, but the letters must appear in the same order.

You solve it.

Clue: It really is very similar to the earlier unbroken substring problem. The difference is that when filling in the table, if two letters are not the same, the value is not zero. If s1[i] is different from s2[j], there could be further matches with later values of i but the same j, OR with later values of j but the same i, OR with later values of both i and j. You need to take the maximum of a few neighbouring table entries. That's all the clue I can give without completely writing the solution. It is still quite tricky to solve, so don't devote too much time to it.

4: Biggest Empty Square

Imagine a digitised satellite photograph showing an area of land. The photograph is rpesented to you as a two dimensional array of ints, each entry represents one square yard of land. The number is the amount dangerous stuff (land mines, poison, and killer robots) hidden in that square yard. You need to find the largest possible square of land that is completely safe for building a special home for orphaned nuns and their fluffy kittens. In other words, looking at an array of ints, find the largest square that contains only zeros.

1 2 3 4 5 7 4 2 3 4 6
2 3 0 0 0 5 3 2 6 8 9
1 2 0 0 0 6 7 2 4 6 8
2 3 0 0 0 0 0 0 0 5 6
1 2 3 4 0 0 0 0 0 4 3
0 0 0 0 0 0 0 0 0 4 6
9 9 9 9 0 0 0 0 1 2 3
8 7 6 8 0 0 0 0 0 3 4
3 4 0 0 0 0 3 4 5 6 8

If you examine that array, you'll see that there are a few squares of zeros in there. Near the top left, there is a 3x3 square whose bottom-right corner overlaps with a 4x4 square. The 4x4 is almost a 5x5, except for a single 1 slipping in on the right hand edge, and that just won't do. In this array, the largest square of zeros is 4x4, so the function we write should return 4 as its answer at the very least, or even better, tell us both the size of the square and where it is by the co-ordinates (row, column) of its top left corner. Our 4x4 square starts at row 3, column 4 (remember to start counting from 0).

How is it to be done?

Suppose we have a table, exactly the same size as the original array. Table[r][c] is supposed to contain the size of the largest square whose top left corner is in row r, column c.
How can we fill in this table? You work it out.

The basic idea is that if there is a square of size N+1 starting at position (r,c), then there MUST be smaller squares of size N, at three neighbouring positions. That is all you need to know.

5: Maximum-Sum Unbroken Sub-Sequence

This is almost the one of the examples we did in class, I have only included it as a suggested exercise if you need more practice.

Given an array of ints, some of which may be negative, some positive, find the largest number you can possibly get by adding together a single unbroken run of numbers from the array. For example:

int A[] = { 1, -7, 3, 2, -4, -6, 1, 4, -2, -1, 2, 1, 4, -5, 4 };

Unless I have mis-calculated, for this array, the answer should be 9. The unbroken run of six numbers <1,4,-2,-1,2,1,4> adds up to 9, and no other unbroken run gives a higher sum.

6: Best Way to Make Change

void GreedyChangeMaker(int amt)
{ while (amt>=25)
  { printf(" 25");
    amt-=25; }
  while (amt>=10)
  { printf(" 10");
    amt-=10; }
  while (amt>=5)
  { printf(" 5");
    amt-=5; }
  while (amt>=1)
  { printf(" 1");
    amt-=1; }
  printf("\n"); }

The problem is that although the greedy algorithm does always work for Americal currency (and just about all others currently used throughout the world), it is not guaranteed always to work. Consider a currency in which the values of the available coins are: {16, 12, 1} and you need to give change for 24 centimes. The greedy algorithm would immediately jump for the biggest coin it could handle (the 16) and then be forced to make the remaining 8 using only pennies, giving this solution 16+1+1+1+1+1+1+1+1 using a total of 9 coins. The correct solution is 12+12, using only 2 coins.

This is not a totally artificial problem. When I was a lad in England in the 60's, the coins in existence had these values (in units of pennies) {240, 60, 30, 24, 12, 6, 3, 1, ½, ¼}. If you had to give change for 4 shillings, which is 48 pence, the greedy algorithm provides 30+12+6 using 3 coins, whereas the optimal solution is 24+24 using only 2 coins.

See what you can come up with. All those hippies can't do arithmetic with numbers beyond 20 (they wear sandals, so they can count on fingers and toes), they need your help.