Thursday, January 10, 2019

More Roman Numerals and Bash

·   0

When in Rome: finishing the Roman numeral converter script.
In my last article, I started digging in to a classic computer science puzzle: converting Roman numerals to Arabic numerals. First off, it more accurately should be called Hindu-Arabic, and it's worth mentioning that it's believed to have been invented somewhere between the first and fourth century—a counting system based on 0..9 values.
The script I ended up with last time offered the basics of parsing a specified Roman numeral and converted each value into its decimal equivalent with this simple function:

mapit() {
   case $1 in
     I|i) value=1 ;;
     V|v) value=5 ;;
     X|x) value=10 ;;
     L|l) value=50 ;;
     C|c) value=100 ;;
     D|d) value=500 ;;
     M|m) value=1000 ;;
      * ) echo "Error: Value $1 unknown" >&2 ; exit 2 ;;
   esac
}

Then I demonstrated a slick way to use the underutilized seq command to parse a string character by character, but the sad news is that you won't be able to use it for the final Roman numeral to Arabic numeral converter. Why? Because depending on the situation, the script sometimes will need to jump two ahead, and not just go left to right linearly, one character at a time.
Instead, you can build the main loop as a while loop:

while [ $index -lt $length ] ; do

    our code

    index=$(( $index + 1 ))
done

There are two basic cases to think about in terms of solving this algorithmic puzzle: the subsequent value is greater than the current value, or it isn't—for example, IX versus II. The first is 9 (literally 1 subtracted from 10), and the second is 2. That's no surprise; you'll need to know both the current and next values within the script.
Sharp readers already will recognize that the last character in a sequence is a special case, because there won't be a next value available. I'm going to ignore the special case to start with, and I'll address it later in the code development. Stay tuned, sharpies!
Because Bash shell scripts don't have elegant in-line functions, the code to get the current and next values won't be value=mapit(romanchar), but it'll be a smidge clumsy with its use of the global variable value:

mapit ${romanvalue:index-1:1}
currentval=$value

mapit ${romanvalue:index:1}
nextval=$value

It's key to realize that in the situation where the next value isn't greater than the current value (for example, MC), you can't automatically conclude that the next value isn't going to be part of a complex two-value sequence anyway. Like this: MCM. You can't just say M=1000 and C=500, so let's just convert it to 1500 and process the second M when we get to it. MCM=1900, not 2500!
The basic logic turns out to be pretty straightforward:

if [ $nextval -gt $currentval ] ; then
  sum=$(( $sum + $nextval - $currentval ))
else
  sum=$(( $sum + currentval ))
fi
Done!
Or, um, not. The problem with the conditional code above is that in the situation where you've referenced both the current and next value, you need to ensure that the next value isn't again processed the next time through the loop.
In other words, when the sequence "CM" is converted, the M shouldn't be converted yet again the second time through the loop.
This is precisely why I stepped away from the for loop, so you can have some passes through the loop be a +1 iteration but others be a +2 iteration as appropriate.
With that in mind, let's add the necessary line to the conditional statement:

if [ $nextval -gt $currentval ] ; then
  sum=$(( $sum + $nextval - $currentval ))
  index=$(( $index + 1 ))
else
  sum=$(( $sum + currentval ))
fi

Remember that the very bottom of the while loop still has the index value increment +1. The above addition to the conditional statement is basically that when the situation of next > current is encountered, the script will process both values and jump ahead an extra character.
This means that for any given Roman numeral, the number of times through the loop will be equal to or less than the total number of characters in the sequence.
Which means the problem is now solved except for the very last value in the sequence. What happens if it isn't part of a next-current pair? At its most simple, how do you parse "X"?
That turns out to require a bunch of code to sidestep both the conversion of nextval from the string (which will fail as it's reaching beyond the length of the string) and any test reference to nextval.
That suggests a simple solution: wrap the entire if-then-else code block in a conditional that tests for the last character:

if [ $index -lt $length ] ; then
  if-then-else code block
else
  sum=$(( $sum + $currentval ))
fi

That's it. By George, I think you've got it! Here's the full while statement, so you can see how this fits into the overall program logic:

while [ $index -le $length ] ; do

  mapit ${romanvalue:index-1:1}
  currentval=$value

  if [ $index -lt $length ] ; then
    mapit ${romanvalue:index:1}
    nextval=$value

    if [ $nextval -gt $currentval ] ; then
      sum=$(( $sum + $nextval - $currentval ))
      index=$(( $index + 1 ))
    else
      sum=$(( $sum + $currentval ))
    fi
  else
    sum=$(( $sum + $currentval ))
  fi

  index=$(( $index + 1 ))

done

It turns out not to be particularly complex after all. The key is to recognize that you need to parse the Roman number in a rather clumped manner, not letter by letter.
Let's give this script a few quick tests:

$ sh roman.sh DXXV
Roman number DXXV converts to Arabic value 525
$ sh roman.sh CMXCIX
Roman number CMXCIX converts to Arabic value 999
$ sh roman.sh MCMLXXII
Roman number MCMLXXII converts to Arabic value 1972

Mission accomplished.

Subscribe to this Blog via Email :