For problems designated FSM/DFA only (problem numbers in the 9000
range), programs written in normal programming langauges will
be rejected. All submissions must be specifications of finite
state machines that perform the required task. FSM specifications
are interpreted, not compiled in the normal way. The source code
for the FSM interpreter used may be down-loaded
here. This is not standard software; the only way to test FSM
specifications before submitting them is by running this interpreter.
The remainder of this file explains the use of the interpreter
and the format of an FSM specification.
fsm.
The FSM specification should be stored in a perfectly normal
human-readable text file. If your FSM specification is in
a file called p9001.fsm, then the simple way to run
it is "fsm p9001". (if the filename ends with .fsm
the .fsm may be elided). The interpreter first checks that the
FSM specification is valid and deterministic, then executes
it, using the stream of characters from standard input as the
input to the machine. If the option -list is specified
(e.g. "fsm -list p9001") the FSM rules are
printed as they are read (but in a different format) for
checking that it is understood correctly.
If the option -trace is specified
(e.g. "fsm -trace p9001") each transition
of the FSM is printed as it happens.
Input to a Running FSM
Once the FSM is accepted as syntactically correct and deterministic,
it immediately starts running, taking its input from standard input.
Each individual character is used as an input to drive a single
transition of the FSM, with no exceptions. If the FSM is in a state
which can not accept the input character, the result is a run-time
error UNLESS that input character is EOF, the End-Of-input-File
marker. An FSM may ignore the EOF symbol without penalty. After the
EOF symbol has been read or ignored, the FSM is stopped (this is
unlike the behaviour of EOF in standard C input functions).
FSMs for problems that provide a single line of input may safely
assume that the single line will be properly terminated with a
newline character which is itself immediately followed by EOF, so
such programs may safely stop on receiving the input \n
FSMs for problems that provide mutliple lines of input may safely
assume that the input follows the format specified by the problem
statement. Every line will end with a \n character, and
the last input line's \n will be immediately followed
by the EOF mark. No stray characters are allowed into the input
data.
FSM Specifications
Example Problem: The input will be a single line (terminated by \n)
containing nothing but ones and zeros. If the number of
ones in the line is even but not zero, output 'YES';
otherwise output 'NO'. The output should be one properly \n-terminated
line of text.
An FSM solving this problem would have three main states, one
for when no ones have been read, one for when an odd number of
ones have been read, and one for when a non-zero even number of
ones have been read. Let's call those states SZ, SO, and SE respectively.
The transitions for this FSM would be very simple. In state SZ, an
incoming 0 makes no difference, and an incoming one makes the number of
ones odd, so it should transition to state SO. Similarly, in SO,
and zero makes no difference, but a one makes the number even,
so SE is the new state. In state SE, an incoming one makes the
number odd again, so SO is returned to. In any of these three states,
it is possible to receive not just ones and zeros, but also the
end-of-line character \n. If \n is received in state
SE, the machine should stop after outputting 'YES'; from either of the
other two states, it should stop after outputting 'NO'.
Programs that only output either YES or NO present an especially
simple case. Any state(s) of an FSM may be designated as "Accepting
States". If the machine stops while it is in an accepting state,
it will print the string 'YES' followed by a newline character
before stopping. If the FSM has any accepting states, but is not
in one of them when it stops, it prints 'NO' before stopping.
(Only if an FSM has no designated accepting states will it stop
silently in the way programs normally do.
So, to make the FSM correctly output YES or NO, another state is
required: when a newline character is received in the SE state,
the machine should transition into an accepting state. When a
newline character is received in the SZ or SO states, something
must be done (not specifying what should be done when a particular
character arrives causes a run-time error if that character
does arrive); while it is not strictly necessary to create a fifth
state to represent non-accepting termination, it does at least
make the design simpler.
The design of the FSM then has five states: SZ, SO, SE, GOOD, and BAD.
From SZ the transitions for each possible input are: 0->SZ, 1->SO,
\n->BAD. From SO the transitions are 0->SO, 1->SE, \n->BAD. From
SE the transitions are 0->SE, 1->SO, \n->GOOD. From BAD and GOOD
there are no transitions, as no further input will arrive, but
it is necessary to state that GOOD is an accepting state, and that
SZ is the starting state.
The FSM specification for this machine, exactly as submitted to
the automatic judge, is:
// id=1111 code=CODE lang=FSM prob=9000
START=SZ
SZ: 0 -> SZ
1 -> SO
\n -> BAD
SO: 0 -> SO
1 -> SE
\n -> BAD
SE: 0 -> SE
1 -> SO
\n -> GOOD
GOOD(OK):
The general rules are:
- Blank lines are permitted and are ignored.
- The two character sequence // introduces a comment;
the rest of the line is ignored.
- States may be given any names that consist of a squence of
letters, digits, dollar-signs, underlines, and dots. Case is
not relevant in distinguishing states.
- An FSM specification may contain one line saying START=state.
If such a line is present, the FSM begins execution in the indicated
state. Otherwise the start state is the first state mentioned in
the specification.
- The transitions for each state are started with the name
of that state followed by a colon. This sequence may be alone
on a line, or it may appear at the beginning of one of the
transitions.
- The transitions for each state do not need to be grouped together,
there may be more than one name: entry for
each state. All transitions following name: are
taken as transitions for the named state and added to any others
already provided.
- Between the name of a state and its colon, the word OK
in parentheses may appear (e.g. LAST(OK):). This deisgnates
the named state as an accepting state. There may be any number
(including zero) of accepting states. A state only needs to
be marked (OK) once, not every time it appears.
- There may be any number (including zero) of transitions
following name: or name(OK):.
- A transition consists of the input character that triggers it,
followed by an arrow ->, followed by the name of the
resulting state.
- Any number of input characters may used to trigger a transition.
If there is more than one, each should be separated from the others
by spaces or tabs. For example, the one transition
a b x -> S2
is equivalent to this sequence of three transitions
a -> S2
b -> S2
x -> S2
- A contiguous range of characters (for example c d e f g h)
may be represented by writing the first and last separated by a dash
(for example c-h). This notation may only be used when both
given characters are visible (ascii 33 to 126) and in the right order
(e.g. not h-c).
- Ranges and individual characters may be combined in a single
transition, for example
a-z A-Z 0-9 -> S2
defines a transition that is triggered by any letter or digit.
- When used as an input character, the star * means
"any character not explicitly listed in any other transition
from this state".
- Transitions must be deterministically selectable, so no two
transitions from the same state may include the same input character,
and no two transitions from the same state may include a star.
- When used as an input character
- \s represents a space,
- \t represents a tab,
- \n represents a newline,
- \* represents a star (i.e. Not "any other character"),
- \\ represents the character \,
- \abc (where abc are octal digits) represents the
character with ascii code abc8,
- EOF represents the End-Of-Input marker.
- After the -> and the new state name, an optional single
output character may be specified. Whenever the transition occurs, that
character is printed as output. The same special \x character
sequences are used for output as for input. EOF may not be used for output.
- The character * when used to specify output, represents
whatever input character caused the transition. For example
a-z A-Z -> S2 *
means that any letter causes a transition to state S2, at the same time
printing that same letter.
- In a transition, the word none may appear instead of any
input characters. This means that as soon as the FSM enters this state,
this transition will occur, without consuming any input. The only practical
use of none is to enable more than one character of output
to be produced for a single character of input.
For example,
S0: none -> S1 C
S1: none -> S2 a
S2: none -> S3 t
S3: none -> S4 \n
is a program that prints the line "Cat" without requiring any input
(although the interpreter still does not terminate until the
end-of-file character is received (type ctrl-D under unix)).
- Transitions must be deterministically selectable, so if any
state has a none transition, it may not have any other
transitions at all.
- When an FSM is running, all possible none transitions
are performed before any input character is read. This rule is only
relevant when the end of input is reached (EOF is considered to
be an input character, so the FSM does not stop at EOF if there
is a none transition that could be used).
Another Example
This is an FSM that reads a line of input, printing 'Y' if it
contains the letters C, A, T in that order, but not necessarily
consecutively, and 'N' otherwise:
wantC: C -> wantA
\n -> end N
* -> wantC
wantA: A -> wantT
\n -> end N
* -> wantA
wantT: T -> happy
\n -> end N
* -> wantT
happy: \n -> end Y
* -> happy
end: none -> dead \n
Note that the state dead is left undefined because if
the input is correct (just a single line with the \n
followed by EOF), the FSM will stop as soon as it reaches
the dead state, so its behaviour is irrelevant.
This example may easily be extended into a program that processes
a sequence of lines, for each printing Y if the letters C, A, T
appear in order, and N if they don't, stopping only at the end of the
input file.
wantC: C -> wantA
\n -> end N
* -> wantC
wantA: A -> wantT
\n -> end N
* -> wantA
wantT: T -> happy
\n -> end N
* -> wantT
happy: \n -> end Y
* -> happy
end: none -> again \n
again: EOF -> dead
C -> wantA
\n -> end N
* -> wantC