Abstractions of FSTs with Monads
Finite state machines (FSTs) are usually formulated as a tuple comprising of the following components:
- : The set of states;
- : The input alphabet;
- : The output alphabet;
- : The set of initial states;
- : The set of final states;
- : The transition relation.
We encode this in Scala:
trait Transducer[-A, +B] {
type S
def initialStates: Set[S]
def finalStates: Set[S]
def next(s: S, a: A): Set[(S, B)]
}
Here we encode the state type as an existential type: they are internal and of no interest to the users of FSTs. Note that the output of the next
function is a Set[(S, B)]
: it encodes multiple possible transition outcomes given a state and input .
One of the most interesting computational aspects of FSTs is that they can be composed (unlike RNNs). Mathematically, given FST and , we can compose these two into one FST , where strings is accepted only if there exists such that .
From a functional programming perspective, FSTs form a category here!
implicit object TransducerCategory extends Category[Transducer] {
def id[A]: Transducer[Id, Id] =
new Transducer[Id, Id] {
type S = Unit
def initialStates = Set(())
def finalStates = Set(())
def next(s: S, a: A): Set((s, a))
}
def compose[A, B, C](
t2: Transducer[B, C],
t1: Transducer[A, B]
): Transducer[A, C] =
new Transducer[A, C] {
type S = (t1.S, t2.S)
def initialStates = for {
s1 <- t1.initialStates
s2 <- t2.initialStates
} yield (s1, s2)
def finalStates = for {
s1 <- t1.finalStates
s2 <- t2.finalStates
} yield (s1, s2)
def next(s: S, a: A): (S, C) = {
val (s1, s2) = s
for {
(s1_, b) <- t1.next(s1, a)
(s2_, c) <- t2.next(s2, b)
} yield ((s1_, s2_), c)
}
}
}
However, the output encoding of the next
function, Set[(S, B)]
is not ideal: what if each output is assigned a weight (weighted FSTs)? What if the output given a specific state and input form a distribution (stochastic FSTs)?
Note that the next
function in compose
is implemented elegantly through a monadic comprehension. Let’s abstract this monad out:
trait Transducer[F[_], -A, +B] {
type S
def initialStates: F[S]
def finalStates: F[S]
def next(s: S, a: A): F[(S, B)]
}
We call this F
-transducer: the F
higher-kinded type encapsulates the effect of moving the transducer state to another given an input! F
could be
Id
: Deterministic FSTs;Option
: Determinstic FSTs with some edges lead to failure case;Set
: Non-deterministic FSTs;Stochastic
: Stochastic FSTs;Weighted
: Weighted FSTs.
Let’s take the weighted case as an example, since weighted FSTs (WFSTs) are of utmost importance in a variety of domains, e.g. automatic speech recognition.
\[ W = \{ (a, r) \}; a \in A, r \in R \]
type Weighted[A, R] = Map[A, R]
Weighted
form a monad given a semiring on the weight type R
(this is how WFSTs work in speech recognition): \[ \begin{align} \textrm{pure}(a) &= \left\{ (a, \mathbb{1}) \right\} \\\
\textrm{map}(W, f) &= \left\{ (f(a), r) \mid (a, r) \in W \right\} \\\
\textrm{flatMap}(W, f) &= \left\{ \left(b, \bigoplus_{(a,r)\in W} \bigoplus_{(b, r^\prime) \in f(a)} r \odot r^\prime \right) \middle| b \in B \right\} \end{align} \]
implicit def WeightedMonad(implicit R: Semiring[R])
extends Monad[Weighted[_, R]] {
// let's pretend Scala has type lambda syntax here
def pure[A](a: A) = Map((a, R.one))
def map[A, B](wa: Map[A, R])(f: A => B) =
wa.map { case (a, r) => (f(a), r) }.toMap
def flatMap[A, B](wa: Map[A, R])(f: A => Map[B, R]) =
(
for {
(a, r0) <- wa
(b, r1) <- f(a)
} yield (b, R.times(r0, r1))
).groupBy {
case (b, r) => b
}.mapValues(_.map(_._2).sum)
}
Hence we have the Weighted
-Transducer, therefore we have a category over WFSTs. Essentially, we have the following implicit proving relations:
Semiring[R] => Monad[Weighted[_, R]]
Monad[F] => Category[Transducer[F, _, _]]
I think that this is really illuminating, as it shows the structure of FSTs through the lens of abstracted types.