Mutable Algorithms in Immutable Languages, Part 2
13 Jul 2014See also part 1. All of the code in this post is available in the blog repository on Github.
Last time we saw a way to implement Union/Find, an algorithm which
depends critically on observable mutable memory, within a particular
abstract monad called Mem. Monads implementing Mem model mutable
memory (say that 10 times fast) and we saw that’s sufficient
to recover Union/Find.
But we didn’t actually see any implementations of Mem. So maybe
we’re hosed. Are there any interesting implementations of Mem we can
run Union/Find in?
In case of emergency break ivory
Okay, I’m being dramatic. There is, of course, at least one such
model. If we’re willing to infect our program with IO then we can
always use IORefs to model mutable memory. It’s almost too simple,
but we’ll do it anyway for completeness and to explain some nits
involved in working with the Mem class.
Essentially, we’d like to have, more or less, the following correspondences
refisnewIORefderefisreadIORefsetiswriteIORef
Hopefully, this also clears up why I called pointers “refs” back in part 1.
Diving in
The code for this section is available here.
So, we could do exactly what was just suggested and implement IO
into Mem (almost) directly. We’ll need a newtype wrapper around
Ref IO but it’s pretty trivial.
instance Mem IO where
newtype Ref IO = IORef' { unwrapIORef' :: IORef Int }
type Val IO = Int
ref = fmap IORef . newIORef
deref = readIORef . unwrapIORef'
set r v = writeIORef (unwrapIORef' r) v
Hopefully the major problem with this approach is obvious—Mem
restricts us to only store one type of thing in memory and IORefs
are more general than that. I had to pretty arbitrarily pick Ints to
store for this interface.
So this clarifies a point about using Mem. It’s always going to be
an internal, hidden class where instances, probably created using
newtype wrappers, are specific to both a particular use and a
particular algorithm.
(Well, actually, we ought to be able to get around this by using GADTs and existential types, but I’ll leave that for another day.)
We’ll thus make an entirely new wrapper called, say, UfIO for
Union/Find in IO and specialize the value to be wrapped in the
Node_ layer as needed according to the internal AbstractUnionFind
interface.
{-# LANGUAGE GeneralizedNewtypeDeriving #-}
{-# LANGUAGE TypeFamilies #-}
newtype UfIO v a =
UfIO { runUfIO :: IO a }
deriving ( Functor, Applicative, Monad )
instance Mem (UfIO v) where
newtype Ref (UfIO v) =
UfIORef { getUfIORef :: IORef (Node_ (UfIO v) v) } deriving ( Eq )
type Val (UfIO v) = Node_ (UfIO v) v
ref a = UfIO (UfIORef <$> newIORef a)
deref r = UfIO (readIORef $ getUfIORef r)
set r v = UfIO (writeIORef (getUfIORef r) v)
Just a little bit more code and some noise as we wrap and unwrap all
of the newtypes (but note that all of that noise is free at
runtime).
Using UfIO
Now that we have something which implements Mem with Val r ~ Node_
r v we can use it to run our Union/Find algorithm. All we need to do
is construct a computation using the Union/Find interface and then run
it with runUfIO.
exIO = runUfIO computation where
computation :: UF r Int => r Bool
computation = do
n1 <- node 1
n2 <- node 2
link n1 n2
connected n1 n2
>>> :t exIO
exIO :: IO Bool
>>> exIO
True
Phantom parameters
It’s worth spending a few seconds focusing one particular trick used
above. In order to leave the type stored in the Union/Find graph
unspecified we’ve used a phantom type parameter on the type of
UfIO, the v.
The parameter v is called “phantom” because it is not used in the
implementation of UfIO at all. Instead, the information carried by
v is only used in the instantiation of Mem.
This phantom parameter is almost entirely what allows us to avoid the
problem we encountered above with the Mem instance of IO. The only
way for the associated type Val (UfIO v) to change is if we use
information about some extra parameter like v.
Phantom parameters will return in this series. Generally, their ability to constrain values and subsequently expose more information about them is a critical trick.
Recovering purity
The code for this section is available here.
Whew! With all that IO out of the way we can get back to solving
real fake problems like how to emulate mutable memory.
We’ve seen that what we need to do is establish the ability to store
objects of type Val r and produce references, Ref r, to them which
can be read from or written to. Without beating around the bush too
much, this behavior is very similar to that of storing and accessing
values from a finite Map. Better, if we just let our Refs be
integer identifiers we can use the efficient Data.IntMap.IntMap
from the containers package
(There’s a question as to whether or not we should use the lazy form of IntMap, but I’m going to ignore it.)
If we wrap one of those maps up into a State monad then monadic
operations which edit the finite map by using integer references is
basically what Mem asks for. Better yet, we can just “snapshot” our
memory at any point (using get) to escape back into purity.
Sounds perfect. Let’s build it!
Implementing purely mutable memory
As described, we’re going to implement a Monad which is not a whole
lot more than a State monad carrying around our IntMap
representing memory.
newtype UfIntMap v a =
UfIntMap { unUfIntMap :: State (Uf v) a }
deriving ( Functor, Applicative, Monad )
and as long as we can generate some initial state (uf0) then we can
“run” a UfIntMap computation purely, shedding its monadic layer.
runUfIntMap :: UfIntMap v a -> a
runUfIntMap = flip evalState uf0 . unUfIntMap where
That state type Uf v isn’t exactly what I may have led you to
believe it would be. Instead of only holding an IntMap we must also
keep a “source” of IDs to ensure we don’t collide.
data Uf v =
Uf { count :: Int
, mem :: IntMap (Node_ (UfIntMap v) v)
}
uf0 :: Uf v
uf0 = Uf { count = 0, mem = IM.empty }
But that’s really just a small technical detail. All that’s left now
is to implement the Mem instance. We use the same phantom variable
trick from UfIO v, but now references are just wrappers over Ints
in our map.
instance Mem (UfIntMap v) where
newtype Ref (UfIntMap v) = UfIntMapRef { getId :: Int } deriving ( Eq )
type Val (UfIntMap v) = Node_ (UfIntMap v) v
The methods of pure Mem
We’re close now. We can implement set very easily atop monadic
operations in the State monad and IntMap operations using our
Int references.
set r v = UfIntMap $ do
modify (\s -> s { mem = IM.insert (getId r) v (mem s) })
Implementing ref is pretty easy as well, although we must be careful
to update the count stored in Uf to ensure that we never cause
collisions.
ref v = UfIntMap $ do
c <- gets count
modify (\s -> s { count = c + 1, mem = IM.insert c v (mem s) })
return (UfIntMapRef c)
Finally, we implement deref which is pretty simple as well.
deref r = UfIntMap $ do
Just v <- gets (IM.lookup (getId r) . mem) -- WHOA!
return v
Right?
Well, it’s a little scary that we have an incomplete pattern match
(marked by WHOA above). But we never delete anything from the
IntMap so our references should always be valid. Our sanity test
passes after all.
exPure = runUfIntMap computation where
computation :: UF r Int => r Bool
computation = do
n1 <- node 1
n2 <- node 2
link n1 n2
connected n1 n2
>>> :t exPure
exPure :: Bool
>>> exPure
True
So this is fine, right?
Well, no, otherwise I wouldn’t be harping on it.
A subtle bug
Here’s how we can use our Union/Find API to expose an error. If you’re not familiar with Haskell errors are bad because you usually expect all failure conditions to be reified in types and values.
bug :: Bool
bug =
let
n1 = runUfIntMap $ do
node ()
node ()
node ()
conn = runUfIntMap $ do
n2 <- node ()
connected n1 n2
in
conn
>>> bug
*** Exception: Pattern match failure in do expression at /.../UnionFind/IntMap.hs:43:5-10
We can do even worse, too. Here’s the same error bug triggered such that it just silently, unpredictably provides the wrong answer without even throwing an error.
bug2 :: Bool
bug2 =
let
n1 = runUfIntMap (node ())
conn = runUfIntMap $ do
n2 <- node ()
connected n1 n2 -- this can't possibly be True can it?
in
conn
>>> bug2
True
Ugh.
Border control problems
What we’ve got is a border control problem. Because running our
UfIntMap computation can return values like Ref (UfIntMap v) we’re
able to produce IDs on Vals outside of the context where they make
any sense.
In the first bug described we observe this problem when trying to
reference an id in a context where nothing has yet been storeed at
that location. In the second bug2 we create a reference which,
despite being no longer valid, collides with a value in a different
context.
What this ought to tell you is that (a) there’s a concept of a region
of validity that’s defined by “run” calls like runUfIntMap and (b)
it’s dangerous to export references from that region since they can
then contaminate other regions.
This isn’t really a problem in languages where mutability is allowed to happen everywhere. In some sense they only have a single region—the entire runtime. The only way to “execute” this region is to execute the problem itself.
Can we do better?
So we’ve come quite far now. We have two separate implementations of
Mem and both appear to work correctly for Union/Find. One of them,
based on IO, is impure and so using the results of Union/Find
generated with that algorithm will pollute code with IO. The other,
based on a State monad containing an IntMap modeling mutable
memory, can be executed purely… but when we do we run into the
opportunity for subtle bugs to arise.
In some languages you might just have to shrug your shoulders and write a note in the documentation.
Never run a Union/Find computation which returns a
Nodeas those nodes will no longer be meaningful and can be used to violate preconditions in other Union/Find computations.
But we’re fighting this whole problem in order to preserve the strength of the type sytem. Can we make the type system pay its dues?
Turns out that, yes, we can. And will.