26 Jun 2017
This is a debugging story told completely out of order. In order to understand the ultimate bug, why it seemed to occur arbitrarily, and the ultimate resolution, there's lots of backstory to cover. If you're already deeply familiar with the inner workings of the monad-control package, you can probably look at a demonstration of the bad instance and move on. Otherwise, prepare for a fun ride!
As usual, if you want to play along, we're going to be using Stack's
script interpreter feature. Just
save the snippets contents to a file and run with stack
filename.hs
. (It works with any snippet that begins with
#!/usr/bin/env stack
.)
Oh, and also: the confusion that this blog post demonstrates is one of the reasons why I strongly recommend sticking to a ReaderT env IO
monad transformer stack.
Trying in StateT
Let's start with some broken code (my favorite kind). It uses the
StateT
transformer and a function which may throw a runtime
exception.
#!/usr/bin/env stack -- stack --resolver lts-8.12 script import Control.Monad.State.Strict import Control.Exception import Data.Typeable data OddException = OddException !Int -- great name :) deriving (Show, Typeable) instance Exception OddException mayThrow :: StateT Int IO Int mayThrow = do x <- get if odd x then lift $ throwIO $ OddException x else do put $! x + 1 return $ x `div` 2 main :: IO () main = runStateT (replicateM 2 mayThrow) 0 >>= print
Our problem is that we'd like to be able to recover from a thrown
exception. Easy enough we think, we'll just use
Control.Exception.try
to attempt to run the mayThrow
action. Unfortunately, if I wrap up mayThrow
with a try
, I get
this highly informative error message:
Main.hs:21:19: error:
• Couldn't match type ‘IO’ with ‘StateT Integer IO’
Expected type: StateT Integer IO ()
Actual type: IO ()
• In the first argument of ‘runStateT’, namely
‘(replicateM 2 (try mayThrow))’
In the first argument of ‘(>>=)’, namely
‘runStateT (replicateM 2 (try mayThrow)) 0’
In the expression:
runStateT (replicateM 2 (try mayThrow)) 0 >>= print
Oh, that makes sense: try
is specialized to IO
, and our function
is StateT Int IO
. Our first instinct is probably to keep throwing
lift
calls into our program until it compiles, since lift
seems to
always fix monad transformer compilation errors. However, try as you
might, you'll never succeed. To understand why, let's look at the
(slightly specialized) type signature for try
:
try :: IO a -> IO (Either OddException a)
If I apply lift
to this, I could end up with:
try :: IO a -> StateT Int IO (Either OddException a)
But there's no way to use lift
to modify the type of the IO a
input. This is generally the case with the lift
and liftIO
functions: they can deal with monad values that are the output of a
function, but not the input to the function. (More precisely: the
functions are covariant and work on values in positive positions. We'd
need something contravariant to work on vlaues in negative
positions. You can
read more on this nomenclature
in another blog post.)
Huh, I guess we're stuck. But then I remember that StateT
is just
defined as newtype StateT s m a = StateT { runStateT :: s -> m
(a,s)}
. So maybe I can write a version of try
that works for a
StateT
using the internals of the type.
tryStateT :: StateT Int IO a -> StateT Int IO (Either OddException a) tryStateT (StateT f) = StateT $ \s0 -> do eres <- try (f s0) return $ case eres of Left e -> (Left e, s0) Right (a, s1) -> (Right a, s1)
Go ahead and plug that into our previous example, and you should get the desired output:
([Right 0,Left (OddException 1)],1)
Let's break down in nauseating detail what that tryStateT
function
did:
- Unwrap the
StateT
data constructor from the provided action to get a functionf :: Int -> IO (a, Int)
- Construct a new
StateT
value on the right hand side by using theStateT
data constructor, and capturing the initial state in the values0 :: Int
. - Pass
s0
tof
to get an actionIO :: (a, Int)
, which will give the result and the new, updated state. - Wrap
f s0
withtry
to allow us to detect and recover from a runtime exception. eres
has typeEither OddException (a, Int)
, and we pattern match on it.- If we receive a
Right
/success value, we simply wrap up thea
value in aRight
constructor together with the updated state. - If we receive a
Left
/exception value, we wrap it up the exception with aLeft
. However, we need to return some new state. Since we have no such state available to us from the action, we return the only thing we can: the initials0
state value.
Lesson learned We can use try
in a StateT
with some
difficulty, but we need to be aware of what happens to our monadic
state.
Catching in StateT
It turns out that it's trivial to implement the try
function in
terms of catch
, and the catch
function in terms of try
, at least
when sticking to the IO
-specialized versions:
try' :: Exception e => IO a -> IO (Either e a) try' action = (Right <$> action) `catch` (return . Left) catch' :: Exception e => IO a -> (e -> IO a) -> IO a catch' action onExc = do eres <- try action case eres of Left e -> onExc e Right a -> return a
It turns out that by just changing the type signatures and replacing
try
with tryStateT
, we can do the same thing for StateT
:
catchStateT :: Exception e => StateT Int IO a -> (e -> StateT Int IO a) -> StateT Int IO a catchStateT action onExc = do eres <- tryStateT action case eres of Left e -> onExc e Right a -> return a
NOTE Pay close attention to that type signature, and think about how monadic state is being shuttled through this function.
Well, if we can implement catchStateT
in terms of tryStateT
,
surely we can implement it directly as well. Let's do the most
straightforward thing I can think of (or at least the thing that
continues my narrative here):
catchStateT :: Exception e => StateT Int IO a -> (e -> IO a) -> StateT Int IO a catchStateT (StateT action) onExc = StateT $ \s0 -> action s0 `catch` \e -> do a <- onExc e return (a, s0)
Here, we're basing our implementation on top of the catch
function
instead of the try
function. We do the same unwrap-the-StateT,
capture-the-s0 trick we did before. Now, in the lambda we've created
for the catch
call, we pass the e
exception value to the
user-supplied onExc
function, and then like tryStateT
wrap up the
result in a tuple with the initial s0
.
Who noticed the difference in type signature? Instead of e -> StateT
Int IO a
, our onExc
handler has type e -> IO a
. I told you to pay
attention to how the monadic states were being shuttled around; let's
analyze it:
- In the first function, we use
tryStateT
, which as we mentioned will reconstitute the originals0
state when it returns. If the action succeeded, nothing else happens. But in the exception case, that originals0
is now passed into theonExc
function, and the final monadic state returned will be the result of theonExc
function. - In the second function, we never give the
onExc
function a chance to play with monadic state, since it just lives inIO
. So we always return the original state at the end if an exception occurred.
Which behavior is best? I think most people would argue that the first
function is better: it's more general in allowing onExc
to access
and modify the monadic state, and there's not really any chance for
confusion. Fair enough, I'll buy that argument (that I just made on
behalf of all of my readers).
Bonus exercise Modify this implementation of catchStateT
to have
the same type signature as the original one.
Finally
This is fun, let's keep reimplementing functions from
Control.Exception
! This time, let's do finally
, which will ensure
that some action (usually a cleanup action) is run after an initial
action, regardless of whether an exception was thrown.
finallyStateT :: StateT Int IO a -> IO b -> StateT Int IO a finallyStateT (StateT action) cleanup = StateT $ \s0 -> action s0 `finally` cleanup
That was really easy. Ehh, but one problem: look at that type
signature! We just agreed (or I agreed for you) that in the case of
catch
, it was better to have the second argument also live in
StateT Int IO
. Here, our argument lives in IO
. Let's fix that:
finallyStateT :: StateT Int IO a -> StateT Int IO b -> StateT Int IO a finallyStateT (StateT action) (StateT cleanup) = StateT $ \s0 -> action s0 `finally` cleanup s0
Huh, also pretty simple. Let's analyze the monadic state behavior
here: our cleanup action is given the initial state, regardless of the
result of action s0
. That means that, even if the action succeeded,
we'll ignore the updated state. Furthermore, because finally
ignores
the result of the second argument, we will ignore any updated monadic
state. Want to see what I mean? Try this out:
#!/usr/bin/env stack -- stack --resolver lts-8.12 script import Control.Exception import Control.Monad.State.Strict finallyStateT :: StateT Int IO a -> StateT Int IO b -> StateT Int IO a finallyStateT (StateT action) (StateT cleanup) = StateT $ \s0 -> action s0 `finally` cleanup s0 action :: StateT Int IO () action = modify (+ 1) cleanup :: StateT Int IO () cleanup = do get >>= lift . print modify (+ 2) main :: IO () main = execStateT (action `finallyStateT` cleanup) 0 >>= print
You may expect the output of this to be the numbers 1 and 3, but in
fact the output is 0 and 1: cleanup
looks at the initial state value
of 0, and its + 2
modification is thrown away. So can we implement a
version of our function that keeps the state? Sure (slightly
simplified to avoid async exception/mask noise):
finallyStateT :: StateT Int IO a -> StateT Int IO b -> StateT Int IO a finallyStateT (StateT action) (StateT cleanup) = StateT $ \s0 -> do (a, s1) <- action s0 `onException` cleanup s0 (_b, s2) <- cleanup s1 return (a, s2)
This has the expected output of 1 and 3. Looking at how it works: we
follow our same tricks, and pass in s0
to action
. If an exception
is thrown there, we once again pass in s0
to cleanup
and ignore
its updated state (since we have no choice). However, in the success
case, we now pass in the updated state (s1
) to cleanup
. And
finally, our resulting state is the result of cleanup
(s2
) instead
of the s1
produced by action
.
We have three different implementations of finallyStateT
and two
different type signatures. Let's compare them:
- The first one (the
IO
version) has the advantage that its type tells us exactly what's happening: the cleanup has no access to the state at all. However, you can argue like we did withcatchStateT
that this is limiting and not what people would expect the type signature to be. - The second one (use the initial state for
cleanup
and then throw away its modified state) has the advantage that it's logically consistent: whethercleanup
is called from a success or exception code path, it does the exact same thing. On the other hand, you can argue that it is surprising behavior that state updates that can be preserved are being thrown away. - The third one (keep the state) has the reversed arguments of the second one.
So unlike catchStateT
, I would argue that there's not nearly as
clear a winner with finallyStateT
. Each approach has its relative
merits.
One final point that seems almost not worth mentioning (hint: epic
foreshadowment incoming). The first version (IO
specialized) has an
additional benefit of being ever-so-slightly more efficient than the
other two, since it doesn't need to deal with the additional monadic
state in cleanup
. With a simple monad transformer like StateT
this
performance difference is hardly even worth thinking about. However,
if we were in a tight inner loop, and our monad stack was
significantly more complicated, you could imagine a case where the
performance difference was significant.
Implementing for other transformers
It's great that we understand StateT
so well, but can we do anything
for other transformers? It turns out that, yes, we can for many
transformers. (An exception is continuation-based transformers, which
you can read a bit about in passing in
my ResourceT blog post from last week.)
Let's look at a few other examples of finally
:
import Control.Exception import Control.Monad.Writer import Control.Monad.Reader import Control.Monad.Except import Data.Monoid finallyWriterT :: Monoid w => WriterT w IO a -> WriterT w IO b -> WriterT w IO a finallyWriterT (WriterT action) (WriterT cleanup) = WriterT $ do (a, w1) <- action `onException` cleanup (_b, w2) <- cleanup return (a, w1 <> w2) finallyReaderT :: ReaderT r IO a -> ReaderT r IO b -> ReaderT r IO a finallyReaderT (ReaderT action) (ReaderT cleanup) = ReaderT $ \r -> do a <- action r `onException` cleanup r _b <- cleanup r return a finallyExceptT :: ExceptT e IO a -> ExceptT e IO b -> ExceptT e IO a finallyExceptT (ExceptT action) (ExceptT cleanup) = ExceptT $ do ea <- action `onException` cleanup eb <- cleanup return $ case (ea, eb) of (Left e, _) -> Left e (Right _a, Left e) -> Left e (Right a, Right _b) -> Right a
The WriterT
case is very similar to the StateT
case, except (1)
there's no initial state s0
to contend with, and (2) instead of
receiving an updated s2
state from cleanup
, we need to monoidally
combine the w1
and w2
values. The ReaderT
case is also very
similar to StateT
, but in the opposite way: we receive an immutable
environment r
which is passed into all functions, but there is no
updated state. To put this in other words: WriterT
has no context
but has mutable monadic state, whereas ReaderT
has a context but
no mutable monadic state. StateT
, by contrast, has both. (This is
important to understand, so reread it a few times to get comfortable
with the concept.)
The ExceptT
case is interesting: it has no context (like WriterT
),
but it does have mutable monadic state, just not like StateT
and
WriterT
. Instead of returning an extra value with each result (as a
product), ExceptT
returns either a result value or an e
value (as
a sum). The case
expression at the end of finallyExceptT
is very
informative: we need to figure out how to combine the various monadic
states together. Our implementation here says that if action
returns
e
, we take that result. Otherwise, if cleanup
fails, we take
that value. And if they both return Right
values, then we use
action
's result. But there are at least two other valid choices:
- Prefer
cleanup
'se
value toaction
'se
value, if both are available. - Completely ignore the
e
value returned bycleanup
, and just useaction
's result.
There's also a fourth, invalid option: if action
returns a Left
,
return that immediately and don't call cleanup
. This has been a
perenniel source of bugs in many libraries dealing with exceptions in
monad transformers like ErrorT
, ExceptT
, and EitherT
. This
invalidates the contract of finally
, namely that cleanup
will
always be run. I've seen some arguments for why this can make sense,
but I consider it nothing more than a buggy implementation.
And finally, like with StateT
, we could avoid all of these questions
for ExceptT
if we just modify our type signature to use IO b
for
cleanup
:
finallyExceptT :: ExceptT e IO a -> IO b -> ExceptT e IO a finallyExceptT (ExceptT action) cleanup = ExceptT $ do ea <- action `onException` cleanup _b <- cleanup return ea
So our takeaway: we can implement finally
for various monad
transformers. In some cases this leads to questions of semantics, just
like with StateT
. And all of these transformers fall into a pattern
of optionally capturing some initial context, and optionally shuttling
around some monadic state.
(And no, I haven't forgotten that the title of this blog post talks
about bracket
. We're getting there, ever so slowly. I hope I've
piqued your curiosity.)
Generalizing the pattern
It's wonderful that we can implement all of these functions that take
monad transformers as arguments. But do any of us actually want to go
off and implement catch
, try
, finally
, forkIO
, timeout
, and
a dozen other functions for every possible monad transformer stack
imagineable? I doubt it. So just as we have MonadTrans
and MonadIO
for dealing with transformers in output/positive position, we can
construct some kind of typeclass that handles the two concepts we
mentioned above: capture the context, and deal with the monadic state.
Let's start by playing with this for just StateT
.
#!/usr/bin/env stack -- stack --resolver lts-8.12 script {-# LANGUAGE RankNTypes #-} {-# LANGUAGE ScopedTypeVariables #-} import Control.Exception import Control.Monad.State.Strict type Run s = forall b. StateT s IO b -> IO (b, s) capture :: forall s a. (Run s -> IO a) -> StateT s IO a capture withRun = StateT $ \s0 -> do let run :: Run s run (StateT f) = f s0 a <- withRun run return (a, s0) restoreState :: (a, s) -> StateT s IO a restoreState stateAndResult = StateT $ \_s0 -> return stateAndResult finally1 :: StateT s IO a -> IO b -> StateT s IO a finally1 action cleanup = do x <- capture $ \run -> run action `finally` cleanup restoreState x finally2 :: StateT s IO a -> StateT s IO b -> StateT s IO a finally2 action cleanup = do x <- capture $ \run -> run action `finally` run cleanup restoreState x -- Not async exception safe! finally3 :: StateT s IO a -> StateT s IO b -> StateT s IO a finally3 action cleanup = do x <- capture $ \run -> run action `onException` run cleanup a <- restoreState x _b <- cleanup return a main :: IO () main = do flip evalStateT () $ lift (putStrLn "here1") `finally1` putStrLn "here2" flip evalStateT () $ lift (putStrLn "here3") `finally2` lift (putStrLn "here4") flip evalStateT () $ lift (putStrLn "here5") `finally2` lift (putStrLn "here6")
That's a lot, let's step through it slowly:
type Run s = forall b. StateT s IO b -> IO (b, s)
This is a helper type to make the following bit simpler. It represents
the concept of capturing the initial state in a general manner. Given
an action living in our transformer, it turns an action in our base
monad, returning the entire monadic state with the return value (i.e.,
(b, s)
instead of just b
). This allows use to define our capture
function:
capture :: forall s a. (Run s -> IO a) -> StateT s IO a capture withRun = StateT $ \s0 -> do let run :: Run s run (StateT f) = f s0 a <- withRun run return (a, s0)
This function says "you give me some function that needs to be able to
run monadic actions with the initial context, and I'll give it that
initial context running function (Run s
)." The implementation isn't
too bad: we just capture the s0
, create a run
function out of it,
pass that into the user-provided argument, and then return the result
with the original state.
Now we need some way to update the monadic state based on a result
value. We call it restoreState
:
restoreState :: (a, s) -> StateT s IO a restoreState stateAndResult = StateT $ \_s0 -> return stateAndResult
Pretty simple too: we ignore our original monadic state and replace it
with the state contained in the argument. Next we use these two
functions to implement three versions of finally
. The first two are
able to reuse the finally
from Control.Exception
. However, both of
them suffer from the inability to retain monadic state. Our third
implementation fixes that, at the cost of having to reimplement the
logic of finally
. And as my comment there mentions, our
implementation is not in fact async exception safe.
So all of our original trade-offs apply from our initial StateT
discussion, but now there's an additional downside to option 3: it's
significantly more complicated to implement correctly.
The MonadIOControl type class
Alright, we've established that it's possible to capture this idea for
StateT
. Let's generalize to a typeclass. We'll need three
components:
- A capture function. We'll call it
liftIOWith
, to match nomenclature in monad-control. - A restore function, which we'll call
restoreM
. - An associated type (type family) to represent what the monadic state for the given monad stack is.
We end up with:
type RunInIO m = forall b. m b -> IO (StM m b) class MonadIO m => MonadIOControl m where type StM m a liftIOWith :: (RunInIO m -> IO a) -> m a restoreM :: StM m a -> m a
Let's write an instance for IO
:
instance MonadIOControl IO where type StM IO a = a liftIOWith withRun = withRun id restoreM = return
The type StM IO a = a
says that, for an IO
action returning a
,
the full monadic state is just a
. In other words, there is no
additional monadic state hanging around. That's good, as we know that
there isn't. liftIOWith
is able to just use id
as the RunInIO
function, since you can run an IO
action in IO
directly. And
finally, since there is no monadic state to update, restoreM
just
wraps up the result value in IO
via return
. (More foreshadowment:
what this instance is supposed to look like is actually at the core of
the bug this blog post will eventually talk about.)
Alright, let's implement this instance for StateT s IO
:
instance MonadIOControl (StateT s IO) where type StM (StateT s IO) a = (a, s) liftIOWith withRun = StateT $ \s0 -> do a <- withRun $ \(StateT f) -> f s0 return (a, s0) restoreM stateAndResult = StateT $ \_s0 -> return stateAndResult
This is basically identical to the functions we defined above, so I
won't dwell on it here. But here's an interesting observation: the
same way we define MonadIO
instance as instance MonadIO m =>
MonadIO (StateT s m)
, it would be great to do the same thing for
MonadIOControl
. And, in fact, we can do just that!
instance MonadIOControl m => MonadIOControl (StateT s m) where type StM (StateT s m) a = StM m (a, s) liftIOWith withRun = StateT $ \s0 -> do a <- liftIOWith $ \run -> withRun $ \(StateT f) -> run $ f s0 return (a, s0) restoreM x = StateT $ \_s0 -> restoreM x
We use the underlying monad's liftIOWith
and restoreM
functions
within our own definitions, and thereby get context and state passed
up and down the stack as needed. Alright, let's go ahead and do this
for all of the transformers we've been discussing:
#!/usr/bin/env stack -- stack --resolver lts-8.12 script {-# LANGUAGE RankNTypes #-} {-# LANGUAGE ScopedTypeVariables #-} {-# LANGUAGE TypeFamilies #-} {-# LANGUAGE FlexibleInstances #-} {-# LANGUAGE UndecidableInstances #-} import Control.Exception import Control.Monad.State.Strict import Control.Monad.Writer import Control.Monad.Reader import Control.Monad.Except import Data.Monoid import Data.IORef type RunInIO m = forall b. m b -> IO (StM m b) class MonadIO m => MonadIOControl m where type StM m a liftIOWith :: (RunInIO m -> IO a) -> m a restoreM :: StM m a -> m a instance MonadIOControl IO where type StM IO a = a liftIOWith withRun = withRun id restoreM = return instance MonadIOControl m => MonadIOControl (StateT s m) where type StM (StateT s m) a = StM m (a, s) liftIOWith withRun = StateT $ \s0 -> do a <- liftIOWith $ \run -> withRun $ \(StateT f) -> run $ f s0 return (a, s0) restoreM x = StateT $ \_s0 -> restoreM x instance (MonadIOControl m, Monoid w) => MonadIOControl (WriterT w m) where type StM (WriterT w m) a = StM m (a, w) liftIOWith withRun = WriterT $ do a <- liftIOWith $ \run -> withRun $ \(WriterT f) -> run f return (a, mempty) restoreM x = WriterT $ restoreM x instance MonadIOControl m => MonadIOControl (ReaderT r m) where type StM (ReaderT r m) a = StM m a liftIOWith withRun = ReaderT $ \r -> liftIOWith $ \run -> withRun $ \(ReaderT f) -> run $ f r restoreM x = ReaderT $ \r -> restoreM x instance MonadIOControl m => MonadIOControl (ExceptT e m) where type StM (ExceptT e m) a = StM m (Either e a) liftIOWith withRun = ExceptT $ do a <- liftIOWith $ \run -> withRun $ \(ExceptT f) -> run f return $ Right a restoreM x = ExceptT $ restoreM x control :: MonadIOControl m => (RunInIO m -> IO (StM m a)) -> m a control f = do x <- liftIOWith f restoreM x checkControl :: MonadIOControl m => m () checkControl = control $ \run -> do ref <- newIORef (0 :: Int) let ensureIs :: MonadIO m => Int -> m () ensureIs expected = liftIO $ do putStrLn $ "ensureIs " ++ show expected curr <- atomicModifyIORef ref $ \curr -> (curr + 1, curr) unless (curr == expected) $ error $ show ("curr /= expected", curr, expected) ensureIs 0 Control.Exception.mask $ \restore -> do ensureIs 1 res <- restore (ensureIs 2 >> run (ensureIs 3) `finally` ensureIs 4) ensureIs 5 return res main :: IO () main = do checkControl runStateT checkControl () >>= print runWriterT checkControl >>= (print :: ((), ()) -> IO ()) runReaderT checkControl () runExceptT checkControl >>= (print :: Either () () -> IO ())
I encourage you to inspect each of the instances above and make sure
you're comfortable with their implementation. I've added a function
here, checkControl
, as a basic sanity check of our
implementation. We start with the control
helper function, which
runs some action with a RunInIO
argument, and then restores the
monadic state. Then we use this function in checkControl
to ensure
that a series of actions are all run in the correct order. As you can
see, all of our test monads pass (again, foreshadowment).
The real monad-control package looks pretty similar to this, except:
- Instead of
MonadIOControl
, which is hard-coded to usingIO
as a base monad, it provides aMonadBaseControl
typeclass, which allows arbitrary base monads (likeST
orSTM
). - Just as
MonadBaseControl
is an analogue ofMonadIO
, the package providesMonadTransControl
as an analogue ofMonadTrans
, allowing you to unwrap one layer in a monad stack.
With all of this exposition out of the way—likely the longest exposition I've ever written in any blog post—we can start dealing with the actual bug. I'll show you the full context eventually, but I was asked to help debug a function that looked something like this:
fileLen1 :: (MonadThrow m, MonadBaseControl IO m, MonadIO m) => FilePath -> m Int fileLen1 fp = runResourceT $ runConduit $ sourceFile fp .| lengthCE
This is fairly common in Conduit code. We're going to use
sourceFile
, which needs to allocate some resources. Since we can't
safely allocate resources from within a Conduit pipeline, we start off
with runResourceT
to allow Conduit to register cleanup
actions. (This combination is so common that we have a helper function
runConduitRes = runResourceT . runConduit
.)
Unfortunately, this innocuous-looking like of code was generating an error message:
Control.Monad.Trans.Resource.register': The mutable state is being accessed after cleanup. Please contact the maintainers.
The "Please contact the maintainers." line should probably be removed from the resourcet package; it was from back in a time when we thought this bug was most likely to indicate an implementation bug within resourcet. That's no longer the case... which hopefully this debugging adventure will help demonstrate.
Anyway, as last week's blog post on ResourceT explained,
runResourceT
creates a mutable variable to hold a list of cleanup
actions, allows the inner action to register cleanup values into that
mutable variable, and then when runResourceT
is exiting, it calls
all those cleanup actions. And as a last sanity check, it replaces the
value inside that mutable variable with a special value indicating
that the state has already been closed, and it is therefore invalid to
register further cleanup actions.
In well-behaved code, the structure of our runResourceT
function
should prevent the mutable state from being accessible after it's
closed, though I mention some cases last week that could cause that to
happen (specifically, misuse of concurrency and the transPipe
function). However, after thoroughly exploring the codebase, I could
find no indication that either of these common bugs had occurred.
Internally, runResourceT
is essentially a bracket
call, using the
createInternalState
function to allocate the mutable variable, and
closeInternalState
to clean it up. So I figured I could get a bit
more information about this bug by using the bracket
function from
Control.Exception.Lifted
and implementing:
fileLen2 :: (MonadThrow m, MonadBaseControl IO m, MonadIO m) => FilePath -> m Int fileLen2 fp = Lifted.bracket createInternalState closeInternalState $ runInternalState $ runConduit $ sourceFile fp .| lengthCE
Much to my chagrin, the bug disappeared! Suddenly the code worked
perfectly. Beginning to question my sanity, I decided to look at the
implementation of runResourceT
, and found this:
runResourceT :: MonadBaseControl IO m => ResourceT m a -> m a runResourceT (ResourceT r) = control $ \run -> do istate <- createInternalState E.mask $ \restore -> do res <- restore (run (r istate)) `E.onException` stateCleanup ReleaseException istate stateCleanup ReleaseNormal istate return res
Ignoring the fact that we differentiate between exception and normal
cleanup in the stateCleanup
function, I was struck by one question:
why did I decide to implement this with control
in a manual,
error-prone way instead of using the bracket
function directly? I
began to worry that there was a bug in this implementation leading to
all of the problems.
However, after reading through this implementation many times, I
convinced myself that it was, in fact, correct. And then I realized
why I had done it this way. Both createInternalState
and
stateCleanup
are functions that can live in IO
directly, without
any need of a monad transformer state. The only function that needed
the monad transformer logic was that contained in the ResourceT
itself.
If you remember our discussion above, there were two major advantages
of the implementation of finally
which relied upon IO
for the
cleanup function instead of using the monad transformer state:
- It was much more explicit about how monadic state was going to be handled.
- It gave a slight performance advantage.
With the downside being that the type signature wasn't quite what people normally expected. Well, that downside didn't apply in my case: I was working on an internal function in a library, so I was free to ignore what a user-friendly API would look like. The advantage of explicitness around monadic state certainly appealed in a library that was so sensitive to getting things right. And given how widely used this function is, and the deep monadic stacks it was sometimes used it, any performance advantage was worth pursuing.
Alright, I felt good about the fact that runResourceT
was
implemented correctly. Just to make sure I wasn't crazy, I
reimplemented fileLen
to use an explicit control
instead of
Lifted.bracket
, and the bug reappeared:
-- I'm ignoring async exception safety. This needs mask. fileLen3 :: forall m. (MonadThrow m, MonadBaseControl IO m, MonadIO m) => FilePath -> m Int fileLen3 fp = control $ \run -> do istate <- createInternalState res <- run (runInternalState inner istate) `onException` closeInternalState istate closeInternalState istate return res where inner :: ResourceT m Int inner = runConduit $ sourceFile fp .| lengthCE
And as one final sanity check, I implemented fileLen4
to use the
generalized style of bracket
, where the allocation and cleanup
functions live in the monad stack instead of just IO
, and as
expected the bug disappeared again. (Actually, I didn't really do
this. I'm doing it now for the purpose of this blog post.)
fileLen4 :: forall m. (MonadThrow m, MonadBaseControl IO m, MonadIO m) => FilePath -> m Int fileLen4 fp = control $ \run -> bracket (run createInternalState) (\st -> run $ restoreM st >>= closeInternalState) (\st -> run $ restoreM st >>= runInternalState inner) where inner :: ResourceT m Int inner = runConduit $ sourceFile fp .| lengthCE
Whew, OK! So it turns out that my blog post title was correct: this is a tale of two brackets. And somehow, one of them triggers a bug, and one of them doesn't. But I still didn't know quite how that happened.
The culprit
Another member of the team tracked down the ultimate problem to a
datatype that looked like this (though not actually named Bad
, that
would have been too obvious):
newtype Bad a = Bad { runBad :: IO a } deriving (Functor, Applicative, Monad, MonadIO, MonadThrow, MonadBase IO) instance MonadBaseControl IO Bad where type StM Bad a = IO a liftBaseWith withRun = Bad $ withRun $ return . runBad restoreM = Bad
That's the kind of code that can easily pass a code review without
anyone noticing a thing. With all of the context from this blog post,
you may be able to understand why I've called this type Bad
. Go
ahead and give it a few moments to try and figure it out.
OK, ready to see how this plays out? The StM Bad a
associated type
is supposed to contain the result value of the underlying monad,
together with any state introduced by this monad. Since we just have a
newtype around IO
, there should be no monadic state, and we should
just have a
. However, we've actually defined it as IO a
, which
means "my monadic state for a value a
is an IO
action which will
return an a
." The implementation of liftBaseWith
and restoreM
are simply in line with making the types work out.
Let's look at fileLen3
understanding that this is the instance in
question. I'm also going to expand the control
function to make it
easier to see what's happening.
res <- liftBaseWith $ \run -> do istate <- createInternalState res <- run (runInternalState inner istate) `onException` closeInternalState istate closeInternalState istate return res restoreM res
If we play it a little loose with newtype wrappers, we can substitute
in the implementations of liftBaseWith
and restoreM
to get:
res <- Bad $ do let run = return . runBad istate <- createInternalState res <- run (runInternalState inner istate) `onException` closeInternalState istate closeInternalState istate return res Bad res
Let's go ahead and substitute in our run
function in the one place
it's used:
res <- Bad $ do istate <- createInternalState res <- return (runBad (runInternalState inner istate)) `onException` closeInternalState istate closeInternalState istate return res Bad res
If you look at the code return x `onException` foo
, it's pretty
easy to establish that return
itself will never throw an exception
in IO
, and therefore the onException
it useless. In other words,
the code is equivalent to just return x
. So again substituting:
res <- Bad $ do istate <- createInternalState res <- return (runBad (runInternalState inner istate)) closeInternalState istate return res Bad res
And since foo <- return x
is just let foo = x
, we can turn this
into:
res <- Bad $ do istate <- createInternalState closeInternalState istate return (runBad (runInternalState inner istate)) Bad res
And then:
Bad $ do istate <- createInternalState closeInternalState istate Bad (runBad (runInternalState inner istate))
And finally, just to drive the point home:
istate <- Bad createInternalState Bad $ closeInternalState istate runInternalState inner istate
So who wants to take a guess why the mutable variable was closed
before we ever tried to register? Because that's exactly what our
MonadBaseControl
instance said! The problem is that instead of our
monadic state just being some value, it was the entire action we
needed to run, which was now being deferred until after we called
closeInternalState
. Oops.
What about the other bracket?
Now let's try to understand why fileLen4
worked, despite the broken
MonadBaseControl
instance. Again, starting with the original code
after replacing control
with liftBaseWith
and restoreM
:
res <- liftBaseWith $ \run -> bracket (run createInternalState) (\st -> run $ restoreM st >>= closeInternalState) (\st -> run $ restoreM st >>= runInternalState inner) restoreM res
This turns into:
res <- Bad $ bracket (return $ runBad createInternalState) (\st -> return $ runBad $ Bad st >>= closeInternalState) (\st -> return $ runBad $ Bad st >>= runInternalState inner) Bad res
Since this case is a bit more involved than the previous one, let's
strip off the noise of Bad
and runBad
calls, since they're just
wrapping/unwrapping a newtype:
res <- bracket (return createInternalState) (\st -> return $ st >>= closeInternalState) (\st -> return $ st >>= runInternalState inner) res
To decompose this mess, let's look at the actual implementation of
bracket
from base
:
bracket before after thing = mask $ \restore -> do a <- before r <- restore (thing a) `onException` after a _ <- after a return r
We're going to ignore async exceptions for now, and therefore just
mentally delete the mask $ \restore
bit. We end up with:
res <- do a <- return createInternalState r <- return (a >>= runInternalState inner) `onException` return (a >>= closeInternalState) _ <- return (a >>= closeInternalState) return r res
As above, we know that our return x `onException` foo
will never
actually trigger the exception case. Also, a <- return x
is the same
as let a = x
. So we can simplify to:
res <- do let a = createInternalState let r = a >>= runInternalState inner _ <- return (a >>= closeInternalState) return r res
Also, _ <- return x
has absolutely no impact at all, so we can
delete that line (and any mention of closeInternalState
):
res <- do let a = createInternalState let r = a >>= runInternalState inner return r res
And then with a few more simply conversions, we end up with:
createInternalState >>= runInternalState inner
No wonder this code "worked": it never bothered trying to clean up!
This could have easily led to complete leaking of resources in the
application. Only the fact that our runResourceT
function thankfully
stressed the code in a different way did we reveal the problem.
What's the right instance?
It's certainly possible to define a correct newtype wrapper around
IO
:
newtype Good a = Good { runGood :: IO a } deriving (Functor, Applicative, Monad, MonadIO, MonadThrow, MonadBase IO) instance MonadBaseControl IO Good where type StM Good a = a liftBaseWith withRun = Good $ withRun runGood restoreM = Good . return
Unfortunately we can't simply use GeneralizedNewtypeDeriving
to make
this instance due to the associated type family. But the explicitness
here helps us understand what we did wrong before. Note that our type
StM Good a
is just a
, not IO a
. We then implement the helper
functions in terms of that. If you go through the same substitution
exercise I did above, you'll see that—instead of passing around
values which contain the actions to actually perform—our
fileLen3
and fileLen4
functions will be performing the actions at
the appropriate time.
I'm including the full test program at the end of this post for you to play with.
Takeaways
So that blog post was certainly all over the place. I hope the primary
thing you take away from it is a deeper understanding of how monad
transformer stacks interact with operations in the base monad, and how
monad-control works in general. In particular, next time you call
finally
on some five-layer-deep stack, maybe you'll think twice
about the implication of calling modify
or tell
in your cleanup
function.
Another possible takeaway you may have is "Haskell's crazy
complicated, this bug could happen to anyone, and it's almost
undetectable." It turns out that there's a really simple workaround
for that: stick to standard monad transformers whenever
possible. monad-control is a phenomonal library, but I don't think
most people should ever have to interact with it directly. Like async
exceptions and unsafePerformIO
, there are parts of our library
ecosystem that require them, but you should stick to higher-level
libraries that hide that insanity from you, the same way we use
higher-level languages to avoid having to write assembly.
Finally, having to think about all of the monadic state stuff in my
code gives me a headache. It's possible for us to have a library like
lifted-base
, but which constrains functions to only taking one
argument in the m
monad and the rest in IO
to avoid the
multiple-state stuff. However, my preferred solution is to avoid
wherever possible monad transformers that introduce monadic state, and
stick to ReaderT
like things for the majority of my
application. (Yes, this is another pitch for my
ReaderT design pattern.)
Full final source code
#!/usr/bin/env stack -- stack --resolver lts-8.12 script {-# LANGUAGE GeneralizedNewtypeDeriving #-} {-# LANGUAGE MultiParamTypeClasses #-} {-# LANGUAGE TypeFamilies #-} {-# LANGUAGE FlexibleContexts #-} {-# LANGUAGE ScopedTypeVariables #-} import Control.Monad.Trans.Control import Control.Monad.Trans.Resource import Control.Exception.Safe import qualified Control.Exception.Lifted as Lifted import Conduit newtype Bad a = Bad { runBad :: IO a } deriving (Functor, Applicative, Monad, MonadIO, MonadThrow, MonadBase IO) instance MonadBaseControl IO Bad where type StM Bad a = IO a liftBaseWith withRun = Bad $ withRun $ return . runBad restoreM = Bad newtype Good a = Good { runGood :: IO a } deriving (Functor, Applicative, Monad, MonadIO, MonadThrow, MonadBase IO) instance MonadBaseControl IO Good where type StM Good a = a liftBaseWith withRun = Good $ withRun runGood restoreM = Good . return fileLen1 :: (MonadThrow m, MonadBaseControl IO m, MonadIO m) => FilePath -> m Int fileLen1 fp = runResourceT $ runConduit $ sourceFile fp .| lengthCE fileLen2 :: (MonadThrow m, MonadBaseControl IO m, MonadIO m) => FilePath -> m Int fileLen2 fp = Lifted.bracket createInternalState closeInternalState $ runInternalState $ runConduit $ sourceFile fp .| lengthCE -- I'm ignoring async exception safety. This needs mask. fileLen3 :: forall m. (MonadThrow m, MonadBaseControl IO m, MonadIO m) => FilePath -> m Int fileLen3 fp = control $ \run -> do istate <- createInternalState res <- run (runInternalState inner istate) `onException` closeInternalState istate closeInternalState istate return res where inner :: ResourceT m Int inner = runConduit $ sourceFile fp .| lengthCE fileLen4 :: forall m. (MonadThrow m, MonadBaseControl IO m, MonadIO m) => FilePath -> m Int fileLen4 fp = control $ \run -> bracket (run createInternalState) (\st -> run $ restoreM st >>= closeInternalState) (\st -> run $ restoreM st >>= runInternalState inner) where inner :: ResourceT m Int inner = runConduit $ sourceFile fp .| lengthCE main :: IO () main = do putStrLn "fileLen1" tryAny (fileLen1 "/usr/share/dict/words") >>= print tryAny (runBad (fileLen1 "/usr/share/dict/words")) >>= print tryAny (runGood (fileLen1 "/usr/share/dict/words")) >>= print putStrLn "fileLen2" tryAny (fileLen2 "/usr/share/dict/words") >>= print tryAny (runBad (fileLen2 "/usr/share/dict/words")) >>= print tryAny (runGood (fileLen2 "/usr/share/dict/words")) >>= print putStrLn "fileLen3" tryAny (fileLen3 "/usr/share/dict/words") >>= print tryAny (runBad (fileLen3 "/usr/share/dict/words")) >>= print tryAny (runGood (fileLen3 "/usr/share/dict/words")) >>= print putStrLn "fileLen4" tryAny (fileLen4 "/usr/share/dict/words") >>= print tryAny (runBad (fileLen4 "/usr/share/dict/words")) >>= print tryAny (runGood (fileLen4 "/usr/share/dict/words")) >>= print
Bonus exercise Take the checkControl
function I provided above,
and use it in the Good
and Bad
monads. See what the result is, and
if you can understand why that's the case.