(cache) A Tale of Two Brackets

26 Jun 2017 Michael Snoyman

This is a debugging story told completely out of order. In order to understand the ultimate bug, why it seemed to occur arbitrarily, and the ultimate resolution, there's lots of backstory to cover. If you're already deeply familiar with the inner workings of the monad-control package, you can probably look at a demonstration of the bad instance and move on. Otherwise, prepare for a fun ride!

As usual, if you want to play along, we're going to be using Stack's script interpreter feature. Just save the snippets contents to a file and run with stack filename.hs. (It works with any snippet that begins with #!/usr/bin/env stack.)

Oh, and also: the confusion that this blog post demonstrates is one of the reasons why I strongly recommend sticking to a ReaderT env IO monad transformer stack.

Trying in StateT

Let's start with some broken code (my favorite kind). It uses the StateT transformer and a function which may throw a runtime exception.

#!/usr/bin/env stack
-- stack --resolver lts-8.12 script
import Control.Monad.State.Strict
import Control.Exception
import Data.Typeable

data OddException = OddException !Int -- great name :)
  deriving (Show, Typeable)
instance Exception OddException

mayThrow :: StateT Int IO Int
mayThrow = do
  x <- get
  if odd x
    then lift $ throwIO $ OddException x
    else do
      put $! x + 1
      return $ x `div` 2

main :: IO ()
main = runStateT (replicateM 2 mayThrow) 0 >>= print

Our problem is that we'd like to be able to recover from a thrown exception. Easy enough we think, we'll just use Control.Exception.try to attempt to run the mayThrow action. Unfortunately, if I wrap up mayThrow with a try, I get this highly informative error message:

Main.hs:21:19: error:
    • Couldn't match type ‘IO’ with ‘StateT Integer IO’
      Expected type: StateT Integer IO ()
        Actual type: IO ()
    • In the first argument of ‘runStateT’, namely
        ‘(replicateM 2 (try mayThrow))’
      In the first argument of ‘(>>=)’, namely
        ‘runStateT (replicateM 2 (try mayThrow)) 0’
      In the expression:
        runStateT (replicateM 2 (try mayThrow)) 0 >>= print

Oh, that makes sense: try is specialized to IO, and our function is StateT Int IO. Our first instinct is probably to keep throwing lift calls into our program until it compiles, since lift seems to always fix monad transformer compilation errors. However, try as you might, you'll never succeed. To understand why, let's look at the (slightly specialized) type signature for try:

try :: IO a -> IO (Either OddException a)

If I apply lift to this, I could end up with:

try :: IO a -> StateT Int IO (Either OddException a)

But there's no way to use lift to modify the type of the IO a input. This is generally the case with the lift and liftIO functions: they can deal with monad values that are the output of a function, but not the input to the function. (More precisely: the functions are covariant and work on values in positive positions. We'd need something contravariant to work on vlaues in negative positions. You can read more on this nomenclature in another blog post.)

Huh, I guess we're stuck. But then I remember that StateT is just defined as newtype StateT s m a = StateT { runStateT :: s -> m (a,s)}. So maybe I can write a version of try that works for a StateT using the internals of the type.

tryStateT :: StateT Int IO a -> StateT Int IO (Either OddException a)
tryStateT (StateT f) = StateT $ \s0 -> do
  eres <- try (f s0)
  return $ case eres of
    Left e -> (Left e, s0)
    Right (a, s1) -> (Right a, s1)

Go ahead and plug that into our previous example, and you should get the desired output:

([Right 0,Left (OddException 1)],1)

Let's break down in nauseating detail what that tryStateT function did:

Unwrap the StateT data constructor from the provided action to get a function f :: Int -> IO (a, Int)
Construct a new StateT value on the right hand side by using the StateT data constructor, and capturing the initial state in the value s0 :: Int.
Pass s0 to f to get an action IO :: (a, Int), which will give the result and the new, updated state.
Wrap f s0 with try to allow us to detect and recover from a runtime exception.
eres has type Either OddException (a, Int), and we pattern match on it.
If we receive a Right/success value, we simply wrap up the a value in a Right constructor together with the updated state.
If we receive a Left/exception value, we wrap it up the exception with a Left. However, we need to return some new state. Since we have no such state available to us from the action, we return the only thing we can: the initial s0 state value.

Lesson learned We can use try in a StateT with some difficulty, but we need to be aware of what happens to our monadic state.

Catching in StateT

It turns out that it's trivial to implement the try function in terms of catch, and the catch function in terms of try, at least when sticking to the IO-specialized versions:

try' :: Exception e => IO a -> IO (Either e a)
try' action = (Right <$> action) `catch` (return . Left)

catch' :: Exception e => IO a -> (e -> IO a) -> IO a
catch' action onExc = do
  eres <- try action
  case eres of
    Left e -> onExc e
    Right a -> return a

It turns out that by just changing the type signatures and replacing try with tryStateT, we can do the same thing for StateT:

catchStateT :: Exception e
            => StateT Int IO a
            -> (e -> StateT Int IO a)
            -> StateT Int IO a
catchStateT action onExc = do
  eres <- tryStateT action
  case eres of
    Left e -> onExc e
    Right a -> return a

NOTE Pay close attention to that type signature, and think about how monadic state is being shuttled through this function.

Well, if we can implement catchStateT in terms of tryStateT, surely we can implement it directly as well. Let's do the most straightforward thing I can think of (or at least the thing that continues my narrative here):

catchStateT :: Exception e
            => StateT Int IO a
            -> (e -> IO a)
            -> StateT Int IO a
catchStateT (StateT action) onExc = StateT $ \s0 ->
  action s0 `catch` \e -> do
    a <- onExc e
    return (a, s0)

Here, we're basing our implementation on top of the catch function instead of the try function. We do the same unwrap-the-StateT, capture-the-s0 trick we did before. Now, in the lambda we've created for the catch call, we pass the e exception value to the user-supplied onExc function, and then like tryStateT wrap up the result in a tuple with the initial s0.

Who noticed the difference in type signature? Instead of e -> StateT Int IO a, our onExc handler has type e -> IO a. I told you to pay attention to how the monadic states were being shuttled around; let's analyze it:

In the first function, we use tryStateT, which as we mentioned will reconstitute the original s0 state when it returns. If the action succeeded, nothing else happens. But in the exception case, that original s0 is now passed into the onExc function, and the final monadic state returned will be the result of the onExc function.
In the second function, we never give the onExc function a chance to play with monadic state, since it just lives in IO. So we always return the original state at the end if an exception occurred.

Which behavior is best? I think most people would argue that the first function is better: it's more general in allowing onExc to access and modify the monadic state, and there's not really any chance for confusion. Fair enough, I'll buy that argument (that I just made on behalf of all of my readers).

Bonus exercise Modify this implementation of catchStateT to have the same type signature as the original one.

Finally

This is fun, let's keep reimplementing functions from Control.Exception! This time, let's do finally, which will ensure that some action (usually a cleanup action) is run after an initial action, regardless of whether an exception was thrown.

finallyStateT :: StateT Int IO a
              -> IO b
              -> StateT Int IO a
finallyStateT (StateT action) cleanup = StateT $ \s0 ->
  action s0 `finally` cleanup

That was really easy. Ehh, but one problem: look at that type signature! We just agreed (or I agreed for you) that in the case of catch, it was better to have the second argument also live in StateT Int IO. Here, our argument lives in IO. Let's fix that:

finallyStateT :: StateT Int IO a
              -> StateT Int IO b
              -> StateT Int IO a
finallyStateT (StateT action) (StateT cleanup) = StateT $ \s0 ->
  action s0 `finally` cleanup s0

Huh, also pretty simple. Let's analyze the monadic state behavior here: our cleanup action is given the initial state, regardless of the result of action s0. That means that, even if the action succeeded, we'll ignore the updated state. Furthermore, because finally ignores the result of the second argument, we will ignore any updated monadic state. Want to see what I mean? Try this out:

#!/usr/bin/env stack
-- stack --resolver lts-8.12 script
import Control.Exception
import Control.Monad.State.Strict

finallyStateT :: StateT Int IO a
              -> StateT Int IO b
              -> StateT Int IO a
finallyStateT (StateT action) (StateT cleanup) = StateT $ \s0 ->
  action s0 `finally` cleanup s0

action :: StateT Int IO ()
action = modify (+ 1)

cleanup :: StateT Int IO ()
cleanup = do
  get >>= lift . print
  modify (+ 2)

main :: IO ()
main = execStateT (action `finallyStateT` cleanup) 0 >>= print

You may expect the output of this to be the numbers 1 and 3, but in fact the output is 0 and 1: cleanup looks at the initial state value of 0, and its + 2 modification is thrown away. So can we implement a version of our function that keeps the state? Sure (slightly simplified to avoid async exception/mask noise):

finallyStateT :: StateT Int IO a
              -> StateT Int IO b
              -> StateT Int IO a
finallyStateT (StateT action) (StateT cleanup) = StateT $ \s0 -> do
  (a, s1) <- action s0 `onException` cleanup s0
  (_b, s2) <- cleanup s1
  return (a, s2)

This has the expected output of 1 and 3. Looking at how it works: we follow our same tricks, and pass in s0 to action. If an exception is thrown there, we once again pass in s0 to cleanup and ignore its updated state (since we have no choice). However, in the success case, we now pass in the updated state (s1) to cleanup. And finally, our resulting state is the result of cleanup (s2) instead of the s1 produced by action.

We have three different implementations of finallyStateT and two different type signatures. Let's compare them:

The first one (the IO version) has the advantage that its type tells us exactly what's happening: the cleanup has no access to the state at all. However, you can argue like we did with catchStateT that this is limiting and not what people would expect the type signature to be.
The second one (use the initial state for cleanup and then throw away its modified state) has the advantage that it's logically consistent: whether cleanup is called from a success or exception code path, it does the exact same thing. On the other hand, you can argue that it is surprising behavior that state updates that can be preserved are being thrown away.
The third one (keep the state) has the reversed arguments of the second one.

So unlike catchStateT, I would argue that there's not nearly as clear a winner with finallyStateT. Each approach has its relative merits.

One final point that seems almost not worth mentioning (hint: epic foreshadowment incoming). The first version (IO specialized) has an additional benefit of being ever-so-slightly more efficient than the other two, since it doesn't need to deal with the additional monadic state in cleanup. With a simple monad transformer like StateT this performance difference is hardly even worth thinking about. However, if we were in a tight inner loop, and our monad stack was significantly more complicated, you could imagine a case where the performance difference was significant.

Implementing for other transformers

It's great that we understand StateT so well, but can we do anything for other transformers? It turns out that, yes, we can for many transformers. (An exception is continuation-based transformers, which you can read a bit about in passing in my ResourceT blog post from last week.) Let's look at a few other examples of finally:

import Control.Exception
import Control.Monad.Writer
import Control.Monad.Reader
import Control.Monad.Except
import Data.Monoid

finallyWriterT :: Monoid w
               => WriterT w IO a
               -> WriterT w IO b
               -> WriterT w IO a
finallyWriterT (WriterT action) (WriterT cleanup) = WriterT $ do
  (a, w1) <- action `onException` cleanup
  (_b, w2) <- cleanup
  return (a, w1 <> w2)

finallyReaderT :: ReaderT r IO a
               -> ReaderT r IO b
               -> ReaderT r IO a
finallyReaderT (ReaderT action) (ReaderT cleanup) = ReaderT $ \r -> do
  a <- action r `onException` cleanup r
  _b <- cleanup r
  return a

finallyExceptT :: ExceptT e IO a
               -> ExceptT e IO b
               -> ExceptT e IO a
finallyExceptT (ExceptT action) (ExceptT cleanup) = ExceptT $ do
  ea <- action `onException` cleanup
  eb <- cleanup
  return $ case (ea, eb) of
    (Left e, _) -> Left e
    (Right _a, Left e) -> Left e
    (Right a, Right _b) -> Right a

The WriterT case is very similar to the StateT case, except (1) there's no initial state s0 to contend with, and (2) instead of receiving an updated s2 state from cleanup, we need to monoidally combine the w1 and w2 values. The ReaderT case is also very similar to StateT, but in the opposite way: we receive an immutable environment r which is passed into all functions, but there is no updated state. To put this in other words: WriterT has no context but has mutable monadic state, whereas ReaderT has a context but no mutable monadic state. StateT, by contrast, has both. (This is important to understand, so reread it a few times to get comfortable with the concept.)

The ExceptT case is interesting: it has no context (like WriterT), but it does have mutable monadic state, just not like StateT and WriterT. Instead of returning an extra value with each result (as a product), ExceptT returns either a result value or an e value (as a sum). The case expression at the end of finallyExceptT is very informative: we need to figure out how to combine the various monadic states together. Our implementation here says that if action returns e, we take that result. Otherwise, if cleanup fails, we take that value. And if they both return Right values, then we use action's result. But there are at least two other valid choices:

Prefer cleanup's e value to action's e value, if both are available.
Completely ignore the e value returned by cleanup, and just use action's result.

There's also a fourth, invalid option: if action returns a Left, return that immediately and don't call cleanup. This has been a perenniel source of bugs in many libraries dealing with exceptions in monad transformers like ErrorT, ExceptT, and EitherT. This invalidates the contract of finally, namely that cleanup will always be run. I've seen some arguments for why this can make sense, but I consider it nothing more than a buggy implementation.

And finally, like with StateT, we could avoid all of these questions for ExceptT if we just modify our type signature to use IO b for cleanup:

finallyExceptT :: ExceptT e IO a
               -> IO b
               -> ExceptT e IO a
finallyExceptT (ExceptT action) cleanup = ExceptT $ do
  ea <- action `onException` cleanup
  _b <- cleanup
  return ea

So our takeaway: we can implement finally for various monad transformers. In some cases this leads to questions of semantics, just like with StateT. And all of these transformers fall into a pattern of optionally capturing some initial context, and optionally shuttling around some monadic state.

(And no, I haven't forgotten that the title of this blog post talks about bracket. We're getting there, ever so slowly. I hope I've piqued your curiosity.)

Generalizing the pattern

It's wonderful that we can implement all of these functions that take monad transformers as arguments. But do any of us actually want to go off and implement catch, try, finally, forkIO, timeout, and a dozen other functions for every possible monad transformer stack imagineable? I doubt it. So just as we have MonadTrans and MonadIO for dealing with transformers in output/positive position, we can construct some kind of typeclass that handles the two concepts we mentioned above: capture the context, and deal with the monadic state.

Let's start by playing with this for just StateT.

#!/usr/bin/env stack
-- stack --resolver lts-8.12 script
{-# LANGUAGE RankNTypes #-}
{-# LANGUAGE ScopedTypeVariables #-}
import Control.Exception
import Control.Monad.State.Strict

type Run s = forall b. StateT s IO b -> IO (b, s)

capture :: forall s a.
           (Run s -> IO a)
        -> StateT s IO a
capture withRun = StateT $ \s0 -> do
  let run :: Run s
      run (StateT f) = f s0
  a <- withRun run
  return (a, s0)

restoreState :: (a, s) -> StateT s IO a
restoreState stateAndResult = StateT $ \_s0 -> return stateAndResult

finally1 :: StateT s IO a
         -> IO b
         -> StateT s IO a
finally1 action cleanup = do
  x <- capture $ \run -> run action `finally` cleanup
  restoreState x

finally2 :: StateT s IO a
         -> StateT s IO b
         -> StateT s IO a
finally2 action cleanup = do
  x <- capture $ \run -> run action `finally` run cleanup
  restoreState x

-- Not async exception safe!
finally3 :: StateT s IO a
         -> StateT s IO b
         -> StateT s IO a
finally3 action cleanup = do
  x <- capture $ \run -> run action `onException` run cleanup
  a <- restoreState x
  _b <- cleanup
  return a

main :: IO ()
main = do
  flip evalStateT () $ lift (putStrLn "here1") `finally1`
                       putStrLn "here2"
  flip evalStateT () $ lift (putStrLn "here3") `finally2`
                       lift (putStrLn "here4")
  flip evalStateT () $ lift (putStrLn "here5") `finally2`
                       lift (putStrLn "here6")

That's a lot, let's step through it slowly:

type Run s = forall b. StateT s IO b -> IO (b, s)

This is a helper type to make the following bit simpler. It represents the concept of capturing the initial state in a general manner. Given an action living in our transformer, it turns an action in our base monad, returning the entire monadic state with the return value (i.e., (b, s) instead of just b). This allows use to define our capture function:

capture :: forall s a.
           (Run s -> IO a)
        -> StateT s IO a
capture withRun = StateT $ \s0 -> do
  let run :: Run s
      run (StateT f) = f s0
  a <- withRun run
  return (a, s0)

This function says "you give me some function that needs to be able to run monadic actions with the initial context, and I'll give it that initial context running function (Run s)." The implementation isn't too bad: we just capture the s0, create a run function out of it, pass that into the user-provided argument, and then return the result with the original state.

Now we need some way to update the monadic state based on a result value. We call it restoreState:

restoreState :: (a, s) -> StateT s IO a
restoreState stateAndResult = StateT $ \_s0 -> return stateAndResult

Pretty simple too: we ignore our original monadic state and replace it with the state contained in the argument. Next we use these two functions to implement three versions of finally. The first two are able to reuse the finally from Control.Exception. However, both of them suffer from the inability to retain monadic state. Our third implementation fixes that, at the cost of having to reimplement the logic of finally. And as my comment there mentions, our implementation is not in fact async exception safe.

So all of our original trade-offs apply from our initial StateT discussion, but now there's an additional downside to option 3: it's significantly more complicated to implement correctly.

The MonadIOControl type class

Alright, we've established that it's possible to capture this idea for StateT. Let's generalize to a typeclass. We'll need three components:

A capture function. We'll call it liftIOWith, to match nomenclature in monad-control.
A restore function, which we'll call restoreM.
An associated type (type family) to represent what the monadic state for the given monad stack is.

We end up with:

type RunInIO m = forall b. m b -> IO (StM m b)

class MonadIO m => MonadIOControl m where
  type StM m a

  liftIOWith :: (RunInIO m -> IO a) -> m a
  restoreM :: StM m a -> m a

Let's write an instance for IO:

instance MonadIOControl IO where
  type StM IO a = a

  liftIOWith withRun = withRun id
  restoreM = return

The type StM IO a = a says that, for an IO action returning a, the full monadic state is just a. In other words, there is no additional monadic state hanging around. That's good, as we know that there isn't. liftIOWith is able to just use id as the RunInIO function, since you can run an IO action in IO directly. And finally, since there is no monadic state to update, restoreM just wraps up the result value in IO via return. (More foreshadowment: what this instance is supposed to look like is actually at the core of the bug this blog post will eventually talk about.)

Alright, let's implement this instance for StateT s IO:

instance MonadIOControl (StateT s IO) where
  type StM (StateT s IO) a = (a, s)

  liftIOWith withRun = StateT $ \s0 -> do
    a <- withRun $ \(StateT f) -> f s0
    return (a, s0)

  restoreM stateAndResult = StateT $ \_s0 -> return stateAndResult

This is basically identical to the functions we defined above, so I won't dwell on it here. But here's an interesting observation: the same way we define MonadIO instance as instance MonadIO m => MonadIO (StateT s m), it would be great to do the same thing for MonadIOControl. And, in fact, we can do just that!

instance MonadIOControl m => MonadIOControl (StateT s m) where
  type StM (StateT s m) a = StM m (a, s)

  liftIOWith withRun = StateT $ \s0 -> do
    a <- liftIOWith $ \run -> withRun $ \(StateT f) -> run $ f s0
    return (a, s0)

  restoreM x = StateT $ \_s0 -> restoreM x

We use the underlying monad's liftIOWith and restoreM functions within our own definitions, and thereby get context and state passed up and down the stack as needed. Alright, let's go ahead and do this for all of the transformers we've been discussing:

#!/usr/bin/env stack
-- stack --resolver lts-8.12 script
{-# LANGUAGE RankNTypes #-}
{-# LANGUAGE ScopedTypeVariables #-}
{-# LANGUAGE TypeFamilies #-}
{-# LANGUAGE FlexibleInstances #-}
{-# LANGUAGE UndecidableInstances #-}
import Control.Exception
import Control.Monad.State.Strict
import Control.Monad.Writer
import Control.Monad.Reader
import Control.Monad.Except
import Data.Monoid
import Data.IORef

type RunInIO m = forall b. m b -> IO (StM m b)

class MonadIO m => MonadIOControl m where
  type StM m a

  liftIOWith :: (RunInIO m -> IO a) -> m a
  restoreM :: StM m a -> m a

instance MonadIOControl IO where
  type StM IO a = a

  liftIOWith withRun = withRun id
  restoreM = return

instance MonadIOControl m => MonadIOControl (StateT s m) where
  type StM (StateT s m) a = StM m (a, s)

  liftIOWith withRun = StateT $ \s0 -> do
    a <- liftIOWith $ \run -> withRun $ \(StateT f) -> run $ f s0
    return (a, s0)

  restoreM x = StateT $ \_s0 -> restoreM x

instance (MonadIOControl m, Monoid w) => MonadIOControl (WriterT w m) where
  type StM (WriterT w m) a = StM m (a, w)

  liftIOWith withRun = WriterT $ do
    a <- liftIOWith $ \run -> withRun $ \(WriterT f) -> run f
    return (a, mempty)

  restoreM x = WriterT $ restoreM x

instance MonadIOControl m => MonadIOControl (ReaderT r m) where
  type StM (ReaderT r m) a = StM m a

  liftIOWith withRun = ReaderT $ \r ->
    liftIOWith $ \run -> withRun $ \(ReaderT f) -> run $ f r

  restoreM x = ReaderT $ \r -> restoreM x

instance MonadIOControl m => MonadIOControl (ExceptT e m) where
  type StM (ExceptT e m) a = StM m (Either e a)

  liftIOWith withRun = ExceptT $ do
    a <- liftIOWith $ \run -> withRun $ \(ExceptT f) -> run f
    return $ Right a

  restoreM x = ExceptT $ restoreM x

control :: MonadIOControl m => (RunInIO m -> IO (StM m a)) -> m a
control f = do
  x <- liftIOWith f
  restoreM x

checkControl :: MonadIOControl m => m ()
checkControl = control $ \run -> do
  ref <- newIORef (0 :: Int)
  let ensureIs :: MonadIO m => Int -> m ()
      ensureIs expected = liftIO $ do
        putStrLn $ "ensureIs " ++ show expected
        curr <- atomicModifyIORef ref $ \curr -> (curr + 1, curr)
        unless (curr == expected) $ error $ show ("curr /= expected", curr, expected)

  ensureIs 0
  Control.Exception.mask $ \restore -> do
    ensureIs 1
    res <- restore (ensureIs 2 >> run (ensureIs 3) `finally` ensureIs 4)
    ensureIs 5
    return res

main :: IO ()
main = do
  checkControl
  runStateT checkControl () >>= print
  runWriterT checkControl >>= (print :: ((), ()) -> IO ())
  runReaderT checkControl ()
  runExceptT checkControl >>= (print :: Either () () -> IO ())

I encourage you to inspect each of the instances above and make sure you're comfortable with their implementation. I've added a function here, checkControl, as a basic sanity check of our implementation. We start with the control helper function, which runs some action with a RunInIO argument, and then restores the monadic state. Then we use this function in checkControl to ensure that a series of actions are all run in the correct order. As you can see, all of our test monads pass (again, foreshadowment).

The real monad-control package looks pretty similar to this, except:

Instead of MonadIOControl, which is hard-coded to using IO as a base monad, it provides a MonadBaseControl typeclass, which allows arbitrary base monads (like ST or STM).
Just as MonadBaseControl is an analogue of MonadIO, the package provides MonadTransControl as an analogue of MonadTrans, allowing you to unwrap one layer in a monad stack.

With all of this exposition out of the way—likely the longest exposition I've ever written in any blog post—we can start dealing with the actual bug. I'll show you the full context eventually, but I was asked to help debug a function that looked something like this:

fileLen1 :: (MonadThrow m, MonadBaseControl IO m, MonadIO m)
         => FilePath
         -> m Int
fileLen1 fp = runResourceT
           $ runConduit
           $ sourceFile fp
          .| lengthCE

This is fairly common in Conduit code. We're going to use sourceFile, which needs to allocate some resources. Since we can't safely allocate resources from within a Conduit pipeline, we start off with runResourceT to allow Conduit to register cleanup actions. (This combination is so common that we have a helper function runConduitRes = runResourceT . runConduit.)

Unfortunately, this innocuous-looking like of code was generating an error message:

Control.Monad.Trans.Resource.register': The mutable state is being accessed after cleanup. Please contact the maintainers.

The "Please contact the maintainers." line should probably be removed from the resourcet package; it was from back in a time when we thought this bug was most likely to indicate an implementation bug within resourcet. That's no longer the case... which hopefully this debugging adventure will help demonstrate.

Anyway, as last week's blog post on ResourceT explained, runResourceT creates a mutable variable to hold a list of cleanup actions, allows the inner action to register cleanup values into that mutable variable, and then when runResourceT is exiting, it calls all those cleanup actions. And as a last sanity check, it replaces the value inside that mutable variable with a special value indicating that the state has already been closed, and it is therefore invalid to register further cleanup actions.

In well-behaved code, the structure of our runResourceT function should prevent the mutable state from being accessible after it's closed, though I mention some cases last week that could cause that to happen (specifically, misuse of concurrency and the transPipe function). However, after thoroughly exploring the codebase, I could find no indication that either of these common bugs had occurred.

Internally, runResourceT is essentially a bracket call, using the createInternalState function to allocate the mutable variable, and closeInternalState to clean it up. So I figured I could get a bit more information about this bug by using the bracket function from Control.Exception.Lifted and implementing:

fileLen2 :: (MonadThrow m, MonadBaseControl IO m, MonadIO m)
         => FilePath
         -> m Int
fileLen2 fp = Lifted.bracket
  createInternalState
  closeInternalState
  $ runInternalState
  $ runConduit
  $ sourceFile fp
 .| lengthCE

Much to my chagrin, the bug disappeared! Suddenly the code worked perfectly. Beginning to question my sanity, I decided to look at the implementation of runResourceT, and found this:

runResourceT :: MonadBaseControl IO m => ResourceT m a -> m a
runResourceT (ResourceT r) = control $ \run -> do
    istate <- createInternalState
    E.mask $ \restore -> do
        res <- restore (run (r istate)) `E.onException`
            stateCleanup ReleaseException istate
        stateCleanup ReleaseNormal istate
        return res

Ignoring the fact that we differentiate between exception and normal cleanup in the stateCleanup function, I was struck by one question: why did I decide to implement this with control in a manual, error-prone way instead of using the bracket function directly? I began to worry that there was a bug in this implementation leading to all of the problems.

However, after reading through this implementation many times, I convinced myself that it was, in fact, correct. And then I realized why I had done it this way. Both createInternalState and stateCleanup are functions that can live in IO directly, without any need of a monad transformer state. The only function that needed the monad transformer logic was that contained in the ResourceT itself.

If you remember our discussion above, there were two major advantages of the implementation of finally which relied upon IO for the cleanup function instead of using the monad transformer state:

It was much more explicit about how monadic state was going to be handled.
It gave a slight performance advantage.

With the downside being that the type signature wasn't quite what people normally expected. Well, that downside didn't apply in my case: I was working on an internal function in a library, so I was free to ignore what a user-friendly API would look like. The advantage of explicitness around monadic state certainly appealed in a library that was so sensitive to getting things right. And given how widely used this function is, and the deep monadic stacks it was sometimes used it, any performance advantage was worth pursuing.

Alright, I felt good about the fact that runResourceT was implemented correctly. Just to make sure I wasn't crazy, I reimplemented fileLen to use an explicit control instead of Lifted.bracket, and the bug reappeared:

-- I'm ignoring async exception safety. This needs mask.
fileLen3 :: forall m.
            (MonadThrow m, MonadBaseControl IO m, MonadIO m)
         => FilePath
         -> m Int
fileLen3 fp = control $ \run -> do
  istate <- createInternalState
  res <- run (runInternalState inner istate)
          `onException` closeInternalState istate
  closeInternalState istate
  return res
  where
    inner :: ResourceT m Int
    inner = runConduit $ sourceFile fp .| lengthCE

And as one final sanity check, I implemented fileLen4 to use the generalized style of bracket, where the allocation and cleanup functions live in the monad stack instead of just IO, and as expected the bug disappeared again. (Actually, I didn't really do this. I'm doing it now for the purpose of this blog post.)

fileLen4 :: forall m.
            (MonadThrow m, MonadBaseControl IO m, MonadIO m)
         => FilePath
         -> m Int
fileLen4 fp = control $ \run -> bracket
  (run createInternalState)
  (\st -> run $ restoreM st >>= closeInternalState)
  (\st -> run $ restoreM st >>= runInternalState inner)
  where
    inner :: ResourceT m Int
    inner = runConduit $ sourceFile fp .| lengthCE

Whew, OK! So it turns out that my blog post title was correct: this is a tale of two brackets. And somehow, one of them triggers a bug, and one of them doesn't. But I still didn't know quite how that happened.

The culprit

Another member of the team tracked down the ultimate problem to a datatype that looked like this (though not actually named Bad, that would have been too obvious):

newtype Bad a = Bad { runBad :: IO a }
  deriving (Functor, Applicative, Monad, MonadIO, MonadThrow, MonadBase IO)
instance MonadBaseControl IO Bad where
  type StM Bad a = IO a

  liftBaseWith withRun = Bad $ withRun $ return . runBad
  restoreM = Bad

That's the kind of code that can easily pass a code review without anyone noticing a thing. With all of the context from this blog post, you may be able to understand why I've called this type Bad. Go ahead and give it a few moments to try and figure it out.

OK, ready to see how this plays out? The StM Bad a associated type is supposed to contain the result value of the underlying monad, together with any state introduced by this monad. Since we just have a newtype around IO, there should be no monadic state, and we should just have a. However, we've actually defined it as IO a, which means "my monadic state for a value a is an IO action which will return an a." The implementation of liftBaseWith and restoreM are simply in line with making the types work out.

Let's look at fileLen3 understanding that this is the instance in question. I'm also going to expand the control function to make it easier to see what's happening.

res <- liftBaseWith $ \run -> do
  istate <- createInternalState
  res <- run (runInternalState inner istate)
          `onException` closeInternalState istate
  closeInternalState istate
  return res
restoreM res

If we play it a little loose with newtype wrappers, we can substitute in the implementations of liftBaseWith and restoreM to get:

res <- Bad $ do
  let run = return . runBad
  istate <- createInternalState
  res <- run (runInternalState inner istate)
          `onException` closeInternalState istate
  closeInternalState istate
  return res
Bad res

Let's go ahead and substitute in our run function in the one place it's used:

res <- Bad $ do
  istate <- createInternalState
  res <- return (runBad (runInternalState inner istate))
          `onException` closeInternalState istate
  closeInternalState istate
  return res
Bad res

If you look at the code return x `onException` foo, it's pretty easy to establish that return itself will never throw an exception in IO, and therefore the onException it useless. In other words, the code is equivalent to just return x. So again substituting:

res <- Bad $ do
  istate <- createInternalState
  res <- return (runBad (runInternalState inner istate))
  closeInternalState istate
  return res
Bad res

And since foo <- return x is just let foo = x, we can turn this into:

res <- Bad $ do
  istate <- createInternalState
  closeInternalState istate
  return (runBad (runInternalState inner istate))
Bad res

And then:

Bad $ do
  istate <- createInternalState
  closeInternalState istate
Bad (runBad (runInternalState inner istate))

And finally, just to drive the point home:

istate <- Bad createInternalState
Bad $ closeInternalState istate
runInternalState inner istate

So who wants to take a guess why the mutable variable was closed before we ever tried to register? Because that's exactly what our MonadBaseControl instance said! The problem is that instead of our monadic state just being some value, it was the entire action we needed to run, which was now being deferred until after we called closeInternalState. Oops.

What about the other bracket?

Now let's try to understand why fileLen4 worked, despite the broken MonadBaseControl instance. Again, starting with the original code after replacing control with liftBaseWith and restoreM:

res <- liftBaseWith $ \run -> bracket
  (run createInternalState)
  (\st -> run $ restoreM st >>= closeInternalState)
  (\st -> run $ restoreM st >>= runInternalState inner)
restoreM res

This turns into:

res <- Bad $ bracket
  (return $ runBad createInternalState)
  (\st -> return $ runBad $ Bad st >>= closeInternalState)
  (\st -> return $ runBad $ Bad st >>= runInternalState inner)
Bad res

Since this case is a bit more involved than the previous one, let's strip off the noise of Bad and runBad calls, since they're just wrapping/unwrapping a newtype:

res <- bracket
  (return createInternalState)
  (\st -> return $ st >>= closeInternalState)
  (\st -> return $ st >>= runInternalState inner)
res

To decompose this mess, let's look at the actual implementation of bracket from base:

bracket before after thing =
  mask $ \restore -> do
    a <- before
    r <- restore (thing a) `onException` after a
    _ <- after a
    return r

We're going to ignore async exceptions for now, and therefore just mentally delete the mask $ \restore bit. We end up with:

res <- do
  a <- return createInternalState
  r <- return (a >>= runInternalState inner) `onException`
    return (a >>= closeInternalState)
  _ <- return (a >>= closeInternalState)
  return r
res

As above, we know that our return x `onException` foo will never actually trigger the exception case. Also, a <- return x is the same as let a = x. So we can simplify to:

res <- do
  let a = createInternalState
  let r = a >>= runInternalState inner
  _ <- return (a >>= closeInternalState)
  return r
res

Also, _ <- return x has absolutely no impact at all, so we can delete that line (and any mention of closeInternalState):

res <- do
  let a = createInternalState
  let r = a >>= runInternalState inner
  return r
res

And then with a few more simply conversions, we end up with:

createInternalState >>= runInternalState inner

No wonder this code "worked": it never bothered trying to clean up! This could have easily led to complete leaking of resources in the application. Only the fact that our runResourceT function thankfully stressed the code in a different way did we reveal the problem.

What's the right instance?

It's certainly possible to define a correct newtype wrapper around IO:

newtype Good a = Good { runGood :: IO a }
  deriving (Functor, Applicative, Monad, MonadIO, MonadThrow, MonadBase IO)
instance MonadBaseControl IO Good where
  type StM Good a = a

  liftBaseWith withRun = Good $ withRun runGood
  restoreM = Good . return

Unfortunately we can't simply use GeneralizedNewtypeDeriving to make this instance due to the associated type family. But the explicitness here helps us understand what we did wrong before. Note that our type StM Good a is just a, not IO a. We then implement the helper functions in terms of that. If you go through the same substitution exercise I did above, you'll see that—instead of passing around values which contain the actions to actually perform—our fileLen3 and fileLen4 functions will be performing the actions at the appropriate time.

I'm including the full test program at the end of this post for you to play with.

Takeaways

So that blog post was certainly all over the place. I hope the primary thing you take away from it is a deeper understanding of how monad transformer stacks interact with operations in the base monad, and how monad-control works in general. In particular, next time you call finally on some five-layer-deep stack, maybe you'll think twice about the implication of calling modify or tell in your cleanup function.

Another possible takeaway you may have is "Haskell's crazy complicated, this bug could happen to anyone, and it's almost undetectable." It turns out that there's a really simple workaround for that: stick to standard monad transformers whenever possible. monad-control is a phenomonal library, but I don't think most people should ever have to interact with it directly. Like async exceptions and unsafePerformIO, there are parts of our library ecosystem that require them, but you should stick to higher-level libraries that hide that insanity from you, the same way we use higher-level languages to avoid having to write assembly.

Finally, having to think about all of the monadic state stuff in my code gives me a headache. It's possible for us to have a library like lifted-base, but which constrains functions to only taking one argument in the m monad and the rest in IO to avoid the multiple-state stuff. However, my preferred solution is to avoid wherever possible monad transformers that introduce monadic state, and stick to ReaderT like things for the majority of my application. (Yes, this is another pitch for my ReaderT design pattern.)

Full final source code

#!/usr/bin/env stack
-- stack --resolver lts-8.12 script
{-# LANGUAGE GeneralizedNewtypeDeriving #-}
{-# LANGUAGE MultiParamTypeClasses #-}
{-# LANGUAGE TypeFamilies #-}
{-# LANGUAGE FlexibleContexts #-}
{-# LANGUAGE ScopedTypeVariables #-}
import Control.Monad.Trans.Control
import Control.Monad.Trans.Resource
import Control.Exception.Safe
import qualified Control.Exception.Lifted as Lifted
import Conduit

newtype Bad a = Bad { runBad :: IO a }
  deriving (Functor, Applicative, Monad, MonadIO, MonadThrow, MonadBase IO)
instance MonadBaseControl IO Bad where
  type StM Bad a = IO a

  liftBaseWith withRun = Bad $ withRun $ return . runBad
  restoreM = Bad

newtype Good a = Good { runGood :: IO a }
  deriving (Functor, Applicative, Monad, MonadIO, MonadThrow, MonadBase IO)
instance MonadBaseControl IO Good where
  type StM Good a = a

  liftBaseWith withRun = Good $ withRun runGood
  restoreM = Good . return

fileLen1 :: (MonadThrow m, MonadBaseControl IO m, MonadIO m)
         => FilePath
         -> m Int
fileLen1 fp = runResourceT
           $ runConduit
           $ sourceFile fp
          .| lengthCE

fileLen2 :: (MonadThrow m, MonadBaseControl IO m, MonadIO m)
         => FilePath
         -> m Int
fileLen2 fp = Lifted.bracket
  createInternalState
  closeInternalState
  $ runInternalState
  $ runConduit
  $ sourceFile fp
 .| lengthCE

-- I'm ignoring async exception safety. This needs mask.
fileLen3 :: forall m.
            (MonadThrow m, MonadBaseControl IO m, MonadIO m)
         => FilePath
         -> m Int
fileLen3 fp = control $ \run -> do
  istate <- createInternalState
  res <- run (runInternalState inner istate)
          `onException` closeInternalState istate
  closeInternalState istate
  return res
  where
    inner :: ResourceT m Int
    inner = runConduit $ sourceFile fp .| lengthCE

fileLen4 :: forall m.
            (MonadThrow m, MonadBaseControl IO m, MonadIO m)
         => FilePath
         -> m Int
fileLen4 fp = control $ \run -> bracket
  (run createInternalState)
  (\st -> run $ restoreM st >>= closeInternalState)
  (\st -> run $ restoreM st >>= runInternalState inner)
  where
    inner :: ResourceT m Int
    inner = runConduit $ sourceFile fp .| lengthCE

main :: IO ()
main = do
  putStrLn "fileLen1"
  tryAny (fileLen1 "/usr/share/dict/words") >>= print
  tryAny (runBad (fileLen1 "/usr/share/dict/words")) >>= print
  tryAny (runGood (fileLen1 "/usr/share/dict/words")) >>= print

  putStrLn "fileLen2"
  tryAny (fileLen2 "/usr/share/dict/words") >>= print
  tryAny (runBad (fileLen2 "/usr/share/dict/words")) >>= print
  tryAny (runGood (fileLen2 "/usr/share/dict/words")) >>= print

  putStrLn "fileLen3"
  tryAny (fileLen3 "/usr/share/dict/words") >>= print
  tryAny (runBad (fileLen3 "/usr/share/dict/words")) >>= print
  tryAny (runGood (fileLen3 "/usr/share/dict/words")) >>= print

  putStrLn "fileLen4"
  tryAny (fileLen4 "/usr/share/dict/words") >>= print
  tryAny (runBad (fileLen4 "/usr/share/dict/words")) >>= print
  tryAny (runGood (fileLen4 "/usr/share/dict/words")) >>= print

Bonus exercise Take the checkControl function I provided above, and use it in the Good and Bad monads. See what the result is, and if you can understand why that's the case.