My brilliant colleague Jacob Stanley has finally released his property-testing library, Hedgehog. You can grab it right now on Hackage.
I've been using Hedgehog's predecessor, Jack, for several months, and have come to rely on it for quick quality assurance. Since I've had the benefit of this extended technical preview, I thought I'd spend a few words on the ways this little library has improved my working conditions.
This post will largely concerned with retraversing property-testing basics using Hedgehog, such that a beginner or QuickCheck user can get started reasonably quickly and understand why things are the way they are. Eventually, I'll get to a more detailed post about using Hedgehog to test a typed programming language.
Hedgehog is ideal for complex generators. It has allowed me to fly by the seat of my pants, identifying capture bugs, colossal mistakes in the type checker, broken map merges, mistakes with code generation, and parser / pretty-printer ambiguities. In my opinion, it greatly improves the ergonomics of property testing, and is worth considering over existing alternatives.
If you're not familiar with property testing, now's a great time to learn. The idea is pretty simple: rather than writing unit tests, which apply some predicate to predefined points in the state space, we simply write a random generator for all good inputs, take a large sample, and apply the predicate to the entire sample.
Here's a Hedgehog version of the Hello World of property testing: reversing a list twice should produce the original list.
import qualified Data.List as List
import Hedgehog
import qualified Hedgehog.Gen as Gen
import qualified Hedgehog.Range as Range
genIntList :: Monad m => Gen m [Int]
genIntList =
let listLength = Range.linear 0 100000
in Gen.list listLength Gen.enumBounded
prop_reverse :: Property
prop_reverse =
property $ do
xs <- forAll genIntList
List.reverse (List.reverse xs) === xs
We generate a list of Int
, where the list has anywhere from 0 to 100000 elements, and each Int
lies somewhere between that type's minBound
and maxBound
as defined by the Enum
class. So, INT_MIN
and INT_MAX
. We can run this property with the check
function:
λ> check prop_reverse
✓ prop_reverse passed 100 tests.
Although it takes some practice, property testing is an effective and cheap way to gain confidence that your program works.
When a property fails, the generated value becomes a counterexample, and it must be communicated to the operator. Since it came from a random generator, though, it is likely to be a colossal and inane example. Printing it to standard output and handing it off to the operator immediately is going to ruin their life for the next few hours.
For this reason, most property-testing libraries will include some notion of shrinking, a deterministic search that reduces the counterexample until the property stops failing. The user only sees the smallest counterexample.
In Hedgehog, a reasonably good1 shrink function is constructed for free alongside every generator. Shrink functions are coupled with every generator, and a lazy tree of all possible shrinks is constructed alongside each generated value. Shrinking is then little more than traversing the tree. This is Hedgehog's theoretical point of difference from QuickCheck, though it also differs in a few practical ways.
Let's play to my strengths here and write a piece of bad code, to demonstrate how genIntList
shrinks. My bad code will be a subtly broken implementation of reverse
. We can test it against an oracle function that we know to be correct, List.reverse
.
-- Drops an element somewhere around the middle of the list.
fauxReverse :: [a] -> [a]
fauxReverse xs =
let sx = List.reverse xs
mp = length xs `div` 2
(as, bs) = List.splitAt mp sx
in as <> List.drop 1 bs
prop_fauxReverse :: Property
prop_fauxReverse =
property $ do
xs <- forAll genIntList
fauxReverse xs === List.reverse xs
Recall that genIntList
has a linear Range
from 0 to 100000 for its length, and the elements are generated using enumBounded
, which uses a Range
from minBound
to maxBound
. When this property inevitably fails, we'd expect to see this information used to inform the counterexample search, and indeed we do:
λ> check prop_fauxReverse
✗ <interactive> failed after 2 tests and 11 shrinks.
┏━━ Hedge.hs ━━━
30 ┃ prop_fauxReverse :: Property
31 ┃ prop_fauxReverse =
32 ┃ property $ do
33 ┃ xs <- forAll genIntList
┃ │ [ -9223372036854775808 ]
34 ┃ fauxReverse xs === List.reverse xs
┃ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
┃ │ Failed (- lhs =/= + rhs)
┃ │ - []
┃ │ + [ -9223372036854775808 ]
This failure can be reproduced by running:
> recheck (Size 1) (Seed 6043815965182080260 5082204861531945465) <property>
False
λ> minBound :: Int
-9223372036854775808
Generators made out of Hedgehog.Range
are constructive; Range
contains enough information to generate a correct value every time, and each Range implies a shrink step and gradient. Ranges are good!
We might like to write a more precise property to protect against dropped elements. A reverse function should preserve every element in the list!
λ> check prop_fauxReverse_length
✗ <interactive> failed after 2 tests and 1 shrink.
┏━━ Hedge.hs ━━━
36 ┃ prop_fauxReverse_length :: Property
37 ┃ prop_fauxReverse_length =
38 ┃ property $ do
39 ┃ xs <- forAll genIntList
┃ │ [ -9223372036854775808 ]
40 ┃ length (fauxReverse xs) === length xs
┃ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
┃ │ Failed (- lhs =/= + rhs)
┃ │ - 0
┃ │ + 1
This failure can be reproduced by running:
> recheck (Size 1) (Seed 4416742318514417762 31116095073768267) <property>
False
While Range
helps us construct generators in terms of bounds and steps, we may also wish to write generators in terms of negative invariants. It may be simpler to generate terms, apply some predicates to them, and discard the invalid ones. This gives us the flexibility to write much more complicated generators, though they may take longer to run, or diverge.
Generators that discard heavily are certain to be less productive than other constructive generators. They can generate an arbitrary number of invalid values for each valid one, at the expense of your CPU cycles and precious seconds you probably needed for your life. So, this is a tool to be used gently and infrequently.
In practice, many invariants can be expressed constructively in terms of ranges and bind. Regardless, negative generators that discard are necessary when a constructive generator is not feasible.
In QuickCheck (and Jack) we'd use the suchThat
combinator for this.
Hedgehog's Gen
implements Alternative
and MonadPlus
, and the user is free to use empty
/ mzero
/ (<|>)
/ asum
to indicate failure. There are also a few helpful named wrappers and functions over these instances, like Gen.filter
(which is similar to suchThat
) and discard
. Discarding is a first-class notion in Hedgehog, and invariants enforced this way are also enforced during shrinking.
I use discard
all the time, usually during large choice
/ asum
blocks. A good example is in Hedgehog's STLC example ; if no values of a certain type are currently bound in scope, the known-value generator fails, and the choice
block higher up will generate a literal instead.
Here's a quite vacuous example: a generator for sized lists of unique elements. To provide a better demonstration, I'll use some fairly constraining invariants: elements must be valued between 0 and 10, and the list's length must be between 0 and 10:
genUniqueIntList :: Monad m => Gen m [Int]
genUniqueIntList = do
let listLength = Range.linear 0 10
genInt = Gen.int (Range.linear 0 10)
xs <- Gen.list listLength genInt
if List.nub xs == xs
then pure xs
else Gen.discard -- empty / mzero
Using Gen.sample
to inspect some random values from this generator, we see that it works:
λ> Gen.sample genUniqueIntList
[[0],[1],[1],[1],[3],[3,0],[2]]
λ> Gen.sample genUniqueIntList
[[3,0],[1],[0],[2],[3,1],[3,0],[0,1],[3],[2,1,3]]
λ> Gen.sample genUniqueIntList
[[3,1],[1,3,2],[3,1,2],[2],[2,0]]
λ> Gen.sample genUniqueIntList
[[3],[0,2],[0,2],[0,1,3],[3]]
If you were to run the generator just once and inspect its shrinks using Gen.print,
you would observe that this generator fails nearly all the time. We get slightly better results when rewritten with Gen.filter
, which scales the ranges behind the scenes:
genUniqueIntList :: Monad m => Gen m [Int]
genUniqueIntList = do
let listLength = Range.linear 0 10
genInt = Gen.int (Range.linear 0 10)
Gen.filter (\xs -> List.nub xs == xs) (Gen.list listLength genInt)
λ> Gen.print genUniqueIntList
=== Outcome ===
[1,2]
=== Shrinks ===
[]
[2]
[1]
[0,2]
[1,0]
<discard>
Hedgehog.Gen
contains useful combinators for various containers, including maps and sets. These are pretty handy, and much more convenient than rolling it yourself:
λ> Gen.sample (Gen.set (Range.linear 0 10) (Gen.int (Range.linear 0 10)))
[ fromList []
, fromList [ 3 ]
, fromList [ 3 ]
, fromList [ 0 , 3 ]
, fromList [ 0 , 2 ]
, fromList [ 0 , 2 ]
, fromList [ 3 ]
, fromList [ 0 ]
, fromList []
, fromList []
]
In short, discarding in Hedgehog is easy, it still leads to meaningful and correct shrinking, and there are some helper functions in Hedgehog.Gen
you should consult with before doing it yourself. As with QuickCheck, keep an eye on the values coming out of your generator, as discarding can negatively impact coverage.
Note that shrinking exists in QuickCheck, and is reasonably good. Every Arbitrary
instance carries a shrink :: a -> [a]
function. There's also forAllShrink
if you wish to avoid Arbitrary
and all its orphan works. All the Arbitrary
instances in popular use come with shrinking functions. However, since shrinking is not fundamental to Gen
, we can't compose generators without also manually constructing a shrink function. If the generator has few invariants, we may get away with genericShrink
, but in practice engineers often opt to forego shrinking altogether when working with complicated generators.
Hedgehog supports custom shrinking in the QuickCheck style via Hedgehog.Gen.shrink
.
Hedgehog's predecessor, Jack, was built as a QuickCheck utility library. It reused QuickCheck's runner and notion of Property
, and leaned on the wealth of generators out there on Hackage.
QuickCheck is beautiful, long-lived, and well-maintained, and I've used it a lot to great benefit. It's just a relatively old codebase, with lots of subtle niggles.
Hedgehog does not depend on QuickCheck. Starting from scratch with a new dependency base and no backwards-compatibility requirements allows a few fun improvements:
Hedgehog allows first-class effectful generators and effectful tests. The user must track their effects in the type, as you would in ordinary Haskell. There's no unsafePerformIO
under the hood.
The type of generators, Gen
, forms a monad transformer. It has instances for all the classes you'd expect, like MonadIO
and MonadTrans
. It also has MTL-style instances, MonadState
and the like. When those are insufficient, it implements MFunctor
, so you can hoist
your way to victory.
Here's a pretty silly IO generator:
import Data.Text (Text)
import qualified Data.Text as T
import qualified Data.Text.IO as T
import Hedgehog
import qualified Hedgehog.Gen as Gen
import qualified Hedgehog.Range as Range
-- Generate a dictionary word by sampling from words.
-- Don't actually do this.
genWord :: Gen IO Text
genWord = do
ws <- T.lines <$> liftIO (T.readFile "/usr/share/dict/words")
Gen.element ws
λ> Gen.sample genWord
["tenoner","unsophistically","suggest","catechistic","podoscapher","seallike","colonoscopy","Gemara","eosinophilic","monomyarian"]
It will be interesting to see what kind of effectful generators people find useful. I've mostly been using this transformer redundantly, managing environments with ReaderT
to clean up complicated code. I can't think of a use for an IO generator, but nor can I think of a reason they should be prohibited.
The type of tests, Test
, also forms a monad transformer with the similar instances to Gen
.
To write an IO test, we simply use liftIO
to our heart's content:
prop_got_words :: Property
prop_got_words =
property $ do
b <- liftIO (doesFileExist "/usr/share/dict/words")
assert b
To test functions in Either
or ExceptT
, liftEither
and liftExceptT
functions are provided. These use pretty-show
on the error, and use GHC call stacks to isolate the line of code responsible for the failure:
prop_got_words_except :: Property
prop_got_words_except =
property $
liftExceptT gotWords
gotWords :: MonadIO m => ExceptT Text m ()
gotWords = do
b <- liftIO (doesFileExist "/usr/share/dict/words")
if b then pure () else throwError "don't got words"
The keen observer will notice that property :: Test IO () -> Property
requires the user to finagle their monad stack into IO
before it can be run via mmorph
. This is one place the interface could be improved! Hedgehog is just getting started.
In addition to checkSequential
, which behaves much like quickCheckAll
, Hedgehog supplies checkConcurrent
. It uses concurrent-output
to wrangle the terminal.
testsSeq :: IO Bool
testsSeq =
checkSequential $$(discover)
testsCon :: IO Bool
testsCon =
checkConcurrent $$(discover)
Note that concurrently checking IO properties may lead to nondeterministic failure. The user must identify and sequentialise tests that cannot be run concurrently.
The API for this is in slight flux at the time of writing, so please check out the examples and documentation for more.
There is nothing worse than recreating a complex counterexample from stdout. Lots of libraries (including time
) have Show
instances that perform pretty-printing, rather than the derived version, and these can't be easily pasted into GHCi. If you're bad at writing manual shrink functions, a QuickCheck counterexample will often be huge, perhaps larger than your clipboard buffer! This is a bad place to be.
When a property fails, Hedgehog will print the seed used by the random number generator. This seed is portable across platforms, and can be pasted directly into GHCi:
This failure can be reproduced by running:
> recheck (Size 1) (Seed 861472791889809000 7503307783496352651) prop_fauxReverse_length
Hedgehog uses the new call stack API in GHC 8 to great effect when reporting property failures. Most combinators carry a HasCallStack
constraint. This enables the inline display of generated values:
✗ prop_fauxReverse_length failed after 2 tests and 19 shrinks.
┏━━ Hedge.hs ━━━
48 ┃ prop_fauxReverse_length :: Property
49 ┃ prop_fauxReverse_length =
50 ┃ property $ do
51 ┃ xs <- forAll genIntList
┃ │ [ -9223372036854775808
┃ │ , -9223372036854775808
┃ │ , -9223372036854775808
┃ │ , -9223372036854775808
┃ │ , -9223372036854775808
┃ │ , -9223372036854775808
┃ │ , -9223372036854775808
┃ │ , -9223372036854775808
┃ │ , -9223372036854775808
┃ │ , -9223372036854775808
┃ │ ]
52 ┃ length (fauxReverse xs) === length xs
┃ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
┃ │ Failed (- lhs =/= + rhs)
┃ │ - 9
┃ │ + 10
This failure can be reproduced by running:
> recheck (Size 1) (Seed 861472791889809000 7503307783496352651) prop_fauxReverse_length
Hedgehog
depends on pretty-show
, and uses it to print counterexamples where possible. The Show
constraints on various combinators like (===)
and liftExceptT
are used in this way. You may still get a wall of text if your counterexample doesn't reduce well, but it'll be a heck of a lot neater.
This is a simple trick we can extend to regular ol' QuickCheck, by simply reimplementing (===)
. You should absolutely do this. Life changing.
This is a minor aesthetic difference, but Hedgehog has no equivalent to the Arbitrary
typeclass. Since test frameworks typically live outside of a project's dependency tree, and are never in base
, reliance on a typeclass to provide test coverage creates a slightly awkward situation. Users have to choose between orphan instances and spurious dependencies.
Arbitrary
is a little bit questionable anyway, since it is a lawless class used mostly for overloading. Living without it is fairly easy; we just give all our generators names, and summon them by name using forAll
. It does come at a price: we lose the pretty prop_abc x y z = ...
shorthand.
As noted on Reddit by Koen Claessen and previously on the QuickCheck issue tracker, the integrated approach is not perfect. In a monadic generator, terms depend on one another, and thus need to be shrunk in their bound order. This is something like a depth-first search of the counterexample space, halting at a local minima. Hedgehog provides invariant-preserving shrinks for free, but does not necessarily produce better shrinks than QuickCheck. Adapting Hedgehog's integrated shrinking for alternative search procedures is something of an open question.↩