I'm working on a project right now where I'm dealing with
the Prim
typeclass and I need to ensure that a particular
function I've written is specialized. That is, I need to make sure that
when I call it, I get a specialized version of the function in which the
Prim
dictionaries get inlined into the specialized
definition instead of being passed at runtime.
Fortunately, this is a pretty well-understood thing in GHC. You can just write:
{-# SPECIALIZE foo :: ByteArray Int -> Int #-}
foo :: Prim a => ByteArray a -> Int
foo = ...
And in my code, this approach is working fine. But, since typeclasses are
open, there can be Prim
instances that I don't know about yet when
the library is being written. This brings me to the problem at hand.
The GHC user guide's documentation of SPECIALIZE
provides two ways to use it. The first is putting SPECIALIZE
at the
site of the definition, as I did in the example above. The second is
putting the SPECIALIZE
pragma in another module where the function is imported.
For reference, the example the user manual provides is:
module Map( lookup, blah blah ) where
lookup :: Ord key => [(key,a)] -> key -> Maybe a
lookup = ...
{-# INLINABLE lookup #-}
module Client where
import Map( lookup )
data T = T1 | T2 deriving( Eq, Ord )
{-# SPECIALISE lookup :: [(T,a)] -> T -> Maybe a
The problem I'm having is that this is not working in my code. The project is on github, and the relevant lines are:
To run the benchmark, run these commands:
git submodule init && git submodule update
cabal new-build bench && ./dist-newstyle/build/btree-0.1.0.0/build/bench/bench
When I run the benchmark as is, there is a part of the output that reads:
Off-heap tree, Amount of time taken to build:
0.293197796
If I uncomment line 151 of BTree.Compact, that part of the benchmark runs fifty times faster:
Off-heap tree, Amount of time taken to build:
5.626834e-2
It's worth pointing out that the function in question, modifyWithM
, is enormous.
It's implementation is over 100 lines, but I do not think this should make a
difference. The docs claim:
... mark the definition of f as INLINABLE, so that GHC guarantees to expose an unfolding regardless of how big it is.
So, my understanding is that, if specializing at the definition site works, it should always be possible to instead specialize at the call site. I would appreciate any insights from people who understand this machinery better than I do, and I'm happy to provide more information if something is unclear. Thanks.
EDIT: I've realized that in the git commit I linked to in this post, there is a problem with the benchmark code. It repeatedly inserts the same value. However, even after fixing this, the specialization problem is still happening.
forall
frommodifyWithM
. It looks like there were several bugs related to cross-module specialization that were fixed by some changes in 8.2.1 which you might try (if that's not what you are using): see related tickets on ghc.haskell.org/trac/ghc/ticket/13376 . If you are using 8.2.1 I think you should report a bug – jberryman Jun 24 at 17:46forall
, but that did not resolve it either. – Andrew Thaddeus Martin Jun 24 at 19:49INLINABLE
to the definition site? I think that's almost certainly necessary. – dfeuer Jun 24 at 21:27INLINABLE
at the definition site. – Andrew Thaddeus Martin Jun 24 at 21:37