First, the extensions and libraries that we're going to use:
{-# LANGUAGE RankNTypes, TypeOperators #-}
import Control.Monad (join)
Of these, RankNTypes is the only one that's absolutely essential to the below. I once wrote an explanation of RankNTypes that some people seem to have found useful, so I'll refer to that.
Quoting Tom Crockett's excellent answer, we have:
  A monad is...
  
  
  - An endofunctor, T : X -> X
- A natural transformation, μ : T × T -> T, where × means functor composition
- A natural transformation, η : I -> T, where I is the identity endofunctor on X
...satisfying these laws:
  
  
  - μ(μ(T × T) × T)) = μ(T × μ(T × T))
- μ(η(T)) = T = μ(T(η))
How do we translate this to Haskell code? Well, let's start with the notion of a natural transformation:
-- | A natural transformations between two 'Functor' instances.  Law:
--
-- > fmap f . eta g == eta g . fmap f
--
-- Neat fact: the type system actually guarantees this law.
--
newtype f :-> g =
    Natural { eta :: forall x. f x -> g x }
A type of the form f :-> g is analogous to a function type, but instead of thinking of it as a function between two types (of kind *), think of it as a morphism between two functors (each of kind * -> *). Examples:
listToMaybe :: [] :-> Maybe
listToMaybe = Natural go
    where go [] = Nothing
          go (x:_) = Just x
maybeToList :: Maybe :-> []
maybeToList = Natural go
    where go Nothing = []
          go (Just x) = [x]
reverse' :: [] :-> []
reverse' = Natural reverse
Basically, in Haskell, natural transformations are functions from some type f x to another type g x such that the x type variable is "inaccessible" to the caller. So for example, sort :: Ord a => [a] -> [a] cannot be made into a natural transformation, because it's "picky" about which types we may instantiate for a. One intuitive way I often use to think of this is the following:
- A functor is a way of operating on the content of something without touching the structure.
- A natural transformation is a way of operating on the structure of something without touching or looking at the content.
Now, with that out of the way, let's tackle the clauses of the definition.
The first clause is "an endofunctor, T : X -> X." Well, every Functor in Haskell is an endofunctor in what people call "the Hask category," whose objects are Haskell types (of kind *) and whose morphisms are Haskell functions. This sounds like a complicated statement, but it's actually a very trivial one. All it means is that that a Functor f :: * -> * gives you the means of constructing a type f a :: * for any a :: * and a function fmap f :: f a -> f b out of any f :: a -> b, and that these obey the functor laws.
Second clause: the Identity functor in Haskell (which comes with the Platform, so you can just import it) is defined this way:
newtype Identity a = Identity { runIdentity :: a }
instance Functor Identity where
    fmap f (Identity a) = Identity (f a)
So the natural transformation η : I -> T from Tom Crockett's definition can be written this way for any Monad instance t:
return' :: Monad t => Identity :-> t
return' = Natural (return . runIdentity)
Third clause: The composition of two functors in Haskell can be defined this way (which also comes with the Platform):
newtype Compose f g a = Compose { getCompose :: f (g a) }
-- | The composition of two 'Functor's is also a 'Functor'.
instance (Functor f, Functor g) => Functor (Compose f g) where
    fmap f (Compose fga) = Compose (fmap (fmap f) fga)
So the natural transformation μ : T × T -> T from Tom Crockett's definition can be written like this:
join' :: Monad t => Compose t t :-> t
join' = Natural (join . getCompose)
The statement that this is a monoid in the category of endofunctors then means that Compose (partially applied to just its first two parameters) is associative, and that Identity is its identity element. I.e., that the following isomorphisms hold:
- Compose f (Compose g h) ~= Compose (Compose f g) h
- Compose f Identity ~= f
- Compose Identity g ~= g
These are very easy to prove because Compose and Identity are both defined as newtype, and the Haskell Reports define the semantics of newtype as an isomorphism between the type being defined and the type of the argument to the newtype's data constructor. So for example, let's prove Compose f Identity ~= f:
Compose f Identity a
    ~= f (Identity a)                 -- newtype Compose f g a = Compose (f (g a))
    ~= f a                            -- newtype Identity a = Identity a
Q.E.D.