Generating Types from JSON Samples

Most of the time when I write data types to parse JSON data into, it is easiest to look at a sample of the data and work out the types from that. I thought we could automate this process.

So for example, the Riot Games API has a lot of methods. I don’t want to write out types and parsers for all of them, or any of them.

{"champions": [
   {
      "botMmEnabled": true,
      "bestItem": [],
      "id": 123,
      "rankedPlayEnabled": false,
      "botEnabled": false,
      "active": true,
      "freeToPlay": true
   },
   {
      "botMmEnabled": true,
      "bestItem": [1, 2],
      "id": 456,
      "rankedPlayEnabled": false,
      "botEnabled": false,
      "active": true,
      "freeToPlay": true
   }
]}

The algorithm we use is similar to subtyping. Each field of the object starts with no information known about it (we call this U). If we see some information about that field, we combine it with what we previously knew about it. So if it was U and we find a type, we can now assume it is only that type. If we find a conflicting type, we can only assume it is top or T.

The lattice L describes this:

lattice

Note: a vector contains an L and an object is a mapping from a key to an L.

I used a monoid to represent this relation.

instance Monoid EL where
mempty = U
U `mappend` U = U
T `mappend` _ = T
_ `mappend` T = T

Next we add the rule for finding a type when we did not know one. The cases for strings (S):

U `mappend` S = S
S `mappend` U = S

The two special cases we must consider are objects and arrays. If we see an empty array in an object, we don’t know the type yet. But if we see that array again with elements, we may know what its type is.

For objects, we take all the fields with types on the left object and the right object. In the case we find a field in both objects, we use mappend recursively to get a type that satisfies both instances of that object. In the Data.HashMap module this function is called unionWith, so this code is:

O m `mappend` O n = O (H.unionWith mappend m n)

For arrays, we can simply take the satisfying type from appending the two types inside the vectors together.

V x `mappend` V y = V (x `mappend` y)

In our example, the first time we see bestItem we write down we know it is a vector but not what type of vector V U. The next time, we see it contains a number, so we refine it to a V N.

Then we write a function to map Aeson values to our intermediate types and a function from our intermediate types to a Template Haskell [Dec]. We use a Monad Writer to do so, as each time we find a new object we need to “write” a new data type for it. We also need to pass in the name for the top level type (in my code, it is “API”). This function looks like:

format :: Text -> EL -> Writer [Dec] Type

Finally we put a QuasiQuoter interface over it so we can use it. This outputs the types:

data Tchampions
   = Tchampions {botMmEnabled :: Bool,
                 bestItem :: Vector Scientific,
                 freeToPlay :: Bool,
                 botEnabled :: Bool,
                 active :: Bool,
                 id :: Scientific,
                 rankedPlayEnabled :: Bool}
   deriving (Show, Generic)
 instance FromJSON Tchampions
 instance ToJSON Tchampions

 data TAPI
   = TAPI {champions :: Vector Tchampions}
   deriving (Show, Generic)
 instance FromJSON TAPI
 instance ToJSON TAPI

We name the fields exactly the keys in the JSON object so that we are able to parse JSON via GHC.Generics. Easy!

Source code here

Thanks Thomas for the Graphviz help and Mark for the proof reading.

Note: What doesn’t work

  • Optional data (Maybes)
  • Different types of objects
  • If an API has different fields with the same name