Re: Statically Typing Big Erratic JSON

Related to my previous post Re: The User Wizard Scenario, I’d like to address another scenario cited as reasons to favor dynamic types:

Some APIs have a huge JSON format. But we might only need a few fields. Some fields could be false sometimes but a list of strings other times. Some fields should even be parsed depending on some other fields in the JSON. To statically type this is very tedious and the type might not be understandable anymore.

When it comes to parsing JSON, there are reasons to favor dynamic types (who doesn’t like JSON.parse when exploring data?), but I think the reasons above are actually reasons to favor static types.

1. “Some APIs have a huge JSON format. But we might only need a few fields.”

In a dynamically typed language, we’d simply ignore the rest of the json and use what we need

function handle(jsonString) {
    let user = JSON.parse(jsonString)
    store(user);
}

function store({ email }) {
    // do stuff with `email`
}

In a statically typed language, we’d simply ignore the rest of the json and decode what we need

userDecoder =
    Json.Decode.map (\s -> { email = s })
        (field "email" string) -- only decode `"email"` field

handle jsonString =
    case decodeString userDecoder jsonString of
        Ok user ->
            store user

        Err jsonError ->      -- runtime error for
            handle jsonError  -- dynamically typed langs

store { email } =
    -- do stuff with `email`

The added benefit is, if email field is anything but a String, we would’ve gotten a parsing error early and have code to deal with it upfront. Whereas, in the JS example, we might not know that our email is null or 42 until much later in the system (a background job 3 days later?).

Troubleshooting the source of bad values that causes a crash is tedious and (given decoder pattern exists) an unnecessary trouble imo.

Even if we implemented decoders in JS, we still can’t reap its benefits in the rest of our dynamically typed system. We need to manually add assertions everywhere that matters (that we remember to). welcomeEmail( { email }) ? hmm, add an assertion just to be safe

function welcomeEmail( { email }) {
    if (typeof email !== 'string') {

(Btw, what can we effectively do with an invalid value here? Dealing with errors deep in the system is awkward)

Isn’t writing type signatures everywhere equivalent to scattering assertions everywhere?

No. Types are checked at compile time over the entire codebase, while assertions checks at runtime… and only when that line is run. 5th page of a form wizard? Code gotta run until there, in the right condition, to find out about it.

We can add type signatures in some dynamically typed languages

Ironically, type inferred languages like Haskell had long allowed type signatures to even be removed and still keep the benefit!

Basically one provides “where do you want to type check” vs the other provides “everything is type checked”. I think it’s safe to say while we might prefer some control over what we want to type check in our code, we’d also prefer other people’s code to be fully type checked where possible 😆 … as they say, “Software engineering is what happens to programming when you add time and other programmers.”

2. “Some fields could be false sometimes but a list of strings other times. Some fields should even be parsed depending on some other fields in the JSON.”

Since the said JSON is beyond our control, we just have to deal with it. There is no escape.

In a dynamically typed language, we do whatever is necessary

function handle(jsonString) {
    let user = JSON.parse(jsonString)

    // `false` just means no preferences
    if (user.preferences === false) user.preferences = []

    // `state` determines what `date` actually means
    if (user.state === 'deleted') user.deletedAt = user.date
    if (user.state === 'active')  user.lastLoginAt = user.date

    store(user);
}

function store({ email, preferences, deletedAt, lastLoginAt }) {
    // do stuff with fields
}

In a dynamically typed language, we do whatever is necessary but inside the decoders.

preferenceDecoder =
    Json.Decode.oneOf
        [ map (always []) decodeFalse -- `false` found? decode as an empty list []
        , list string                 -- otherwise, decode as list of string
        ]

dateDecoder =
    Json.Decode.map2 datesFromState
        -- decode both json fields `state` and `date`
        -- then hand it off to `datesFromState` to decide
        (Json.Decode.field "state" string)
        (Json.Decode.field "date" isoDate)

datesFromState stateString date =
    case stateString of
        "deleted" ->
            { deletedAt = Just date, lastLoginAt = Nothing   }
        "active" ->
            { deletedAt = Nothing,   lastLoginAt = Just date }
        _ ->
            { deletedAt = Nothing,   lastLoginAt = Nothing   }

-- We compose our `userDecoder` with these decoders

userDecoder =
    Json.Decode.map3 buildUser
        -- decode the fields then assemble with `buildUser`
        (field "email" string)
        (field "preferences" preferenceDecoder)
        (dateDecoder)

buildUser email preferences { deletedAt, lastLoginAt } =
    { email = email
    , preferences = preferences
    , deletedAt = deletedAt
    , lastLoginAt = lastLoginAt
    }

-- Then we update the original snippet to mention the new fields

handle jsonString =
    case decodeString userDecoder jsonString of
        Ok user ->
            store user

        Err jsonError ->
            handle jsonError

store { email, preferences, deletedAt, lastLoginAt } =
    -- do stuff

3. To statically type this is very tedious and the type might not be understandable anymore.

While there is obviously more lines of code, but note that the type of User is as clean. The messy reality of the JSON rules are captured and compartmentalized inside individual decoder functions, very testable out of the box too. The rest of the system can use this clean User type in abandon, with the compiler ensuring there are no stray threads.

UPDATE: a followup post Re: REPL