Re: Statically Typing Big Erratic JSON
Related to my previous post Re: The User Wizard Scenario, I’d like to address another scenario cited as reasons to favor dynamic types:
Some APIs have a huge JSON format. But we might only need a few fields. Some fields could be
false
sometimes but a list of strings other times. Some fields should even be parsed depending on some other fields in the JSON. To statically type this is very tedious and the type might not be understandable anymore.
When it comes to parsing JSON, there are reasons to favor dynamic types (who doesn’t like JSON.parse
when exploring data?), but I think the reasons above are actually reasons to favor static types.
1. “Some APIs have a huge JSON format. But we might only need a few fields.”
In a dynamically typed language, we’d simply ignore the rest of the json and use what we need
function handle(jsonString) {
let user = JSON.parse(jsonString)
store(user);
}
function store({ email }) {
// do stuff with `email`
}
In a statically typed language, we’d simply ignore the rest of the json and decode what we need
userDecoder =
Json.Decode.map (\s -> { email = s })
(field "email" string) -- only decode `"email"` field
handle jsonString =
case decodeString userDecoder jsonString of
Ok user ->
store user
Err jsonError -> -- runtime error for
handle jsonError -- dynamically typed langs
store { email } =
-- do stuff with `email`
The added benefit is, if email
field is anything but a String
, we would’ve gotten a parsing error early and have code to deal with it upfront. Whereas, in the JS example, we might not know that our email
is null
or 42
until much later in the system (a background job 3 days later?).
Troubleshooting the source of bad values that causes a crash is tedious and (given decoder pattern exists) an unnecessary trouble imo.
Even if we implemented decoders in JS, we still can’t reap its benefits in the rest of our dynamically typed system. We need to manually add assertions everywhere that matters (that we remember to). welcomeEmail( { email })
? hmm, add an assertion just to be safe
function welcomeEmail( { email }) {
if (typeof email !== 'string') {
(Btw, what can we effectively do with an invalid value here? Dealing with errors deep in the system is awkward)
Isn’t writing type signatures everywhere equivalent to scattering assertions everywhere?
No. Types are checked at compile time over the entire codebase, while assertions checks at runtime… and only when that line is run. 5th page of a form wizard? Code gotta run until there, in the right condition, to find out about it.
We can add type signatures in some dynamically typed languages
Ironically, type inferred languages like Haskell had long allowed type signatures to even be removed and still keep the benefit!
Basically one provides “where do you want to type check” vs the other provides “everything is type checked”. I think it’s safe to say while we might prefer some control over what we want to type check in our code, we’d also prefer other people’s code to be fully type checked where possible 😆 … as they say, “Software engineering is what happens to programming when you add time and other programmers.”
2. “Some fields could be
false
sometimes but a list of strings other times. Some fields should even be parsed depending on some other fields in the JSON.”
Since the said JSON is beyond our control, we just have to deal with it. There is no escape.
In a dynamically typed language, we do whatever is necessary
function handle(jsonString) {
let user = JSON.parse(jsonString)
// `false` just means no preferences
if (user.preferences === false) user.preferences = []
// `state` determines what `date` actually means
if (user.state === 'deleted') user.deletedAt = user.date
if (user.state === 'active') user.lastLoginAt = user.date
store(user);
}
function store({ email, preferences, deletedAt, lastLoginAt }) {
// do stuff with fields
}
In a dynamically typed language, we do whatever is necessary but inside the decoders.
preferenceDecoder =
Json.Decode.oneOf
[ map (always []) decodeFalse -- `false` found? decode as an empty list []
, list string -- otherwise, decode as list of string
]
dateDecoder =
Json.Decode.map2 datesFromState
-- decode both json fields `state` and `date`
-- then hand it off to `datesFromState` to decide
(Json.Decode.field "state" string)
(Json.Decode.field "date" isoDate)
datesFromState stateString date =
case stateString of
"deleted" ->
{ deletedAt = Just date, lastLoginAt = Nothing }
"active" ->
{ deletedAt = Nothing, lastLoginAt = Just date }
_ ->
{ deletedAt = Nothing, lastLoginAt = Nothing }
-- We compose our `userDecoder` with these decoders
userDecoder =
Json.Decode.map3 buildUser
-- decode the fields then assemble with `buildUser`
(field "email" string)
(field "preferences" preferenceDecoder)
(dateDecoder)
buildUser email preferences { deletedAt, lastLoginAt } =
{ email = email
, preferences = preferences
, deletedAt = deletedAt
, lastLoginAt = lastLoginAt
}
-- Then we update the original snippet to mention the new fields
handle jsonString =
case decodeString userDecoder jsonString of
Ok user ->
store user
Err jsonError ->
handle jsonError
store { email, preferences, deletedAt, lastLoginAt } =
-- do stuff
3. To statically type this is very tedious and the type might not be understandable anymore.
While there is obviously more lines of code, but note that the type of User
is as clean. The messy reality of the JSON rules are captured and compartmentalized inside individual decoder functions, very testable out of the box too. The rest of the system can use this clean User
type in abandon, with the compiler ensuring there are no stray threads.
UPDATE: a followup post Re: REPL