Issue 05
Jul 20, 2022 · 5 minute read
Jason is an actively maintained JSON parser and generator in Elixir - if you think of handling JSON structures to send to your clients or just shepherding data around your Elixir application you’re likely to reach for this or Poison - of which the contents of this post are relevant to too.
The following isn’t required - feel free to skip to the content below for the tutorial proper. Reading this, however, will give you an understanding and real-life example of when you might want to use custom Jason encoding.
At Multiverse we’re working on developing a new version of an existing third-party integration and taking it in-house as part of our core platform. The feature in question isn’t important, it’s just important to know that there’s a lot of data - think millions of records.
One problem is that we don’t have direct access to the database of the existing system - what we do have is the ability to export the data into a CSV! Thankfully there are only 8 or so fields for each record meaning we can resort to a good, old-fashioned CSV import to bring the data over.
Simple right? Wrong.
Importing 1m+ rows of data in PSQL isn’t too bad but it will slow down and potentially lock up the rest of our application while we do the import - and that’s bad! We also need to transform the data into a more reasonable data structure for our new implementation - the old version had some inconsistencies and type issues we’d like to do away with - for example dates represented in epoch time and un-required data fields.
Epoch time (a.k.a. Unix time) is a system for describing a point in time by counting the number of seconds (excluding leap seconds) that have elapsed since the /Unix epoch/ - the Unix epoch being 00:00:00 UTC on 1-1-1970.
So now we need to transform and import each record, something that’ll take time. To mitigate this, we’re going to use RabbitMQ, an open source message broker by packaging each of the rows from our CSV into a JSON message, asynchronously firing them off as messages and let our RabbitMQ consumer deal with them as they come in concurrently - if you’re interesting in learning more about Rabbit, let me know!
So where does Jason encoding come in?
Remember that we want to transform our data - the issue with exporting everything into a CSV file is that when we read it into our application we’ll be reading everything as a string. Converting our date fields now involve us manually calling String.to_integer/2
which is something we don’t want to have to do manually - imagine we had 100 fields to import!
defmodule LegacyLog do
defstruct [:id, :user_id, :legacy_log_id, :time, :notes, :date, :inserted_at, :updated_at]
end
All we have here is a simple Elixir struct with fields for each of the records we want to handle and parse from the CSV import.
Using it in our CSV import:
defp serialise(data) do
data
|> Enum.map(fn row ->
[id, user_id, logtype_id, time, _target, notes, date, timecreated, timemodified, _, _] = row
%LegacyLog{
id: id,
user_id: user_id,
legacy_log_id: logtype_id,
time: time,
notes: notes,
date: date,
inserted_at: timecreated,
updated_at: timemodified
}
end)
end
Yes, we could transform the data here as we import it but good RabbitMQ practise involves pretending that our messages are coming from somewhere external we don’t have direct access to - even though we’re producing and consuming them ourselves in this instance.
The bit you’re really here for.
We need to take advantage of the way Jason
has defined Protocols
to define a custom implementation on how to handle our encoding.
Following on from our previously defined struct, we’re going to add a defimpl
for the struct itself and specify what we want to do:
defmodule LegacyLog do
defstruct [:id, :user_id, :legacy_log_id, :time, :notes, :date, :inserted_at, :updated_at]
defimpl Jason.Encoder, for: LegacyLog do
@impl Jason.Encoder
def encode(value, opts) do
{notes, remaining_log} = Map.pop(value, :notes)
remaining_log
|> Map.from_struct()
|> Map.new(fn {k, v} -> {k, String.to_integer(v)} end)
|> Map.put(:notes, notes)
|> Jason.Encode.map(opts)
end
end
end
In the above code I’m defining an implementation for the type LegacyLog
meaning when a Jason.Encode/3
call comes it’s way, it knows to call my custom code - any other encoding function call will use the default implementation (which is fine for most use cases).
The custom code itself is:
Struct
into a Map
Jason.Encode
which guarantees valid JSON.Protocols are a mechanism to achieve polymorphism in Elixir when you want behaviour to vary depending on the data type. It can be pretty powerful. Again, reach out if you’d like to hear more about Protocols.
We’re done!
Subscribe to my Substack below for similar content and follow me on Twitter for more Elixir (and general programming) tips.
Want to learn and master LiveView?
Check out the book I'm writing
The Phoenix LiveView Cookbook