June 24, 2015 / by Jörg Rathlev / Software engineer / @
Migrating to JSON in PostgreSQL
PostgreSQL offers a JSON column type, but how do you migrate existing data into a JSON-based structure?
We recently had a case where we wanted to migrate some data in an existing table from a fixed set of columns into a more flexible JSON column.
Consider the following example:
This table is now modified by adding an additional column of type JSON. The
goal is to migrate the data from the name
column into a field in the JSON
object, ideally with a different field name.
Adding the new column is simple:
However, how can you migrate the existing data into the new JSON column?
What doesn’t work?
Unfortunately, creating arbitrary JSON structures with the JSON functions
offered by PostgreSQL is not always straightforward. The main problem is that
the row
constructor loses the column names, which become the field names of
the JSON object created with row_to_json
:
Note how this creates JSON objects where the field name is simply “f1”:
id | name | target
----+------+---------------
1 | test | {"f1":"test"}
2 | foo | {"f1":"foo"}
3 | bar | {"f1":"bar"}
If you’re simply selecting data, you can use a common table expression to first
select the columns that you need, and then use row_to_json
on its rows, as
shown below. However, this does not work for an update statement, at least not
if none of the columns have a unique constraint.
Solution
For PostgreSQL versions before 9.4, the only way to create a JSON structure with arbitrary field names seems to be to construct the JSON structure as a string and then cast it to JSON (if you’re aware of a better solution, please let me know!):
id | name | target
----+------+----------------------------
1 | test | {"json_field_name":"test"}
2 | foo | {"json_field_name":"foo"}
3 | bar | {"json_field_name":"bar"}
The to_json
function converts the value of the source column to a JSON value
and ensures that, for example, any quote characters occuring in strings will be
properly escaped in the resulting JSON object.
If you’re using PostgreSQL 9.4, and only if all your values are strings, you
can also use the new json_object
function:
The result is (in this case) the same. However, this does not work if you want
to migrate a column of, for example, integers because json_object
requires an
array of strings as its input.
Conclusion
PostgreSQL’s JSON support is pretty good (and getting better with each release), but constructing JSON objects with arbitrary structure and field names is not as simple as it could be, especially not within the constraints of an update statement. A simple workaround is to create the JSON object as a string and then explicitly cast it to JSON.