Tutorial: Converting JSON Lines

Sometimes you will get a data source that has multiple rows of JSON in one document. This tutorial describes how to split up a JSON Lines document and process each row of JSON individually.

An example of a JSON Lines file.

Here is an example of a playbook that takes a file with multiple lines of JSON in it, reifies each line into a graph, and then publishes the graph to the database.

The area in red is the part that does the conversion; the Ingest and Publish steps could be any other method getting the data into and out of the playbook.

Split Text

The first thing we need to do is get the big block of text (with one line of JSON per row) into a form that has separate rows we can reference later. We'll do that with the Split Text step.

  1. This playbook uses the Ingest step, which always stores the incoming data in a variable called rawInput (with a capital letter I). The icon next to this field tells us that we need to format it using FreeMarker syntax, so we enter ${rawInput}
  2. The output variable can be called anything we want, and we'll be using it in the Split Package step next.
  3. This data only has one column per row, so we'll check the box to convert all columns.
  4. There are some choices here: We could use \r or \n depending on which line ending the data uses, but to cover all the bases, we'll just use the Regular Expression symbol $ which means, "the end of the line."
  5. Since we want the Delimiter to be processed using Regex (and not a literal "$" character), we check the Use Regex box.

Split Text Output

In the Inventory View, we can see that the data came in as rawInput and came out as result[*] (The empty _default_ at the top means the package does not have a graph in it yet.)

Although visually the contents of these two variables look the same, the split now allows us to reference each row by name, as indicated by the [*] indicator in the name. For example, if we wanted to reference the first row of data, we could use ${result[0]}.

Split Package

Next we need to run each row of data through individually, as if they had come in as separate items in the first place. We'll do this with the Split Package step.

  1. We want to use the output of the previous step, so we'll use the variable result here. Note that there is no FreeMarker icon which means we do not have to use the FreeMarker syntax to reference the name of the variable.
  2. This step will create a new package for each of the rows in the result variable. The individual row will be named singleItem and we'll use that name in the next step.

Reify JSON Object

Now that we have one line of JSON separated from the rest, we can send it to a Reify Configuration that knows how to convert it into a graph.

  1. This example uses a specific reify configuration, but it could also use the Match Pattern feature.
  2. The Input Variable is the same as the output of the Split Package step
  3. The Output Graph is left at its default value of _default_ which is the graph that the Publish to Knowledge Base step uses.

How about that Reify Multiple JSON Objects Step?

If you're wondering why we can't just use the Reify Multiple JSON Objects step to do this, the reason is that that step is meant for a single parent JSON object that has multiple JSON objects enclosed. This is common when you get multiple results back from a query to a database, like ElasticSearch. JSON Lines files are full JSON documents with one document per line.
  • tutorial/jsonlines
  • Last modified: 2019/04/18 22:17