Reify: Turning Data into a Graph

The process of turning structured raw data into a semantic graph is reification. The Reify Configuration view lets you make decisions about how each kind of data that enters the system should be represented when it is a graph. The Reify Configuration view creates the options for the Reify JSON and Reify Table Row Playbook Steps.

The Reify Configuration view is a part of the built-in "Reify" perspective or it can be added to any perspective by choosing Window → Show View.

Related Info: Managing Views and Perspectives

Length: 17 minutes

Topics Covered in the video:

  • Opening the Reify Perspective
  • Duplicating a View in a Different Perspective
  • Creating a new JSON Reify Config
  • Adding and removing properties to the config
  • Adding attached objects to the graph
  • Referencing the reify config from a Playbook

Step-By-Step CSV

Use this tutorial to walk through creating a folder data source, a CSV reify configuration, and a playbook to process and publish the data. Learn tips and tricks for building and troubleshooting playbooks. Tutorial: Data Ingest for CSV Sources

A Reify Config can be used to turn a JSON object or a Table Row into a graph. Before creating a configuration, you will need to know which type of data you will be creating the graph from, and have a sample of that data representing all of the properties you would like the resulting graph to have.

  • Click on the + icon in the top-right corner of the view and choose between JSON and TABLE. (Once this choice is made, it cannot be changed for this configuration.)
  • Click the Load/Paste button to open a dialog that will let you either paste in JSON or CSV text or choose a file with the Browse button. Paste or open a file that will be the sample file for this configuration. (Once this sample has been added and the configuration saved, it cannot be changed.)
  • Add a custom name for the ingestor. (This can also be changed later.)
  • (optional) Click the Configure button next to Match Pattern to open a dialog that will let you pick some conditions that will allow an incoming piece of data know which configuration it should use. This option is only necessary if a Playbook Reify step is set to "[Use Match Pattern]".
    • Possible values to use in configuring a match pattern are: has any value (exists), equals (exact match), contains (partial match), matches regex (regular expression), starts with, and ends with.

Configuring the Graph Object (center node)

  • Choose the primary type (class) of the graph object by clicking the Choose Class and ID button on the top row of the configuration table.
    • The Class you choose will be the type of object at the center of the graph. It's name depends on the setting you choose
    • Choose Generated ("Class IRI-UUID") for the ID for data where each time the reify step runs you want a unique graph as output.
      • Example: a windows event that uses winevent:T4688 would show up as T4688-49abe01b-e932-40db-b94a-abb6127e75f2
    • Choose a Class + JSONPath Value ("Class IRI-JSONPath Value") for the ID when you want to uniquely identify a graph, allowing other pieces of data to also be added the same graph.
      • Example: an employee data source that uses emp:Employee and $.AccountName would show up as Employee-tbarnes. Any other pieces of data that were then reified that also created a graph with the same name would merge their data into the first graph.
    • Choose JSONPath Value ("JSONPath Value") to use a JSON path containing the IRI ID of the object.
      • Some data sources will already have a full IRI as the value and you don't want to prepend the class name in front of it. This option will use whatever the value of the item you choose is.

If you chose an IRI Option other than Generated and the JSON path you selected has any array structure within it, you will have the option to specify the indexes of the arrays in the path. Since only one item will be named, the only option for specifying an array item is a whole number. The default value is 0.

Adding Data Properties to the Object

  • Any property in the sample list can be dragged to the configuration table to create a new data property on the graph object. The ordering of these properties does not matter, except for nesting them inside new objects (see next section).
  • Click the <Choose Data Property> button to open a dialog listing all of the Data Properties in your ontologies. Choose the one you want to link this piece of data to the graph and click OK.
  • If the Data Property you need is not in one of your ontologies, click the + button at the bottom of the dialog and add a new one to an existing ontology.
  • If you need to create a new Ontology, use the Ontology Manager view.

Plan Ahead for Filtering

If you plan on using a Graph Filter (e.g. Graph String Filter) step you will want to make sure the values you want to filter on are data properties attached to the primary object instead of a nested object. You could also choose to duplicate a property by dragging it over twice from the left side. Once onto the primary object for filtering, and one into an attached object for structure.

Important Information about Dates

Date formats are very important when reifying data. DarkLight uses ISO 8601 with a timezone specified [See Date Codes (MM/dd/yyyy). It has some built-in common patterns that it tries to match incoming data, but will always output in the ISO form. If your incoming data does not specify a timezone then it will appear in local time. If needed, use the JSON Path step to pull the date into a variable, then convert the date using the Normalize Date step. Attach the date to the reified graph using the Query Package step.
  • Show Automatic Date Patterns

Removing Data Properties from the Configuration

  • To remove a property that has been dragged from the left side to the right side, select the item(s) and press the Delete key on your keyboard. Note that on many keyboards this is a different key than the Backspace key. (Laptop and small keyboard users may have to use Fn-Backspace).

Connecting New Nodes to the Center Object

If you drag a JSON object (shown with a { } icon) to the configuration side, it will create a new object in the graph (shown as a circle), connected to the primary object via an Object Property. Additional properties from anywhere in the JSON tree can be dragged onto that object and added as properties of the new object.

In the video above, a JSON object ( {}geoip ) is shown being added to the configuration side where it creates a new object in the graph. JSON properties can be dragged on top of the geoip entry and they will be added as child properties of that object. Children can be moved in and out of objects by dragging them to a new position in the graph. The geoip and timezone entries are then selected with Shift clicks, and then deleted by pressing the Delete key on the keyboard. Multiple items can be selected at once using the Shift key for consecutive items, and the Control/Command key for individual items. When a JSON object and its properties are dragged to the configuration, they retain their hierarchy.

  • help/ingest
  • Last modified: 2018/04/30 21:06