SPARQL Queries

SPARQL (SPARQL Protocol and RDF Query Language) is a query language to ask questions of and manipulate data in RDF graph form. Most of the data DarkLight deals with is graph-based, so SPARQL is commonly used in playbook steps.

Start with this excellent video, from the author of the Learning SPARQL O'Reilly book (Sample First Chapter pdf)

See also:

Common Query Types

ASK - This type returns a boolean (true or false) result, based on whether or not there are any solutions to the query. The step exits from the True (+) or False (-) side depending on the result.

CONSTRUCT - This type creates an RDF graph of specified "sentences," called triples. The result is stored in as a graph in the package.

SELECT - This type returns specifically requested values for any solutions to the query. The result is stored in a package variable.

UPDATE - Used with the DELETE method to remove triples from the graph or the INSERT method to add triples to an existing object.

SELECTing data from a Graph (Query Package)

Let's use the following graph of a wireless network as our sample. The subject of this graph is the node in the middle. Each of the arrows coming out of it are predicates that point to objects or text values (strings). This subject-predicate-object triple is quite commonly seen in SPARQL queries as ?s ?p ?o where the question mark represents an unknown piece of the triple. The core:id of this node is it's IRI, and it uniquely identifies this individual as "unifi:UnifiWLANConf-5a2096e650730a87792ea1a5". the rdf:Type of this node is the ontological Class that it is a member of, specifically "unifi:UnifiWLANConf".

( Note: If you haven't watched the video at the top of the page, you really should before you keep going on this page.)

  1. SELECT ?wifiName
  2. WHERE {
  3. ?s <tag:darklight:unifi#hasName> ?wifiName
  4. }
  1. First, the SELECT keyword is followed by variable we want returned in the results. In this case, out of all of those we only want to return ?wifiName. The name of this variable can be anything you want.
  2. Second, the WHERE clause lists graph patterns {subject - predicate - object} that we want the query to match. Graph patterns are always surrounded by curly brackets. Variables begin with a ? character, and in this example, we want to look for triples that match the form of ?s (any subject) has a data property called unifi:hasName of ?subjectUserName.

If the step runs this query against the graph, and you told the step to output the results to a variable called "results", it will return a table that looks like this:

results

0
0 champtc_guest

If you wanted to later use the value "champtc_guest" in some other step, you would refer to this as the "first column and first row" (which in this case is 0,0) so the FreeMarker would be ${results[0][0]}

SELECTing multiple items from the same graph

Now let's say we want to also extract the WPA Mode of the router. We do this by adding another line in the same form as the first, but using a different data property and a different variable name.

  1. ${PREFIX("unifi")}
  2.  
  3. SELECT ?wifiName ?wpa
  4. WHERE {
  5. ?s unifi:hasName ?wifiName .
  6. ?s unifi:hasWPAMode ?wpa
  7. }

results

0 1
0 champtc_guest wpa2

${results[0][0] = "champtc_guest" and ${results[0][1] = "wpa2"

Now let's play SPOT THE DIFFERENCES:

  1. Prefix Freemarker: Instead of using the full ontology name of <tag:darklight… we use a FreeMarker expression at the top to establish "unifi" as a PREFIX of the IRI. The prefix lets us say unifi:hasName (with a : and not a #) instead of the whole IRI.
    1. Note that this FreeMarker is not a SPARQL thing, but a DarkLight function. If you were writing this in some other SPARQL editor, like Stardog Studio, you would use PREFIX unifi: <tag:darklight:unifi> at the top of the query.
  2. Two Variable Names After SELECT: The variable names listed after the SELECT statement will be returned in that order in the results so we've added ?wpa
  3. A Period After the First Triple: In SPARQL, a period . means AND. If you get a SPARQL error, dots at the end of your lines are the first thing to check.
  4. A Second Triple Pattern: Since we said in the SELECT line that we want to return a value for ?wpa, we need to define in the WHERE clause where to look for it.

SELECTing Items from a Specific Object

Sometimes you'll have more than one object (node) in your package graph and you want to make sure you're getting the right one. This example adds a way to search for only the rdf:Type that we specify.

  1. ${PREFIX("unifi")}
  2.  
  3. SELECT ?wifiName ?wpa
  4. WHERE {
  5. ?s unifi:hasName ?wifiName .
  6. ?s unifi:hasWPAMode ?wpa .
  7. ?s a unifi:UnifiWLANConf
  8. }

results

0 1
0 champtc_guest wpa2

${results[0][0] = "champtc_guest" and ${results[0][1] = "wpa2"

Not much has changed here, and we really didn't have to do it since the only node we have that could be a subject is the UnifiWLANConf. The new line (after the period on the previous line to say AND) uses the letter a which is shorthand for rdf:Type. It's specifying that the node with the IRI of ?s has to be of Type unifi:UnifiWLANConf.

SELECTing data from a Graph (Query Knowledge Base)

There is a very important difference when you are writing SPARQL in DarkLight and the place you are searching is not the package but one of your RDF Data Feeds that DarkLight has published to. By default, DarkLight publishes into a named graph that has an IRI the same as the primary individual, also called the publish object.

Everything is the same as in the above queries, except we need to wrap the WHERE {} clause in a GRAPH {} clause, like this:

  1. ${PREFIX("unifi")}
  2.  
  3. SELECT ?wifiName ?wpa
  4. WHERE {
  5. GRAPH ?g {
  6. ?s unifi:hasName ?wifiName .
  7. ?s unifi:hasWPAMode ?wpa .
  8. ?s a unifi:UnifiWLANConf
  9. }
  10. }

In this case, I'm using a variable of ?g to mean "any named graph" which would let the query look through all of the named graphs in, say, Working Memory.

Find an Employee and Bring that Graph into the Package (Query Knowledge Base)

  1. ${PREFIX("ent")}
  2.  
  3. CONSTRUCT { ?s ?p ?o }
  4. WHERE {
  5. GRAPH ?g {
  6. ?s a ent:Employee .
  7. ?s ent:hasAccountName "${username[0][0]}" .
  8. ?s ?p ?o
  9. }
  10. }

This SPARQL example retrieves contextual data that matches the username data already in a DarkLight package.

  1. First, the CONSTRUCT keyword indicates that the results should be a graph.
  2. Next the block {?s ?p ?o} specifies which new statements should be created from the query results. In this example, we want to construct everything we can (?s ?p ?o).
  3. Next, GRAPH ?g directs the query to search across all named graphs in the database.
  4. Finally the WHERE clause lists several graph patterns (triples) that must be matched
    1. In this case, we're searching for triples of the form ?s "is of type" "Employee".
    2. ?s "has account name" "${username[0][0]}". The object of this triple is a template expression to be replaced with a results from a previous DarkLight package query. Since it will return as a text string, it must be in quotes.
    3. The last line, ?s ?p ?o, will match every other sentence about any particular ?s matched by previous two lines.

Creates the "Stix:Incident" Object

  1. CONSTRUCT {
  2. ?PTHE a <http://stix.mitre.org/Incident#Incident>.
  3. ?PTHE <tag:champtc:core#hasEventTime> ?NOW.
  4. } WHERE {
  5. BIND(IRI(CONCAT("http://stix.mitre.org/Incident#Incident-", STRUUID())) AS ?PTHE) .
  6. BIND(NOW() AS ?NOW)
  7. }

This SPARQL query pattern, creates an brand new object. Using a CONSTRUCT statement, we create two new "sentences":

  • ?PTHE (PTHE for Pass The Hash Event) "is of type" Stix:Incident, and
  • ?PTHE "has Event Time" ?now.

The WHERE clause of this query does the following:

  1. STRUUID generates a new UUID.
  2. CONCAT prepends the UUID with a IRI
  3. IRI types the concatenated string as an IRI (only IRIs can appear as the subject of a sentence in graph database.)
  4. BIND sets the value of a new variable ?PTHE to the newly created IRI
  5. BIND(NOW() AS ?NOW) sets the value of new variable, ?NOW to the current date/time.

The resulting sentences of this construct query might look something like this:

SPARQL Queries can use Built-in FreeMarker Functions

  • ${trigger} - the IRI of the incoming event, the one that triggered the playbook
  • ${uuid()} - creates a randomly generated UUID
  • ${pkgdata("myprefix:dataProperty")} - returns the value of the data property from the default graph (more details below)
  • ${packageAsJSON()} - returns the complete DarkLight package from the current playbook. Can be used with Web Request step to send a package to another instance of DarkLight (via the HTTP Post Data Feed).
  • ${prefix("myprefix")} - Use at the top of a SPARQL query to convert a prefix into an IRI. For example, ${prefix("attack")} is equivalent to PREFIX attack: <tag:darklight:attack#> and then in the body of the query you can use the prefix and predicate (attack:T1037) instead of the IRI version (<tag:darklight:attack#T1037>). Note that the separator between prefix and predicate is a colon : when using a prefix and a the number sign # when using an IRI. Also note that when using a prefix in the query, you do not use the angle brackets.
  • tips/sparql
  • Last modified: 2019/08/15 16:23