1

I'm transforming the data in one data store to another form / ontology / schema, using SPARQL.

The data is actually a provenance, but can be simplified as a series of relationship like this: A produces D + B consumes D.

:A0 :consumes :D0 ;
    :produces :D1, :D2 .
:A1 :produces :D3 .
:A2 :consumes :D1, :D2 ;
    :produces :D4 .
:A3 :consumes :D3, :D4 ;
    :produces :D5, D6 .

(There is no guarantee D is always produced by some A, or will be consumed by some other A. But every D will only be produced by one A.)

I would like to get the information of data dependency. An example query looks like this:

CONSTRUCT {
    ?producer :hasNextStage ?consumer .
}
WHERE {
    ?producer :produces ?data .
    OPTIONAL {
            ?consumer :consumes ?data .
            FILTER (?producer != ?consumer)
    }
}

Everything is fine until here. However, I would like to have more information, say "which A is connected to which another A by what data", something like this:

:A0 :hasInfluence :INFLUENCE .
:INFLUENCE :stage :A2 ;
    :data :D1, :D2 .

As demonstrated, this requires me to construct a new variable (:INFLUENCE) and assign triples to it. Is there a way to do this in SPARQL?

------ UPDATED SECONDARY QUESTION ------

According to cygri's answer, I changed the query to this:

CONSTRUCT {
    ?producer :hasInfluence ?influence .
    ?influence :stage ?consumer ;
        :data ?data .
}
WHERE {
    ?producer :produces ?data .
    OPTIONAL {
            ?consumer :consumes ?data .
            FILTER (?producer != ?consumer)
            BIND (IRI(CONCAT("http://my/ns/#", CONCAT(STRAFTER(STR(?producer), "#"), STRAFTER(STR(?consumer), "#")))) AS ?influence)
    }
}

However, the BIND clause seems not having any effect. After shortening it, the problem is with the ?producer variable: if I use this variable here, it won't work. Seems ?producer is not bound here? (But the FILTER does work.)

If I move this BIND clause out of the OPTIONAL, everything works fine. But this is not intuitive, and I'm wondering why it won't work inside OPTIONAL?

| improve this question | |
  • 1
    " construct a new variable" - I guess you mean create new URI. So, why not? just add the URI there. You can use the constructor IRI() to create a URI resource from a string. Or just put the IRI there if you know it – UninformedUser Apr 1 '19 at 17:08
  • OPTIONAL is a left-outer join during evaluation. The clause is evaluated separately. Read about scope of variables: w3.org/TR/sparql11-query/#variableScope – UninformedUser Apr 2 '19 at 12:10
  • Here is the algebra tree: (leftjoin (bgp (triple ?producer <http://ex.org/test/produces> ?data)) (extend ((?influence (iri (concat "http://my/ns/#" (concat (strafter (str ?producer) "#") (strafter (str ?consumer) "#")))))) (bgp (triple ?consumer <http://ex.org/test/consumes> ?data))) (!= ?producer ?consumer)) – UninformedUser Apr 2 '19 at 12:17
  • and here with BIND outside of the OPTIONAL: (extend ((?influence (iri (concat "http://my/ns/#" (concat (strafter (str ?producer) "#") (strafter (str ?consumer) "#")))))) (conditional (bgp (triple ?producer <http://ex.org/test/produces> ?data)) (filter (!= ?producer ?consumer) (bgp (triple ?consumer <http://ex.org/test/consumes> ?data))))) – UninformedUser Apr 2 '19 at 12:18
  • and here some more literature about evaluation of BIND and the scope: blog.blazegraph.com/?p=954 - and I agree, without knowing this, it is definitely confusing – UninformedUser Apr 2 '19 at 12:19
1

The simplest solution would be to avoid a new variable in the CONSTRUCT template altogether and just use a blank node:

CONSTRUCT {
    ?producer :hasInfluence [
        :stage ?consumer;
        :data ?data
    ]
}

This should produce the desired graph structure. If you insist on an IRI instead of a blank node for the influence node (as you probably should), then you would want something like:

CONSTRUCT {
    ?producer :hasInfluence ?influence.
    ?influence :stage ?consumer;
        :data ?data.
}
WHERE {
    ...
    BIND (IRI(xxx) AS ?influence)
}

This assigns a new IRI to variable ?influence and uses that variable in the CONSTRUCT template.

Now, xxx is just a placeholder for the expression that calculates the IRI. You don’t provide enough detail to say what should go in there. Would there be one influence node for each data node? If so, you could take the string form of the data IRI: str(?data) and do some string replacement using replace(s, search, replace) to make a nice unique IRI for the influence node.

| improve this answer | |
  • Thank you very much. The second approach is indeed what I'm looking for. To clarify: each pair of producer and consumer shares the same variable. I'm thinking of using their concatenation to form the IRI, but can't get the ?producer in the OPTIONAL clause (see the updated code). What document should I read to find the answer? – renyuneyun Apr 2 '19 at 9:50
  • Sorry I didn't see this comment earlier. This is a quirk in SPARQL. A FILTER in OPTIONAL has access to variables from outside the curly-bracket group, but BIND does not. I don't know why that is, and it doesn't make sense to me, as they both just evaluate an expression. But it's written that way in the SPARQL specification. – cygri Apr 9 '19 at 8:58

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service, privacy policy and cookie policy

Not the answer you're looking for? Browse other questions tagged or ask your own question.