Thanks, Bridger. I agree this seems like a use case for graph technologies
(RDF/SPARQL or labeled property graphs). SPARQL 1.1 includes property
paths, which make it possible to query on transitive properties (e.g., A
contains B, B contains C). One example from Wikidata:
https://twitter.com/andre_ourednik/status/1427264453217763336.
There are also document-based representations, as Bridger mentions:
JSON-LD, RDF/XML, and TriX are supported by RDF tools; there's also GraphML
and GEXF, supported by the Gephi platform[1] for network visualization and
also Python tools like NetworkX[2].
Would be interesting to test how a recursive approach using db:attribute,
etc., over a link index would scale in BaseX.
Tim
[1] https://gephi.org/
[2] https://networkx.org/
--
Tim A. Thompson (he, him)
Librarian for Applied Metadata Research
Yale University Library
On Thu, Jun 23, 2022 at 1:09 PM Bridger Dyson-Smith
bdysonsmith@gmail.com
wrote:
> Hi Eliot -
>
> I've wondered (but never tested/explored) about leveraging some semblance
> of json-ld (or serialized ttl, or something similar) and passing those
> values to Apache Jena (or another SPARQL processor) to use that as an
> inference engine. I'm deep in Speculation Territory here - I don't know
> what anything would look like - but you're describing an interesting
> problem, and it seems doable. Martynas Jusevicius (and his colleagues) have
> a project, LinkedDataHub, that may provide another avenue for exploring
> this -- I haven't used AtomGraph's applications, but he's active on the
> xml.com slack, and it looks like there are some interesting visualization
> capabilities with their work.
>
> Our listserv friend and neighbor, Tim Thompson of Yale, may have some
> ideas along these lines, too.
> Sorry that I can't provide anything concrete, but I hope some of this is
> somewhat helpful.
> Best,
> Bridger
>
> [1]
https://json-ld.org/
> [2]
https://jena.apache.org/
> [3]
https://github.com/AtomGraph/LinkedDataHub
>
> On Thu, Jun 23, 2022 at 10:35 AM Eliot Kimber
eliot.kimber@servicenow.com
> wrote:
>
>> In the context of our Project Mirabel system that manages DITA content, I
>> need to be able answer the question “for topic X, what other topics link to
>> it directly or indirectly?”
>>
>>
>>
>> That is, say Topic A links to Topic B that Links to Topic C.
>>
>>
>>
>> Asking the question “What topics ultimately link to topic C?” I would
>> like to get the answer “Topic A, Topic B”.
>>
>>
>>
>> Getting the answer for direct references is easy—I already build a
>> where-used index that captures, for each DITA map or topic, what other maps
>> and topics link directly to it.
>>
>>
>>
>> But to get the Topic A part of the answer I need some kind of link graph
>> index and I’m not sure how best to go about calculating this or capturing
>> it in some index or set of indexes.
>>
>>
>>
>> In our content the fan out from a single Topic C to the set of topics
>> that ultimately reference it could be 10s of 1000s of topics. We have about
>> 45K topics in the content for each version of the ServiceNow Platform and a
>> number of topics that are used by a large number of other topics, so the
>> explosion can be quite large. That suggests that a simple
>> topic-to-ultimately-referenced-topics index would be very inefficient in
>> that the entry for any given topic could potentially have 45K – 1 entries
>> (we don’t care that a topic references itself).
>>
>>
>>
>> On the other hand, working backwards through chains of direct references
>> can also be expensive and is probably too slow, so maybe the brute-force
>> index is the best option?
>>
>>
>>
>> At the same time, I would like to be able to quickly visualize the link
>> graph extending from or ending in any given topic or simply the link graph
>> for the entire information set, which requires capturing the nodes and
>> edges.
>>
>>
>>
>> My question: does anyone either have experience or insight into this kind
>> of link graph challenge or know of relevant papers or general discussion of
>> graph processing I might look at?
>>
>>
>>
>> Thanks,
>>
>>
>>
>> Eliot
>>
>> _____________________________________________
>>
>> *Eliot Kimber*
>>
>> Sr Staff Content Engineer
>>
>> O: 512 554 9368
>>
>> M: 512 554 9368
>>
>> servicenow.com
https://www.servicenow.com
>>
>> LinkedIn
https://www.linkedin.com/company/servicenow | Twitter
>>
https://twitter.com/servicenow | YouTube
>>
https://www.youtube.com/user/servicenowinc | Facebook
>>
https://www.facebook.com/servicenow
>>
>