Hello Again, If this is the wrong forum for these type of questions let me know. By the way Liam I picked up your book last night, I like the flavor as it differs from my other reads such as those from Kay. Although I have been using XML for years and understand the core concepts it should be a great refresher. If you have time to read this and respond I appreciate it... If not I understand. :) The Organizational Overall Problem: There aren't many people in my industry that use XQuery and xml in the way it was intended (IMHO). In fact most developers in my organization are rather uneducated in it and as you know there is some un-rational backlash as many correlate XML to the DOM and XPath/XSLT 1.0 and as a competitor to JSON which is ludicrous. The DOM has its issues of scale-ability which our products are currently running into. This isn't really xml's or the DOM's problem but simply poor implementation. As you know though, all that matters is perception. I am having to work with a large number of un-schemad, basically hack job, developed xml documents and workflows. A lot of our product utilizes XSLT for reporting and transformation however only a few in the team understand the concepts and due to MS/Managerial BS we are stuck with a .NET XSLT 1.0 processor. Also almost all of our XSLT scripts utilize a Pull pattern which I find overly verbose and inefficient, but that is a personal opinion. I have been researching alternatives to the DOM and .NET's standard processor (I have used Saxon too) because I personally find XML useful and the query semantics of XQuery 3.0 awesome. Proposed Solutions: We have two primary use cases: 1) as a local db to replace the context DOM for our 'documents' which in our case relates to Utilitiy GIS Designs of circuit, subdivisions, fiber, etc... I am thinking BaseX coupled with RestXQ could replace our DOM for local installs and allow ourselves to decouple from the Geodatabase and provide a browser based UI. (Currently our xml documents are stored in a blob in the GIS database) The GIS we are using is ESRI however we are interested in also supporting Open Street Maps. (Noticed the Geo Module) 2) as a service for hosting uploading and allowing users/delivery and support to view, query and modify complex sets of interrelated XML configuration files. Some of our applications have hundreds. Again all these documents follow a similar semantics however their is no defined schema for any of them. Currently the client has to read a 50 page manual and edit the files one at time. Often there are nodes in several files which must match exactly or the entire application fails.... Its a nightmare and there is no concept of generalization in my organization when it comes to development. Every tool we do have to 'configure' is hand crafted and unique..... Its abysmal! Because of this we have tools that cover probably only 20% of the configuration. I think we can accomplish both of the above tasks using a single codebase and restXQ I have written an XQuery expression which using our 'common' xml semantics can ascertain entities/properties/relationships and distill this in the form of metadata which then using RestXQ is distilled into a metadata driven api for manipulated data centric xml documents. Similar to how you can transform a well formed xml document into an xsd however our metadata format compliments XSD but servers broader purposes for example to annotate triggers/mappings etc. This metadata is then consumed for each RestXQ operation to allow for a generalized API and all markup is applied on the Client to allow for more efficient caching mechanisms since users can share resources between different representations more efficiently. We currently use Knockout.js on the client for this. Here is an example of a generalized API endpoint for retrieving a unique entity by type. (: Removed error handling and some other stuff for brevity. Its not fully functional/vindicated but is more a representation of the concept :)declare %restxq:path("api/{$entityListName}/{$entityId}") %restxq:GET function page:GetEntity($entityListName as xs:string, $entityId as xs:string) { let $entityMetadata := $page:database/metadata/entity[@type = $entityListName] let $entity := if($entityMetadata/property[@key = 'true']) then (: Use the properties marked in the metadata as the key :) $page:database//*[name() = $entityListName and data((*|@*)[name() = $entityMetadata/property[@key = 'true']/@name]) = $entityId] <entity> { (: Construct the representation based on the metadata :) for $propMeta in $entityMetadata/property return attribute {$propMeta/@name} { data($entity/(*|@*)[name() = $propMeta/@name]) } } { (: Include a link to all related entities for further discovery by the client.. :) for $relMeta in $entityMetadata/relationship return if($relMeta/@multiplicity = '*') then (: creates a href like: api/worklocation/23/cost which would return the sequence of cost items associated with the work location 23 :) element link { (attribute href { concat( 'api/, $entityListName, '/', $entityId, '/', $relMeta/@with ), attribute rel { $relMeta/@with })} else ( (: creates a href to a single related entity :) let $relatedEntityMeta := $page:database/metadata/entity[@type = $relMeta/@with'] element link { (attribute href { concat( 'api/', $relMeta/@with, data( $entity/(@*|*)[name() = $relMeta/@type]/((@*|*)[$relatedEntityMeta/property[@key = 'true'] ) ) }, attribute rel { $relMeta/@with } ) } ) } </entity> }; With a metadata mapping of the document we can convert the 'tree' structure into a virtual 'relational table' structure allowing for granular node modification/addition/removal in a generalized way. Once the metadata is distilled using the same api with metadata describing the metadata we can then allow the client to further manipulate the view/aliasing/permissions/triggers/mappings of the entities archetypes metadata allowing for quick UI/Workflow generation. At least that is the idea... This obviously requires a client side library which understand this metadata orchestration. The XML/BaseX Question:We need to be able to query effectively across relationships. Are there any facilities in XML/XQuery/3rd party that do this? I was hoping Id and IDref could accomplish this but as you stated that is not the case.... Basically I won't know before hand whether a child node is inline or a reference but would like to be able to query both as those they were the same.. If I had two nodes (I will call them parents) both with the same child. One has the actual inline child node and the other has just a reference to it. Lets say the child's attribute name is 'Tom'. I would like to be able to query like this and return both parents nodes: //parents[child/@name = 'Tom'] Anything? If not, is this a weird use case? I wouldn't think so. I could imagine however that this use case may be hard to 'generally' support in xml given referential loops etc especially in a non schemed/validated document... - James
[I think this thread is getting further away from BaseX, and might belong on query-talk instead, but on the other hand the use of XQuery as a back-end for Web Apps is definitely on the increase]
On Tue, 2013-05-14 at 11:14 -0600, James Wright wrote:
Hello Again, If this is the wrong forum for these type of questions let me know. By the way Liam I picked up your book last night, I like the flavor as it differs from my other reads such as those from Kay. Although I have been using XML for years and understand the core concepts it should be a great refresher.
Thanks, I wrote the boring chapters :-)
The Organizational Overall Problem: There aren't many people in my industry that use XQuery and xml in the way it was intended (IMHO). In fact most developers in my organization are rather uneducated in it and as you know there is some un-rational backlash as many correlate XML to the DOM and XPath/XSLT 1.0 and as a competitor to JSON which is ludicrous.
You're right, it's crazy and unfortunate.
XML was originally designed as an interoperable way to put SGML technical documentation on the Web in Netscape plugins!
The DOM has its issues of scale-ability which our products are currently running into. This isn't really xml's or the DOM's problem but simply poor implementation. As you know though, all that matters is perception.
If it helps, XQuery, Xpath 2 and later, XSLT 2 and later, are not DOM-based, but have an abstract data model, and are designed with performance very much in mind.
[...]
we are stuck with a .NET XSLT 1.0 processor.
There's at least two .net-based XSLT 2 processors, and another in development. But I think that's maybe off-topic for this list ;)
We have two primary use cases:
- as a local db to replace the context DOM for our 'documents' which
in our case relates to Utilitiy GIS Designs of circuit, subdivisions, fiber, etc... I am thinking BaseX coupled with RestXQ could replace our DOM for local installs and allow ourselves to decouple from the Geodatabase and provide a browser based UI.
Yes, that will likely make sense.
- as a service for hosting uploading and allowing users/delivery and
support to view, query and modify complex sets of interrelated XML configuration files. Some of our applications have hundreds. Again all these documents follow a similar semantics however their is no defined schema for any of them.
You might want to look at W3C SML as a way of orchestrating validation for configuration management.
I think we can accomplish both of the above tasks using a single codebase and restXQ
it's likely although obviously you'll want separate database instances. Note also that there are size/performance issues with BaseX today if you have a lot of data - "a lot" is subjective but if it's multiple terabytes you'll probably need multiple database instances. The good news is that it's relatively easy to move to different XQuery engines if needed, and also that BaseX keeps improving so you might well not need to move :-) I do know of people with petabyte XQuery databases.
I have written an XQuery expression which using our 'common' xml semantics can ascertain entities/properties/relationships and distill this in the form of metadata which then using RestXQ is distilled into a metadata driven api for manipulated data centric xml documents.
This pattern is rather like creating a persistent "view" in SQL.
[...]
The XML/BaseX Question:We need to be able to query effectively across relationships. Are there any facilities in XML/XQuery/3rd party that do this? I was hoping Id and IDref could accomplish this but as you stated that is not the case....
ID and IDREF support is almost certainly irrelevant here.
given $doc1 with <student sn="3016"><name>Simon</name></student> and $doc2 with <course><enrolled>3016</enrolled>.... you can easily do for $student in $doc1//student return $doc2//course[enrolled = $student/@sn] to get a list of courses with students from $doc1.
Basically I won't know before hand whether a child node is inline or a reference but would like to be able to query both as those they were the same.. If I had two nodes (I will call them parents) both with the same child. One has the actual inline child node and the other has just a reference to it. Lets say the child's attribute name is 'Tom'. I would like to be able to query like this and return both parents nodes: //parents[child/@name = 'Tom']
Write a function to do it. declare function local:get-children($input as element(*)) as element(*)* { for $child in $input/node() return if (local:isreferences($child)) then local:getreference($child) else $child }
and maybe declare function local:get-reference($input as element(*)) as element(*)* { return /documents[@id eq $input/@doc]//*[@id eq $input/ref] }
Anything? If not, is this a weird use case? I wouldn't think so.
I haven't encountered it, but RDF people often want something similar.
After this you can write e.g. /documents[@id = "36"]//parents[local:get-children()/@name = 'Tom']
I could imagine however that this use case may be hard to 'generally' support in xml given referential loops etc especially in a non schemed/validated document.
The nature of XML is that you don't necessarily know what's a reference when you create a document.
Liam
Liam, Thanks again. We luckily are not talking terabytes. The larger documents I have seen are in the 1 - 5 gigs with most under 15mb. I would think each 'document' would be a resource in the database. Its the 1+ gig DOMs that are our problem.. Also having to load the entire Blob/DOM at once is horrible for load performance. The RestXQ/Hypermedia approach would allow lazy loading via hypermedia driven discover-ability but still support full XPath/XQuery level querying as previously relegated to the DOM alone. The nice thing about our metadata approach is we actually have several levels of metadata which is then 'combined' into user/group specific metadata to allow for user/group level configuration without undermining core system functions and caching mechanisms. The RestXQ endpoints simply consume the 'System wide' metadata for building entity representations and the client consumes their own 'personal' metadata for client side representation.... This personal metadata is a mashup of the system metadata, any group metadata they are apart of and personal metadata alterations. For example Bob could make the 'NeedsRepair' field invisible on Transformer Inspection records... This 'custom' metadata is consumed by the client and the html markup is generated based on it. Again we use clietside js via knockout.js for this. The key benefit here is the data representation does not change and can be consumed by all clients regardless of their metadata alterations. This single representation can then be subsequently cached efficiently through the network with caching headers which are configurable via the system wide entity metadata. For example maybe Transformers have a freshness of 1 minute while Inspection Records are fresh for 2 days... Although there is only 'one' representation of each resource private/sensitive data is always present and encrypted with a secret key per resource where the secret key is only available to those with privileges. So for example: api/employees/2 may return { name: 'Bob' salary: 'encryptedstringhere', isActive: 'true' } This resource can be 'shared' by everyone. If someone has permissions to see 'salary' they can request: api/employees/2/keys This would return a sequence of secret keys for the properties encrypted in the resource for all properties the user has permissions to see. Anyway I think I got carried away... I am just excited! I really just wanted to say thanks again, there is a ton of documentation on XML and its hard to wade through it all efficiently.. Your input was invaluable. The approach you outlined is in line with the way I was thinking I would implement query-able relationships should xml not have those facilities inhouse. Also thanks for the SML reference. It looks promising. Ill leave you alone and check out that query-talk group.. Once again! Thanks and have a great rest of your week! Hopefully in the coming weeks Ill know if this will all work as we envision or fails miserably. :) - James
Subject: Re: [basex-talk] Referential Queries From: liam@w3.org To: james.jw@hotmail.com CC: basex-talk@mailman.uni-konstanz.de Date: Tue, 14 May 2013 15:13:59 -0400
[I think this thread is getting further away from BaseX, and might belong on query-talk instead, but on the other hand the use of XQuery as a back-end for Web Apps is definitely on the increase]
On Tue, 2013-05-14 at 11:14 -0600, James Wright wrote:
Hello Again, If this is the wrong forum for these type of questions let me know. By the way Liam I picked up your book last night, I like the flavor as it differs from my other reads such as those from Kay. Although I have been using XML for years and understand the core concepts it should be a great refresher.
Thanks, I wrote the boring chapters :-)
The Organizational Overall Problem: There aren't many people in my industry that use XQuery and xml in the way it was intended (IMHO). In fact most developers in my organization are rather uneducated in it and as you know there is some un-rational backlash as many correlate XML to the DOM and XPath/XSLT 1.0 and as a competitor to JSON which is ludicrous.
You're right, it's crazy and unfortunate.
XML was originally designed as an interoperable way to put SGML technical documentation on the Web in Netscape plugins!
The DOM has its issues of scale-ability which our products are currently running into. This isn't really xml's or the DOM's problem but simply poor implementation. As you know though, all that matters is perception.
If it helps, XQuery, Xpath 2 and later, XSLT 2 and later, are not DOM-based, but have an abstract data model, and are designed with performance very much in mind.
[...]
we are stuck with a .NET XSLT 1.0 processor.
There's at least two .net-based XSLT 2 processors, and another in development. But I think that's maybe off-topic for this list ;)
We have two primary use cases:
- as a local db to replace the context DOM for our 'documents' which
in our case relates to Utilitiy GIS Designs of circuit, subdivisions, fiber, etc... I am thinking BaseX coupled with RestXQ could replace our DOM for local installs and allow ourselves to decouple from the Geodatabase and provide a browser based UI.
Yes, that will likely make sense.
- as a service for hosting uploading and allowing users/delivery and
support to view, query and modify complex sets of interrelated XML configuration files. Some of our applications have hundreds. Again all these documents follow a similar semantics however their is no defined schema for any of them.
You might want to look at W3C SML as a way of orchestrating validation for configuration management.
I think we can accomplish both of the above tasks using a single codebase and restXQ
it's likely although obviously you'll want separate database instances. Note also that there are size/performance issues with BaseX today if you have a lot of data - "a lot" is subjective but if it's multiple terabytes you'll probably need multiple database instances. The good news is that it's relatively easy to move to different XQuery engines if needed, and also that BaseX keeps improving so you might well not need to move :-) I do know of people with petabyte XQuery databases.
I have written an XQuery expression which using our 'common' xml semantics can ascertain entities/properties/relationships and distill this in the form of metadata which then using RestXQ is distilled into a metadata driven api for manipulated data centric xml documents.
This pattern is rather like creating a persistent "view" in SQL.
[...]
The XML/BaseX Question:We need to be able to query effectively across relationships. Are there any facilities in XML/XQuery/3rd party that do this? I was hoping Id and IDref could accomplish this but as you stated that is not the case....
ID and IDREF support is almost certainly irrelevant here.
given $doc1 with <student sn="3016"><name>Simon</name></student> and $doc2 with <course><enrolled>3016</enrolled>.... you can easily do for $student in $doc1//student return $doc2//course[enrolled = $student/@sn] to get a list of courses with students from $doc1.
Basically I won't know before hand whether a child node is inline or a reference but would like to be able to query both as those they were the same.. If I had two nodes (I will call them parents) both with the same child. One has the actual inline child node and the other has just a reference to it. Lets say the child's attribute name is 'Tom'. I would like to be able to query like this and return both parents nodes: //parents[child/@name = 'Tom']
Write a function to do it. declare function local:get-children($input as element(*)) as element(*)* { for $child in $input/node() return if (local:isreferences($child)) then local:getreference($child) else $child }
and maybe declare function local:get-reference($input as element(*)) as element(*)* { return /documents[@id eq $input/@doc]//*[@id eq $input/ref] }
Anything? If not, is this a weird use case? I wouldn't think so.
I haven't encountered it, but RDF people often want something similar.
After this you can write e.g. /documents[@id = "36"]//parents[local:get-children()/@name = 'Tom']
I could imagine however that this use case may be hard to 'generally' support in xml given referential loops etc especially in a non schemed/validated document.
The nature of XML is that you don't necessarily know what's a reference when you create a document.
Liam
-- Liam Quin - XML Activity Lead, W3C, http://www.w3.org/People/Quin/ Pictures from old books: http://fromoldbooks.org/ Ankh: irc.sorcery.net irc.gnome.org freenode/#xml
basex-talk@mailman.uni-konstanz.de