I am loading page view data exported from Adobe as CSV as XML with a structure like:

<csv timestamp="2024-01-24">

  <record language="en-US" family="vancouver" bundlename="bundle-platcap-platform-capabilities" topicpath="administer/general/concept/capabilities-bundle-landingpage.dita" notviewed="viewed" viewcount="120587">

    <entry name="URL">docs.servicenow.com/bundle/vancouver-servicenow-platform/page/administer/general/concept/capabilities-bundle-landingpage.html</entry>

    <entry name="Page Views">120587</entry>

    <entry name="Unique Visitors">50863</entry>

    <entry name="Bounce Rate">0.255523003</entry>

    <entry name="Content Velocity">3.289644475</entry>

    <entry name="Docs Engagement Rate">0.289549233</entry>

  </record>

  … 50K more records …

</csv>

 

For this database I have a token index and the MAXLEN value is set to 255 to ensure that the @topicpath values will be tokenized (none should be longer than 255).

 

I’m measuring a consistent 0.3 seconds for this query:

let $docRecords as element()* := db:token($analyticsmgmt:analyticsDatabase, $docPath, 'topicpath')/..

 

Where $docPath is a value that will match tokens in the @topicpath attribute.

 

Based on other token-based optimizations I’ve done, I would expect faster results, at least 10-times faster.

 

My questions:

 

  1. What can I look for that might be making this lookup slower than expected or, conversely, how can I prove that this is the fastest the query will return?
  2. Is there a better way to manage this kind of tabular data with many 1000s of records within BaseX? For example, does it matter if the records are part of a single document or would it be better to have each record be a separate document?

Thanks,

 

Eliot

_____________________________________________

Eliot Kimber

Sr Staff Content Engineer

Digital Content & Design

O: 512 554 9368

M: 512 554 9368

servicenow.com

LinkedIn | Twitter | YouTube | Facebook