Hello everyone,
First of all thank you for building such a great product!
Unfortunately I encounter extremely slow queries using a basex instance. My xml files are distributed over 256 databases (00-ff) with roughly 250MB each. All share a structure similar to [3]. I executed OPTIMIZE ALL for all databases, see [4]. [5] shows the info index output. The linux server which is running the basex instance shows great amounts of io. I replicated the data to my local desktop and see the same amount of high io usage across all database files. For example: [data\81\tbl.basex, data\81\txt.basex, ..., etc]
Querying a single database is fast and the info output window shows the following [2], but a query across all dbs just stalls and does not return a result in a reasonable time.
Could somebody give me a hint how to speed things up?
regards, Björn
[1]
Example query:
``` declare namespace mets = "http://www.loc.gov/METS/"; declare namespace lmerFile = "http://www.ddb.de/LMERfile";
count( for $db in db:list() for $doc in db:open($db) where $doc/koala/mets:mets/mets:amdSec/mets:techMD/mets:mdWrap/mets:xmlData/lmerFi le:format = 'urn:123' return $doc) ```
[2] ``` Compiling: - inline $db_0 - pre-evaluate db:open(database[,path]) to document-node() sequence: db:open("fe") -> (db:open-pre("fe", 0), ...) - rewrite cached filter to iter path: ((db:open-pre("fe", 0), ...))[(element(koala)/element(mets:mets)/element(mets:amdSec)/element(mets:t ... -> (db:open-pre("fe", 0), ...)/.[(element(koala)/element(mets:mets)/element(mets:amdSec)/element(mets: t... - rewrite where clause(s) - simplify FLWOR expression: (db:open-pre("fe", 0), ...)/.[(element(koala)/element(mets:mets)/element(mets:amdSec)/element(mets: t... Optimized Query: count((db:open-pre("fe", 0), ...)/.[(element(koala)/element(mets:mets)/element(mets:amdSec)/element(mets: techMD)/element(mets:mdWrap)/element(mets:xmlData)/element(lmerFile:format) = "urn:123")]) Query: declare namespace mets = "http://www.loc.gov/METS/"; declare namespace lmerFile = "http://www.ddb.de/LMERfile"; count( let $db := "fe" for $doc in db:open($db) where $doc/koala/mets:mets/mets:amdSec/mets:techMD/mets:mdWrap/mets:xmlData/lmerFi le:format = 'urn:123' return $doc) Result: - Hit(s): 1 Item - Updated: 0 Items - Printed: 4 b - Read Locking: (global) - Write Locking: (none) Timing: - Parsing: 0.41 ms - Compiling: 161.91 ms - Evaluating: 2155.26 ms - Printing: 0.86 ms - Total Time: 2318.44 ms Query Plan: <QueryPlan compiled="true" updating="false"> <FnCount name="count" type="xs:integer" size="1"> <IterPath type="document-node()*" database="fe"> <DBNodeSeq type="document-node()+" size="29196" database="fe"> <DBNode pre="0" type="document-node()" size="1" database="fe"/> <DBNode pre="245" type="document-node()" size="1" database="fe"/> <DBNode pre="417" type="document-node()" size="1" database="fe"/> <DBNode pre="679" type="document-node()" size="1" database="fe"/> <DBNode pre="945" type="document-node()" size="1" database="fe"/> </DBNodeSeq> <IterStep axis="self" test="node()" type="node()?"> <CmpG op="=" type="xs:boolean" size="1"> <IterPath type="element()*"> <IterStep axis="child" test="element(koala)" type="element()*"/> <IterStep axis="child" test="element(mets:mets)" type="element()*"/> <IterStep axis="child" test="element(mets:amdSec)" type="element()*"/> <IterStep axis="child" test="element(mets:techMD)" type="element()*"/> <IterStep axis="child" test="element(mets:mdWrap)" type="element()*"/> <IterStep axis="child" test="element(mets:xmlData)" type="element()*"/> <IterStep axis="child" test="element(lmerFile:format)" type="element()*"/> </IterPath> <Str type="xs:string" size="1">urn:123</Str> </CmpG> </IterStep> </IterPath> </FnCount> </QueryPlan>
```
[3]
<?xml version="1.0" encoding="UTF-8"?> <koala> <aip> <parent_asset_id /> <asset_ingest_timestamp>2016-05-04 13-57-17-943000</asset_ingest_timestamp> <external_asset_id>urn:nbn:de:xyz</external_asset_id> <internal_asset_id>4e1f-82-45c5-37-1678e81f</internal_asset_id> <aip_is_deleted>false</aip_is_deleted> <aip_deletion_timestamp>2016-05-04 13-57-17-943000</aip_deletion_timestamp> <storage_backend>hdd</storage_backend> <aip_size>179810</aip_size> <asset_version>1</asset_version> </aip> <mets OBJID="" TYPE="ko" xsi:schemaLocation="http://www.loc.gov/METS/ http://www.loc.gov/standards/mets/mets.xsd http://www.ddb.de/LMERfile http://files.dnb.de/standards/lmer/lmer-file.xsd http://www.ddb.de/LMERobject http://files.dnb.de/standards/lmer/lmer-object.xsd http://www.ddb.de/LMERprocess http://files.dnb.de/standards/lmer/lmer-process.xsd http://purl.org/dc/elements/1.1/ http://dublincore.org/schemas/xmls/qdc/2003/04/02/dc.xsd" xmlns="http://www.loc.gov/METS/" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:lmerFile="http://www.ddb.de/LMERfile" xmlns:lmerObject="http://www.ddb.de/LMERobject" xmlns:lmerProcess="http://www.ddb.de/LMERprocess" xmlns:dc="http://purl.org/dc/elements/1.1/"> <metsHdr CREATEDATE="2015-03-23T11:03:19.079+01:00"> <agent ROLE="ARCHIVIST" TYPE="ORGANIZATION"> <name>test</name> <note>test</note> </agent> </metsHdr> <amdSec ID="AmdSec-0001"> <techMD ID="TechMD-LMER-Object"> <mdWrap ID="TechMD-LMER-Object-MdWrap" MIMETYPE="text/xml" MDTYPE="OTHER" OTHERMDTYPE="lmerObject" LABEL="LMERobject"> <xmlData> lmerObject:persistentIdentifierurn:nbn:de:test</lmerObject:persistentIdent ifier> <lmerObject:transferChecksum CHECKSUMTYPE="xor of sha1 file checksums">ac38416b34a7b1b55875798819220c092a77f3b7</lmerObject:transferChec ksum> lmerObject:masterCreationDate2015-03-23T11:03:19.080+01:00</lmerObject:mas terCreationDate> lmerObject:metadataCreationDate2015-03-23T11:03:19.080+01:00</lmerObject:m etadataCreationDate> lmerObject:metadataRecordCreatortest</lmerObject:metadataRecordCreator> lmerObject:numberOfFiles1</lmerObject:numberOfFiles> </xmlData> </mdWrap> </techMD> <techMD ID="TechMD-File--0"> <mdWrap ID="TechMD-File--0-MDWRAP" MIMETYPE="text/xml" MDTYPE="OTHER" OTHERMDTYPE="lmerFile"> <xmlData> lmerFile:format REGISTRYNAME="DIAS"urn:123</lmerFile:format> <lmerFile:xmlData MDTYPE="FITS"> <fits xsi:schemaLocation="http://hul.harvard.edu/ois/xml/ns/fits/fits_output http://files.dnb.de/standards/fits/fits_output_dnb.xsd" version="0.6.1" timestamp="23.03.15 09:24" xmlns="http://hul.harvard.edu/ois/xml/ns/fits/fits_output"> <identification status="SINGLE_RESULT"> <identity format="Portable Document Format" mimetype="application/pdf" toolname="FITS" toolversion="0.6.1"> <tool toolname="Droid" toolversion="3.0"/> <tool toolname="Jhove" toolversion="1.11"/> <tool toolname="file utility" toolversion="5.17"/> <tool toolname="Exiftool" toolversion="7.74"/> <tool toolname="NLNZ Metadata Extractor" toolversion="3.4GA"/> <tool toolname="ffident" toolversion="0.2"/> <version toolname="Droid" toolversion="3.0">1.6</version> <externalIdentifier toolname="Droid" toolversion="3.0" type="puid">fmt/20</externalIdentifier> </identity> </identification> <fileinfo> <size toolname="Jhove" toolversion="1.11">77973</size> <creatingApplicationName toolname="Jhove" toolversion="1.11">Acrobat 9.1.1 /InDesign: pictwpstops filter 1.0</creatingApplicationName> <lastmodified toolname="Exiftool" toolversion="7.74" status="SINGLE_RESULT">2015:03:23 09:24:57+01:00</lastmodified> <created toolname="Exiftool" toolversion="7.74" status="SINGLE_RESULT">2009:06:12 11:53:28+02:00</created> <filepath toolname="OIS File Information" toolversion="0.1" status="SINGLE_RESULT">/data/var/tmp/tomcatP-6/./impservice/GU_42_2009-S23-2 5.pdf</filepath> <filename toolname="OIS File Information" toolversion="0.1" status="SINGLE_RESULT">./impservice/GU_42_2009-S23-25.pdf</filename> <md5checksum toolname="OIS File Information" toolversion="0.1" status="SINGLE_RESULT">a2a1ff970c0f82dfce7e208b35eb5783</md5checksum> <fslastmodified toolname="OIS File Information" toolversion="0.1" status="SINGLE_RESULT">1427099097000</fslastmodified> </fileinfo> <filestatus> <well-formed toolname="Jhove" toolversion="1.11" status="SINGLE_RESULT">true</well-formed> <valid toolname="Jhove" toolversion="1.11" status="SINGLE_RESULT">true</valid> <message toolname="Jhove" toolversion="1.11" status="SINGLE_RESULT">Page information is not displayed; to display remove param value of p from the config file</message> <ingestLevel toolname="DidigO" toolversion="0.9" status="SINGLE_RESULT">4</ingestLevel> </filestatus> <metadata> <document> <pageCount toolname="Exiftool" toolversion="7.74" status="SINGLE_RESULT">3</pageCount> <isTagged toolname="Jhove" toolversion="1.11">no</isTagged> <hasOutline toolname="Jhove" toolversion="1.11">no</hasOutline> <hasAnnotations toolname="Jhove" toolversion="1.11" status="SINGLE_RESULT">no</hasAnnotations> <isRightsManaged toolname="Exiftool" toolversion="7.74" status="SINGLE_RESULT">no</isRightsManaged> <isProtected toolname="Exiftool" toolversion="7.74">no</isProtected> <isPdfA toolname="Exiftool" toolversion="7.74" status="SINGLE_RESULT">no</isPdfA> <hasForms toolname="NLNZ Metadata Extractor" toolversion="3.4GA" status="SINGLE_RESULT">no</hasForms> </document> </metadata> </fits> </lmerFile:xmlData> </xmlData> </mdWrap> </techMD> </amdSec> <fileSec> <fileGrp ID="ASSET" ADMID="TechMD-LMER-Object"> <file ID="FILE-0" ADMID="TechMD-File--0" CREATED="2015-08-13T11:03:19.006+01:00" SIZE="77973" CHECKSUM="ac38416b34a7bc092a77f3b7" CHECKSUMTYPE="SHA-1" MIMETYPE="application/pdf"> <FLocat LOCTYPE="URL" xlink:type="simple" xlink:href="file://./content/big.pdf"/> </file> </fileGrp> </fileSec> <structMap TYPE="ASSET"> <div ORDER="1" LABEL="File list" TYPE="ASSET"> <fptr FILEID="FILE-0"/> </div> </structMap> </mets> </koala>
[4] Example for db 00:
``` BaseX 9.0.2 [Standalone] Try 'help' to get more information.
open 00
Database '00' was opened in 2040.58 ms.
info database
Database Properties NAME: 00 SIZE: 269 MB NODES: 8782619 DOCUMENTS: 29471 BINARIES: 0 TIMESTAMP: 2020-01-09T13:31:43.000Z UPTODATE: true
Resource Properties INPUTPATH: INPUTSIZE: 0 b INPUTDATE: 2020-01-09T06:37:10.793Z
Indexes TEXTINDEX: true ATTRINDEX: true TOKENINDEX: false FTINDEX: true TEXTINCLUDE: ATTRINCLUDE: TOKENINCLUDE: FTINCLUDE: LANGUAGE: English STEMMING: false CASESENS: false DIACRITICS: false STOPWORDS: UPDINDEX: true AUTOOPTIMIZE: false MAXCATS: 100 MAXLEN: 96 SPLITSIZE: 0
```
[5] Info Index Output:
info index
Elements - Structure: Hash - Entries: 147 tool 185692x, leaf message 162967x, strings, leaf mdWrap 68088x xmlData 68088x techMD 68088x version 52735x, 22 distinct strings, leaf externalIdentifier 52319x, 28 distinct strings, leaf lmerFile:format 38617x, 15 distinct strings, leaf lmerFile:xmlData 38617x fits 38617x identification 38617x identity 38617x fileinfo 38617x filename 38617x, strings, leaf filepath 38617x, strings, leaf size 38617x, integers [948, 344172844], leaf md5checksum 38617x, strings, leaf fslastmodified 38617x, integers [1.354674301E12, 1.578101482E12], leaf filestatus 38617x ingestLevel 38617x, 4 distinct integers [1, 4], leaf metadata 38617x file 38617x FLocat 38617x, leaf fptr 38617x, leaf document 31004x well-formed 30515x, 2 distinct strings, leaf valid 30515x, 2 distinct strings, leaf lmerObject:metadataCreationDate 29471x, strings, leaf lmerObject:metadataRecordCreator 29471x, string, leaf lmerObject:numberOfFiles 29471x, 61 distinct integers [1, 235], leaf parent_asset_id 29471x, string, leaf fileSec 29471x fileGrp 29471x structMap 29471x div 29471x aip 29471x koala 29471x aip_is_deleted 29471x, leaf aip_deletion_timestamp 29471x, leaf storage_backend 29471x, string, leaf aip_size 29471x, integers [863, 1683060591], leaf asset_version 29471x, integer [1, 1], leaf mets 29471x metsHdr 29471x agent 29471x name 29471x, string, leaf note 29471x, string, leaf amdSec 29471x internal_asset_id 29471x, strings, leaf external_asset_id 29471x, strings, leaf asset_ingest_timestamp 29471x, strings, leaf lmerObject:persistentIdentifier 29471x, strings, leaf lmerObject:transferChecksum 29471x, strings, leaf lmerObject:masterCreationDate 29471x, strings, leaf isTagged 29315x, 2 distinct strings, leaf hasOutline 29315x, 2 distinct strings, leaf lastmodified 28499x, strings, leaf creatingApplicationName 28145x, strings, leaf isProtected 27796x, 2 distinct strings, leaf hasAnnotations 26362x, 2 distinct strings, leaf isRightsManaged 25577x, 4 distinct strings, leaf created 23912x, strings, leaf isPdfA 23902x, 2 distinct strings, leaf pageCount 23673x, integers [1, 3128], leaf hasForms 22050x, 2 distinct strings, leaf title 19639x, strings, leaf graphicsCount 19028x, integers [1, 20435], leaf author 12483x, strings, leaf pdfAVersion 11292x, 5 distinct strings, leaf language 8013x, 51 distinct strings, leaf lmerFile:category 7366x, string, leaf image 2118x imageWidth 2115x, integers [200, 7462], leaf bitsPerSample 2115x, 3 distinct strings, leaf imageHeight 2115x, integers [304, 10713], leaf YCbCrSubSampling 2084x, 4 distinct strings, leaf xSamplingFrequency 2083x, 55 distinct doubles [1, 1200], leaf ySamplingFrequency 2083x, 55 distinct doubles [1, 1200], leaf samplingFrequencyUnit 2082x, 3 distinct strings, leaf lightSource 1739x, 3 distinct strings, leaf channelMode 1649x, 3 distinct strings, leaf channels 1649x, 2 distinct integers [1, 2], leaf bitRate 1649x, 3 distinct integers [80000, 320000], leaf sampleRate 1649x, 2 distinct integers [44100, 48000], leaf audio 1649x text 1231x charset 1002x, 6 distinct strings, leaf markupBasis 940x, 3 distinct strings, leaf markupBasisVersion 939x, double [1, 1], leaf markupLanguage 735x, 2 distinct strings, leaf iccProfileName 708x, 15 distinct strings, leaf iccProfileVersion 708x, 6 distinct strings, leaf compressionScheme 578x, 3 distinct strings, leaf orientation 561x, 2 distinct strings, leaf scanningSoftwareName 533x, 66 distinct strings, leaf copyrightNote 481x, strings, leaf colorSpace 328x, 3 distinct strings, leaf samplesPerPixel 323x, 3 distinct integers [3, 5], leaf exifVersion 155x, 4 distinct integers [210, 230], leaf byteOrder 71x, string, leaf linebreak 58x, string, leaf YCbCrPositioning 21x, 2 distinct integers [1, 2], leaf apertureValue 16x, 11 distinct doubles [2.2, 13.4], leaf exposureBiasValue 16x, 2 distinct strings, leaf shutterSpeedValue 15x, 12 distinct strings, leaf flash 15x, 6 distinct strings, leaf digitalCameraModelName 15x, 13 distinct strings, leaf digitalCameraManufacturer 15x, 8 distinct strings, leaf isoSpeedRating 15x, 7 distinct integers [32, 1600], leaf focalLength 14x, 12 distinct doubles [4.2, 137], leaf exposureTime 14x, 11 distinct strings, leaf fNumber 14x, 9 distinct doubles [2.2, 6.3], leaf meteringMode 12x, 4 distinct strings, leaf maxApertureValue 10x, 5 distinct doubles [2.7, 4.9], leaf captureDevice 9x, string, leaf exposureProgram 8x, 4 distinct strings, leaf sensingMethod 8x, string, leaf inhibitorType 7x, 3 distinct strings, leaf scannerManufacturer 5x, 4 distinct strings, leaf scannerModelName 5x, 4 distinct strings, leaf creatingos 3x, string, leaf duration 2x, 2 distinct strings, leaf milliseconds 2x, 2 distinct integers [2209067, 6200496], leaf brightnessvalue 2x, 2 distinct doubles [3737910, 3737956.3692762186], leaf subjectDistance 2x, 2 distinct strings, leaf brightnessValue 2x, 2 distinct doubles [6.369276219, 9.768303187], leaf cfaPattern 2x, 2 distinct strings, leaf referenceBlackWhite 1x, string, leaf exposureIndex 1x, integer [100, 100], leaf usesStyleSheets 1x, string, leaf characterCount 1x, integer [451, 451], leaf wordCount 1x, integer [32, 32], leaf paragraphCount 1x, integer [1, 1], leaf gpsDateStamp 1x, string, leaf gpsLatitudeRef 1x, string, leaf gpsLatitude 1x, string, leaf gpsLongitudeRef 1x, string, leaf gpsLongitude 1x, string, leaf gpsAltitudeRef 1x, string, leaf gpsAltitude 1x, string, leaf gpsTimeStamp 1x, string, leaf gpsSpeedRef 1x, string, leaf gpsSpeed 1x, integer [0, 4.9E-324], leaf gpsImgDirectionRef 1x, string, leaf gpsImgDirection 1x, double [352.1332378, 352.1332378], leaf gpsDestBearingRef 1x, string, leaf gpsDestBearing 1x, double [352.1332378, 352.1332378], leaf
Attributes - Structure: Hash - Entries: 31 toolversion 1175464x, 20 distinct strings, leaf toolname 1175464x, 11 distinct strings, leaf status 773429x, 2 distinct strings, leaf ID 233735x, strings, leaf TYPE 117884x, 3 distinct strings, leaf MIMETYPE 106705x, 11 distinct strings, leaf MDTYPE 106705x, 2 distinct strings, leaf xsi:schemaLocation 68088x, strings, leaf CHECKSUMTYPE 68088x, 2 distinct strings, leaf OTHERMDTYPE 68088x, 2 distinct strings, leaf ADMID 68088x, strings, leaf LABEL 58942x, 2 distinct strings, leaf type 52319x, string, leaf CREATED 38617x, strings, leaf SIZE 38617x, integers [948, 344172844], leaf CHECKSUM 38617x, strings, leaf LOCTYPE 38617x, string, leaf xlink:type 38617x, string, leaf xlink:href 38617x, strings, leaf FILEID 38617x, strings, leaf REGISTRYNAME 38617x, string, leaf version 38617x, string, leaf timestamp 38617x, strings, leaf format 38617x, 14 distinct strings, leaf mimetype 38617x, 11 distinct strings, leaf ROLE 29471x, string, leaf CREATEDATE 29471x, strings, leaf PROFILE 29471x, string, leaf OBJID 29471x, strings, leaf RECORDSTATUS 29471x, string, leaf ORDER 29471x, integer [1, 1], leaf
Text Index - Structure: Sorted List - Names: - Size: 5866 kB - Entries: 446427 no 157290x 1 69949x true 52426x 4 31437x .... <removed for privacy reasons> ....
Attribute Index - Structure: Sorted List - Names: - Size: 8288 kB - Entries: 242140 SINGLE_RESULT 738299x Jhove 442253x 1.11 435034x Exiftool 215455x 7.74 215455x OIS File Information 156783x 0.1 156783x Droid 129111x 3.0 129111x ASSET 88413x FITS 77234x 0.6.1 77234x text/xml 70424x NLNZ Metadata Extractor 68965x 3.4GA 68965x OTHER 68088x application/pdf 58972x TechMD-LMER-Object 58942x TechMD-File--0 58942x FILE-0 58942x puid 52319x simple 38617x lmerFile 38617x URL 38617x SHA-1 38617x 0.9 38617x 0.2 35663x CONFLICT 35130x ... 00472a0cefc564ddeeaf01ef4eedefa648ae10a7 1x 00445341-1a94-45d1-96e0-bfbd285d0d9b 1x 0042e488-c19a-4824-9619-d396230236ae 1x 0041e4bb7a4ab62e00a9292e5f99254c260fab9d 1x 0040b05044d6242c7c746b82ff20ba793ae95e83 1x 0040782b-c857-495f-a7b0-83aa0e359452 1x 004004d7946daa934ac3313bbe719a19204ac982 1x 003df86b-035b-44fa-9c3e-a0dfdcf385b7 1x 003db1fa9bf4eb1cc721e2944d3fc4128e18691b 1x 003d1d59ef25783c8b0b611c485ab94101419dae 1x 003a6afc-3d12-4e0a-afd6-a0fa27babc75 1x 003a273f76a52e69167bd7663ca5ba2333a51e8e 1x 0038393fd7440e962ab44397700290f67708322b 1x 00381c23-1181-4cdb-a389-280cee83fa8c 1x 003564e47d4b24eb4e36a29d951b0ba77ed36a41 1x 003493e5e743a8cb209170cd3545ab47281bf1fc 1x 0032449e-1641-4d7b-8eb6-c0b6ce4bd73e 1x 00306c1e-67f1-4653-b1af-6c9cc8f466e3 1x 002ef043-303b-4372-8747-d424ad5d06ab 1x 002e19fe325f1a79f9c07929ed9649e5a88598e2 1x 002d9889ac5e27206fbe44bf204ddf77ec6d825a 1x 0028ee044ab6547ff0d2d0f5cfdefee2202653d0 1x 0028a8563f280fc2f6d2234c8bd5a374348bea4b 1x 002834455c3cc319a94b5801770f4f4a7bdb6d5f 1x 0024e7afb42e3466305c8688df8b26e9722596fe 1x 002275c7fb5dbb62291ea1db105f6ddf354cd67a 1x 002122dacfe6e2c268421f9f5fcdc708fa64ba28 1x 00200878d71f7184518266e6962d868c6c341523 1x 001fa27f1c3c2aec81f2257ac975e0d24a825333 1x 001f194b917c9312038b8f9cba4a46b5c007d1aa 1x
Token Index - Not available
Full-Text Index - Names: - Size: 39 MB - Entries: 386830 pdf 183610x 1 181572x <removed for privacy reasons>
Path Index doc(): 29471x koala: 29471x aip: 29471x parent_asset_id: 29471x, leaf text(): 25457x, 2 distinct strings, leaf asset_ingest_timestamp: 29471x, leaf text(): 29471x, strings, leaf external_asset_id: 29471x, leaf text(): 29471x, strings, leaf internal_asset_id: 29471x, leaf text(): 29471x, strings, leaf aip_is_deleted: 29471x, leaf aip_deletion_timestamp: 29471x, leaf storage_backend: 29471x, leaf text(): 29471x, string, leaf aip_size: 29471x, leaf text(): 29471x, integers [863, 1683060591], leaf asset_version: 29471x, leaf text(): 29471x, integer [1, 1], leaf mets: 29471x @OBJID: 29471x, strings, leaf @TYPE: 29471x, string, leaf @PROFILE: 29471x, string, leaf @xsi:schemaLocation: 29471x, strings, leaf metsHdr: 29471x @CREATEDATE: 29471x, strings, leaf @RECORDSTATUS: 29471x, string, leaf agent: 29471x @ROLE: 29471x, string, leaf @TYPE: 29471x, string, leaf name: 29471x, leaf text(): 29471x, string, leaf note: 29471x, leaf text(): 29471x, string, leaf amdSec: 29471x @ID: 29471x, string, leaf techMD: 68088x @ID: 68088x, strings, leaf mdWrap: 68088x @ID: 68088x, strings, leaf @MIMETYPE: 68088x, string, leaf @MDTYPE: 68088x, string, leaf @OTHERMDTYPE: 68088x, 2 distinct strings, leaf @LABEL: 29471x, string, leaf xmlData: 68088x lmerObject:persistentIdentifier: 29471x, leaf text(): 29471x, strings, leaf lmerObject:transferChecksum: 29471x, leaf @CHECKSUMTYPE: 29471x, string, leaf text(): 29471x, strings, leaf lmerObject:masterCreationDate: 29471x, leaf text(): 29471x, strings, leaf lmerObject:metadataCreationDate: 29471x, leaf text(): 29471x, strings, leaf lmerObject:metadataRecordCreator: 29471x, leaf text(): 29471x, string, leaf lmerObject:numberOfFiles: 29471x, leaf text(): 29471x, 61 distinct integers [1, 235], leaf lmerFile:format: 38617x, leaf @REGISTRYNAME: 38617x, string, leaf text(): 38617x, 15 distinct strings, leaf lmerFile:category: 7366x, leaf text(): 7366x, string, leaf lmerFile:xmlData: 38617x @MDTYPE: 38617x, string, leaf fits: 38617x @xsi:schemaLocation: 38617x, strings, leaf @version: 38617x, string, leaf @timestamp: 38617x, strings, leaf identification: 38617x @status: 38617x, string, leaf identity: 38617x @format: 38617x, 14 distinct strings, leaf @mimetype: 38617x, 11 distinct strings, leaf @toolname: 38617x, string, leaf @toolversion: 38617x, string, leaf tool: 185692x, leaf @toolname: 185692x, 8 distinct strings, leaf @toolversion: 185692x, 17 distinct strings, leaf version: 52735x, leaf @toolname: 52735x, 5 distinct strings, leaf @toolversion: 52735x, 10 distinct strings, leaf @status: 35130x, string, leaf text(): 52735x, 22 distinct strings, leaf externalIdentifier: 52319x, leaf @toolname: 52319x, string, leaf @toolversion: 52319x, double [3, 3], leaf @type: 52319x, string, leaf text(): 52319x, 28 distinct strings, leaf fileinfo: 38617x filename: 38617x, leaf @toolname: 38617x, 2 distinct strings, leaf @toolversion: 38617x, 4 distinct strings, leaf text(): 38617x, strings, leaf @status: 35296x, string, leaf filepath: 38617x, leaf @toolname: 38617x, 2 distinct strings, leaf @toolversion: 38617x, 4 distinct strings, leaf text(): 38617x, strings, leaf @status: 35296x, string, leaf size: 38617x, leaf @toolname: 38617x, 2 distinct strings, leaf @toolversion: 38617x, 3 distinct doubles [0.1, 1.5], leaf @status: 8957x, string, leaf text(): 38617x, integers [948, 344172844], leaf md5checksum: 38617x, leaf @toolname: 38617x, string, leaf @toolversion: 38617x, double [0.1, 0.1], leaf @status: 38617x, string, leaf text(): 38617x, strings, leaf fslastmodified: 38617x, leaf @toolname: 38617x, string, leaf @toolversion: 38617x, double [0.1, 0.1], leaf @status: 38617x, string, leaf text(): 38617x, integers [1.354674301E12, 1.578101482E12], leaf creatingApplicationName: 28145x, leaf @toolname: 28145x, 3 distinct strings, leaf @toolversion: 28145x, 4 distinct strings, leaf text(): 28145x, strings, leaf @status: 8983x, string, leaf lastmodified: 28499x, leaf @toolname: 28499x, string, leaf @toolversion: 28499x, double [7.74, 7.74], leaf @status: 28499x, string, leaf text(): 28499x, strings, leaf created: 23912x, leaf @toolname: 23912x, 2 distinct strings, leaf @toolversion: 23912x, 2 distinct strings, leaf @status: 23463x, string, leaf text(): 23912x, strings, leaf inhibitorType: 7x, leaf @toolname: 7x, string, leaf @toolversion: 7x, double [7.74, 7.74], leaf @status: 7x, string, leaf text(): 7x, 3 distinct strings, leaf copyrightNote: 481x, leaf @toolname: 481x, string, leaf @toolversion: 481x, double [7.74, 7.74], leaf @status: 481x, string, leaf text(): 481x, strings, leaf creatingos: 3x, leaf @toolname: 3x, string, leaf @toolversion: 3x, 3 distinct doubles [5.09, 5.32], leaf @status: 3x, string, leaf text(): 3x, string, leaf filestatus: 38617x well-formed: 30515x, leaf @toolname: 30515x, 2 distinct strings, leaf @toolversion: 30515x, 5 distinct strings, leaf @status: 30515x, string, leaf text(): 30515x, 2 distinct strings, leaf valid: 30515x, leaf @toolname: 30515x, 2 distinct strings, leaf @toolversion: 30515x, 5 distinct strings, leaf @status: 30515x, string, leaf text(): 30515x, 2 distinct strings, leaf ingestLevel: 38617x, leaf @toolname: 38617x, string, leaf @toolversion: 38617x, double [0.9, 0.9], leaf @status: 38617x, string, leaf text(): 38617x, 4 distinct integers [1, 4], leaf message: 162967x, leaf @toolname: 162967x, 2 distinct strings, leaf @toolversion: 162967x, 5 distinct strings, leaf @status: 162967x, string, leaf text(): 162967x, strings, leaf metadata: 38617x document: 31004x author: 12483x, leaf @toolname: 12483x, 3 distinct strings, leaf @toolversion: 12483x, 6 distinct strings, leaf @status: 5555x, string, leaf text(): 12467x, strings, leaf language: 8013x, leaf @toolname: 8013x, 3 distinct strings, leaf @toolversion: 8013x, 6 distinct strings, leaf @status: 3013x, string, leaf text(): 8013x, 51 distinct strings, leaf title: 19639x, leaf @toolname: 19639x, 3 distinct strings, leaf @toolversion: 19639x, 6 distinct strings, leaf @status: 10692x, string, leaf text(): 19402x, strings, leaf isRightsManaged: 25577x, leaf @toolname: 25577x, 2 distinct strings, leaf @toolversion: 25577x, 4 distinct strings, leaf @status: 25577x, string, leaf text(): 25577x, 4 distinct strings, leaf pageCount: 23673x, leaf @toolname: 23673x, 2 distinct strings, leaf @toolversion: 23673x, 2 distinct doubles [1.5, 7.74], leaf @status: 23534x, string, leaf text(): 23673x, integers [1, 3128], leaf graphicsCount: 19028x, leaf @toolname: 19028x, string, leaf @toolversion: 19028x, 2 distinct doubles [1.11, 1.5], leaf @status: 19028x, string, leaf text(): 19028x, integers [1, 20435], leaf isTagged: 29315x, leaf @toolname: 29315x, 2 distinct strings, leaf @toolversion: 29315x, 3 distinct strings, leaf text(): 29315x, 2 distinct strings, leaf @status: 10218x, string, leaf hasOutline: 29315x, leaf @toolname: 29315x, 2 distinct strings, leaf @toolversion: 29315x, 3 distinct strings, leaf text(): 29315x, 2 distinct strings, leaf @status: 10218x, string, leaf hasAnnotations: 26362x, leaf @toolname: 26362x, string, leaf @toolversion: 26362x, 2 distinct doubles [1.11, 1.5], leaf @status: 26362x, string, leaf text(): 26362x, 2 distinct strings, leaf isProtected: 27796x, leaf @toolname: 27796x, 2 distinct strings, leaf @toolversion: 27796x, 2 distinct strings, leaf text(): 27796x, 2 distinct strings, leaf @status: 9640x, string, leaf isPdfA: 23902x, leaf @toolname: 23902x, string, leaf @toolversion: 23902x, double [7.74, 7.74], leaf @status: 23902x, string, leaf text(): 23902x, 2 distinct strings, leaf hasForms: 22050x, leaf @toolname: 22050x, string, leaf @toolversion: 22050x, string, leaf @status: 22050x, string, leaf text(): 22050x, 2 distinct strings, leaf pdfAVersion: 11292x, leaf @toolname: 11292x, string, leaf @toolversion: 11292x, double [7.74, 7.74], leaf @status: 11292x, string, leaf text(): 11292x, 5 distinct strings, leaf text: 1231x charset: 1002x, leaf @toolname: 1002x, 3 distinct strings, leaf @toolversion: 1002x, 4 distinct strings, leaf text(): 1002x, 6 distinct strings, leaf @status: 204x, string, leaf markupBasis: 940x, leaf @toolname: 940x, 3 distinct strings, leaf @toolversion: 940x, 4 distinct strings, leaf text(): 940x, 3 distinct strings, leaf @status: 191x, string, leaf markupBasisVersion: 939x, leaf @toolname: 939x, 2 distinct strings, leaf @toolversion: 939x, 3 distinct strings, leaf text(): 939x, double [1, 1], leaf @status: 200x, string, leaf markupLanguage: 735x, leaf @toolname: 735x, string, leaf @toolversion: 735x, double [0.2, 0.2], leaf @status: 735x, string, leaf text(): 735x, 2 distinct strings, leaf linebreak: 58x, leaf @toolname: 58x, string, leaf @toolversion: 58x, double [1.11, 1.11], leaf @status: 58x, string, leaf text(): 58x, string, leaf usesStyleSheets: 1x, leaf @toolname: 1x, string, leaf @toolversion: 1x, string, leaf @status: 1x, string, leaf text(): 1x, string, leaf characterCount: 1x, leaf @toolname: 1x, string, leaf @toolversion: 1x, string, leaf @status: 1x, string, leaf text(): 1x, integer [451, 451], leaf wordCount: 1x, leaf @toolname: 1x, string, leaf @toolversion: 1x, string, leaf @status: 1x, string, leaf text(): 1x, integer [32, 32], leaf paragraphCount: 1x, leaf @toolname: 1x, string, leaf @toolversion: 1x, string, leaf @status: 1x, string, leaf text(): 1x, integer [1, 1], leaf image: 2118x imageWidth: 2115x, leaf @toolname: 2115x, 3 distinct strings, leaf @toolversion: 2115x, 3 distinct strings, leaf text(): 2115x, integers [200, 7462], leaf @status: 441x, string, leaf imageHeight: 2115x, leaf @toolname: 2115x, 3 distinct strings, leaf @toolversion: 2115x, 3 distinct strings, leaf text(): 2115x, integers [304, 10713], leaf @status: 441x, string, leaf iccProfileName: 708x, leaf @toolname: 708x, string, leaf @toolversion: 708x, double [7.74, 7.74], leaf @status: 708x, string, leaf text(): 708x, 15 distinct strings, leaf YCbCrSubSampling: 2084x, leaf @toolname: 2084x, string, leaf @toolversion: 2084x, double [7.74, 7.74], leaf @status: 2084x, string, leaf text(): 2084x, 4 distinct strings, leaf samplingFrequencyUnit: 2082x, leaf @toolname: 2082x, 3 distinct strings, leaf @toolversion: 2082x, 3 distinct strings, leaf text(): 2082x, 3 distinct strings, leaf @status: 394x, string, leaf xSamplingFrequency: 2083x, leaf @toolname: 2083x, 3 distinct strings, leaf @toolversion: 2083x, 3 distinct strings, leaf text(): 2083x, 55 distinct doubles [1, 1200], leaf @status: 394x, string, leaf ySamplingFrequency: 2083x, leaf @toolname: 2083x, 3 distinct strings, leaf @toolversion: 2083x, 3 distinct strings, leaf text(): 2083x, 55 distinct doubles [1, 1200], leaf @status: 394x, string, leaf iccProfileVersion: 708x, leaf @toolname: 708x, string, leaf @toolversion: 708x, double [7.74, 7.74], leaf @status: 708x, string, leaf text(): 708x, 6 distinct strings, leaf bitsPerSample: 2115x, leaf @toolname: 2115x, 3 distinct strings, leaf @toolversion: 2115x, 3 distinct strings, leaf text(): 2115x, 3 distinct strings, leaf @status: 441x, string, leaf lightSource: 1739x, leaf @toolname: 1739x, 2 distinct strings, leaf @toolversion: 1739x, 2 distinct strings, leaf @status: 1734x, string, leaf text(): 1739x, 3 distinct strings, leaf compressionScheme: 578x, leaf @toolname: 578x, 2 distinct strings, leaf @toolversion: 578x, 2 distinct doubles [1.5, 7.74], leaf @status: 577x, string, leaf text(): 578x, 3 distinct strings, leaf orientation: 561x, leaf @toolname: 561x, 2 distinct strings, leaf @toolversion: 561x, 2 distinct strings, leaf text(): 561x, 2 distinct strings, leaf @status: 76x, string, leaf scanningSoftwareName: 533x, leaf @toolname: 533x, string, leaf @toolversion: 533x, double [7.74, 7.74], leaf @status: 533x, string, leaf text(): 533x, 66 distinct strings, leaf colorSpace: 328x, leaf @toolname: 328x, 2 distinct strings, leaf @toolversion: 328x, 2 distinct doubles [1.5, 7.74], leaf @status: 327x, string, leaf text(): 328x, 3 distinct strings, leaf samplesPerPixel: 323x, leaf @toolname: 323x, 2 distinct strings, leaf @toolversion: 323x, 2 distinct doubles [1.5, 7.74], leaf @status: 322x, string, leaf text(): 323x, 3 distinct integers [3, 5], leaf YCbCrPositioning: 21x, leaf @toolname: 21x, string, leaf @toolversion: 21x, double [7.74, 7.74], leaf text(): 21x, 2 distinct integers [1, 2], leaf captureDevice: 9x, leaf @toolname: 9x, string, leaf @toolversion: 9x, double [7.74, 7.74], leaf @status: 9x, string, leaf text(): 9x, string, leaf digitalCameraManufacturer: 15x, leaf @toolname: 15x, string, leaf @toolversion: 15x, double [7.74, 7.74], leaf @status: 15x, string, leaf text(): 15x, 8 distinct strings, leaf digitalCameraModelName: 15x, leaf @toolname: 15x, string, leaf @toolversion: 15x, double [7.74, 7.74], leaf @status: 15x, string, leaf text(): 15x, 13 distinct strings, leaf fNumber: 14x, leaf @toolname: 14x, string, leaf @toolversion: 14x, double [7.74, 7.74], leaf text(): 14x, 9 distinct doubles [2.2, 6.3], leaf exposureTime: 14x, leaf @toolname: 14x, string, leaf @toolversion: 14x, double [7.74, 7.74], leaf text(): 14x, 11 distinct strings, leaf isoSpeedRating: 15x, leaf @toolname: 15x, string, leaf @toolversion: 15x, double [7.74, 7.74], leaf @status: 15x, string, leaf text(): 15x, 7 distinct integers [32, 1600], leaf exifVersion: 155x, leaf @toolname: 155x, string, leaf @toolversion: 155x, double [7.74, 7.74], leaf text(): 155x, 4 distinct integers [210, 230], leaf @status: 24x, string, leaf shutterSpeedValue: 15x, leaf @toolname: 15x, string, leaf @toolversion: 15x, double [7.74, 7.74], leaf @status: 15x, string, leaf text(): 15x, 12 distinct strings, leaf apertureValue: 16x, leaf @toolname: 16x, string, leaf @toolversion: 16x, double [7.74, 7.74], leaf text(): 16x, 11 distinct doubles [2.2, 13.4], leaf exposureBiasValue: 16x, leaf @toolname: 16x, string, leaf @toolversion: 16x, double [7.74, 7.74], leaf @status: 16x, string, leaf text(): 16x, 2 distinct strings, leaf maxApertureValue: 10x, leaf @toolname: 10x, string, leaf @toolversion: 10x, double [7.74, 7.74], leaf text(): 10x, 5 distinct doubles [2.7, 4.9], leaf meteringMode: 12x, leaf @toolname: 12x, string, leaf @toolversion: 12x, double [7.74, 7.74], leaf text(): 12x, 4 distinct strings, leaf flash: 15x, leaf @toolname: 15x, string, leaf @toolversion: 15x, double [7.74, 7.74], leaf text(): 15x, 6 distinct strings, leaf focalLength: 14x, leaf @toolname: 14x, string, leaf @toolversion: 14x, double [7.74, 7.74], leaf text(): 14x, 12 distinct doubles [4.2, 137], leaf sensingMethod: 8x, leaf @toolname: 8x, string, leaf @toolversion: 8x, double [7.74, 7.74], leaf text(): 8x, string, leaf referenceBlackWhite: 1x, leaf @toolname: 1x, string, leaf @toolversion: 1x, double [7.74, 7.74], leaf text(): 1x, string, leaf scannerManufacturer: 5x, leaf @toolname: 5x, string, leaf @toolversion: 5x, double [7.74, 7.74], leaf @status: 5x, string, leaf text(): 5x, 4 distinct strings, leaf scannerModelName: 5x, leaf @toolname: 5x, string, leaf @toolversion: 5x, double [7.74, 7.74], leaf @status: 5x, string, leaf text(): 5x, 4 distinct strings, leaf exposureProgram: 8x, leaf @toolname: 8x, string, leaf @toolversion: 8x, double [7.74, 7.74], leaf text(): 8x, 4 distinct strings, leaf subjectDistance: 2x, leaf @toolname: 2x, string, leaf @toolversion: 2x, double [7.74, 7.74], leaf text(): 2x, 2 distinct strings, leaf cfaPattern: 2x, leaf @toolname: 2x, string, leaf @toolversion: 2x, double [7.74, 7.74], leaf text(): 2x, 2 distinct strings, leaf byteOrder: 71x, leaf @toolname: 71x, string, leaf @toolversion: 71x, double [1.5, 1.5], leaf @status: 71x, string, leaf text(): 71x, string, leaf exposureIndex: 1x, leaf @toolname: 1x, string, leaf @toolversion: 1x, double [7.74, 7.74], leaf text(): 1x, integer [100, 100], leaf brightnessValue: 2x, leaf @toolname: 2x, string, leaf @toolversion: 2x, double [7.74, 7.74], leaf @status: 2x, string, leaf text(): 2x, 2 distinct doubles [6.369276219, 9.768303187], leaf brightnessvalue: 2x, leaf @toolname: 2x, string, leaf @toolversion: 2x, string, leaf @status: 2x, string, leaf text(): 2x, 2 distinct doubles [3737910, 3737956.3692762186], leaf gpsLatitudeRef: 1x, leaf @toolname: 1x, string, leaf @toolversion: 1x, double [7.74, 7.74], leaf @status: 1x, string, leaf text(): 1x, string, leaf gpsLatitude: 1x, leaf @toolname: 1x, string, leaf @toolversion: 1x, double [7.74, 7.74], leaf @status: 1x, string, leaf text(): 1x, string, leaf gpsLongitudeRef: 1x, leaf @toolname: 1x, string, leaf @toolversion: 1x, double [7.74, 7.74], leaf @status: 1x, string, leaf text(): 1x, string, leaf gpsLongitude: 1x, leaf @toolname: 1x, string, leaf @toolversion: 1x, double [7.74, 7.74], leaf @status: 1x, string, leaf text(): 1x, string, leaf gpsAltitudeRef: 1x, leaf @toolname: 1x, string, leaf @toolversion: 1x, double [7.74, 7.74], leaf @status: 1x, string, leaf text(): 1x, string, leaf gpsAltitude: 1x, leaf @toolname: 1x, string, leaf @toolversion: 1x, double [7.74, 7.74], leaf @status: 1x, string, leaf text(): 1x, string, leaf gpsTimeStamp: 1x, leaf @toolname: 1x, string, leaf @toolversion: 1x, double [7.74, 7.74], leaf @status: 1x, string, leaf text(): 1x, string, leaf gpsSpeedRef: 1x, leaf @toolname: 1x, string, leaf @toolversion: 1x, double [7.74, 7.74], leaf @status: 1x, string, leaf text(): 1x, string, leaf gpsSpeed: 1x, leaf @toolname: 1x, string, leaf @toolversion: 1x, double [7.74, 7.74], leaf @status: 1x, string, leaf text(): 1x, integer [0, 4.9E-324], leaf gpsImgDirectionRef: 1x, leaf @toolname: 1x, string, leaf @toolversion: 1x, double [7.74, 7.74], leaf @status: 1x, string, leaf text(): 1x, string, leaf gpsImgDirection: 1x, leaf @toolname: 1x, string, leaf @toolversion: 1x, double [7.74, 7.74], leaf @status: 1x, string, leaf text(): 1x, double [352.1332378, 352.1332378], leaf gpsDestBearingRef: 1x, leaf @toolname: 1x, string, leaf @toolversion: 1x, double [7.74, 7.74], leaf @status: 1x, string, leaf text(): 1x, string, leaf gpsDestBearing: 1x, leaf @toolname: 1x, string, leaf @toolversion: 1x, double [7.74, 7.74], leaf @status: 1x, string, leaf text(): 1x, double [352.1332378, 352.1332378], leaf gpsDateStamp: 1x, leaf @toolname: 1x, string, leaf @toolversion: 1x, double [7.74, 7.74], leaf @status: 1x, string, leaf text(): 1x, string, leaf audio: 1649x bitRate: 1649x, leaf @toolname: 1649x, string, leaf @toolversion: 1649x, double [7.74, 7.74], leaf @status: 1647x, string, leaf text(): 1649x, 3 distinct integers [80000, 320000], leaf sampleRate: 1649x, leaf @toolname: 1649x, string, leaf @toolversion: 1649x, double [7.74, 7.74], leaf @status: 1647x, string, leaf text(): 1649x, 2 distinct integers [44100, 48000], leaf channels: 1649x, leaf @toolname: 1649x, string, leaf @toolversion: 1649x, double [7.74, 7.74], leaf @status: 1647x, string, leaf text(): 1649x, 2 distinct integers [1, 2], leaf channelMode: 1649x, leaf @toolname: 1649x, string, leaf @toolversion: 1649x, double [7.74, 7.74], leaf @status: 1649x, string, leaf text(): 1649x, 3 distinct strings, leaf duration: 2x, leaf @toolname: 2x, string, leaf @toolversion: 2x, string, leaf @status: 2x, string, leaf text(): 2x, 2 distinct strings, leaf milliseconds: 2x, leaf @toolname: 2x, string, leaf @toolversion: 2x, string, leaf @status: 2x, string, leaf text(): 2x, 2 distinct integers [2209067, 6200496], leaf fileSec: 29471x fileGrp: 29471x @ID: 29471x, string, leaf @ADMID: 29471x, string, leaf file: 38617x @ID: 38617x, strings, leaf @ADMID: 38617x, strings, leaf @CREATED: 38617x, strings, leaf @SIZE: 38617x, integers [948, 344172844], leaf @CHECKSUM: 38617x, strings, leaf @CHECKSUMTYPE: 38617x, string, leaf @MIMETYPE: 38617x, 11 distinct strings, leaf FLocat: 38617x, leaf @LOCTYPE: 38617x, string, leaf @xlink:type: 38617x, string, leaf @xlink:href: 38617x, strings, leaf structMap: 29471x @TYPE: 29471x, string, leaf div: 29471x @ORDER: 29471x, integer [1, 1], leaf @LABEL: 29471x, string, leaf @TYPE: 29471x, string, leaf fptr: 38617x, leaf @FILEID: 38617x, strings, leaf
Dear Björn,
Our Article on Indexes may give you some hints [1]:
• Via the 'enforceindex' pragma, you can help the optimizer by indicating that all of your database have up-to-date index structures. • With db:text, you can directly access the database indexes and rewrite the remaining steps of your query in reverse order:
db:text($db, 'urn:123')/parent::lmerFile:format/...
Hope this helps Christian
[1] http://docs.basex.org/wiki/Indexes
On Mon, Jan 13, 2020 at 11:56 AM Braunschweig, Björn bjoern.braunschweig@gwdg.de wrote:
Hello everyone,
First of all thank you for building such a great product!
Unfortunately I encounter extremely slow queries using a basex instance. My xml files are distributed over 256 databases (00-ff) with roughly 250MB each. All share a structure similar to [3]. I executed OPTIMIZE ALL for all databases, see [4]. [5] shows the info index output. The linux server which is running the basex instance shows great amounts of io. I replicated the data to my local desktop and see the same amount of high io usage across all database files. For example: [data\81\tbl.basex, data\81\txt.basex, ..., etc]
Querying a single database is fast and the info output window shows the following [2], but a query across all dbs just stalls and does not return a result in a reasonable time.
Could somebody give me a hint how to speed things up?
regards, Björn
Dear Christian,
Thank you for your quick response.
Could you tell me why the following two queries have a difference in behaviour so that the first one apparently uses the text index [1] and the second one [2] does not?
Your second suggestion with db:text is nice but I cannot figure out how to efficiently extract other informations like something from the root element without crawling the whole database and performing a full table scan so to say.
Regards, Björn
[1] declare namespace lmerFile = "http://www.ddb.de/LMERfile"; count( (# db:enforceindex #) { let $dbs := ("00") for $db in $dbs for $doc in db:open($db) where $doc//lmerFile:format/text() = 'urn:diasid:fty:kopal:0000000000000000000000' return $doc })
Compiling: - merge steps: descendant::element(lmerFile:format) - rewrite for to let: for $db_1 in $dbs_0 - inline $dbs_0 - inline $db_1 - pre-evaluate db:open(database[,path]) to document-node() sequence: db:open("00") -> (db:open-pre("00", 0), ...) - apply text index for "urn:123" - rewrite cached filter to cached path: ((db:open-pre("00", 0), ...))[(descendant::element(lmerFile:format)/text() = "urn:123... -> db:text("00", "urn:123")/parent::element(lmerFile:format)/ancest... - rewrite where clause(s) - simplify FLWOR expression: db:text("00", "urn:123")/parent::element(lmerFile:format)/ancest... Optimized Query: count((# Q{http://basex.org/modules/db%7Denforceindex #) { db:text("00", "urn:123")/parent::element(lmerFile:format)/ancestor::node()[self::document-node()] }) Query: declare namespace lmerFile = "http://www.ddb.de/LMERfile"; count( (# db:enforceindex #) { let $dbs := ("00") for $db in $dbs for $doc in db:open($db) where $doc//lmerFile:format/text() = 'urn:123' return $doc }) Result: - Hit(s): 1 Item - Updated: 0 Items - Printed: 4 b - Read Locking: (global) - Write Locking: (none) Timing: - Parsing: 0.26 ms - Compiling: 55.69 ms - Evaluating: 60.66 ms - Printing: 0.37 ms - Total Time: 116.98 ms Query Plan: <QueryPlan compiled="true" updating="false"> <FnCount name="count" type="xs:integer" size="1"> <Extension type="document-node()*"> <DBPragma value=""> <QNm type="xs:QName" size="1">db:enforceindex</QNm> </DBPragma> <CachedPath type="document-node()*"> <ValueAccess index="text" type="text()*"> <IndexStaticDb type="item()*" database="00"/> <Str type="xs:string" size="1">urn:123</Str> </ValueAccess> <IterStep axis="parent" test="element(lmerFile:format)" type="element()?"/> <CachedStep axis="ancestor" test="node()" type="node()*"> <CachedPath type="item()*"> <IterStep axis="self" test="document-node()" type="document-node()?"/> </CachedPath> </CachedStep> </CachedPath> </Extension> </FnCount> </QueryPlan>
[2]
Compiling: - pre-evaluate expression list to xs:string sequence: ("00", "01") - merge steps: descendant::element(lmerFile:format) - inline $dbs_0 - rewrite where clause(s) Optimized Query: count((# Q{http://basex.org/modules/db%7Denforceindex #) { for $db_1 in ("00", "01") return (db:open($db_1))[(descendant::element(lmerFile:format)/text() = "urn:123")] }) Query: declare namespace lmerFile = "http://www.ddb.de/LMERfile"; count( (# db:enforceindex #) { let $dbs := ("00", "01") for $db in $dbs for $doc in db:open($db) where $doc//lmerFile:format/text() = 'urn:123' return $doc }) Result: - Hit(s): 1 Item - Updated: 0 Items - Printed: 4 b - Read Locking: (global) - Write Locking: (none) Timing: - Parsing: 0.27 ms - Compiling: 0.54 ms - Evaluating: 1778.21 ms - Printing: 0.31 ms - Total Time: 1779.33 ms Query Plan: <QueryPlan compiled="true" updating="false"> <FnCount name="count" type="xs:integer" size="1"> <Extension type="document-node()*"> <DBPragma value=""> <QNm type="xs:QName" size="1">db:enforceindex</QNm> </DBPragma> <GFLWOR type="document-node()*"> <For type="xs:string" size="1" name="$db" id="1"> <StrSeq type="xs:string+" size="2"> <Str type="xs:string" size="1">00</Str> <Str type="xs:string" size="1">01</Str> </StrSeq> </For> <IterFilter type="document-node()*"> <DbOpen name="db:open" type="document-node()*"> <VarRef type="xs:string" size="1" name="$db" id="1"/> </DbOpen> <CmpG op="=" type="xs:boolean" size="1"> <CachedPath type="text()*"> <IterStep axis="descendant" test="element(lmerFile:format)" type="element()*"/> <IterStep axis="child" test="text()" type="text()*"/> </CachedPath> <Str type="xs:string" size="1">urn:123</Str> </CmpG> </IterFilter> </GFLWOR> </Extension> </FnCount> </QueryPlan>
-----Ursprüngliche Nachricht----- Von: Christian Grün christian.gruen@gmail.com Gesendet: Montag, 13. Januar 2020 13:07 An: Braunschweig, Björn bjoern.braunschweig@gwdg.de Cc: basex-talk@mailman.uni-konstanz.de Betreff: Re: [basex-talk] slow search performance across multiple databases
Dear Björn,
Our Article on Indexes may give you some hints [1]:
• Via the 'enforceindex' pragma, you can help the optimizer by indicating that all of your database have up-to-date index structures. • With db:text, you can directly access the database indexes and rewrite the remaining steps of your query in reverse order:
db:text($db, 'urn:123')/parent::lmerFile:format/...
Hope this helps Christian
[1] http://docs.basex.org/wiki/Indexes
On Mon, Jan 13, 2020 at 11:56 AM Braunschweig, Björn bjoern.braunschweig@gwdg.de wrote:
Hello everyone,
First of all thank you for building such a great product!
Unfortunately I encounter extremely slow queries using a basex instance. My xml files are distributed over 256 databases (00-ff) with roughly 250MB each. All share a structure similar to [3]. I executed OPTIMIZE ALL for all databases, see [4]. [5] shows the info index output. The linux server which is running the basex instance shows great amounts of io. I replicated the data to my local desktop and see the same amount of high io usage across all database files. For example: [data\81\tbl.basex, data\81\txt.basex, ..., etc]
Querying a single database is fast and the info output window shows the following [2], but a query across all dbs just stalls and does not return a result in a reasonable time.
Could somebody give me a hint how to speed things up?
regards, Björn
Hi,
I have trouble seeing the original second query.
But if I got it correct then the problem is that to have BaseX automatically rewrite for indexing you have to supply the DB you want to search in as string literal. That is db:open("DB") and not let $db := "DB" db:open($db). Or for with multiple dbs for that matter.
Maybe I am wrong and the enforceindex pragma solves that now. I have to admit my code predates the pragma and always uses string literals when accessing a database.
Best regards
Omar
Am 14.01.2020 um 13:33 schrieb Braunschweig, Björn:
Dear Christian,
Thank you for your quick response.
Could you tell me why the following two queries have a difference in behaviour so that the first one apparently uses the text index [1] and the second one [2] does not?
Your second suggestion with db:text is nice but I cannot figure out how to efficiently extract other informations like something from the root element without crawling the whole database and performing a full table scan so to say.
Regards, Björn
[1] declare namespace lmerFile = "http://www.ddb.de/LMERfile"; count( (# db:enforceindex #) { let $dbs := ("00") for $db in $dbs for $doc in db:open($db) where $doc//lmerFile:format/text() = 'urn:diasid:fty:kopal:0000000000000000000000' return $doc })
Compiling:
- merge steps: descendant::element(lmerFile:format)
- rewrite for to let: for $db_1 in $dbs_0
- inline $dbs_0
- inline $db_1
- pre-evaluate db:open(database[,path]) to document-node() sequence: db:open("00") -> (db:open-pre("00", 0), ...)
- apply text index for "urn:123"
- rewrite cached filter to cached path: ((db:open-pre("00", 0), ...))[(descendant::element(lmerFile:format)/text() = "urn:123... -> db:text("00", "urn:123")/parent::element(lmerFile:format)/ancest...
- rewrite where clause(s)
- simplify FLWOR expression: db:text("00", "urn:123")/parent::element(lmerFile:format)/ancest...
Optimized Query: count((# Q{http://basex.org/modules/db%7Denforceindex #) { db:text("00", "urn:123")/parent::element(lmerFile:format)/ancestor::node()[self::document-node()] }) Query: declare namespace lmerFile = "http://www.ddb.de/LMERfile"; count( (# db:enforceindex #) { let $dbs := ("00") for $db in $dbs for $doc in db:open($db) where $doc//lmerFile:format/text() = 'urn:123' return $doc }) Result:
- Hit(s): 1 Item
- Updated: 0 Items
- Printed: 4 b
- Read Locking: (global)
- Write Locking: (none)
Timing:
- Parsing: 0.26 ms
- Compiling: 55.69 ms
- Evaluating: 60.66 ms
- Printing: 0.37 ms
- Total Time: 116.98 ms
Query Plan:
<QueryPlan compiled="true" updating="false"> <FnCount name="count" type="xs:integer" size="1"> <Extension type="document-node()*"> <DBPragma value=""> <QNm type="xs:QName" size="1">db:enforceindex</QNm> </DBPragma> <CachedPath type="document-node()*"> <ValueAccess index="text" type="text()*"> <IndexStaticDb type="item()*" database="00"/> <Str type="xs:string" size="1">urn:123</Str> </ValueAccess> <IterStep axis="parent" test="element(lmerFile:format)" type="element()?"/> <CachedStep axis="ancestor" test="node()" type="node()*"> <CachedPath type="item()*"> <IterStep axis="self" test="document-node()" type="document-node()?"/> </CachedPath> </CachedStep> </CachedPath> </Extension> </FnCount> </QueryPlan>
[2]
Compiling:
- pre-evaluate expression list to xs:string sequence: ("00", "01")
- merge steps: descendant::element(lmerFile:format)
- inline $dbs_0
- rewrite where clause(s)
Optimized Query: count((# Q{http://basex.org/modules/db%7Denforceindex #) { for $db_1 in ("00", "01") return (db:open($db_1))[(descendant::element(lmerFile:format)/text() = "urn:123")] }) Query: declare namespace lmerFile = "http://www.ddb.de/LMERfile"; count( (# db:enforceindex #) { let $dbs := ("00", "01") for $db in $dbs for $doc in db:open($db) where $doc//lmerFile:format/text() = 'urn:123' return $doc }) Result:
- Hit(s): 1 Item
- Updated: 0 Items
- Printed: 4 b
- Read Locking: (global)
- Write Locking: (none)
Timing:
- Parsing: 0.27 ms
- Compiling: 0.54 ms
- Evaluating: 1778.21 ms
- Printing: 0.31 ms
- Total Time: 1779.33 ms
Query Plan:
<QueryPlan compiled="true" updating="false"> <FnCount name="count" type="xs:integer" size="1"> <Extension type="document-node()*"> <DBPragma value=""> <QNm type="xs:QName" size="1">db:enforceindex</QNm> </DBPragma> <GFLWOR type="document-node()*"> <For type="xs:string" size="1" name="$db" id="1"> <StrSeq type="xs:string+" size="2"> <Str type="xs:string" size="1">00</Str> <Str type="xs:string" size="1">01</Str> </StrSeq> </For> <IterFilter type="document-node()*"> <DbOpen name="db:open" type="document-node()*"> <VarRef type="xs:string" size="1" name="$db" id="1"/> </DbOpen> <CmpG op="=" type="xs:boolean" size="1"> <CachedPath type="text()*"> <IterStep axis="descendant" test="element(lmerFile:format)" type="element()*"/> <IterStep axis="child" test="text()" type="text()*"/> </CachedPath> <Str type="xs:string" size="1">urn:123</Str> </CmpG> </IterFilter> </GFLWOR> </Extension> </FnCount> </QueryPlan>
-----Ursprüngliche Nachricht----- Von: Christian Grün christian.gruen@gmail.com Gesendet: Montag, 13. Januar 2020 13:07 An: Braunschweig, Björn bjoern.braunschweig@gwdg.de Cc: basex-talk@mailman.uni-konstanz.de Betreff: Re: [basex-talk] slow search performance across multiple databases
Dear Björn,
Our Article on Indexes may give you some hints [1]:
• Via the 'enforceindex' pragma, you can help the optimizer by indicating that all of your database have up-to-date index structures. • With db:text, you can directly access the database indexes and rewrite the remaining steps of your query in reverse order:
db:text($db, 'urn:123')/parent::lmerFile:format/...
Hope this helps Christian
[1] http://docs.basex.org/wiki/Indexes
On Mon, Jan 13, 2020 at 11:56 AM Braunschweig, Björn bjoern.braunschweig@gwdg.de wrote:
Hello everyone,
First of all thank you for building such a great product!
Unfortunately I encounter extremely slow queries using a basex instance. My xml files are distributed over 256 databases (00-ff) with roughly 250MB each. All share a structure similar to [3]. I executed OPTIMIZE ALL for all databases, see [4]. [5] shows the info index output. The linux server which is running the basex instance shows great amounts of io. I replicated the data to my local desktop and see the same amount of high io usage across all database files. For example: [data\81\tbl.basex, data\81\txt.basex, ..., etc]
Querying a single database is fast and the info output window shows the following [2], but a query across all dbs just stalls and does not return a result in a reasonable time.
Could somebody give me a hint how to speed things up?
regards, Björn
Dear Omar,
Sorry I forgot to include the second query and just copied the info output.
Here are the two queries again:
[1] Uses index declare namespace lmerFile = "http://www.ddb.de/LMERfile"; count( (# db:enforceindex #) { for $doc in db:open("00") where $doc//lmerFile:format/text() = 'urn:123' return $doc })
[2] Does not use index declare namespace lmerFile = "http://www.ddb.de/LMERfile"; count( (# db:enforceindex #) { let $dbs := ("00", "01") for $db in $dbs for $doc in db:open($db) where $doc//lmerFile:format/text() = 'urn:123' return $doc })
I have either a limited amount of databases I would like to access or all databases. Using anything but db:open with a string literal results in no indices are being used. Is there a solution to circumvent that problem?
Regards, Björn
-----Ursprüngliche Nachricht----- Von: BaseX-Talk basex-talk-bounces@mailman.uni-konstanz.de Im Auftrag von Omar Siam Gesendet: Dienstag, 14. Januar 2020 17:06 An: basex-talk@mailman.uni-konstanz.de Betreff: Re: [basex-talk] slow search performance across multiple databases
Hi,
I have trouble seeing the original second query.
But if I got it correct then the problem is that to have BaseX automatically rewrite for indexing you have to supply the DB you want to search in as string literal. That is db:open("DB") and not let $db := "DB" db:open($db). Or for with multiple dbs for that matter.
Maybe I am wrong and the enforceindex pragma solves that now. I have to admit my code predates the pragma and always uses string literals when accessing a database.
Best regards
Omar
Am 14.01.2020 um 13:33 schrieb Braunschweig, Björn:
Dear Christian,
Thank you for your quick response.
Could you tell me why the following two queries have a difference in behaviour so that the first one apparently uses the text index [1] and the second one [2] does not?
Your second suggestion with db:text is nice but I cannot figure out how to efficiently extract other informations like something from the root element without crawling the whole database and performing a full table scan so to say.
Regards, Björn
[1] declare namespace lmerFile = "http://www.ddb.de/LMERfile"; count( (# db:enforceindex #) { let $dbs := ("00") for $db in $dbs for $doc in db:open($db) where $doc//lmerFile:format/text() = 'urn:diasid:fty:kopal:0000000000000000000000' return $doc })
Compiling:
- merge steps: descendant::element(lmerFile:format)
- rewrite for to let: for $db_1 in $dbs_0
- inline $dbs_0
- inline $db_1
- pre-evaluate db:open(database[,path]) to document-node() sequence:
db:open("00") -> (db:open-pre("00", 0), ...)
- apply text index for "urn:123"
- rewrite cached filter to cached path: ((db:open-pre("00", 0), ...))[(descendant::element(lmerFile:format)/text() = "urn:123... -> db:text("00", "urn:123")/parent::element(lmerFile:format)/ancest...
- rewrite where clause(s)
- simplify FLWOR expression: db:text("00", "urn:123")/parent::element(lmerFile:format)/ancest...
Optimized Query: count((# Q{http://basex.org/modules/db%7Denforceindex #) { db:text("00", "urn:123")/parent::element(lmerFile:format)/ancestor::node()[self::doc ument-node()] }) Query: declare namespace lmerFile = "http://www.ddb.de/LMERfile"; count( (# db:enforceindex #) { let $dbs := ("00") for $db in $dbs for $doc in db:open($db) where $doc//lmerFile:format/text() = 'urn:123' return $doc }) Result:
- Hit(s): 1 Item
- Updated: 0 Items
- Printed: 4 b
- Read Locking: (global)
- Write Locking: (none)
Timing:
- Parsing: 0.26 ms
- Compiling: 55.69 ms
- Evaluating: 60.66 ms
- Printing: 0.37 ms
- Total Time: 116.98 ms
Query Plan:
<QueryPlan compiled="true" updating="false"> <FnCount name="count" type="xs:integer" size="1"> <Extension type="document-node()*"> <DBPragma value=""> <QNm type="xs:QName" size="1">db:enforceindex</QNm> </DBPragma> <CachedPath type="document-node()*"> <ValueAccess index="text" type="text()*"> <IndexStaticDb type="item()*" database="00"/> <Str type="xs:string" size="1">urn:123</Str> </ValueAccess> <IterStep axis="parent" test="element(lmerFile:format)" type="element()?"/> <CachedStep axis="ancestor" test="node()" type="node()*"> <CachedPath type="item()*"> <IterStep axis="self" test="document-node()" type="document-node()?"/> </CachedPath> </CachedStep> </CachedPath> </Extension> </FnCount> </QueryPlan>
[2]
Compiling:
- pre-evaluate expression list to xs:string sequence: ("00", "01")
- merge steps: descendant::element(lmerFile:format)
- inline $dbs_0
- rewrite where clause(s)
Optimized Query: count((# Q{http://basex.org/modules/db%7Denforceindex #) { for $db_1 in ("00", "01") return (db:open($db_1))[(descendant::element(lmerFile:format)/text() = "urn:123")] }) Query: declare namespace lmerFile = "http://www.ddb.de/LMERfile"; count( (# db:enforceindex #) { let $dbs := ("00", "01") for $db in $dbs for $doc in db:open($db) where $doc//lmerFile:format/text() = 'urn:123' return $doc }) Result:
- Hit(s): 1 Item
- Updated: 0 Items
- Printed: 4 b
- Read Locking: (global)
- Write Locking: (none)
Timing:
- Parsing: 0.27 ms
- Compiling: 0.54 ms
- Evaluating: 1778.21 ms
- Printing: 0.31 ms
- Total Time: 1779.33 ms
Query Plan:
<QueryPlan compiled="true" updating="false"> <FnCount name="count" type="xs:integer" size="1"> <Extension type="document-node()*"> <DBPragma value=""> <QNm type="xs:QName" size="1">db:enforceindex</QNm> </DBPragma> <GFLWOR type="document-node()*"> <For type="xs:string" size="1" name="$db" id="1"> <StrSeq type="xs:string+" size="2"> <Str type="xs:string" size="1">00</Str> <Str type="xs:string" size="1">01</Str> </StrSeq> </For> <IterFilter type="document-node()*"> <DbOpen name="db:open" type="document-node()*"> <VarRef type="xs:string" size="1" name="$db" id="1"/> </DbOpen> <CmpG op="=" type="xs:boolean" size="1"> <CachedPath type="text()*"> <IterStep axis="descendant" test="element(lmerFile:format)" type="element()*"/> <IterStep axis="child" test="text()" type="text()*"/> </CachedPath> <Str type="xs:string" size="1">urn:123</Str> </CmpG> </IterFilter> </GFLWOR> </Extension> </FnCount> </QueryPlan>
-----Ursprüngliche Nachricht----- Von: Christian Grün christian.gruen@gmail.com Gesendet: Montag, 13. Januar 2020 13:07 An: Braunschweig, Björn bjoern.braunschweig@gwdg.de Cc: basex-talk@mailman.uni-konstanz.de Betreff: Re: [basex-talk] slow search performance across multiple databases
Dear Björn,
Our Article on Indexes may give you some hints [1]:
• Via the 'enforceindex' pragma, you can help the optimizer by indicating that all of your database have up-to-date index structures. • With db:text, you can directly access the database indexes and rewrite the remaining steps of your query in reverse order:
db:text($db, 'urn:123')/parent::lmerFile:format/...
Hope this helps Christian
[1] http://docs.basex.org/wiki/Indexes
On Mon, Jan 13, 2020 at 11:56 AM Braunschweig, Björn bjoern.braunschweig@gwdg.de wrote:
Hello everyone,
First of all thank you for building such a great product!
Unfortunately I encounter extremely slow queries using a basex instance. My xml files are distributed over 256 databases (00-ff) with roughly 250MB each. All share a structure similar to [3]. I executed OPTIMIZE ALL for all databases, see [4]. [5] shows the info index output. The linux server which is running the basex instance shows great amounts of io. I replicated the data to my local desktop and see the same amount of high io usage across all database files. For example: [data\81\tbl.basex, data\81\txt.basex, ..., etc]
Querying a single database is fast and the info output window shows the following [2], but a query across all dbs just stalls and does not return a result in a reasonable time.
Could somebody give me a hint how to speed things up?
regards, Björn
Hi Björn,
[2] Does not use index declare namespace lmerFile = "http://www.ddb.de/LMERfile"; count( (# db:enforceindex #) { let $dbs := ("00", "01") for $db in $dbs for $doc in db:open($db) where $doc//lmerFile:format/text() = 'urn:123' return $doc })
The index rewriting should take effect if you e.g. rewrite db:open($db)/* instead of db:open($db). The background:
* The where clause will be attached to the for expression as predicate * As thefor expression is a simple function call, the result will be a filter expression: db:open($db)[.//lmerFile:format/text() = 'urn:123'] * If the for expression is a path expression, the where clause will be attached to the last predicate of this path. The result will be a path again. * Only path expressions are rewritten to index access, no filters.
But thanks for your observation. We will check if we can implicitly rewrite your function call to a path expression if it’s embedded in an iteration like yours.
Best, Christian
Von: BaseX-Talk basex-talk-bounces@mailman.uni-konstanz.de Im Auftrag von Omar Siam Gesendet: Dienstag, 14. Januar 2020 17:06 An: basex-talk@mailman.uni-konstanz.de Betreff: Re: [basex-talk] slow search performance across multiple databases
Hi,
I have trouble seeing the original second query.
But if I got it correct then the problem is that to have BaseX automatically rewrite for indexing you have to supply the DB you want to search in as string literal. That is db:open("DB") and not let $db := "DB" db:open($db). Or for with multiple dbs for that matter.
Maybe I am wrong and the enforceindex pragma solves that now. I have to admit my code predates the pragma and always uses string literals when accessing a database.
Best regards
Omar
Am 14.01.2020 um 13:33 schrieb Braunschweig, Björn:
Dear Christian,
Thank you for your quick response.
Could you tell me why the following two queries have a difference in behaviour so that the first one apparently uses the text index [1] and the second one [2] does not?
Your second suggestion with db:text is nice but I cannot figure out how to efficiently extract other informations like something from the root element without crawling the whole database and performing a full table scan so to say.
Regards, Björn
[1] declare namespace lmerFile = "http://www.ddb.de/LMERfile"; count( (# db:enforceindex #) { let $dbs := ("00") for $db in $dbs for $doc in db:open($db) where $doc//lmerFile:format/text() = 'urn:diasid:fty:kopal:0000000000000000000000' return $doc })
Compiling:
- merge steps: descendant::element(lmerFile:format)
- rewrite for to let: for $db_1 in $dbs_0
- inline $dbs_0
- inline $db_1
- pre-evaluate db:open(database[,path]) to document-node() sequence:
db:open("00") -> (db:open-pre("00", 0), ...)
- apply text index for "urn:123"
- rewrite cached filter to cached path: ((db:open-pre("00", 0), ...))[(descendant::element(lmerFile:format)/text() = "urn:123... -> db:text("00", "urn:123")/parent::element(lmerFile:format)/ancest...
- rewrite where clause(s)
- simplify FLWOR expression: db:text("00", "urn:123")/parent::element(lmerFile:format)/ancest...
Optimized Query: count((# Q{http://basex.org/modules/db%7Denforceindex #) { db:text("00", "urn:123")/parent::element(lmerFile:format)/ancestor::node()[self::doc ument-node()] }) Query: declare namespace lmerFile = "http://www.ddb.de/LMERfile"; count( (# db:enforceindex #) { let $dbs := ("00") for $db in $dbs for $doc in db:open($db) where $doc//lmerFile:format/text() = 'urn:123' return $doc }) Result:
- Hit(s): 1 Item
- Updated: 0 Items
- Printed: 4 b
- Read Locking: (global)
- Write Locking: (none)
Timing:
- Parsing: 0.26 ms
- Compiling: 55.69 ms
- Evaluating: 60.66 ms
- Printing: 0.37 ms
- Total Time: 116.98 ms
Query Plan:
<QueryPlan compiled="true" updating="false"> <FnCount name="count" type="xs:integer" size="1"> <Extension type="document-node()*"> <DBPragma value=""> <QNm type="xs:QName" size="1">db:enforceindex</QNm> </DBPragma> <CachedPath type="document-node()*"> <ValueAccess index="text" type="text()*"> <IndexStaticDb type="item()*" database="00"/> <Str type="xs:string" size="1">urn:123</Str> </ValueAccess> <IterStep axis="parent" test="element(lmerFile:format)" type="element()?"/> <CachedStep axis="ancestor" test="node()" type="node()*"> <CachedPath type="item()*"> <IterStep axis="self" test="document-node()" type="document-node()?"/> </CachedPath> </CachedStep> </CachedPath> </Extension> </FnCount> </QueryPlan>
[2]
Compiling:
- pre-evaluate expression list to xs:string sequence: ("00", "01")
- merge steps: descendant::element(lmerFile:format)
- inline $dbs_0
- rewrite where clause(s)
Optimized Query: count((# Q{http://basex.org/modules/db%7Denforceindex #) { for $db_1 in ("00", "01") return (db:open($db_1))[(descendant::element(lmerFile:format)/text() = "urn:123")] }) Query: declare namespace lmerFile = "http://www.ddb.de/LMERfile"; count( (# db:enforceindex #) { let $dbs := ("00", "01") for $db in $dbs for $doc in db:open($db) where $doc//lmerFile:format/text() = 'urn:123' return $doc }) Result:
- Hit(s): 1 Item
- Updated: 0 Items
- Printed: 4 b
- Read Locking: (global)
- Write Locking: (none)
Timing:
- Parsing: 0.27 ms
- Compiling: 0.54 ms
- Evaluating: 1778.21 ms
- Printing: 0.31 ms
- Total Time: 1779.33 ms
Query Plan:
<QueryPlan compiled="true" updating="false"> <FnCount name="count" type="xs:integer" size="1"> <Extension type="document-node()*"> <DBPragma value=""> <QNm type="xs:QName" size="1">db:enforceindex</QNm> </DBPragma> <GFLWOR type="document-node()*"> <For type="xs:string" size="1" name="$db" id="1"> <StrSeq type="xs:string+" size="2"> <Str type="xs:string" size="1">00</Str> <Str type="xs:string" size="1">01</Str> </StrSeq> </For> <IterFilter type="document-node()*"> <DbOpen name="db:open" type="document-node()*"> <VarRef type="xs:string" size="1" name="$db" id="1"/> </DbOpen> <CmpG op="=" type="xs:boolean" size="1"> <CachedPath type="text()*"> <IterStep axis="descendant" test="element(lmerFile:format)" type="element()*"/> <IterStep axis="child" test="text()" type="text()*"/> </CachedPath> <Str type="xs:string" size="1">urn:123</Str> </CmpG> </IterFilter> </GFLWOR> </Extension> </FnCount> </QueryPlan>
-----Ursprüngliche Nachricht----- Von: Christian Grün christian.gruen@gmail.com Gesendet: Montag, 13. Januar 2020 13:07 An: Braunschweig, Björn bjoern.braunschweig@gwdg.de Cc: basex-talk@mailman.uni-konstanz.de Betreff: Re: [basex-talk] slow search performance across multiple databases
Dear Björn,
Our Article on Indexes may give you some hints [1]:
• Via the 'enforceindex' pragma, you can help the optimizer by indicating that all of your database have up-to-date index structures. • With db:text, you can directly access the database indexes and rewrite the remaining steps of your query in reverse order:
db:text($db, 'urn:123')/parent::lmerFile:format/...
Hope this helps Christian
[1] http://docs.basex.org/wiki/Indexes
On Mon, Jan 13, 2020 at 11:56 AM Braunschweig, Björn bjoern.braunschweig@gwdg.de wrote:
Hello everyone,
First of all thank you for building such a great product!
Unfortunately I encounter extremely slow queries using a basex instance. My xml files are distributed over 256 databases (00-ff) with roughly 250MB each. All share a structure similar to [3]. I executed OPTIMIZE ALL for all databases, see [4]. [5] shows the info index output. The linux server which is running the basex instance shows great amounts of io. I replicated the data to my local desktop and see the same amount of high io usage across all database files. For example: [data\81\tbl.basex, data\81\txt.basex, ..., etc]
Querying a single database is fast and the info output window shows the following [2], but a query across all dbs just stalls and does not return a result in a reasonable time.
Could somebody give me a hint how to speed things up?
regards, Björn
Hi Björn, hi Omar,
All filter expressions that yield document nodes in distinct document order will now be rewritten to path expressions. Feel free to check out the latest stable snapshot [1].
Christian
[1] http://files.basex.org/releases/latest/
On Wed, Jan 15, 2020 at 9:22 AM Christian Grün christian.gruen@gmail.com wrote:
Hi Björn,
[2] Does not use index declare namespace lmerFile = "http://www.ddb.de/LMERfile"; count( (# db:enforceindex #) { let $dbs := ("00", "01") for $db in $dbs for $doc in db:open($db) where $doc//lmerFile:format/text() = 'urn:123' return $doc })
The index rewriting should take effect if you e.g. rewrite db:open($db)/* instead of db:open($db). The background:
- The where clause will be attached to the for expression as predicate
- As thefor expression is a simple function call, the result will be a
filter expression: db:open($db)[.//lmerFile:format/text() = 'urn:123']
- If the for expression is a path expression, the where clause will be
attached to the last predicate of this path. The result will be a path again.
- Only path expressions are rewritten to index access, no filters.
But thanks for your observation. We will check if we can implicitly rewrite your function call to a path expression if it’s embedded in an iteration like yours.
Best, Christian
Von: BaseX-Talk basex-talk-bounces@mailman.uni-konstanz.de Im Auftrag von Omar Siam Gesendet: Dienstag, 14. Januar 2020 17:06 An: basex-talk@mailman.uni-konstanz.de Betreff: Re: [basex-talk] slow search performance across multiple databases
Hi,
I have trouble seeing the original second query.
But if I got it correct then the problem is that to have BaseX automatically rewrite for indexing you have to supply the DB you want to search in as string literal. That is db:open("DB") and not let $db := "DB" db:open($db). Or for with multiple dbs for that matter.
Maybe I am wrong and the enforceindex pragma solves that now. I have to admit my code predates the pragma and always uses string literals when accessing a database.
Best regards
Omar
Am 14.01.2020 um 13:33 schrieb Braunschweig, Björn:
Dear Christian,
Thank you for your quick response.
Could you tell me why the following two queries have a difference in behaviour so that the first one apparently uses the text index [1] and the second one [2] does not?
Your second suggestion with db:text is nice but I cannot figure out how to efficiently extract other informations like something from the root element without crawling the whole database and performing a full table scan so to say.
Regards, Björn
[1] declare namespace lmerFile = "http://www.ddb.de/LMERfile"; count( (# db:enforceindex #) { let $dbs := ("00") for $db in $dbs for $doc in db:open($db) where $doc//lmerFile:format/text() = 'urn:diasid:fty:kopal:0000000000000000000000' return $doc })
Compiling:
- merge steps: descendant::element(lmerFile:format)
- rewrite for to let: for $db_1 in $dbs_0
- inline $dbs_0
- inline $db_1
- pre-evaluate db:open(database[,path]) to document-node() sequence:
db:open("00") -> (db:open-pre("00", 0), ...)
- apply text index for "urn:123"
- rewrite cached filter to cached path: ((db:open-pre("00", 0), ...))[(descendant::element(lmerFile:format)/text() = "urn:123... -> db:text("00", "urn:123")/parent::element(lmerFile:format)/ancest...
- rewrite where clause(s)
- simplify FLWOR expression: db:text("00", "urn:123")/parent::element(lmerFile:format)/ancest...
Optimized Query: count((# Q{http://basex.org/modules/db%7Denforceindex #) { db:text("00", "urn:123")/parent::element(lmerFile:format)/ancestor::node()[self::doc ument-node()] }) Query: declare namespace lmerFile = "http://www.ddb.de/LMERfile"; count( (# db:enforceindex #) { let $dbs := ("00") for $db in $dbs for $doc in db:open($db) where $doc//lmerFile:format/text() = 'urn:123' return $doc }) Result:
- Hit(s): 1 Item
- Updated: 0 Items
- Printed: 4 b
- Read Locking: (global)
- Write Locking: (none)
Timing:
- Parsing: 0.26 ms
- Compiling: 55.69 ms
- Evaluating: 60.66 ms
- Printing: 0.37 ms
- Total Time: 116.98 ms
Query Plan:
<QueryPlan compiled="true" updating="false"> <FnCount name="count" type="xs:integer" size="1"> <Extension type="document-node()*"> <DBPragma value=""> <QNm type="xs:QName" size="1">db:enforceindex</QNm> </DBPragma> <CachedPath type="document-node()*"> <ValueAccess index="text" type="text()*"> <IndexStaticDb type="item()*" database="00"/> <Str type="xs:string" size="1">urn:123</Str> </ValueAccess> <IterStep axis="parent" test="element(lmerFile:format)" type="element()?"/> <CachedStep axis="ancestor" test="node()" type="node()*"> <CachedPath type="item()*"> <IterStep axis="self" test="document-node()" type="document-node()?"/> </CachedPath> </CachedStep> </CachedPath> </Extension> </FnCount> </QueryPlan>
[2]
Compiling:
- pre-evaluate expression list to xs:string sequence: ("00", "01")
- merge steps: descendant::element(lmerFile:format)
- inline $dbs_0
- rewrite where clause(s)
Optimized Query: count((# Q{http://basex.org/modules/db%7Denforceindex #) { for $db_1 in ("00", "01") return (db:open($db_1))[(descendant::element(lmerFile:format)/text() = "urn:123")] }) Query: declare namespace lmerFile = "http://www.ddb.de/LMERfile"; count( (# db:enforceindex #) { let $dbs := ("00", "01") for $db in $dbs for $doc in db:open($db) where $doc//lmerFile:format/text() = 'urn:123' return $doc }) Result:
- Hit(s): 1 Item
- Updated: 0 Items
- Printed: 4 b
- Read Locking: (global)
- Write Locking: (none)
Timing:
- Parsing: 0.27 ms
- Compiling: 0.54 ms
- Evaluating: 1778.21 ms
- Printing: 0.31 ms
- Total Time: 1779.33 ms
Query Plan:
<QueryPlan compiled="true" updating="false"> <FnCount name="count" type="xs:integer" size="1"> <Extension type="document-node()*"> <DBPragma value=""> <QNm type="xs:QName" size="1">db:enforceindex</QNm> </DBPragma> <GFLWOR type="document-node()*"> <For type="xs:string" size="1" name="$db" id="1"> <StrSeq type="xs:string+" size="2"> <Str type="xs:string" size="1">00</Str> <Str type="xs:string" size="1">01</Str> </StrSeq> </For> <IterFilter type="document-node()*"> <DbOpen name="db:open" type="document-node()*"> <VarRef type="xs:string" size="1" name="$db" id="1"/> </DbOpen> <CmpG op="=" type="xs:boolean" size="1"> <CachedPath type="text()*"> <IterStep axis="descendant" test="element(lmerFile:format)" type="element()*"/> <IterStep axis="child" test="text()" type="text()*"/> </CachedPath> <Str type="xs:string" size="1">urn:123</Str> </CmpG> </IterFilter> </GFLWOR> </Extension> </FnCount> </QueryPlan>
-----Ursprüngliche Nachricht----- Von: Christian Grün christian.gruen@gmail.com Gesendet: Montag, 13. Januar 2020 13:07 An: Braunschweig, Björn bjoern.braunschweig@gwdg.de Cc: basex-talk@mailman.uni-konstanz.de Betreff: Re: [basex-talk] slow search performance across multiple databases
Dear Björn,
Our Article on Indexes may give you some hints [1]:
• Via the 'enforceindex' pragma, you can help the optimizer by indicating that all of your database have up-to-date index structures. • With db:text, you can directly access the database indexes and rewrite the remaining steps of your query in reverse order:
db:text($db, 'urn:123')/parent::lmerFile:format/...
Hope this helps Christian
[1] http://docs.basex.org/wiki/Indexes
On Mon, Jan 13, 2020 at 11:56 AM Braunschweig, Björn bjoern.braunschweig@gwdg.de wrote:
Hello everyone,
First of all thank you for building such a great product!
Unfortunately I encounter extremely slow queries using a basex instance. My xml files are distributed over 256 databases (00-ff) with roughly 250MB each. All share a structure similar to [3]. I executed OPTIMIZE ALL for all databases, see [4]. [5] shows the info index output. The linux server which is running the basex instance shows great amounts of io. I replicated the data to my local desktop and see the same amount of high io usage across all database files. For example: [data\81\tbl.basex, data\81\txt.basex, ..., etc]
Querying a single database is fast and the info output window shows the following [2], but a query across all dbs just stalls and does not return a result in a reasonable time.
Could somebody give me a hint how to speed things up?
regards, Björn
Hi Christian,
Awesome, thank you very much. I followed your suggestion and it works well.
Björn
-----Ursprüngliche Nachricht----- Von: Christian Grün christian.gruen@gmail.com Gesendet: Mittwoch, 15. Januar 2020 11:24 An: Braunschweig, Björn bjoern.braunschweig@gwdg.de Cc: Omar Siam Omar.Siam@oeaw.ac.at; basex-talk@mailman.uni-konstanz.de Betreff: Re: [basex-talk] slow search performance across multiple databases
Hi Björn, hi Omar,
All filter expressions that yield document nodes in distinct document order will now be rewritten to path expressions. Feel free to check out the latest stable snapshot [1].
Christian
[1] http://files.basex.org/releases/latest/
On Wed, Jan 15, 2020 at 9:22 AM Christian Grün christian.gruen@gmail.com wrote:
Hi Björn,
[2] Does not use index declare namespace lmerFile = "http://www.ddb.de/LMERfile"; count( (# db:enforceindex #) { let $dbs := ("00", "01") for $db in $dbs for $doc in db:open($db) where $doc//lmerFile:format/text() = 'urn:123' return $doc })
The index rewriting should take effect if you e.g. rewrite db:open($db)/* instead of db:open($db). The background:
- The where clause will be attached to the for expression as predicate
- As thefor expression is a simple function call, the result will be a
filter expression: db:open($db)[.//lmerFile:format/text() = 'urn:123']
- If the for expression is a path expression, the where clause will be
attached to the last predicate of this path. The result will be a path again.
- Only path expressions are rewritten to index access, no filters.
But thanks for your observation. We will check if we can implicitly rewrite your function call to a path expression if it’s embedded in an iteration like yours.
Best, Christian
Von: BaseX-Talk basex-talk-bounces@mailman.uni-konstanz.de Im Auftrag von Omar Siam Gesendet: Dienstag, 14. Januar 2020 17:06 An: basex-talk@mailman.uni-konstanz.de Betreff: Re: [basex-talk] slow search performance across multiple databases
Hi,
I have trouble seeing the original second query.
But if I got it correct then the problem is that to have BaseX automatically rewrite for indexing you have to supply the DB you want to search in as string literal. That is db:open("DB") and not let $db := "DB" db:open($db). Or for with multiple dbs for that matter.
Maybe I am wrong and the enforceindex pragma solves that now. I have to admit my code predates the pragma and always uses string literals when accessing a database.
Best regards
Omar
Am 14.01.2020 um 13:33 schrieb Braunschweig, Björn:
Dear Christian,
Thank you for your quick response.
Could you tell me why the following two queries have a difference in behaviour so that the first one apparently uses the text index [1] and the second one [2] does not?
Your second suggestion with db:text is nice but I cannot figure out how to efficiently extract other informations like something from the root element without crawling the whole database and performing a full table scan so to say.
Regards, Björn
[1] declare namespace lmerFile = "http://www.ddb.de/LMERfile"; count( (# db:enforceindex #) { let $dbs := ("00") for $db in $dbs for $doc in db:open($db) where $doc//lmerFile:format/text() = 'urn:diasid:fty:kopal:0000000000000000000000' return $doc })
Compiling:
- merge steps: descendant::element(lmerFile:format)
- rewrite for to let: for $db_1 in $dbs_0
- inline $dbs_0
- inline $db_1
- pre-evaluate db:open(database[,path]) to document-node() sequence:
db:open("00") -> (db:open-pre("00", 0), ...)
- apply text index for "urn:123"
- rewrite cached filter to cached path: ((db:open-pre("00", 0), ...))[(descendant::element(lmerFile:format)/text() = "urn:123... -> db:text("00", "urn:123")/parent::element(lmerFile:format)/ancest...
- rewrite where clause(s)
- simplify FLWOR expression: db:text("00", "urn:123")/parent::element(lmerFile:format)/ancest...
Optimized Query: count((# Q{http://basex.org/modules/db%7Denforceindex #) { db:text("00", "urn:123")/parent::element(lmerFile:format)/ancestor::node()[self: :doc ument-node()] }) Query: declare namespace lmerFile = "http://www.ddb.de/LMERfile"; count( (# db:enforceindex #) { let $dbs := ("00") for $db in $dbs for $doc in db:open($db) where $doc//lmerFile:format/text() = 'urn:123' return $doc }) Result:
- Hit(s): 1 Item
- Updated: 0 Items
- Printed: 4 b
- Read Locking: (global)
- Write Locking: (none)
Timing:
- Parsing: 0.26 ms
- Compiling: 55.69 ms
- Evaluating: 60.66 ms
- Printing: 0.37 ms
- Total Time: 116.98 ms
Query Plan:
<QueryPlan compiled="true" updating="false"> <FnCount name="count" type="xs:integer" size="1"> <Extension type="document-node()*"> <DBPragma value=""> <QNm type="xs:QName" size="1">db:enforceindex</QNm> </DBPragma> <CachedPath type="document-node()*"> <ValueAccess index="text" type="text()*"> <IndexStaticDb type="item()*" database="00"/> <Str type="xs:string" size="1">urn:123</Str> </ValueAccess> <IterStep axis="parent" test="element(lmerFile:format)" type="element()?"/> <CachedStep axis="ancestor" test="node()" type="node()*"> <CachedPath type="item()*"> <IterStep axis="self" test="document-node()" type="document-node()?"/> </CachedPath> </CachedStep> </CachedPath> </Extension> </FnCount> </QueryPlan>
[2]
Compiling:
- pre-evaluate expression list to xs:string sequence: ("00", "01")
- merge steps: descendant::element(lmerFile:format)
- inline $dbs_0
- rewrite where clause(s)
Optimized Query: count((# Q{http://basex.org/modules/db%7Denforceindex #) { for $db_1 in ("00", "01") return (db:open($db_1))[(descendant::element(lmerFile:format)/text() = "urn:123")] }) Query: declare namespace lmerFile = "http://www.ddb.de/LMERfile"; count( (# db:enforceindex #) { let $dbs := ("00", "01") for $db in $dbs for $doc in db:open($db) where $doc//lmerFile:format/text() = 'urn:123' return $doc }) Result:
- Hit(s): 1 Item
- Updated: 0 Items
- Printed: 4 b
- Read Locking: (global)
- Write Locking: (none)
Timing:
- Parsing: 0.27 ms
- Compiling: 0.54 ms
- Evaluating: 1778.21 ms
- Printing: 0.31 ms
- Total Time: 1779.33 ms
Query Plan:
<QueryPlan compiled="true" updating="false"> <FnCount name="count" type="xs:integer" size="1"> <Extension type="document-node()*"> <DBPragma value=""> <QNm type="xs:QName" size="1">db:enforceindex</QNm> </DBPragma> <GFLWOR type="document-node()*"> <For type="xs:string" size="1" name="$db" id="1"> <StrSeq type="xs:string+" size="2"> <Str type="xs:string" size="1">00</Str> <Str type="xs:string" size="1">01</Str> </StrSeq> </For> <IterFilter type="document-node()*"> <DbOpen name="db:open" type="document-node()*"> <VarRef type="xs:string" size="1" name="$db" id="1"/> </DbOpen> <CmpG op="=" type="xs:boolean" size="1"> <CachedPath type="text()*"> <IterStep axis="descendant" test="element(lmerFile:format)" type="element()*"/> <IterStep axis="child" test="text()" type="text()*"/> </CachedPath> <Str type="xs:string" size="1">urn:123</Str> </CmpG> </IterFilter> </GFLWOR> </Extension> </FnCount> </QueryPlan>
-----Ursprüngliche Nachricht----- Von: Christian Grün christian.gruen@gmail.com Gesendet: Montag, 13. Januar 2020 13:07 An: Braunschweig, Björn bjoern.braunschweig@gwdg.de Cc: basex-talk@mailman.uni-konstanz.de Betreff: Re: [basex-talk] slow search performance across multiple databases
Dear Björn,
Our Article on Indexes may give you some hints [1]:
• Via the 'enforceindex' pragma, you can help the optimizer by indicating that all of your database have up-to-date index structures. • With db:text, you can directly access the database indexes and rewrite the remaining steps of your query in reverse order:
db:text($db, 'urn:123')/parent::lmerFile:format/...
Hope this helps Christian
[1] http://docs.basex.org/wiki/Indexes
On Mon, Jan 13, 2020 at 11:56 AM Braunschweig, Björn bjoern.braunschweig@gwdg.de wrote:
Hello everyone,
First of all thank you for building such a great product!
Unfortunately I encounter extremely slow queries using a basex instance. My xml files are distributed over 256 databases (00-ff) with roughly 250MB each. All share a structure similar to [3]. I executed OPTIMIZE ALL for all databases, see [4]. [5] shows the info index output. The linux server which is running the basex instance shows great amounts of io. I replicated the data to my local desktop and see the same amount of high io usage across all database files. For example: [data\81\tbl.basex, data\81\txt.basex, ..., etc]
Querying a single database is fast and the info output window shows the following [2], but a query across all dbs just stalls and does not return a result in a reasonable time.
Could somebody give me a hint how to speed things up?
regards, Björn
basex-talk@mailman.uni-konstanz.de