Hi Easy,
so you want to know if eXist-db or BaseX is the better system? There’s clearly no answer to this, as XML as technology is just too diverse. Next, the history of both systems are simply too different. It would be easy for us to write benchmarks that outperform all other XML databases, but the eXist-db guys could probably manage to do the same.
If you have the impression anyway that BaseX serves your needs better, then my advise is to just stick with it ;) If you rather want to know how to handle the amount of data, this may help:
• First of all, you could have a look at the data size you want to store in BaseX. My naïve computation results in 4,000 bytes * 100 * 500,000 ≈ 200 GB (and I assume, the amount will increase over the years)
• It may theoretically be possible to store all data in a single database, but there are several reasons for partitioning your data into multiple database instances (they can still be accessed by a single XQuery expression). There is no ideal size for a single database, but things like indexing get easier if it does not exceed 10 gb.
Do you simply need to store all data, or do you also want to perform updates, create backups, etc.? Christian ___________________________
I have a question consult with you, I want to plan using basex db to manage our cityizen EHR data( more than 500,0000 people),every people have more than 100 dataset xml file like:
<BasicDataset> <DataSet code="HDSD00.01.01" codeSystem="WS365-2011" codeSystemName="城乡居民健康档案基本数据集"/> <HDSN00.01.001 dataelementName="记录日期" value="20110411160000"/> <HDSN00.01.002 dataelementName="记录者姓名" value="少华"/> <HDSN00.01.003 dataelementName="记录人编号" value=""/> <HDSN00.01.004 dataelementName="记录人联系电话" value="13518272323"/> <HDSN00.01.005 dataelementName="记录人所属机构名称" value="卫生局"/> <HDSN00.01.006 dataelementName="记录人所属机构代码" value="PDY87041051068311X1119"/> <HDSN00.01.007 dataelementName="记录人所属机构地址" value="和路135号"/> <HDSN00.01.008 dataelementName="文档保管机构名称" value="卫生局"/> <HDSN00.01.009 dataelementName="文档保管机构代码" value="PDY87041051068311X1119"/> <HDSN00.01.010 dataelementName="文档保管机构地址" value="路135号"/> <HDSN00.01.011 dataelementName="文档保管机构联系电话" value="13518272323"/> <HDSN00.01.012 dataelementName="" value=""/> <HDSN00.01.013 dataelementName="文档编号" value="0000A28072174D28A92AC5716D39C544"/> <HDSN00.01.014 dataelementName="文档生成日期" value="20131101195713"/> <HDSN00.01.015 dataelementName="参与人电话" value="xxxxx"/> <HDSN00.01.016 dataelementName="参与人姓名" value="华"/> <HDSN00.01.032 dataelementName="源ID" value="0000A28072174D28A92AC5716D39C544"/> <HDSD00.01.376 value=""/> <HDSD00.01.377 value=""/> <HDSD00.01.378 value=""/> <HDSD00.01.379 value=""/> <HDSD00.01.380 value=""/> <HDSD00.01.381 value=""/> <HDSD00.01.424 value=""/> <HDSD00.01.008 value="18781045619"/> <HDSD00.01.018 code="4" displayName="不详" codeSystem="CV07.10.003" codeSystemName="医疗费用来源类别代码表"/> <HDSD00.01.041 code="1" displayName="卫生厕所" codeSystem="CV03.00.304" codeSystemName="厕所类别代码表"/> <HDSD00.01.014 code="3" displayName="不详" codeSystem="" codeSystemName=""/> <HDSD00.01.040 code="1" displayName="自来水" codeSystem="CV03.00.115" codeSystemName="饮水类别代码表"/> <HDSD00.01.006 value="51062219500408121X"/> <HDSD00.01.042 code="1" displayName="不详" codeSystem="" codeSystemName=""/> <HDSD00.01.002 value="杨国安"/> <HDSD00.01.013 code="5" displayName="不详" codeSystem="CV04.50.005" codeSystemName="ABO血型代码表"/> <HDSD00.01.007 value="清泉村"/> <HDSD00.01.019 value="true"/> <HDSD00.01.009 value="唐公健"/> <HDSD00.01.003 code="1" displayName="男" codeSystem="dicSex" codeSystemName="性别"/> <HDSD00.01.017 code="2" displayName="已婚" codeSystem="dicMaritalStatus" codeSystemName="婚姻"/> <HDSD00.01.001 value="51068310420305002501"/> <HDSD00.01.010 value="13981093150"/> <HDSD00.01.038 code="2" displayName="换气扇" codeSystem="CV03.00.302" codeSystemName="厨房排风设施类别代码表"/> <HDSD00.01.030 value="false"/> <HDSD00.01.024 value="false"/> <HDSD00.01.015 code="1" displayName="研究生" codeSystem="dicEducational" codeSystemName="学历"/> <HDSD00.01.027 value="false"/> <HDSD00.01.039 code="1" displayName="液化气" codeSystem="CV03.00.303" codeSystemName="燃料类型类别代码表"/> <HDSD00.01.016 code="5" displayName="农林牧渔水利业生产人员" codeSystem="Classifications" codeSystemName="职业"/> <HDSD00.01.012 code="01" displayName="不详" codeSystem="dicNationality" codeSystemName="民族"/> <HDSD00.0l.011 value="true"/> <HDsD00.01.004 value="19500407160000"/> <HDSD00.01.037 value="false"/> <HDSD00.01.01.02> <row/> </HDSD00.01.01.02> <HDSD00.01.01.03> <row/> </HDSD00.01.01.03> <HDSD00.01.01.04> <row/> </HDSD00.01.01.04> <HDSD00.01.01.06> <row/> </HDSD00.01.01.06> <HDSD00.01.01.08> <row/> </HDSD00.01.01.08> <HDSD00.01.01.09> <row/> </HDSD00.01.01.09> <HDSD00.01.01.05> <row/> </HDSD00.01.01.05> <HDSD00.01.01.07> <row/> </HDSD00.01.01.07> </BasicDataset>
before, I used existdb to do this, but I found existdb is difficult ,query on large data often make system dead, so I want to know, the basex is better to do this, because I found basex can index much qucik, query response well, and do import data file quick, can you give me some advice?