Hi,
I have huge XML document containing around 70000000 child nodes. I want to implement paging on this, so that I can fetch a chunk of nodes at time, but as the start position value increases from 10000000 the execution time takes more than a second to execute, is there a any optimized way I can follow.
I have written following code:
public class XQJ { public static void main(String[] args) throws XQException { XQDataSource xqs = new BaseXXQDataSource(); xqs.setProperty("serverName", "localhost"); xqs.setProperty("port", "1984");
// Change USERNAME and PASSWORD values XQConnection conn = xqs.getConnection("admin", "admin");
XQPreparedExpression xqpe = conn.prepareExpression("declare namespace xbrli=' http://www.xbrl.org/2003/instance';" + " declare variable $doc as xs:string external;" + " declare variable $start as xs:integer external;" + " declare variable $pageSize as xs:integer external;" + " let $allMatches := doc($doc)/xbrli:xbrl/*" + " return subsequence($allMatches,$start,$pageSize)"); // " return $matches");
xqpe.bindString(new QName("doc"), "1389962424906/1389962424906/facts.xml", null); int totalRecords = 70000000; int pageSize = 10000; int noOfPages = totalRecords/pageSize; long startTime = 0; int start = 0; for(int i=0; i<noOfPages; i++){ startTime = System.currentTimeMillis(); start = i*pageSize;
xqpe.bindInt(new QName("start"), start, null); xqpe.bindInt(new QName("pageSize"), pageSize, null);
XQResultSequence rs = xqpe.executeQuery();
// while(rs.next()){} // System.out.println(rs.getItemAsString(null));
// System.out.println(start + " : " + (System.currentTimeMillis()-startTime)); } System.out.println(start + " : " + (System.currentTimeMillis()-startTime)); conn.close(); }
Hi Geet,
I can’t think of an easy way to page results with the current database/query architecture. One spontaneous idea: You could request all node ids in a first query [1] and requests the hits in a second query [2]. When requesting 70 million nodes, however, the list would get pretty huge.
Christian
[1] http://docs.basex.org/wiki/Database_Module#db:node-id [2] http://docs.basex.org/wiki/Database_Module#db:node-pre
On Mon, Jan 20, 2014 at 11:06 AM, Geet Gangwar geetgangwar@gmail.com wrote:
Hi,
I have huge XML document containing around 70000000 child nodes. I want to implement paging on this, so that I can fetch a chunk of nodes at time, but as the start position value increases from 10000000 the execution time takes more than a second to execute, is there a any optimized way I can follow.
I have written following code:
public class XQJ { public static void main(String[] args) throws XQException { XQDataSource xqs = new BaseXXQDataSource(); xqs.setProperty("serverName", "localhost"); xqs.setProperty("port", "1984");
// Change USERNAME and PASSWORD values XQConnection conn = xqs.getConnection("admin", "admin"); XQPreparedExpression xqpe = conn.prepareExpression("declare namespace
xbrli='http://www.xbrl.org/2003/instance';" + " declare variable $doc as xs:string external;" + " declare variable $start as xs:integer external;" + " declare variable $pageSize as xs:integer external;" + " let $allMatches := doc($doc)/xbrli:xbrl/*" + " return subsequence($allMatches,$start,$pageSize)"); // " return $matches");
xqpe.bindString(new QName("doc"),
"1389962424906/1389962424906/facts.xml", null); int totalRecords = 70000000; int pageSize = 10000; int noOfPages = totalRecords/pageSize; long startTime = 0; int start = 0; for(int i=0; i<noOfPages; i++){ startTime = System.currentTimeMillis(); start = i*pageSize;
xqpe.bindInt(new QName("start"), start, null); xqpe.bindInt(new QName("pageSize"), pageSize, null); XQResultSequence rs = xqpe.executeQuery();
// while(rs.next()){} // System.out.println(rs.getItemAsString(null));
// System.out.println(start + " : " + (System.currentTimeMillis()-startTime)); } System.out.println(start + " : " + (System.currentTimeMillis()-startTime)); conn.close(); }
BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk
basex-talk@mailman.uni-konstanz.de