Hello, I have a question about unmarshaling a query result into JAXB object(s). My database collection contains documents that all contain the same type of root element, let's call it Element1.
Originally I was doing something like this (I know it's silly but it worked just fine for small data sets):
String CLOSING_TAG = "</Element1>"; Context context = ... Unmarshaller um = ...
String predicate1 = ... String predicate2 = ... String query = "//Element1[" + predicate1 + " and " + predicate2 + "]";
String result = new XQuery(query).execute(context); if (result != null && !result.isEmpty()) { ArrayList<Element1> elements = new ArrayList<>(); int index = -1; int beginIndex = 0; while ((index = result.indexOf(CLOSING_TAG, beginIndex)) != -1) { int endIndex = index + CLOSING_TAG.length(); String element1 = result.substring(beginIndex, endIndex); beginIndex = endIndex + 1; elements.add(((Element1) um.unmarshal(new ByteArrayInputStream(element1.getBytes())))); } // do something with the elements }
But when I got into larger data sets (my DB collection is currently approx. 1GB total size and has just over 20k documents) this started to fail apparently due to trying to convert the entire result to a string at one time (out of memory error, despite setting -Xmx16384m). So I dig through the examples to find a better way and changed my code to this:
ByteArrayOutputStream baos = new ByteArrayOutputStream(); Iter iter = null; Serializer ser = null; QueryProcessor proc = new QueryProcessor(query, context); iter = proc.iter(); ser = proc.getSerializer(baos); proc.close();
if (iter != null && ser != null) { ArrayList<Element1> elements = new ArrayList<>(); for(Item item; (item = iter.next()) != null;) { baos.reset(); ser.serialize(item); elements.add(((Element1) um.unmarshal(new ByteArrayInputStream(baos.toByteArray())))); } ser.close(); // do something with the elements }
This appears to work fine but I am just wondering if there is a better/faster way to do it? At first glance, serializing to a ByteArrayOutputSream only to then turn around and use a ByteArrayInputStream to unmarshal with JAXB seems wasteful.
Thanks for taking the time to read, Zach
Okay I found another approach which is about 30% faster than what was written at the end of my last email:
QueryProcessor proc = new QueryProcessor(query, context); Iter iter = proc.iter(); proc.close();
if (iter != null) { ArrayList<Element1> elements = new ArrayList<>(); SAXSerializer ser = null; SAXSource source = new SAXSource(ser, null); for(Item item; (item = iter.next()) != null;) { ser = new SAXSerializer(item); source.setXMLReader(ser); elements.add(((Element1) um.unmarshal(source))); } // do something with the elements }
Can it be done even better than this?
Thanks, Zach
On Fri, Jun 6, 2014 at 1:38 PM, Zachary DeLuca zadeluca@gmail.com wrote:
Hello, I have a question about unmarshaling a query result into JAXB object(s). My database collection contains documents that all contain the same type of root element, let's call it Element1.
Originally I was doing something like this (I know it's silly but it worked just fine for small data sets):
String CLOSING_TAG = "</Element1>"; Context context = ... Unmarshaller um = ...
String predicate1 = ... String predicate2 = ... String query = "//Element1[" + predicate1 + " and " + predicate2 + "]";
String result = new XQuery(query).execute(context); if (result != null && !result.isEmpty()) { ArrayList<Element1> elements = new ArrayList<>(); int index = -1; int beginIndex = 0; while ((index = result.indexOf(CLOSING_TAG, beginIndex)) != -1) { int endIndex = index + CLOSING_TAG.length(); String element1 = result.substring(beginIndex, endIndex); beginIndex = endIndex + 1; elements.add(((Element1) um.unmarshal(new ByteArrayInputStream(element1.getBytes())))); } // do something with the elements }
But when I got into larger data sets (my DB collection is currently approx. 1GB total size and has just over 20k documents) this started to fail apparently due to trying to convert the entire result to a string at one time (out of memory error, despite setting -Xmx16384m). So I dig through the examples to find a better way and changed my code to this:
ByteArrayOutputStream baos = new ByteArrayOutputStream(); Iter iter = null; Serializer ser = null; QueryProcessor proc = new QueryProcessor(query, context); iter = proc.iter(); ser = proc.getSerializer(baos); proc.close();
if (iter != null && ser != null) { ArrayList<Element1> elements = new ArrayList<>(); for(Item item; (item = iter.next()) != null;) { baos.reset(); ser.serialize(item); elements.add(((Element1) um.unmarshal(new ByteArrayInputStream(baos.toByteArray())))); } ser.close(); // do something with the elements }
This appears to work fine but I am just wondering if there is a better/faster way to do it? At first glance, serializing to a ByteArrayOutputSream only to then turn around and use a ByteArrayInputStream to unmarshal with JAXB seems wasteful.
Thanks for taking the time to read, Zach
Hi Zach,
you may be disappointed, but I would also have proposed the use of the SAXSerializer, and nothing else. To get even better performance, I would suggest to do some profiling (e.g. using -Xhprof:cpu=samples on command line) and see which component is the current bottleneck. I invite you to send the results to the list.
Thanks, Christian
On Sat, Jun 7, 2014 at 5:41 AM, Zachary DeLuca zadeluca@gmail.com wrote:
Okay I found another approach which is about 30% faster than what was written at the end of my last email:
QueryProcessor proc = new QueryProcessor(query, context); Iter iter = proc.iter(); proc.close();
if (iter != null) { ArrayList<Element1> elements = new ArrayList<>(); SAXSerializer ser = null; SAXSource source = new SAXSource(ser, null); for(Item item; (item = iter.next()) != null;) { ser = new SAXSerializer(item); source.setXMLReader(ser); elements.add(((Element1) um.unmarshal(source))); } // do something with the elements }
Can it be done even better than this?
Thanks, Zach
On Fri, Jun 6, 2014 at 1:38 PM, Zachary DeLuca zadeluca@gmail.com wrote:
Hello, I have a question about unmarshaling a query result into JAXB object(s). My database collection contains documents that all contain the same type of root element, let's call it Element1.
Originally I was doing something like this (I know it's silly but it worked just fine for small data sets):
String CLOSING_TAG = "</Element1>"; Context context = ... Unmarshaller um = ...
String predicate1 = ... String predicate2 = ... String query = "//Element1[" + predicate1 + " and " + predicate2 + "]";
String result = new XQuery(query).execute(context); if (result != null && !result.isEmpty()) { ArrayList<Element1> elements = new ArrayList<>(); int index = -1; int beginIndex = 0; while ((index = result.indexOf(CLOSING_TAG, beginIndex)) != -1) { int endIndex = index + CLOSING_TAG.length(); String element1 = result.substring(beginIndex, endIndex); beginIndex = endIndex + 1; elements.add(((Element1) um.unmarshal(new ByteArrayInputStream(element1.getBytes())))); } // do something with the elements }
But when I got into larger data sets (my DB collection is currently approx. 1GB total size and has just over 20k documents) this started to fail apparently due to trying to convert the entire result to a string at one time (out of memory error, despite setting -Xmx16384m). So I dig through the examples to find a better way and changed my code to this:
ByteArrayOutputStream baos = new ByteArrayOutputStream(); Iter iter = null; Serializer ser = null; QueryProcessor proc = new QueryProcessor(query, context); iter = proc.iter(); ser = proc.getSerializer(baos); proc.close();
if (iter != null && ser != null) { ArrayList<Element1> elements = new ArrayList<>(); for(Item item; (item = iter.next()) != null;) { baos.reset(); ser.serialize(item); elements.add(((Element1) um.unmarshal(new ByteArrayInputStream(baos.toByteArray())))); } ser.close(); // do something with the elements }
This appears to work fine but I am just wondering if there is a better/faster way to do it? At first glance, serializing to a ByteArrayOutputSream only to then turn around and use a ByteArrayInputStream to unmarshal with JAXB seems wasteful.
Thanks for taking the time to read, Zach
Thanks for the advice, I have done what you said and the results are attached (I removed a few things to protect the innocent, but they were all from the very bottom of the list). This is from a single query that ends up returning >22k documents, almost the entire current database.
It looks like the vast majority (>80%) of the time is spent on String interning called by JAXB components. I don't really know much about that, I will have to research to see if it can be improved or avoided (or if that even makes sense).
Thanks, Zach
On Sat, Jun 7, 2014 at 4:17 PM, Christian Grün christian.gruen@gmail.com wrote:
Hi Zach,
you may be disappointed, but I would also have proposed the use of the SAXSerializer, and nothing else. To get even better performance, I would suggest to do some profiling (e.g. using -Xhprof:cpu=samples on command line) and see which component is the current bottleneck. I invite you to send the results to the list.
Thanks, Christian
On Sat, Jun 7, 2014 at 5:41 AM, Zachary DeLuca zadeluca@gmail.com wrote:
Okay I found another approach which is about 30% faster than what was written at the end of my last email:
QueryProcessor proc = new QueryProcessor(query, context); Iter iter = proc.iter(); proc.close();
if (iter != null) { ArrayList<Element1> elements = new ArrayList<>(); SAXSerializer ser = null; SAXSource source = new SAXSource(ser, null); for(Item item; (item = iter.next()) != null;) { ser = new SAXSerializer(item); source.setXMLReader(ser); elements.add(((Element1) um.unmarshal(source))); } // do something with the elements }
Can it be done even better than this?
Thanks, Zach
On Fri, Jun 6, 2014 at 1:38 PM, Zachary DeLuca zadeluca@gmail.com
wrote:
Hello, I have a question about unmarshaling a query result into JAXB object(s). My database collection contains documents that all contain
the
same type of root element, let's call it Element1.
Originally I was doing something like this (I know it's silly but it worked just fine for small data sets):
String CLOSING_TAG = "</Element1>"; Context context = ... Unmarshaller um = ...
String predicate1 = ... String predicate2 = ... String query = "//Element1[" + predicate1 + " and " + predicate2 + "]";
String result = new XQuery(query).execute(context); if (result != null && !result.isEmpty()) { ArrayList<Element1> elements = new ArrayList<>(); int index = -1; int beginIndex = 0; while ((index = result.indexOf(CLOSING_TAG, beginIndex)) != -1) { int endIndex = index + CLOSING_TAG.length(); String element1 = result.substring(beginIndex, endIndex); beginIndex = endIndex + 1; elements.add(((Element1) um.unmarshal(new ByteArrayInputStream(element1.getBytes())))); } // do something with the elements }
But when I got into larger data sets (my DB collection is currently approx. 1GB total size and has just over 20k documents) this started to
fail
apparently due to trying to convert the entire result to a string at one time (out of memory error, despite setting -Xmx16384m). So I dig
through the
examples to find a better way and changed my code to this:
ByteArrayOutputStream baos = new ByteArrayOutputStream(); Iter iter = null; Serializer ser = null; QueryProcessor proc = new QueryProcessor(query, context); iter = proc.iter(); ser = proc.getSerializer(baos); proc.close();
if (iter != null && ser != null) { ArrayList<Element1> elements = new ArrayList<>(); for(Item item; (item = iter.next()) != null;) { baos.reset(); ser.serialize(item); elements.add(((Element1) um.unmarshal(new ByteArrayInputStream(baos.toByteArray())))); } ser.close(); // do something with the elements }
This appears to work fine but I am just wondering if there is a better/faster way to do it? At first glance, serializing to a ByteArrayOutputSream only to then turn around and use a
ByteArrayInputStream
to unmarshal with JAXB seems wasteful.
Thanks for taking the time to read, Zach
Hi Zach,
thanks for the attached profiling result. Good for us; this confirms my assumption that most of the time is spent by JAXB.. If you find out how to further speed up the conversion, you are welcome to report to the list.
Thanks, Christian
On Sun, Jun 8, 2014 at 4:32 AM, Zachary DeLuca zadeluca@gmail.com wrote:
Thanks for the advice, I have done what you said and the results are attached (I removed a few things to protect the innocent, but they were all from the very bottom of the list). This is from a single query that ends up returning >22k documents, almost the entire current database.
It looks like the vast majority (>80%) of the time is spent on String interning called by JAXB components. I don't really know much about that, I will have to research to see if it can be improved or avoided (or if that even makes sense).
Thanks, Zach
On Sat, Jun 7, 2014 at 4:17 PM, Christian Grün christian.gruen@gmail.com wrote:
Hi Zach,
you may be disappointed, but I would also have proposed the use of the SAXSerializer, and nothing else. To get even better performance, I would suggest to do some profiling (e.g. using -Xhprof:cpu=samples on command line) and see which component is the current bottleneck. I invite you to send the results to the list.
Thanks, Christian
On Sat, Jun 7, 2014 at 5:41 AM, Zachary DeLuca zadeluca@gmail.com wrote:
Okay I found another approach which is about 30% faster than what was written at the end of my last email:
QueryProcessor proc = new QueryProcessor(query, context); Iter iter = proc.iter(); proc.close();
if (iter != null) { ArrayList<Element1> elements = new ArrayList<>(); SAXSerializer ser = null; SAXSource source = new SAXSource(ser, null); for(Item item; (item = iter.next()) != null;) { ser = new SAXSerializer(item); source.setXMLReader(ser); elements.add(((Element1) um.unmarshal(source))); } // do something with the elements }
Can it be done even better than this?
Thanks, Zach
On Fri, Jun 6, 2014 at 1:38 PM, Zachary DeLuca zadeluca@gmail.com wrote:
Hello, I have a question about unmarshaling a query result into JAXB object(s). My database collection contains documents that all contain the same type of root element, let's call it Element1.
Originally I was doing something like this (I know it's silly but it worked just fine for small data sets):
String CLOSING_TAG = "</Element1>"; Context context = ... Unmarshaller um = ...
String predicate1 = ... String predicate2 = ... String query = "//Element1[" + predicate1 + " and " + predicate2 + "]";
String result = new XQuery(query).execute(context); if (result != null && !result.isEmpty()) { ArrayList<Element1> elements = new ArrayList<>(); int index = -1; int beginIndex = 0; while ((index = result.indexOf(CLOSING_TAG, beginIndex)) != -1) { int endIndex = index + CLOSING_TAG.length(); String element1 = result.substring(beginIndex, endIndex); beginIndex = endIndex + 1; elements.add(((Element1) um.unmarshal(new ByteArrayInputStream(element1.getBytes())))); } // do something with the elements }
But when I got into larger data sets (my DB collection is currently approx. 1GB total size and has just over 20k documents) this started to fail apparently due to trying to convert the entire result to a string at one time (out of memory error, despite setting -Xmx16384m). So I dig through the examples to find a better way and changed my code to this:
ByteArrayOutputStream baos = new ByteArrayOutputStream(); Iter iter = null; Serializer ser = null; QueryProcessor proc = new QueryProcessor(query, context); iter = proc.iter(); ser = proc.getSerializer(baos); proc.close();
if (iter != null && ser != null) { ArrayList<Element1> elements = new ArrayList<>(); for(Item item; (item = iter.next()) != null;) { baos.reset(); ser.serialize(item); elements.add(((Element1) um.unmarshal(new ByteArrayInputStream(baos.toByteArray())))); } ser.close(); // do something with the elements }
This appears to work fine but I am just wondering if there is a better/faster way to do it? At first glance, serializing to a ByteArrayOutputSream only to then turn around and use a ByteArrayInputStream to unmarshal with JAXB seems wasteful.
Thanks for taking the time to read, Zach
basex-talk@mailman.uni-konstanz.de