Thanks for the advice, I have done what you said and the results are attached (I removed a few things to protect the innocent, but they were all from the very bottom of the list). This is from a single query that ends up returning >22k documents, almost the entire current database.

It looks like the vast majority (>80%) of the time is spent on String interning called by JAXB components. I don't really know much about that, I will have to research to see if it can be improved or avoided (or if that even makes sense).

Thanks,
Zach


On Sat, Jun 7, 2014 at 4:17 PM, Christian Grün <christian.gruen@gmail.com> wrote:
Hi Zach,

you may be disappointed, but I would also have proposed the use of the
SAXSerializer, and nothing else. To get even better performance, I
would suggest to do some profiling (e.g. using -Xhprof:cpu=samples on
command line) and see which component is the current bottleneck. I
invite you to send the results to the list.

Thanks,
Christian


On Sat, Jun 7, 2014 at 5:41 AM, Zachary DeLuca <zadeluca@gmail.com> wrote:
> Okay I found another approach which is about 30% faster than what was
> written at the end of my last email:
>
> QueryProcessor proc = new QueryProcessor(query, context);
> Iter iter = proc.iter();
> proc.close();
>
> if (iter != null) {
>     ArrayList<Element1> elements = new ArrayList<>();
>     SAXSerializer ser = null;
>   SAXSource source = new SAXSource(ser, null);
>     for(Item item; (item = iter.next()) != null;) {
>         ser = new SAXSerializer(item);
>         source.setXMLReader(ser);
>         elements.add(((Element1) um.unmarshal(source)));
> }
>   // do something with the elements
> }
>
>
> Can it be done even better than this?
>
> Thanks,
> Zach
>
>
> On Fri, Jun 6, 2014 at 1:38 PM, Zachary DeLuca <zadeluca@gmail.com> wrote:
>>
>> Hello, I have a question about unmarshaling a query result into JAXB
>> object(s). My database collection contains documents that all contain the
>> same type of root element, let's call it Element1.
>>
>> Originally I was doing something like this (I know it's silly but it
>> worked just fine for small data sets):
>>
>>
>> String CLOSING_TAG = "</Element1>";
>> Context context = ...
>> Unmarshaller um = ...
>>
>> String predicate1 = ...
>> String predicate2 = ...
>> String query = "//Element1[" + predicate1 + " and " + predicate2 + "]";
>>
>> String result = new XQuery(query).execute(context);
>> if (result != null && !result.isEmpty()) {
>>     ArrayList<Element1> elements = new ArrayList<>();
>>   int index = -1;
>>     int beginIndex = 0;
>>     while ((index = result.indexOf(CLOSING_TAG, beginIndex)) != -1) {
>>         int endIndex = index + CLOSING_TAG.length();
>>         String element1 = result.substring(beginIndex, endIndex);
>>         beginIndex = endIndex + 1;
>>         elements.add(((Element1) um.unmarshal(new
>> ByteArrayInputStream(element1.getBytes()))));
>>     }
>>     // do something with the elements
>> }
>>
>>
>> But when I got into larger data sets (my DB collection is currently
>> approx. 1GB total size and has just over 20k documents) this started to fail
>> apparently due to trying to convert the entire result to a string at one
>> time (out of memory error, despite setting -Xmx16384m). So I dig through the
>> examples to find a better way and changed my code to this:
>>
>>
>> ByteArrayOutputStream baos = new ByteArrayOutputStream();
>> Iter iter = null;
>> Serializer ser = null;
>> QueryProcessor proc = new QueryProcessor(query, context);
>> iter = proc.iter();
>> ser = proc.getSerializer(baos);
>> proc.close();
>>
>> if (iter != null && ser != null) {
>>     ArrayList<Element1> elements = new ArrayList<>();
>>     for(Item item; (item = iter.next()) != null;) {
>> baos.reset();
>>         ser.serialize(item);
>>         elements.add(((Element1) um.unmarshal(new
>> ByteArrayInputStream(baos.toByteArray()))));
>>   }
>>   ser.close();
>>   // do something with the elements
>> }
>>
>>
>> This appears to work fine but I am just wondering if there is a
>> better/faster way to do it? At first glance, serializing to a
>> ByteArrayOutputSream only to then turn around and use a ByteArrayInputStream
>> to unmarshal with JAXB seems wasteful.
>>
>>
>> Thanks for taking the time to read,
>> Zach
>
>