Can you point me to the server source where the socket read is being done to take the xml off the socket, please?
Sure. Here, the input stream is requested (and wrapped into a buffered input stream):
https://github.com/BaseXdb/basex/blob/master/basex-core/src/main/java/org/ba...
Hope this helps, Christian
-----Original Message----- From: Christian Grün [mailto:christian.gruen@gmail.com] Sent: 16 March 2015 11:27 To: Jonathan Clarke Cc: Lizzi, Vincent; BaseX Subject: Re: [basex-talk] Large Document Upload Performance
Hi Jonathan,
fBaseXClient.replace(fPathName, fXMLSource.getBytes());
It should probably look as follows?
fBaseXClient.replace(fPathName, fInputStream);
The following code snippet may be a bit faster...
import org.basex.io.in.ArrayInput; ... String xml = "<xml>...</xml>"; fBaseXClient.replace(fPathName, new ArrayInput(xml));
However, I assume that the bottleneck is not really BaseX, but rather the environment in which it is used.
Hope this helps, Christian
On Mon, Mar 16, 2015 at 11:55 AM, Jonathan Clarke jonathan.m.clarke@dsl.pipex.com wrote:
Hi Vincent,
Many thanks for this. As you may see, I've just posted a response to Christian, with source that's pretty similar to yours already, aside from the libraries themselves. My findings suggest that it's a socket buffer problem, but I'll wait to hear what Christian says before replacing my implementation with your suggestions below.
Jonathan.
-----Original Message----- From: Lizzi, Vincent [mailto:Vincent.Lizzi@taylorandfrancis.com] Sent: 13 March 2015 21:30 To: Jonathan Clarke; 'Christian Grün' Cc: 'BaseX' Subject: RE: [basex-talk] Large Document Upload Performance
Hi Jonathan,
A few months ago I needed to import XML documents that were over 50 Mb to BaseX. After a few attempts to speed the process I found that using Saxon's s9api and Xerces2 as shown below performed the best. The bottleneck appeared to not be in BaseX but actually in making the process of sending the data to BaseX efficient. Here is the Java code.
protected void loadXmlDocument(BaseXClient client, File xmlFile) throws Exception { DocumentBuilder docBuilder = sxProcessor.newDocumentBuilder(); SAXSource source = prepareSaxSource(xmlFile); XdmNode doc = docBuilder.build(source); try (ByteArrayOutputStream baos = new ByteArrayOutputStream()) { Serializer ser = new Serializer(baos); ser.setOutputProperty(Serializer.Property.ENCODING, "UTF-8"); ser.serializeNode(doc); try (InputStream is = new ByteArrayInputStream(baos.toByteArray())) { client.replace(path, is); } } }
protected SAXSource prepareSaxSource(File xmlFile) throws ParserConfigurationException, SAXException, MalformedURLException { SAXParserFactory saxFactory = SAXParserFactory.newInstance(); saxFactory.setNamespaceAware(true); saxFactory.setXIncludeAware(true); saxFactory.setValidating(false); SAXParser saxParser = saxFactory.newSAXParser(); XMLReader reader = saxParser.getXMLReader();
CatalogResolver resolver = new CatalogResolver(catalogManager); reader.setEntityResolver(resolver); SAXSource source = new SAXSource(); source.setInputSource(new InputSource(xmlFile.toURI().toURL().toExternalForm())); source.setXMLReader(reader); return source;
}
I tried to make the above code self-contained by cobbling together relevant parts of the code, so this is untested but carries the idea.
I hope this helps.
Vincent
-----Original Message----- From: basex-talk-bounces@mailman.uni-konstanz.de [mailto:basex-talk-bounces@mailman.uni-konstanz.de] On Behalf Of Jonathan Clarke Sent: Friday, March 13, 2015 3:50 PM To: 'Christian Grün' Cc: 'BaseX' Subject: Re: [basex-talk] Large Document Upload Performance
Hi Christian,
I wouldn't be able to provide you with the data itself, but I'm not using a query, I'm simply using the BaseXClient that's provided on your site, it's just a connection open to the server, and then a call to the replace function. What's the typical time you would expect to see for a file of that size? Some research online has suggested that the delay is caused by the document indexing that gets underway at the point of update. In the meantime, I'll try and construct a file of similar size that's non-descript that we can use. Are there any other performance enhancing settings that you've advised others for a similar reports? Like the flushing, and I able to postpone or turn off the document indexing until I'm ready to call the function explicitly?
Jonathan.
-----Original Message----- From: Christian Grün [mailto:christian.gruen@gmail.com] Sent: 13 March 2015 19:12 To: Jonathan Clarke Cc: BaseX Subject: Re: [basex-talk] Large Document Upload Performance
Hi Jonathan,
I hope you can help me. I am using BaseX to underpin a complex distributed system, which also requires storage of xml document in soft real-time. At the moment, I’m getting storage times for a 4Mb XML file of about 500ms. Can you advise how I might be able to bring that down, please, by at least 75%?
We'll probably need more information on your queries etc.
I also tried to use AddCache, and that just crashed the latest production release of the server.
If you find out how we can reproduce this, your feedback is welcome.
Best, Christian