Hi,
I have no idea if it is used by others, but last march my most recent version of my RbaseX library was accepted by CRAN. To my knowledge there are no errors (all tests are passed). The only problem is that performance is bad ;-(. Uploading a file or downloading the result from a query can take several minutes. I can understand why it takes so long. According to the server protocol, the end of a stream is indicated by a terminating 0-byte. And to distinguish a 'regular' 0-byte in a binary stream from the stop-0, 0-bytes (and FF-bytes) are preceded by an extra FF-byte. The only way to deal in R with these FF-byte was to proces each character/byte separately and that takes much time.
I am trying to speed up everything by using C++ for all direct read/write operations. But I never have worked with C++ before. And neither do I understand exactly how streams are to be used. According to some posts on internet, when reading from a stream the first 8 bytes are used to pass information on the length of the stream.
My question is if this a standard way to pass information on that length? Or is it specific to C++ or Java?
Ben
Hi Ben,
The BaseX server protocol was specified without focus on any particular programming language.
If there is no way to speed up stream processing with R, you could have a look at the existing C++ client implementation [1]. Maybe you’ve done so already?
Cheers, Christian
[1] https://docs.basex.org/wiki/Clients
I am trying to speed up everything by using C++ for all direct read/write operations. But I never have worked with C++ before. And neither do I understand exactly how streams are to be used. According to some posts on internet, when reading from a stream the first 8 bytes are used to pass information on the length of the stream.
My question is if this a standard way to pass information on that length? Or is it specific to C++ or Java?
Ben
Hi Christian,
R provides a package which makes it rather easy to use C++ code. That is why I focused on C++. I first tried to understand the BaseXCPPAPI as provided by Jean-Marc Mercier but for a complete novice on C++, that code was way too complicated for an old man like me (I'm retiring TODAY ;-)). The C-code from Alexander Holupirek is much easier to understand and for the moment I'm trying to convert his code to a C++-variant that can be both used by my RbaseX and a new C++-client.
Usually, I first experiment in the GUI to learn which statements I have to use for a query. After that I use the same statements in my client. I noticed that often execution in the GUI only took miliseconds while execution in the client could take minutes (depending on the size of the input or the results). It is my guess that this boiles down to read/write operations on the connection.
In R, I have isolated all actions upon the stream into one R-class. And my first goal is to create a C++ class that is functionally equivalent. Hopefully that will improve performance. If I manage in that, I am halfways into building a C++ client that offers the same functionality as my RbaseX-client. Who knows If I'll succeed in that ;-) .
Cheers, Ben
Op 30-06-2020 om 11:56 schreef Christian Grün:
Hi Ben,
The BaseX server protocol was specified without focus on any particular programming language.
If there is no way to speed up stream processing with R, you could have a look at the existing C++ client implementation [1]. Maybe you’ve done so already?
Cheers, Christian
basex-talk@mailman.uni-konstanz.de