Hi,
A deadlock occurs in the following situation: a first client program opens an iterative query. For each iteration, this program does some processing and sends another reading request to BaseX (using another BaseX session). All works fine until a second client program (or another thread) sends an updating command to BaseX (like optimize for instance). This locks BaseX server. To unlock it, you have to kill the first program.
I have read BaseX server code and found the reason for this behavior in the class org.basex.core.Lock: - with the iterative query, there is always at least one reader alive (readers=1). - when the updating query is received, it is put in the queue (index 0) and remains in it as long as there is a reading query running (that is to say, as long as the iterative reading query is running). - then a second reading request is received, it is put in the queue (index 1 as there is already the updating query in the queue). As it is only the second item of the queue, it remains in the queue as long as the first item in the queue (the updating query) has not been processed (BaseX processes the requests in the order of arrival, FIFO queue). But this first item can not be processed because there is the iterative reading query running. All queries are thus locked.
Some may say that we should not send another query while we are in the loop of an iterative query but in our context of many sites being developed by several developers, it is possible that a developer codes this and we do not want BaseX to be locked in this case (whatever it is a mistake of the developer or not).
I have found a solution to this problem by modifying the org.basex.core.Lock class. You will find my code hereafter. I do not use a queue anymore and i use a static mutex (called queueMutex) to synchronize all pending queries (threads). The "drawback" of this solution is that the queries are not processed anymore in the order of arrival but randomly.
What do you think of this solution ? Do you plan to update BaseX locking mechanism ?
I'm using BaseX 6.7.1 but I have seen that Lock.java has not been changed in BaseX 6.7.2.
Here is my code :
package org.basex.core;
import java.util.Date; //import java.util.LinkedList; import java.util.Random;
import org.basex.util.Util;
/** * Management of executing read/write processes. * Supports multiple readers, limited by {@link MainProp#PARALLEL}, * and single writers (readers/writer lock). * * @author BaseX Team 2005-11, BSD License * @author Christian Gruen */ final class Lock { /** Queue for all waiting processes. */ // private final LinkedList<Object> queue = new LinkedList<Object>(); /** Mutex object. */ private final Object mutex = new Object(); /** Database context. */ private final Context ctx; /** Static mutex used to synchronize all pending queries. **/ private final static Object queueMutex = new Object();
/** Number of active readers. */ private int readers; /** Writer flag. */ private boolean writer;
/** * Default constructor. * @param c context */ Lock(final Context c) { ctx = c; }
/** * Modifications before executing a command. * @param w writing flag */ void lock(final boolean w) { synchronized(mutex) { int code = new Random(new Date().getTime()).nextInt(); // final Object o = new Object(); // queue.add(o);
try { while(true) { synchronized(queueMutex) { // if(o == queue.get(0) && !writer) { if(!writer) { if(w) { if(readers == 0) { writer = true; break; } } else if(readers < Math.max(ctx.mprop.num(MainProp.PARALLEL), 1)) { ++readers; break; } } } mutex.wait(); } } catch(final InterruptedException ex) { Util.stack(ex); }
// queue.remove(0); } }
/** * Modifications after executing a command. * @param w writing flag */ synchronized void unlock(final boolean w) { synchronized(mutex) { if(w) { writer = false; } else { --readers; } mutex.notifyAll(); } } }
Dear Laurent,
thanks for the elaborate description of the problem and the proposed bug fix. Yes, I agree that this issue should not be ignored. I've just added your observation in our bug tracker:
https://github.com/BaseXdb/basex/issues/173
I'm not sure, however, how you're planning to utilize the randomly generated number, as the "code" variable isn't used in the subsequent lines of code. Did I miss something? Next, a random selection of incoming requests could mix up subsequent write operations, which would cause unexpected results in other use cases of BaseX (but maybe I missed your point here, regarding the notion of random execution).
I'll have some more thoughts on potential alternatives (which is always better than just questioning others' proposals). If you have some more ideas how to resolve the issue, you are welcome to tell us.
Christian ____________________
On Mon, Aug 29, 2011 at 9:50 AM, Laurent Chevalier l.chevalier@cyim.com wrote:
Hi,
A deadlock occurs in the following situation: a first client program opens an iterative query. For each iteration, this program does some processing and sends another reading request to BaseX (using another BaseX session). All works fine until a second client program (or another thread) sends an updating command to BaseX (like optimize for instance). This locks BaseX server. To unlock it, you have to kill the first program.
I have read BaseX server code and found the reason for this behavior in the class org.basex.core.Lock: - with the iterative query, there is always at least one reader alive (readers=1). - when the updating query is received, it is put in the queue (index 0) and remains in it as long as there is a reading query running (that is to say, as long as the iterative reading query is running). - then a second reading request is received, it is put in the queue (index 1 as there is already the updating query in the queue). As it is only the second item of the queue, it remains in the queue as long as the first item in the queue (the updating query) has not been processed (BaseX processes the requests in the order of arrival, FIFO queue). But this first item can not be processed because there is the iterative reading query running. All queries are thus locked.
Some may say that we should not send another query while we are in the loop of an iterative query but in our context of many sites being developed by several developers, it is possible that a developer codes this and we do not want BaseX to be locked in this case (whatever it is a mistake of the developer or not).
I have found a solution to this problem by modifying the org.basex.core.Lock class. You will find my code hereafter. I do not use a queue anymore and i use a static mutex (called queueMutex) to synchronize all pending queries (threads). The "drawback" of this solution is that the queries are not processed anymore in the order of arrival but randomly.
What do you think of this solution ? Do you plan to update BaseX locking mechanism ?
I'm using BaseX 6.7.1 but I have seen that Lock.java has not been changed in BaseX 6.7.2.
Here is my code :
package org.basex.core;
import java.util.Date; //import java.util.LinkedList; import java.util.Random;
import org.basex.util.Util;
/** * Management of executing read/write processes. * Supports multiple readers, limited by {@link MainProp#PARALLEL}, * and single writers (readers/writer lock). * * @author BaseX Team 2005-11, BSD License * @author Christian Gruen */ final class Lock { /** Queue for all waiting processes. */ // private final LinkedList<Object> queue = new LinkedList<Object>(); /** Mutex object. */ private final Object mutex = new Object(); /** Database context. */ private final Context ctx; /** Static mutex used to synchronize all pending queries. **/ private final static Object queueMutex = new Object();
/** Number of active readers. */ private int readers; /** Writer flag. */ private boolean writer;
/** * Default constructor. * @param c context */ Lock(final Context c) { ctx = c; }
/** * Modifications before executing a command. * @param w writing flag */ void lock(final boolean w) { synchronized(mutex) { int code = new Random(new Date().getTime()).nextInt(); // final Object o = new Object(); // queue.add(o);
try { while(true) { synchronized(queueMutex) { // if(o == queue.get(0) && !writer) { if(!writer) { if(w) { if(readers == 0) { writer = true; break; } } else if(readers < Math.max(ctx.mprop.num(MainProp.PARALLEL), 1)) { ++readers; break; } } } mutex.wait(); } } catch(final InterruptedException ex) { Util.stack(ex); }
// queue.remove(0); } }
/** * Modifications after executing a command. * @param w writing flag */ synchronized void unlock(final boolean w) { synchronized(mutex) { if(w) { writer = false; } else { --readers; } mutex.notifyAll(); } } }
Hi Christian,
Thanks for your quick answer. Sorry, I've forgotten to remove the "code" variable that I've used for logging.
Regarding the random execution, I understand now that it could be a problem. I will try to find another solution keeping the order but fixing the problem of deadlock.
Laurent
-----Message d'origine----- De : Christian Grün [mailto:christian.gruen@gmail.com] Envoyé : mardi 30 août 2011 00:24 À : Laurent Chevalier Cc : basex-talk@mailman.uni-konstanz.de Objet : Re: [basex-talk] BaseX server deadlock
Dear Laurent,
thanks for the elaborate description of the problem and the proposed bug fix. Yes, I agree that this issue should not be ignored. I've just added your observation in our bug tracker:
https://github.com/BaseXdb/basex/issues/173
I'm not sure, however, how you're planning to utilize the randomly generated number, as the "code" variable isn't used in the subsequent lines of code. Did I miss something? Next, a random selection of incoming requests could mix up subsequent write operations, which would cause unexpected results in other use cases of BaseX (but maybe I missed your point here, regarding the notion of random execution).
I'll have some more thoughts on potential alternatives (which is always better than just questioning others' proposals). If you have some more ideas how to resolve the issue, you are welcome to tell us.
Christian ____________________
On Mon, Aug 29, 2011 at 9:50 AM, Laurent Chevalier l.chevalier@cyim.com wrote:
Hi,
A deadlock occurs in the following situation: a first client program
opens an iterative query. For each iteration, this program does some processing and sends another reading request to BaseX (using another BaseX session). All works fine until a second client program (or another thread) sends an updating command to BaseX (like optimize for instance). This locks BaseX server. To unlock it, you have to kill the first program.
I have read BaseX server code and found the reason for this behavior
in the class org.basex.core.Lock:
- with the iterative query, there is always at least one reader
alive (readers=1).
- when the updating query is received, it is put in the queue (index
- and remains in it as long as there is a reading query running (that
is to say, as long as the iterative reading query is running).
- then a second reading request is received, it is put in the queue
(index 1 as there is already the updating query in the queue). As it is only the second item of the queue, it remains in the queue as long as the first item in the queue (the updating query) has not been processed (BaseX processes the requests in the order of arrival, FIFO queue). But this first item can not be processed because there is the iterative reading query running. All queries are thus locked.
Some may say that we should not send another query while we are in
the loop of an iterative query but in our context of many sites being developed by several developers, it is possible that a developer codes this and we do not want BaseX to be locked in this case (whatever it is a mistake of the developer or not).
I have found a solution to this problem by modifying the
org.basex.core.Lock class. You will find my code hereafter. I do not use a queue anymore and i use a static mutex (called queueMutex) to synchronize all pending queries (threads). The "drawback" of this solution is that the queries are not processed anymore in the order of arrival but randomly.
What do you think of this solution ? Do you plan to update BaseX
locking mechanism ?
I'm using BaseX 6.7.1 but I have seen that Lock.java has not been
changed in BaseX 6.7.2.
Here is my code :
package org.basex.core;
import java.util.Date; //import java.util.LinkedList; import java.util.Random;
import org.basex.util.Util;
/** * Management of executing read/write processes. * Supports multiple readers, limited by {@link MainProp#PARALLEL}, * and single writers (readers/writer lock). * * @author BaseX Team 2005-11, BSD License * @author Christian Gruen */ final class Lock { /** Queue for all waiting processes. */ // private final LinkedList<Object> queue = new
LinkedList<Object>();
/** Mutex object. */ private final Object mutex = new Object(); /** Database context. */ private final Context ctx; /** Static mutex used to synchronize all pending queries. **/ private final static Object queueMutex = new Object();
/** Number of active readers. */ private int readers; /** Writer flag. */ private boolean writer;
/** * Default constructor. * @param c context */ Lock(final Context c) { ctx = c; }
/** * Modifications before executing a command. * @param w writing flag */ void lock(final boolean w) { synchronized(mutex) { int code = new Random(new Date().getTime()).nextInt(); // final Object o = new Object(); // queue.add(o);
try { while(true) { synchronized(queueMutex) { // if(o == queue.get(0) && !writer) { if(!writer) { if(w) { if(readers == 0) { writer = true; break; } } else if(readers <
Math.max(ctx.mprop.num(MainProp.PARALLEL), 1)) {
++readers; break; } } } mutex.wait(); } } catch(final InterruptedException ex) { Util.stack(ex); }
// queue.remove(0); } }
/** * Modifications after executing a command. * @param w writing flag */ synchronized void unlock(final boolean w) { synchronized(mutex) { if(w) { writer = false; } else { --readers; } mutex.notifyAll(); } } }
I have found a new solution that preserves the order of arrival of the requests and avoids the deadlock. You will find hereafter the Lock class and the new LockedCommand class used for the queue. Actually, instead of taking only the first item of the queue, I take the first item that can be processed.
Regards, Laurent
package org.basex.core;
import java.util.LinkedList; import org.basex.util.Util;
/** * Management of executing read/write processes. * Supports multiple readers, limited by {@link MainProp#PARALLEL}, * and single writers (readers/writer lock). * * @author BaseX Team 2005-11, BSD License * @author Christian Gruen */ final class Lock { /** Queue for all waiting processes. */ private final LinkedList<LockedCommand> queue = new LinkedList<LockedCommand>(); /** Mutex object. */ private final Object mutex = new Object(); /** Database context. */ private final Context ctx; /** Mutext used to synchronize pending commands (waiting processes). */ private final Object queueMutex = new Object();
/** Number of active readers. */ private int readers; /** Writer flag. */ private boolean writer;
/** * Default constructor. * @param c context */ Lock(final Context c) { ctx = c; }
/** * Next pending command from the queue that can be processed. * @return Command or null if they are all locked. */ LockedCommand next() { synchronized(queueMutex) { if ( writer ) return null; // Currently writing. No other command allowed. for(int i=0; i<queue.size(); i++) { LockedCommand cmd = queue.get(i); if(cmd.writer) { // Writing can be done only if no reading command is running. if(readers == 0) { writer = true; return cmd; } } else if(readers < Math.max(ctx.mprop.num(MainProp.PARALLEL), 1)) { // Additionnal readers are still allowed. ++readers; return cmd; } } return null; } }
/** * Modifications before executing a command. * @param w writing flag */ void lock(final boolean w) { synchronized(mutex) { final LockedCommand cmd = new LockedCommand(w); queue.add(cmd);
try { while(true) { synchronized(queueMutex) { LockedCommand nextCmd = next(); if ( nextCmd!=null && cmd.equals(nextCmd) ) break; } mutex.wait(); } } catch(final InterruptedException ex) { Util.stack(ex); }
queue.remove(cmd); } }
/** * Modifications after executing a command. * @param w writing flag */ synchronized void unlock(final boolean w) { synchronized(mutex) { if(w) { writer = false; } else { --readers; } mutex.notifyAll(); } } }
package org.basex.core;
/** * Locked command stored in Lock class queue. * @author Laurent Chevalier */ final class LockedCommand { /** Writer flag. */ public boolean writer;
/** * Default constructor. * @param w Writer flag. Tells whether it is an updating command (true) or not (false). */ LockedCommand(final boolean w) { this.writer = w; } }
-----Message d'origine----- De : Christian Grün [mailto:christian.gruen@gmail.com] Envoyé : mardi 30 août 2011 00:24 À : Laurent Chevalier Cc : basex-talk@mailman.uni-konstanz.de Objet : Re: [basex-talk] BaseX server deadlock
Dear Laurent,
thanks for the elaborate description of the problem and the proposed bug fix. Yes, I agree that this issue should not be ignored. I've just added your observation in our bug tracker:
https://github.com/BaseXdb/basex/issues/173
I'm not sure, however, how you're planning to utilize the randomly generated number, as the "code" variable isn't used in the subsequent lines of code. Did I miss something? Next, a random selection of incoming requests could mix up subsequent write operations, which would cause unexpected results in other use cases of BaseX (but maybe I missed your point here, regarding the notion of random execution).
I'll have some more thoughts on potential alternatives (which is always better than just questioning others' proposals). If you have some more ideas how to resolve the issue, you are welcome to tell us.
Christian ____________________
On Mon, Aug 29, 2011 at 9:50 AM, Laurent Chevalier l.chevalier@cyim.com wrote:
Hi,
A deadlock occurs in the following situation: a first client program
opens an iterative query. For each iteration, this program does some processing and sends another reading request to BaseX (using another BaseX session). All works fine until a second client program (or another thread) sends an updating command to BaseX (like optimize for instance). This locks BaseX server. To unlock it, you have to kill the first program.
I have read BaseX server code and found the reason for this behavior
in the class org.basex.core.Lock:
- with the iterative query, there is always at least one reader
alive (readers=1).
- when the updating query is received, it is put in the queue (index
- and remains in it as long as there is a reading query running (that
is to say, as long as the iterative reading query is running).
- then a second reading request is received, it is put in the queue
(index 1 as there is already the updating query in the queue). As it is only the second item of the queue, it remains in the queue as long as the first item in the queue (the updating query) has not been processed (BaseX processes the requests in the order of arrival, FIFO queue). But this first item can not be processed because there is the iterative reading query running. All queries are thus locked.
Some may say that we should not send another query while we are in
the loop of an iterative query but in our context of many sites being developed by several developers, it is possible that a developer codes this and we do not want BaseX to be locked in this case (whatever it is a mistake of the developer or not).
I have found a solution to this problem by modifying the
org.basex.core.Lock class. You will find my code hereafter. I do not use a queue anymore and i use a static mutex (called queueMutex) to synchronize all pending queries (threads). The "drawback" of this solution is that the queries are not processed anymore in the order of arrival but randomly.
What do you think of this solution ? Do you plan to update BaseX
locking mechanism ?
I'm using BaseX 6.7.1 but I have seen that Lock.java has not been
changed in BaseX 6.7.2.
Here is my code :
package org.basex.core;
import java.util.Date; //import java.util.LinkedList; import java.util.Random;
import org.basex.util.Util;
/** * Management of executing read/write processes. * Supports multiple readers, limited by {@link MainProp#PARALLEL}, * and single writers (readers/writer lock). * * @author BaseX Team 2005-11, BSD License * @author Christian Gruen */ final class Lock { /** Queue for all waiting processes. */ // private final LinkedList<Object> queue = new
LinkedList<Object>();
/** Mutex object. */ private final Object mutex = new Object(); /** Database context. */ private final Context ctx; /** Static mutex used to synchronize all pending queries. **/ private final static Object queueMutex = new Object();
/** Number of active readers. */ private int readers; /** Writer flag. */ private boolean writer;
/** * Default constructor. * @param c context */ Lock(final Context c) { ctx = c; }
/** * Modifications before executing a command. * @param w writing flag */ void lock(final boolean w) { synchronized(mutex) { int code = new Random(new Date().getTime()).nextInt(); // final Object o = new Object(); // queue.add(o);
try { while(true) { synchronized(queueMutex) { // if(o == queue.get(0) && !writer) { if(!writer) { if(w) { if(readers == 0) { writer = true; break; } } else if(readers <
Math.max(ctx.mprop.num(MainProp.PARALLEL), 1)) {
++readers; break; } } } mutex.wait(); } } catch(final InterruptedException ex) { Util.stack(ex); }
// queue.remove(0); } }
/** * Modifications after executing a command. * @param w writing flag */ synchronized void unlock(final boolean w) { synchronized(mutex) { if(w) { writer = false; } else { --readers; } mutex.notifyAll(); } } }
Hi Laurent,
thanks again for offering another solution for the Lock class. Just an interim note: Before I'll commit some changes, I'll have to get sure that the update doesn't cause any other side effects, as the discussed code is very sensitive. It's a somewhat irritating fact that this Java class has undergone many iterations (considering the fact that it's pretty compact and light-weight), but the positive effect of that is that the existing code is very stable.
Be sure I'll keep you updated, Christian __________________________
On Tue, Aug 30, 2011 at 11:14 AM, Laurent Chevalier l.chevalier@cyim.com wrote:
I have found a new solution that preserves the order of arrival of the requests and avoids the deadlock. You will find hereafter the Lock class and the new LockedCommand class used for the queue. Actually, instead of taking only the first item of the queue, I take the first item that can be processed.
Regards, Laurent
package org.basex.core;
import java.util.LinkedList; import org.basex.util.Util;
/** * Management of executing read/write processes. * Supports multiple readers, limited by {@link MainProp#PARALLEL}, * and single writers (readers/writer lock). * * @author BaseX Team 2005-11, BSD License * @author Christian Gruen */ final class Lock { /** Queue for all waiting processes. */ private final LinkedList<LockedCommand> queue = new LinkedList<LockedCommand>(); /** Mutex object. */ private final Object mutex = new Object(); /** Database context. */ private final Context ctx; /** Mutext used to synchronize pending commands (waiting processes). */ private final Object queueMutex = new Object();
/** Number of active readers. */ private int readers; /** Writer flag. */ private boolean writer;
/** * Default constructor. * @param c context */ Lock(final Context c) { ctx = c; }
/** * Next pending command from the queue that can be processed. * @return Command or null if they are all locked. */ LockedCommand next() { synchronized(queueMutex) { if ( writer ) return null; // Currently writing. No other command allowed. for(int i=0; i<queue.size(); i++) { LockedCommand cmd = queue.get(i); if(cmd.writer) { // Writing can be done only if no reading command is running. if(readers == 0) { writer = true; return cmd; } } else if(readers < Math.max(ctx.mprop.num(MainProp.PARALLEL), 1)) { // Additionnal readers are still allowed. ++readers; return cmd; } } return null; } }
/** * Modifications before executing a command. * @param w writing flag */ void lock(final boolean w) { synchronized(mutex) { final LockedCommand cmd = new LockedCommand(w); queue.add(cmd);
try { while(true) { synchronized(queueMutex) { LockedCommand nextCmd = next(); if ( nextCmd!=null && cmd.equals(nextCmd) ) break; } mutex.wait(); } } catch(final InterruptedException ex) { Util.stack(ex); }
queue.remove(cmd); } }
/** * Modifications after executing a command. * @param w writing flag */ synchronized void unlock(final boolean w) { synchronized(mutex) { if(w) { writer = false; } else { --readers; } mutex.notifyAll(); } } }
package org.basex.core;
/** * Locked command stored in Lock class queue. * @author Laurent Chevalier */ final class LockedCommand { /** Writer flag. */ public boolean writer;
/** * Default constructor. * @param w Writer flag. Tells whether it is an updating command (true) or not (false). */ LockedCommand(final boolean w) { this.writer = w; } }
-----Message d'origine----- De : Christian Grün [mailto:christian.gruen@gmail.com] Envoyé : mardi 30 août 2011 00:24 À : Laurent Chevalier Cc : basex-talk@mailman.uni-konstanz.de Objet : Re: [basex-talk] BaseX server deadlock
Dear Laurent,
thanks for the elaborate description of the problem and the proposed bug fix. Yes, I agree that this issue should not be ignored. I've just added your observation in our bug tracker:
https://github.com/BaseXdb/basex/issues/173
I'm not sure, however, how you're planning to utilize the randomly generated number, as the "code" variable isn't used in the subsequent lines of code. Did I miss something? Next, a random selection of incoming requests could mix up subsequent write operations, which would cause unexpected results in other use cases of BaseX (but maybe I missed your point here, regarding the notion of random execution).
I'll have some more thoughts on potential alternatives (which is always better than just questioning others' proposals). If you have some more ideas how to resolve the issue, you are welcome to tell us.
Christian ____________________
On Mon, Aug 29, 2011 at 9:50 AM, Laurent Chevalier l.chevalier@cyim.com wrote:
Hi,
A deadlock occurs in the following situation: a first client program
opens an iterative query. For each iteration, this program does some processing and sends another reading request to BaseX (using another BaseX session). All works fine until a second client program (or another thread) sends an updating command to BaseX (like optimize for instance). This locks BaseX server. To unlock it, you have to kill the first program.
I have read BaseX server code and found the reason for this behavior
in the class org.basex.core.Lock:
- with the iterative query, there is always at least one reader
alive (readers=1).
- when the updating query is received, it is put in the queue (index
- and remains in it as long as there is a reading query running (that
is to say, as long as the iterative reading query is running).
- then a second reading request is received, it is put in the queue
(index 1 as there is already the updating query in the queue). As it is only the second item of the queue, it remains in the queue as long as the first item in the queue (the updating query) has not been processed (BaseX processes the requests in the order of arrival, FIFO queue). But this first item can not be processed because there is the iterative reading query running. All queries are thus locked.
Some may say that we should not send another query while we are in
the loop of an iterative query but in our context of many sites being developed by several developers, it is possible that a developer codes this and we do not want BaseX to be locked in this case (whatever it is a mistake of the developer or not).
I have found a solution to this problem by modifying the
org.basex.core.Lock class. You will find my code hereafter. I do not use a queue anymore and i use a static mutex (called queueMutex) to synchronize all pending queries (threads). The "drawback" of this solution is that the queries are not processed anymore in the order of arrival but randomly.
What do you think of this solution ? Do you plan to update BaseX
locking mechanism ?
I'm using BaseX 6.7.1 but I have seen that Lock.java has not been
changed in BaseX 6.7.2.
Here is my code :
package org.basex.core;
import java.util.Date; //import java.util.LinkedList; import java.util.Random;
import org.basex.util.Util;
/** * Management of executing read/write processes. * Supports multiple readers, limited by {@link MainProp#PARALLEL}, * and single writers (readers/writer lock). * * @author BaseX Team 2005-11, BSD License * @author Christian Gruen */ final class Lock { /** Queue for all waiting processes. */ // private final LinkedList<Object> queue = new
LinkedList<Object>();
/** Mutex object. */ private final Object mutex = new Object(); /** Database context. */ private final Context ctx; /** Static mutex used to synchronize all pending queries. **/ private final static Object queueMutex = new Object();
/** Number of active readers. */ private int readers; /** Writer flag. */ private boolean writer;
/** * Default constructor. * @param c context */ Lock(final Context c) { ctx = c; }
/** * Modifications before executing a command. * @param w writing flag */ void lock(final boolean w) { synchronized(mutex) { int code = new Random(new Date().getTime()).nextInt(); // final Object o = new Object(); // queue.add(o);
try { while(true) { synchronized(queueMutex) { // if(o == queue.get(0) && !writer) { if(!writer) { if(w) { if(readers == 0) { writer = true; break; } } else if(readers <
Math.max(ctx.mprop.num(MainProp.PARALLEL), 1)) {
++readers; break; } } } mutex.wait(); } } catch(final InterruptedException ex) { Util.stack(ex); }
// queue.remove(0); } }
/** * Modifications after executing a command. * @param w writing flag */ synchronized void unlock(final boolean w) { synchronized(mutex) { if(w) { writer = false; } else { --readers; } mutex.notifyAll(); } } }
Hi Christian,
I understand that you have to test exhaustively this sensitive code. I can also understand that you may think to other solutions. No problem. BTW, reading my code this morning, I have found that the synchronization was not complete : I have removed the queueMutex and made a better usage of the existing mutex. I send you the fixed code for information.
Regards, Laurent
package org.basex.core;
import java.util.LinkedList; import org.basex.util.Util;
/** * Management of executing read/write processes. * Supports multiple readers, limited by {@link MainProp#PARALLEL}, * and single writers (readers/writer lock). * * @author BaseX Team 2005-11, BSD License * @author Christian Gruen */ final class Lock { /** Queue for all waiting processes. */ private final LinkedList<LockedCommand> queue = new LinkedList<LockedCommand>(); /** Mutex object. */ private final Object mutex = new Object(); /** Database context. */ private final Context ctx;
/** Number of active readers. */ private int readers; /** Writer flag. */ private boolean writer;
/** * Default constructor. * @param c context */ Lock(final Context c) { ctx = c; }
/** * Next pending command from the queue that can be processed. * @return Command or null if they are all locked. */ private LockedCommand next() { synchronized(mutex) { if ( writer ) return null; // Currently writing. No other command allowed. for(int i=0; i<queue.size(); i++) { LockedCommand cmd = queue.get(i); if(cmd.writer) { // Writing can be done only if no reading command is running. if(readers == 0) { writer = true; return cmd; } } else if(readers < Math.max(ctx.mprop.num(MainProp.PARALLEL), 1)) { // Additionnal readers are still allowed. ++readers; return cmd; } } return null; } }
/** * Modifications before executing a command. * @param w writing flag */ void lock(final boolean w) { synchronized(mutex) { final LockedCommand cmd = new LockedCommand(w); queue.add(cmd);
try { while(true) { LockedCommand nextCmd = next(); if ( nextCmd!=null && cmd.equals(nextCmd) ) break; mutex.wait(); } } catch(final InterruptedException ex) { Util.stack(ex); }
synchronized(mutex) { queue.remove(cmd); } } }
/** * Modifications after executing a command. * @param w writing flag */ synchronized void unlock(final boolean w) { synchronized(mutex) { if(w) { writer = false; } else { --readers; } mutex.notifyAll(); } } }
package org.basex.core;
/** * Locked command stored in Lock class queue. * @author Laurent Chevalier */ final class LockedCommand { /** Writer flag. */ public boolean writer;
/** * Default constructor. * @param w Writer flag. Tells whether it is an updating command (true) or not (false). */ LockedCommand(final boolean w) { this.writer = w; } }
-----Message d'origine----- De : Christian Grün [mailto:christian.gruen@gmail.com] Envoyé : mercredi 31 août 2011 00:17 À : Laurent Chevalier Cc : basex-talk@mailman.uni-konstanz.de Objet : Re: [basex-talk] BaseX server deadlock
Hi Laurent,
thanks again for offering another solution for the Lock class. Just an interim note: Before I'll commit some changes, I'll have to get sure that the update doesn't cause any other side effects, as the discussed code is very sensitive. It's a somewhat irritating fact that this Java class has undergone many iterations (considering the fact that it's pretty compact and light-weight), but the positive effect of that is that the existing code is very stable.
Be sure I'll keep you updated, Christian __________________________
On Tue, Aug 30, 2011 at 11:14 AM, Laurent Chevalier l.chevalier@cyim.com wrote:
I have found a new solution that preserves the order of arrival of
the requests and avoids the deadlock. You will find hereafter the Lock class and the new LockedCommand class used for the queue. Actually, instead of taking only the first item of the queue, I take the first item that can be processed.
Regards, Laurent
package org.basex.core;
import java.util.LinkedList; import org.basex.util.Util;
/**
- Management of executing read/write processes.
- Supports multiple readers, limited by {@link MainProp#PARALLEL},
- and single writers (readers/writer lock).
- @author BaseX Team 2005-11, BSD License
- @author Christian Gruen
*/ final class Lock { /** Queue for all waiting processes. */ private final LinkedList<LockedCommand> queue = new
LinkedList<LockedCommand>();
/** Mutex object. */ private final Object mutex = new Object(); /** Database context. */ private final Context ctx; /** Mutext used to synchronize pending commands (waiting processes).
*/
private final Object queueMutex = new Object();
/** Number of active readers. */ private int readers; /** Writer flag. */ private boolean writer;
/**
- Default constructor.
- @param c context
*/ Lock(final Context c) { ctx = c; }
/**
- Next pending command from the queue that can be processed.
- @return Command or null if they are all locked.
*/ LockedCommand next() { synchronized(queueMutex) { if ( writer ) return null; // Currently writing. No other
command allowed.
for(int i=0; i<queue.size(); i++) { LockedCommand cmd = queue.get(i); if(cmd.writer) { // Writing can be done only if no reading command is
running.
if(readers == 0) { writer = true; return cmd; } } else if(readers <
Math.max(ctx.mprop.num(MainProp.PARALLEL), 1)) {
// Additionnal readers are still allowed. ++readers; return cmd; } } return null;
} }
/**
- Modifications before executing a command.
- @param w writing flag
*/ void lock(final boolean w) { synchronized(mutex) { final LockedCommand cmd = new LockedCommand(w); queue.add(cmd);
try { while(true) { synchronized(queueMutex) { LockedCommand nextCmd = next(); if ( nextCmd!=null && cmd.equals(nextCmd) ) break; } mutex.wait(); } } catch(final InterruptedException ex) { Util.stack(ex); } queue.remove(cmd);
} }
/**
- Modifications after executing a command.
- @param w writing flag
*/ synchronized void unlock(final boolean w) { synchronized(mutex) { if(w) { writer = false; } else { --readers; } mutex.notifyAll(); } } }
package org.basex.core;
/**
- Locked command stored in Lock class queue.
- @author Laurent Chevalier
*/ final class LockedCommand { /** Writer flag. */ public boolean writer;
/**
- Default constructor.
- @param w Writer flag. Tells whether it is an updating command
(true) or not (false).
*/ LockedCommand(final boolean w) { this.writer = w; } }
-----Message d'origine----- De : Christian Grün [mailto:christian.gruen@gmail.com] Envoyé : mardi 30 août 2011 00:24 À : Laurent Chevalier Cc : basex-talk@mailman.uni-konstanz.de Objet : Re: [basex-talk] BaseX server deadlock
Dear Laurent,
thanks for the elaborate description of the problem and the proposed bug fix. Yes, I agree that this issue should not be ignored. I've
just
added your observation in our bug tracker:
https://github.com/BaseXdb/basex/issues/173
I'm not sure, however, how you're planning to utilize the randomly generated number, as the "code" variable isn't used in the
subsequent
lines of code. Did I miss something? Next, a random selection of incoming requests could mix up subsequent write operations, which would cause unexpected results in other use cases of BaseX (but
maybe
I missed your point here, regarding the notion of random execution).
I'll have some more thoughts on potential alternatives (which is always better than just questioning others' proposals). If you have some more ideas how to resolve the issue, you are welcome to tell
us.
Christian ____________________
On Mon, Aug 29, 2011 at 9:50 AM, Laurent Chevalier l.chevalier@cyim.com wrote:
Hi,
A deadlock occurs in the following situation: a first client
program
opens an iterative query. For each iteration, this program does some processing and sends another reading request to BaseX (using another BaseX session). All works fine until a second client program (or another thread) sends an updating command to BaseX (like optimize
for
instance). This locks BaseX server. To unlock it, you have to kill
the
first program.
I have read BaseX server code and found the reason for this
behavior
in the class org.basex.core.Lock:
- with the iterative query, there is always at least one reader
alive (readers=1).
- when the updating query is received, it is put in the queue
(index
- and remains in it as long as there is a reading query running
(that
is to say, as long as the iterative reading query is running).
- then a second reading request is received, it is put in the
queue
(index 1 as there is already the updating query in the queue). As it
is
only the second item of the queue, it remains in the queue as long
as
the first item in the queue (the updating query) has not been
processed
(BaseX processes the requests in the order of arrival, FIFO queue).
But
this first item can not be processed because there is the iterative reading query running. All queries are thus locked.
Some may say that we should not send another query while we are in
the loop of an iterative query but in our context of many sites
being
developed by several developers, it is possible that a developer
codes
this and we do not want BaseX to be locked in this case (whatever it
is
a mistake of the developer or not).
I have found a solution to this problem by modifying the
org.basex.core.Lock class. You will find my code hereafter. I do not use a queue anymore and i use a static mutex (called queueMutex) to synchronize all pending queries (threads). The "drawback" of this solution is that the queries are not processed anymore in the order
of
arrival but randomly.
What do you think of this solution ? Do you plan to update BaseX
locking mechanism ?
I'm using BaseX 6.7.1 but I have seen that Lock.java has not been
changed in BaseX 6.7.2.
Here is my code :
package org.basex.core;
import java.util.Date; //import java.util.LinkedList; import java.util.Random;
import org.basex.util.Util;
/**
- Management of executing read/write processes.
- Supports multiple readers, limited by {@link
MainProp#PARALLEL},
- and single writers (readers/writer lock).
- @author BaseX Team 2005-11, BSD License
- @author Christian Gruen
*/ final class Lock { /** Queue for all waiting processes. */ // private final LinkedList<Object> queue = new
LinkedList<Object>();
/** Mutex object. */ private final Object mutex = new Object(); /** Database context. */ private final Context ctx; /** Static mutex used to synchronize all pending queries. **/ private final static Object queueMutex = new Object();
/** Number of active readers. */ private int readers; /** Writer flag. */ private boolean writer;
/**
- Default constructor.
- @param c context
*/ Lock(final Context c) { ctx = c; }
/**
- Modifications before executing a command.
- @param w writing flag
*/ void lock(final boolean w) { synchronized(mutex) { int code = new Random(new Date().getTime()).nextInt(); // final Object o = new Object(); // queue.add(o);
try { while(true) { synchronized(queueMutex) {
// if(o == queue.get(0) && !writer) { if(!writer) { if(w) { if(readers == 0) { writer = true; break; } } else if(readers <
Math.max(ctx.mprop.num(MainProp.PARALLEL), 1)) {
++readers; break; } } } mutex.wait(); } } catch(final InterruptedException ex) { Util.stack(ex); }
// queue.remove(0); } }
/**
- Modifications after executing a command.
- @param w writing flag
*/ synchronized void unlock(final boolean w) { synchronized(mutex) { if(w) { writer = false; } else { --readers; } mutex.notifyAll(); } } }
Dear Laurent,
a little update: as we're accompanying all critical changes in BaseX with new JUnit tests, it would be great if you could create a test class that allows us to reproduce the behavior you were initially describing.
Regarding the latest solution of yours, it's important to add that update operations may get subject to starvation if many read operations take place (see e.g. [1] for interested readers).
Thanks again, Christian
[1] http://en.wikipedia.org/wiki/Readers-writer_lock ___________________________
On Mon, Aug 29, 2011 at 9:50 AM, Laurent Chevalier l.chevalier@cyim.com wrote:
Hi,
A deadlock occurs in the following situation: a first client program opens an iterative query. For each iteration, this program does some processing and sends another reading request to BaseX (using another BaseX session). All works fine until a second client program (or another thread) sends an updating command to BaseX (like optimize for instance). This locks BaseX server. To unlock it, you have to kill the first program.
I have read BaseX server code and found the reason for this behavior in the class org.basex.core.Lock: - with the iterative query, there is always at least one reader alive (readers=1). - when the updating query is received, it is put in the queue (index 0) and remains in it as long as there is a reading query running (that is to say, as long as the iterative reading query is running). - then a second reading request is received, it is put in the queue (index 1 as there is already the updating query in the queue). As it is only the second item of the queue, it remains in the queue as long as the first item in the queue (the updating query) has not been processed (BaseX processes the requests in the order of arrival, FIFO queue). But this first item can not be processed because there is the iterative reading query running. All queries are thus locked.
Some may say that we should not send another query while we are in the loop of an iterative query but in our context of many sites being developed by several developers, it is possible that a developer codes this and we do not want BaseX to be locked in this case (whatever it is a mistake of the developer or not).
I have found a solution to this problem by modifying the org.basex.core.Lock class. You will find my code hereafter. I do not use a queue anymore and i use a static mutex (called queueMutex) to synchronize all pending queries (threads). The "drawback" of this solution is that the queries are not processed anymore in the order of arrival but randomly.
What do you think of this solution ? Do you plan to update BaseX locking mechanism ?
I'm using BaseX 6.7.1 but I have seen that Lock.java has not been changed in BaseX 6.7.2.
Here is my code :
package org.basex.core;
import java.util.Date; //import java.util.LinkedList; import java.util.Random;
import org.basex.util.Util;
/** * Management of executing read/write processes. * Supports multiple readers, limited by {@link MainProp#PARALLEL}, * and single writers (readers/writer lock). * * @author BaseX Team 2005-11, BSD License * @author Christian Gruen */ final class Lock { /** Queue for all waiting processes. */ // private final LinkedList<Object> queue = new LinkedList<Object>(); /** Mutex object. */ private final Object mutex = new Object(); /** Database context. */ private final Context ctx; /** Static mutex used to synchronize all pending queries. **/ private final static Object queueMutex = new Object();
/** Number of active readers. */ private int readers; /** Writer flag. */ private boolean writer;
/** * Default constructor. * @param c context */ Lock(final Context c) { ctx = c; }
/** * Modifications before executing a command. * @param w writing flag */ void lock(final boolean w) { synchronized(mutex) { int code = new Random(new Date().getTime()).nextInt(); // final Object o = new Object(); // queue.add(o);
try { while(true) { synchronized(queueMutex) { // if(o == queue.get(0) && !writer) { if(!writer) { if(w) { if(readers == 0) { writer = true; break; } } else if(readers < Math.max(ctx.mprop.num(MainProp.PARALLEL), 1)) { ++readers; break; } } } mutex.wait(); } } catch(final InterruptedException ex) { Util.stack(ex); }
// queue.remove(0); } }
/** * Modifications after executing a command. * @param w writing flag */ synchronized void unlock(final boolean w) { synchronized(mutex) { if(w) { writer = false; } else { --readers; } mutex.notifyAll(); } } } _______________________________________________ BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk
Hi Laurent,
while I didn't manage to reproduce the deadlock that you described a while ago, I came across some other potential scenarios in which our locking implementation could cause deadlocks. The simplest example looks as follows:
- Client1 creates an iterator and requests the first result - Client2 sends an updating command - Client1 requests no further results, thus blocking Client2
Instead of modifying the delicate Lock algorithm itself, we decided to go one step further and rewrite our client architecture. From now on, the clients are responsible for iterating through their query items, and an iterator request to the server triggers the complete execution and transmission of a query. This has several advantages:
- The server will only perform atomic operations and is not dependent on the clients' behavior anymore - The iterative evaluation of a query will only trigger a single socket request, leading to a considerable speedup if network latency is high
The obvious drawback is that intermediate results need to be cached. The most straightforward alternative to bypass this problem is to send several queries to the server, or restrict the number of iterated results in the XQuery expression if not all requested results are actually needed.
We have added another Wiki page to better document our server protocol [1]. Next, I have closed the GitHub issue related to your locking problem, as it should now be fixed as well.
Hope this helps, Christian
[1] http://docs.basex.org/wiki/Server_Protocol [2] https://github.com/BaseXdb/basex/issues/173
On Mon, Aug 29, 2011 at 9:50 AM, Laurent Chevalier l.chevalier@cyim.com wrote:
Hi,
A deadlock occurs in the following situation: a first client program opens an iterative query. For each iteration, this program does some processing and sends another reading request to BaseX (using another BaseX session). All works fine until a second client program (or another thread) sends an updating command to BaseX (like optimize for instance). This locks BaseX server. To unlock it, you have to kill the first program.
I have read BaseX server code and found the reason for this behavior in the class org.basex.core.Lock: - with the iterative query, there is always at least one reader alive (readers=1). - when the updating query is received, it is put in the queue (index 0) and remains in it as long as there is a reading query running (that is to say, as long as the iterative reading query is running). - then a second reading request is received, it is put in the queue (index 1 as there is already the updating query in the queue). As it is only the second item of the queue, it remains in the queue as long as the first item in the queue (the updating query) has not been processed (BaseX processes the requests in the order of arrival, FIFO queue). But this first item can not be processed because there is the iterative reading query running. All queries are thus locked.
Some may say that we should not send another query while we are in the loop of an iterative query but in our context of many sites being developed by several developers, it is possible that a developer codes this and we do not want BaseX to be locked in this case (whatever it is a mistake of the developer or not).
I have found a solution to this problem by modifying the org.basex.core.Lock class. You will find my code hereafter. I do not use a queue anymore and i use a static mutex (called queueMutex) to synchronize all pending queries (threads). The "drawback" of this solution is that the queries are not processed anymore in the order of arrival but randomly.
What do you think of this solution ? Do you plan to update BaseX locking mechanism ?
I'm using BaseX 6.7.1 but I have seen that Lock.java has not been changed in BaseX 6.7.2.
Here is my code :
package org.basex.core;
import java.util.Date; //import java.util.LinkedList; import java.util.Random;
import org.basex.util.Util;
/** * Management of executing read/write processes. * Supports multiple readers, limited by {@link MainProp#PARALLEL}, * and single writers (readers/writer lock). * * @author BaseX Team 2005-11, BSD License * @author Christian Gruen */ final class Lock { /** Queue for all waiting processes. */ // private final LinkedList<Object> queue = new LinkedList<Object>(); /** Mutex object. */ private final Object mutex = new Object(); /** Database context. */ private final Context ctx; /** Static mutex used to synchronize all pending queries. **/ private final static Object queueMutex = new Object();
/** Number of active readers. */ private int readers; /** Writer flag. */ private boolean writer;
/** * Default constructor. * @param c context */ Lock(final Context c) { ctx = c; }
/** * Modifications before executing a command. * @param w writing flag */ void lock(final boolean w) { synchronized(mutex) { int code = new Random(new Date().getTime()).nextInt(); // final Object o = new Object(); // queue.add(o);
try { while(true) { synchronized(queueMutex) { // if(o == queue.get(0) && !writer) { if(!writer) { if(w) { if(readers == 0) { writer = true; break; } } else if(readers < Math.max(ctx.mprop.num(MainProp.PARALLEL), 1)) { ++readers; break; } } } mutex.wait(); } } catch(final InterruptedException ex) { Util.stack(ex); }
// queue.remove(0); } }
/** * Modifications after executing a command. * @param w writing flag */ synchronized void unlock(final boolean w) { synchronized(mutex) { if(w) { writer = false; } else { --readers; } mutex.notifyAll(); } } } _______________________________________________ BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk
Hi Christian,
I've not forgotten your request for a code to reproduce the deadlock I found. I intend to send it to you later but I have to finish some developments to show tangible results to my boss first... But, the scenario you provide is very close.
I've found a bug relative to synchronization in the last code of the Lock class I've sent you. I've fixed it and I send you the fix code hereafter.
I know very well your TCP protocol as I have developed a C# client implementation. I find very useful the iterative query as it avoids to cache all the results. This is particularly important for me and it is the main interest of the iterative query. Thus, I'm a bit afraid when you say that "The obvious drawback is that intermediate results need to be cached.". But, I'm not sure that I have well understood the changes you plan to do.
Regards, Laurent
package org.basex.core;
import java.util.LinkedList;
import org.basex.util.Util;
/** * Management of executing read/write processes. * Supports multiple readers, limited by {@link MainProp#PARALLEL}, * and single writers (readers/writer lock). * * @author BaseX Team 2005-11, BSD License * @author Christian Gruen */ final class Lock { /** Queue for all waiting processes. */ private final LinkedList<LockedCommand> queue = new LinkedList<LockedCommand>(); /** Mutex object. */ private final Object mutex = new Object(); /** Database context. */ private final Context ctx;
/** Number of active readers. */ private int readers; /** Writer flag. */ private boolean writer;
/** * Default constructor. * @param c context */ Lock(final Context c) { ctx = c; }
/** * Tells whether this command can be processed now or not. * @param o Pending command. * @return Permission to run command. */ private boolean ok(LockedCommand o) { synchronized(mutex) { if ( writer ) return false; for(int i=0; i<queue.size(); i++) { if(o == queue.get(i)) { if(o.writer) { if(readers == 0) { writer = true; return true; } } else if(readers < Math.max(ctx.mprop.num(MainProp.PARALLEL), 1)) { ++readers; return true; } } } return false; } }
/** * Modifications before executing a command. * @param w writing flag */ void lock(final boolean w) { synchronized(mutex) { final LockedCommand o = new LockedCommand(w); queue.add(o);
try { while(true) { if ( ok(o) ) break; mutex.wait(); } } catch(final InterruptedException ex) { Util.stack(ex); }
synchronized(mutex) { queue.remove(o); } } }
/** * Modifications after executing a command. * @param w writing flag */ synchronized void unlock(final boolean w) { synchronized(mutex) { if(w) { writer = false; } else { --readers; } mutex.notifyAll(); } } }
package org.basex.core;
/** * Locked command stored in Lock class queue. * @author Laurent Chevalier */ final class LockedCommand { /** Writer flag. */ public boolean writer;
/** * Default constructor. * @param w Writer flag. Tells whether it is an updating command (true) or not (false). */ LockedCommand(final boolean w) { this.writer = w; } }
-----Message d'origine----- De : Christian Grün [mailto:christian.gruen@gmail.com] Envoyé : lundi 19 septembre 2011 16:13 À : Laurent Chevalier Cc : basex-talk@mailman.uni-konstanz.de Objet : Re: [basex-talk] BaseX server deadlock
Hi Laurent,
while I didn't manage to reproduce the deadlock that you described a while ago, I came across some other potential scenarios in which our locking implementation could cause deadlocks. The simplest example looks as follows:
- Client1 creates an iterator and requests the first result
- Client2 sends an updating command
- Client1 requests no further results, thus blocking Client2
Instead of modifying the delicate Lock algorithm itself, we decided to go one step further and rewrite our client architecture. From now on, the clients are responsible for iterating through their query items, and an iterator request to the server triggers the complete execution and transmission of a query. This has several advantages:
- The server will only perform atomic operations and is not dependent
on the clients' behavior anymore
- The iterative evaluation of a query will only trigger a single
socket request, leading to a considerable speedup if network latency is high
The obvious drawback is that intermediate results need to be cached. The most straightforward alternative to bypass this problem is to send several queries to the server, or restrict the number of iterated results in the XQuery expression if not all requested results are actually needed.
We have added another Wiki page to better document our server protocol [1]. Next, I have closed the GitHub issue related to your locking problem, as it should now be fixed as well.
Hope this helps, Christian
[1] http://docs.basex.org/wiki/Server_Protocol [2] https://github.com/BaseXdb/basex/issues/173
On Mon, Aug 29, 2011 at 9:50 AM, Laurent Chevalier
l.chevalier@cyim.com wrote:
Hi,
A deadlock occurs in the following situation: a first client program
opens an iterative query. For each iteration, this program does some processing and sends another reading request to BaseX (using another BaseX session). All works fine until a second client program (or another thread) sends an updating command to BaseX (like optimize for instance). This locks BaseX server. To unlock it, you have to kill the first program.
I have read BaseX server code and found the reason for this behavior
in the class org.basex.core.Lock:
- with the iterative query, there is always at least one reader
alive (readers=1).
- when the updating query is received, it is put in the queue
(index 0) and remains in it as long as there is a reading query running (that is to say, as long as the iterative reading query is running).
- then a second reading request is received, it is put in the queue
(index 1 as there is already the updating query in the queue). As it is only the second item of the queue, it remains in the queue as long as the first item in the queue (the updating query) has not been processed (BaseX processes the requests in the order of arrival, FIFO queue). But this first item can not be processed because there is the iterative reading query running. All queries are thus locked.
Some may say that we should not send another query while we are in
the loop of an iterative query but in our context of many sites being developed by several developers, it is possible that a developer codes this and we do not want BaseX to be locked in this case (whatever it is a mistake of the developer or not).
I have found a solution to this problem by modifying the
org.basex.core.Lock class. You will find my code hereafter. I do not use a queue anymore and i use a static mutex (called queueMutex) to synchronize all pending queries (threads). The "drawback" of this solution is that the queries are not processed anymore in the order of arrival but randomly.
What do you think of this solution ? Do you plan to update BaseX
locking mechanism ?
I'm using BaseX 6.7.1 but I have seen that Lock.java has not been
changed in BaseX 6.7.2.
Here is my code :
package org.basex.core;
import java.util.Date; //import java.util.LinkedList; import java.util.Random;
import org.basex.util.Util;
/** * Management of executing read/write processes. * Supports multiple readers, limited by {@link MainProp#PARALLEL}, * and single writers (readers/writer lock). * * @author BaseX Team 2005-11, BSD License * @author Christian Gruen */ final class Lock { /** Queue for all waiting processes. */ // private final LinkedList<Object> queue = new
LinkedList<Object>();
/** Mutex object. */ private final Object mutex = new Object(); /** Database context. */ private final Context ctx; /** Static mutex used to synchronize all pending queries. **/ private final static Object queueMutex = new Object();
/** Number of active readers. */ private int readers; /** Writer flag. */ private boolean writer;
/** * Default constructor. * @param c context */ Lock(final Context c) { ctx = c; }
/** * Modifications before executing a command. * @param w writing flag */ void lock(final boolean w) { synchronized(mutex) { int code = new Random(new Date().getTime()).nextInt(); // final Object o = new Object(); // queue.add(o);
try { while(true) { synchronized(queueMutex) { // if(o == queue.get(0) && !writer) { if(!writer) { if(w) { if(readers == 0) { writer = true; break; } } else if(readers <
Math.max(ctx.mprop.num(MainProp.PARALLEL), 1)) {
++readers; break; } } } mutex.wait(); } } catch(final InterruptedException ex) { Util.stack(ex); }
// queue.remove(0); } }
/** * Modifications after executing a command. * @param w writing flag */ synchronized void unlock(final boolean w) { synchronized(mutex) { if(w) { writer = false; } else { --readers; } mutex.notifyAll(); } } } _______________________________________________ BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk
In fact, the changes have already done in version 6.8... That's a serious problem for me as we have to minimize the memory consumption of our web applications, that is already high.
-----Message d'origine----- De : Christian Grün [mailto:christian.gruen@gmail.com] Envoyé : lundi 19 septembre 2011 16:13 À : Laurent Chevalier Cc : basex-talk@mailman.uni-konstanz.de Objet : Re: [basex-talk] BaseX server deadlock
Hi Laurent,
while I didn't manage to reproduce the deadlock that you described a while ago, I came across some other potential scenarios in which our locking implementation could cause deadlocks. The simplest example looks as follows:
- Client1 creates an iterator and requests the first result
- Client2 sends an updating command
- Client1 requests no further results, thus blocking Client2
Instead of modifying the delicate Lock algorithm itself, we decided to go one step further and rewrite our client architecture. From now on, the clients are responsible for iterating through their query items, and an iterator request to the server triggers the complete execution and transmission of a query. This has several advantages:
- The server will only perform atomic operations and is not dependent
on the clients' behavior anymore
- The iterative evaluation of a query will only trigger a single
socket request, leading to a considerable speedup if network latency is high
The obvious drawback is that intermediate results need to be cached. The most straightforward alternative to bypass this problem is to send several queries to the server, or restrict the number of iterated results in the XQuery expression if not all requested results are actually needed.
We have added another Wiki page to better document our server protocol [1]. Next, I have closed the GitHub issue related to your locking problem, as it should now be fixed as well.
Hope this helps, Christian
[1] http://docs.basex.org/wiki/Server_Protocol [2] https://github.com/BaseXdb/basex/issues/173
On Mon, Aug 29, 2011 at 9:50 AM, Laurent Chevalier
l.chevalier@cyim.com wrote:
Hi,
A deadlock occurs in the following situation: a first client program
opens an iterative query. For each iteration, this program does some processing and sends another reading request to BaseX (using another BaseX session). All works fine until a second client program (or another thread) sends an updating command to BaseX (like optimize for instance). This locks BaseX server. To unlock it, you have to kill the first program.
I have read BaseX server code and found the reason for this behavior
in the class org.basex.core.Lock:
- with the iterative query, there is always at least one reader
alive (readers=1).
- when the updating query is received, it is put in the queue
(index 0) and remains in it as long as there is a reading query running (that is to say, as long as the iterative reading query is running).
- then a second reading request is received, it is put in the queue
(index 1 as there is already the updating query in the queue). As it is only the second item of the queue, it remains in the queue as long as the first item in the queue (the updating query) has not been processed (BaseX processes the requests in the order of arrival, FIFO queue). But this first item can not be processed because there is the iterative reading query running. All queries are thus locked.
Some may say that we should not send another query while we are in
the loop of an iterative query but in our context of many sites being developed by several developers, it is possible that a developer codes this and we do not want BaseX to be locked in this case (whatever it is a mistake of the developer or not).
I have found a solution to this problem by modifying the
org.basex.core.Lock class. You will find my code hereafter. I do not use a queue anymore and i use a static mutex (called queueMutex) to synchronize all pending queries (threads). The "drawback" of this solution is that the queries are not processed anymore in the order of arrival but randomly.
What do you think of this solution ? Do you plan to update BaseX
locking mechanism ?
I'm using BaseX 6.7.1 but I have seen that Lock.java has not been
changed in BaseX 6.7.2.
Here is my code :
package org.basex.core;
import java.util.Date; //import java.util.LinkedList; import java.util.Random;
import org.basex.util.Util;
/** * Management of executing read/write processes. * Supports multiple readers, limited by {@link MainProp#PARALLEL}, * and single writers (readers/writer lock). * * @author BaseX Team 2005-11, BSD License * @author Christian Gruen */ final class Lock { /** Queue for all waiting processes. */ // private final LinkedList<Object> queue = new
LinkedList<Object>();
/** Mutex object. */ private final Object mutex = new Object(); /** Database context. */ private final Context ctx; /** Static mutex used to synchronize all pending queries. **/ private final static Object queueMutex = new Object();
/** Number of active readers. */ private int readers; /** Writer flag. */ private boolean writer;
/** * Default constructor. * @param c context */ Lock(final Context c) { ctx = c; }
/** * Modifications before executing a command. * @param w writing flag */ void lock(final boolean w) { synchronized(mutex) { int code = new Random(new Date().getTime()).nextInt(); // final Object o = new Object(); // queue.add(o);
try { while(true) { synchronized(queueMutex) { // if(o == queue.get(0) && !writer) { if(!writer) { if(w) { if(readers == 0) { writer = true; break; } } else if(readers <
Math.max(ctx.mprop.num(MainProp.PARALLEL), 1)) {
++readers; break; } } } mutex.wait(); } } catch(final InterruptedException ex) { Util.stack(ex); }
// queue.remove(0); } }
/** * Modifications after executing a command. * @param w writing flag */ synchronized void unlock(final boolean w) { synchronized(mutex) { if(w) { writer = false; } else { --readers; } mutex.notifyAll(); } } } _______________________________________________ BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk
Hi Laurent,
yes, the code has already been rewritten to reflect the new Client API. As there were too many potential conflicts with the old solution, this would have been happened sooner or later anyway.
I'm sorry that you believe that the new solution might conflict with your existing architecture. I'd be interested in a few things to get a better feeling if this problem cannot be solved in a different way:
-- how much data do you iterate through (kb, mb or even more)? -- how expensive are your queries? -- note that the data will be cached by the client.. do you use the same machine for clients and servers? -- I'd be interested in your first test results to see if your worries get true.. As the data will be transferred much faster than before (because of the single request to get the data), the new architecture might turn out to be beneficial even in your case. Indeed I'm quite convinced, after all, that most users will profit from the changes.
Salutations, Christian
___________________________
On Mon, Sep 19, 2011 at 5:16 PM, Laurent Chevalier l.chevalier@cyim.com wrote:
In fact, the changes have already done in version 6.8... That's a serious problem for me as we have to minimize the memory consumption of our web applications, that is already high.
-----Message d'origine----- De : Christian Grün [mailto:christian.gruen@gmail.com] Envoyé : lundi 19 septembre 2011 16:13 À : Laurent Chevalier Cc : basex-talk@mailman.uni-konstanz.de Objet : Re: [basex-talk] BaseX server deadlock
Hi Laurent,
while I didn't manage to reproduce the deadlock that you described a while ago, I came across some other potential scenarios in which our locking implementation could cause deadlocks. The simplest example looks as follows:
- Client1 creates an iterator and requests the first result
- Client2 sends an updating command
- Client1 requests no further results, thus blocking Client2
Instead of modifying the delicate Lock algorithm itself, we decided to go one step further and rewrite our client architecture. From now on, the clients are responsible for iterating through their query items, and an iterator request to the server triggers the complete execution and transmission of a query. This has several advantages:
- The server will only perform atomic operations and is not dependent
on the clients' behavior anymore
- The iterative evaluation of a query will only trigger a single
socket request, leading to a considerable speedup if network latency is high
The obvious drawback is that intermediate results need to be cached. The most straightforward alternative to bypass this problem is to send several queries to the server, or restrict the number of iterated results in the XQuery expression if not all requested results are actually needed.
We have added another Wiki page to better document our server protocol [1]. Next, I have closed the GitHub issue related to your locking problem, as it should now be fixed as well.
Hope this helps, Christian
[1] http://docs.basex.org/wiki/Server_Protocol [2] https://github.com/BaseXdb/basex/issues/173
On Mon, Aug 29, 2011 at 9:50 AM, Laurent Chevalier
l.chevalier@cyim.com wrote:
Hi,
A deadlock occurs in the following situation: a first client program
opens an iterative query. For each iteration, this program does some processing and sends another reading request to BaseX (using another BaseX session). All works fine until a second client program (or another thread) sends an updating command to BaseX (like optimize for instance). This locks BaseX server. To unlock it, you have to kill the first program.
I have read BaseX server code and found the reason for this behavior
in the class org.basex.core.Lock:
- with the iterative query, there is always at least one reader
alive (readers=1).
- when the updating query is received, it is put in the queue
(index 0) and remains in it as long as there is a reading query running (that is to say, as long as the iterative reading query is running).
- then a second reading request is received, it is put in the queue
(index 1 as there is already the updating query in the queue). As it is only the second item of the queue, it remains in the queue as long as the first item in the queue (the updating query) has not been processed (BaseX processes the requests in the order of arrival, FIFO queue). But this first item can not be processed because there is the iterative reading query running. All queries are thus locked.
Some may say that we should not send another query while we are in
the loop of an iterative query but in our context of many sites being developed by several developers, it is possible that a developer codes this and we do not want BaseX to be locked in this case (whatever it is a mistake of the developer or not).
I have found a solution to this problem by modifying the
org.basex.core.Lock class. You will find my code hereafter. I do not use a queue anymore and i use a static mutex (called queueMutex) to synchronize all pending queries (threads). The "drawback" of this solution is that the queries are not processed anymore in the order of arrival but randomly.
What do you think of this solution ? Do you plan to update BaseX
locking mechanism ?
I'm using BaseX 6.7.1 but I have seen that Lock.java has not been
changed in BaseX 6.7.2.
Here is my code :
package org.basex.core;
import java.util.Date; //import java.util.LinkedList; import java.util.Random;
import org.basex.util.Util;
/** * Management of executing read/write processes. * Supports multiple readers, limited by {@link MainProp#PARALLEL}, * and single writers (readers/writer lock). * * @author BaseX Team 2005-11, BSD License * @author Christian Gruen */ final class Lock { /** Queue for all waiting processes. */ // private final LinkedList<Object> queue = new
LinkedList<Object>();
/** Mutex object. */ private final Object mutex = new Object(); /** Database context. */ private final Context ctx; /** Static mutex used to synchronize all pending queries. **/ private final static Object queueMutex = new Object();
/** Number of active readers. */ private int readers; /** Writer flag. */ private boolean writer;
/** * Default constructor. * @param c context */ Lock(final Context c) { ctx = c; }
/** * Modifications before executing a command. * @param w writing flag */ void lock(final boolean w) { synchronized(mutex) { int code = new Random(new Date().getTime()).nextInt(); // final Object o = new Object(); // queue.add(o);
try { while(true) { synchronized(queueMutex) { // if(o == queue.get(0) && !writer) { if(!writer) { if(w) { if(readers == 0) { writer = true; break; } } else if(readers <
Math.max(ctx.mprop.num(MainProp.PARALLEL), 1)) {
++readers; break; } } } mutex.wait(); } } catch(final InterruptedException ex) { Util.stack(ex); }
// queue.remove(0); } }
/** * Modifications after executing a command. * @param w writing flag */ synchronized void unlock(final boolean w) { synchronized(mutex) { if(w) { writer = false; } else { --readers; } mutex.notifyAll(); } } } _______________________________________________ BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk
Hi Christian,
We are building web applications in MS .NET. The data is made of a hierarchy of containers. A container is a directory containing an XML file and attachments (static resources like pictures, videos, etc.). These containers are "indexed" in a database for better performance. Currently, we are using an SQL Server database with XQuery/XPath and fulltext search. I'm currently working on a new implementation with BaseX. Our goal is to simplify xquery writing (today, we have to mix xquery in sql queries which is a bit complicated), and, if possible, we would like to get better performances.
The biggest database I have to deal with today counts around 25000 containers and continues to grow. It contains medical events data, html articles, news, agenda, members directory, etc. The size of the BaseX database directory with indexes is 160 Mo.
We want to keep the database synchronized with the file system hierarchy. For instance, if you manually add a container in the file system, you can launch a "re-indexing" process that will update the database automatically. For this process, I iterate over all containers in database, and check if it has to be updated or not. I'm using an iterative query for this. This query is very basic as it only returns a list of string (the identifiers of the containers) of 255 characters max. But, if you multiply 255 by the number of containers, it's starting to do much.
We have other usages of iterative queries. Another example : control access data is not stored in the database. So, if I want, for instance, the first 10 accessible containers in a given website section, I will loop over the containers published in this section in the database, and return results as soon as I have found 10 accessible containers, ignoring the remaining ones (provided by the BaseX query).
With SQL Server clients, we have an equivalent of BaseX iterative queries that avoid caching the whole request results. The memory consumption is a very serious issue for web applications in MS .NET or Java.
With JDBC drivers, the fetch size can be set (http://www.oracle.com/technetwork/database/enterprise-edition/memory.pdf). With PostgreSQL JDBC driver, cursors are used and multiple queries may be fired to get all results (http://abhirama.wordpress.com/2009/01/07/postgresql-jdbc-and-large-result-se...).
I think the iterative query without client caching (as it was implemented in BaseX until version 6.7) was a really great feature and addressed a very common memory consumption issue.
BTW, I'm exploring two ways of using BaseX : - either in client/server mode : the client (web site) communicates with the BaseX server through TCP, - or embedded : I have generated a .NET assembly (DLL) with IKVM.NET and thus I can embed BaseX in a .NET application.
The client/server mode would be used for portals. The embedded mode might be interested for single sites that do not share database with others.
I hope we'll find a good solution solving both the deadlock issue and the client memory consumption issue.
Regards, Laurent
-----Message d'origine----- De : Christian Grün [mailto:christian.gruen@gmail.com] Envoyé : lundi 19 septembre 2011 17:38 À : Laurent Chevalier Cc : basex-talk@mailman.uni-konstanz.de Objet : Re: [basex-talk] BaseX server deadlock
Hi Laurent,
yes, the code has already been rewritten to reflect the new Client API. As there were too many potential conflicts with the old solution, this would have been happened sooner or later anyway.
I'm sorry that you believe that the new solution might conflict with your existing architecture. I'd be interested in a few things to get a better feeling if this problem cannot be solved in a different way:
-- how much data do you iterate through (kb, mb or even more)? -- how expensive are your queries? -- note that the data will be cached by the client.. do you use the same machine for clients and servers? -- I'd be interested in your first test results to see if your worries get true.. As the data will be transferred much faster than before (because of the single request to get the data), the new architecture might turn out to be beneficial even in your case. Indeed I'm quite convinced, after all, that most users will profit from the changes.
Salutations, Christian
On Mon, Sep 19, 2011 at 5:16 PM, Laurent Chevalier l.chevalier@cyim.com wrote:
In fact, the changes have already done in version 6.8... That's a
serious problem for me as we have to minimize the memory consumption of our web applications, that is already high.
-----Message d'origine----- De : Christian Grün [mailto:christian.gruen@gmail.com] Envoyé : lundi 19 septembre 2011 16:13 À : Laurent Chevalier Cc : basex-talk@mailman.uni-konstanz.de Objet : Re: [basex-talk] BaseX server deadlock
Hi Laurent,
while I didn't manage to reproduce the deadlock that you described a while ago, I came across some other potential scenarios in which our locking implementation could cause deadlocks. The simplest example looks as follows:
- Client1 creates an iterator and requests the first result
- Client2 sends an updating command
- Client1 requests no further results, thus blocking Client2
Instead of modifying the delicate Lock algorithm itself, we decided
to
go one step further and rewrite our client architecture. From now
on,
the clients are responsible for iterating through their query items, and an iterator request to the server triggers the complete
execution
and transmission of a query. This has several advantages:
- The server will only perform atomic operations and is not
dependent
on the clients' behavior anymore
- The iterative evaluation of a query will only trigger a single
socket request, leading to a considerable speedup if network latency is high
The obvious drawback is that intermediate results need to be cached. The most straightforward alternative to bypass this problem is to
send
several queries to the server, or restrict the number of iterated results in the XQuery expression if not all requested results are actually needed.
We have added another Wiki page to better document our server
protocol
[1]. Next, I have closed the GitHub issue related to your locking problem, as it should now be fixed as well.
Hope this helps, Christian
[1] http://docs.basex.org/wiki/Server_Protocol [2] https://github.com/BaseXdb/basex/issues/173
On Mon, Aug 29, 2011 at 9:50 AM, Laurent Chevalier
l.chevalier@cyim.com wrote:
Hi,
A deadlock occurs in the following situation: a first client
program
opens an iterative query. For each iteration, this program does some processing and sends another reading request to BaseX (using another BaseX session). All works fine until a second client program (or another thread) sends an updating command to BaseX (like optimize
for
instance). This locks BaseX server. To unlock it, you have to kill
the
first program.
I have read BaseX server code and found the reason for this
behavior
in the class org.basex.core.Lock:
- with the iterative query, there is always at least one reader
alive (readers=1).
- when the updating query is received, it is put in the queue
(index 0) and remains in it as long as there is a reading query
running
(that is to say, as long as the iterative reading query is running).
- then a second reading request is received, it is put in the
queue
(index 1 as there is already the updating query in the queue). As it
is
only the second item of the queue, it remains in the queue as long
as
the first item in the queue (the updating query) has not been
processed
(BaseX processes the requests in the order of arrival, FIFO queue).
But
this first item can not be processed because there is the iterative reading query running. All queries are thus locked.
Some may say that we should not send another query while we are
in
the loop of an iterative query but in our context of many sites
being
developed by several developers, it is possible that a developer
codes
this and we do not want BaseX to be locked in this case (whatever it
is
a mistake of the developer or not).
I have found a solution to this problem by modifying the
org.basex.core.Lock class. You will find my code hereafter. I do not use a queue anymore and i use a static mutex (called queueMutex) to synchronize all pending queries (threads). The "drawback" of this solution is that the queries are not processed anymore in the order
of
arrival but randomly.
What do you think of this solution ? Do you plan to update BaseX
locking mechanism ?
I'm using BaseX 6.7.1 but I have seen that Lock.java has not been
changed in BaseX 6.7.2.
Here is my code :
package org.basex.core;
import java.util.Date; //import java.util.LinkedList; import java.util.Random;
import org.basex.util.Util;
/** * Management of executing read/write processes. * Supports multiple readers, limited by {@link
MainProp#PARALLEL},
* and single writers (readers/writer lock). * * @author BaseX Team 2005-11, BSD License * @author Christian Gruen */ final class Lock { /** Queue for all waiting processes. */ // private final LinkedList<Object> queue = new
LinkedList<Object>();
/** Mutex object. */ private final Object mutex = new Object(); /** Database context. */ private final Context ctx; /** Static mutex used to synchronize all pending queries. **/ private final static Object queueMutex = new Object();
/** Number of active readers. */ private int readers; /** Writer flag. */ private boolean writer;
/** * Default constructor. * @param c context */ Lock(final Context c) { ctx = c; }
/** * Modifications before executing a command. * @param w writing flag */ void lock(final boolean w) { synchronized(mutex) { int code = new Random(new Date().getTime()).nextInt(); // final Object o = new Object(); // queue.add(o);
try { while(true) { synchronized(queueMutex) { // if(o == queue.get(0) && !writer) { if(!writer) { if(w) { if(readers == 0) { writer = true; break; } } else if(readers <
Math.max(ctx.mprop.num(MainProp.PARALLEL), 1)) {
++readers; break; } } } mutex.wait(); } } catch(final InterruptedException ex) { Util.stack(ex); }
// queue.remove(0); } }
/** * Modifications after executing a command. * @param w writing flag */ synchronized void unlock(final boolean w) { synchronized(mutex) { if(w) { writer = false; } else { --readers; } mutex.notifyAll(); } } } _______________________________________________ BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk
Laurent,
thanks for the elaborate description of your system architecture. I'm still quite positive that our new architecture shouldn't seriously set you back, and I'd claim that our caching architecture is pretty memory efficient, so I would suggest to first do some tests with the new iterator to evaluate if caching is the main issue (sorry for persisting; maybe you've already spent enough time in this anyway).
If the client-side caching turns out to waste too many resources, you could easily adjust the light-weight client code to fit your needs. All you have to do is to directly interpret the incoming results, and skip the remaining results if you have finished querying (see [1] for the Java client). In both cases, querying should at least be much faster than before, and the client-based adjustments won't open many sophisticated issues that would have to be resolved server-side.
Hope this helps; more feedback is welcome, Christian
[1] https://github.com/BaseXdb/basex-api/blob/master/src/main/java/BaseXClient.j... ___________________________
On Mon, Sep 19, 2011 at 7:04 PM, Laurent Chevalier l.chevalier@cyim.com wrote:
Hi Christian,
We are building web applications in MS .NET. The data is made of a hierarchy of containers. A container is a directory containing an XML file and attachments (static resources like pictures, videos, etc.). These containers are "indexed" in a database for better performance. Currently, we are using an SQL Server database with XQuery/XPath and fulltext search. I'm currently working on a new implementation with BaseX. Our goal is to simplify xquery writing (today, we have to mix xquery in sql queries which is a bit complicated), and, if possible, we would like to get better performances.
The biggest database I have to deal with today counts around 25000 containers and continues to grow. It contains medical events data, html articles, news, agenda, members directory, etc. The size of the BaseX database directory with indexes is 160 Mo.
We want to keep the database synchronized with the file system hierarchy. For instance, if you manually add a container in the file system, you can launch a "re-indexing" process that will update the database automatically. For this process, I iterate over all containers in database, and check if it has to be updated or not. I'm using an iterative query for this. This query is very basic as it only returns a list of string (the identifiers of the containers) of 255 characters max. But, if you multiply 255 by the number of containers, it's starting to do much.
We have other usages of iterative queries. Another example : control access data is not stored in the database. So, if I want, for instance, the first 10 accessible containers in a given website section, I will loop over the containers published in this section in the database, and return results as soon as I have found 10 accessible containers, ignoring the remaining ones (provided by the BaseX query).
With SQL Server clients, we have an equivalent of BaseX iterative queries that avoid caching the whole request results. The memory consumption is a very serious issue for web applications in MS .NET or Java.
With JDBC drivers, the fetch size can be set (http://www.oracle.com/technetwork/database/enterprise-edition/memory.pdf). With PostgreSQL JDBC driver, cursors are used and multiple queries may be fired to get all results (http://abhirama.wordpress.com/2009/01/07/postgresql-jdbc-and-large-result-se...).
I think the iterative query without client caching (as it was implemented in BaseX until version 6.7) was a really great feature and addressed a very common memory consumption issue.
BTW, I'm exploring two ways of using BaseX : - either in client/server mode : the client (web site) communicates with the BaseX server through TCP, - or embedded : I have generated a .NET assembly (DLL) with IKVM.NET and thus I can embed BaseX in a .NET application.
The client/server mode would be used for portals. The embedded mode might be interested for single sites that do not share database with others.
I hope we'll find a good solution solving both the deadlock issue and the client memory consumption issue.
Regards, Laurent
-----Message d'origine----- De : Christian Grün [mailto:christian.gruen@gmail.com] Envoyé : lundi 19 septembre 2011 17:38 À : Laurent Chevalier Cc : basex-talk@mailman.uni-konstanz.de Objet : Re: [basex-talk] BaseX server deadlock
Hi Laurent,
yes, the code has already been rewritten to reflect the new Client API. As there were too many potential conflicts with the old solution, this would have been happened sooner or later anyway.
I'm sorry that you believe that the new solution might conflict with your existing architecture. I'd be interested in a few things to get a better feeling if this problem cannot be solved in a different way:
-- how much data do you iterate through (kb, mb or even more)? -- how expensive are your queries? -- note that the data will be cached by the client.. do you use the same machine for clients and servers? -- I'd be interested in your first test results to see if your worries get true.. As the data will be transferred much faster than before (because of the single request to get the data), the new architecture might turn out to be beneficial even in your case. Indeed I'm quite convinced, after all, that most users will profit from the changes.
Salutations, Christian
On Mon, Sep 19, 2011 at 5:16 PM, Laurent Chevalier l.chevalier@cyim.com wrote:
In fact, the changes have already done in version 6.8... That's a
serious problem for me as we have to minimize the memory consumption of our web applications, that is already high.
-----Message d'origine----- De : Christian Grün [mailto:christian.gruen@gmail.com] Envoyé : lundi 19 septembre 2011 16:13 À : Laurent Chevalier Cc : basex-talk@mailman.uni-konstanz.de Objet : Re: [basex-talk] BaseX server deadlock
Hi Laurent,
while I didn't manage to reproduce the deadlock that you described a while ago, I came across some other potential scenarios in which our locking implementation could cause deadlocks. The simplest example looks as follows:
- Client1 creates an iterator and requests the first result
- Client2 sends an updating command
- Client1 requests no further results, thus blocking Client2
Instead of modifying the delicate Lock algorithm itself, we decided
to
go one step further and rewrite our client architecture. From now
on,
the clients are responsible for iterating through their query items, and an iterator request to the server triggers the complete
execution
and transmission of a query. This has several advantages:
- The server will only perform atomic operations and is not
dependent
on the clients' behavior anymore
- The iterative evaluation of a query will only trigger a single
socket request, leading to a considerable speedup if network latency is high
The obvious drawback is that intermediate results need to be cached. The most straightforward alternative to bypass this problem is to
send
several queries to the server, or restrict the number of iterated results in the XQuery expression if not all requested results are actually needed.
We have added another Wiki page to better document our server
protocol
[1]. Next, I have closed the GitHub issue related to your locking problem, as it should now be fixed as well.
Hope this helps, Christian
[1] http://docs.basex.org/wiki/Server_Protocol [2] https://github.com/BaseXdb/basex/issues/173
On Mon, Aug 29, 2011 at 9:50 AM, Laurent Chevalier
l.chevalier@cyim.com wrote:
Hi,
A deadlock occurs in the following situation: a first client
program
opens an iterative query. For each iteration, this program does some processing and sends another reading request to BaseX (using another BaseX session). All works fine until a second client program (or another thread) sends an updating command to BaseX (like optimize
for
instance). This locks BaseX server. To unlock it, you have to kill
the
first program.
I have read BaseX server code and found the reason for this
behavior
in the class org.basex.core.Lock:
- with the iterative query, there is always at least one reader
alive (readers=1).
- when the updating query is received, it is put in the queue
(index 0) and remains in it as long as there is a reading query
running
(that is to say, as long as the iterative reading query is running).
- then a second reading request is received, it is put in the
queue
(index 1 as there is already the updating query in the queue). As it
is
only the second item of the queue, it remains in the queue as long
as
the first item in the queue (the updating query) has not been
processed
(BaseX processes the requests in the order of arrival, FIFO queue).
But
this first item can not be processed because there is the iterative reading query running. All queries are thus locked.
Some may say that we should not send another query while we are
in
the loop of an iterative query but in our context of many sites
being
developed by several developers, it is possible that a developer
codes
this and we do not want BaseX to be locked in this case (whatever it
is
a mistake of the developer or not).
I have found a solution to this problem by modifying the
org.basex.core.Lock class. You will find my code hereafter. I do not use a queue anymore and i use a static mutex (called queueMutex) to synchronize all pending queries (threads). The "drawback" of this solution is that the queries are not processed anymore in the order
of
arrival but randomly.
What do you think of this solution ? Do you plan to update BaseX
locking mechanism ?
I'm using BaseX 6.7.1 but I have seen that Lock.java has not been
changed in BaseX 6.7.2.
Here is my code :
package org.basex.core;
import java.util.Date; //import java.util.LinkedList; import java.util.Random;
import org.basex.util.Util;
/** * Management of executing read/write processes. * Supports multiple readers, limited by {@link
MainProp#PARALLEL},
* and single writers (readers/writer lock). * * @author BaseX Team 2005-11, BSD License * @author Christian Gruen */ final class Lock { /** Queue for all waiting processes. */ // private final LinkedList<Object> queue = new
LinkedList<Object>();
/** Mutex object. */ private final Object mutex = new Object(); /** Database context. */ private final Context ctx; /** Static mutex used to synchronize all pending queries. **/ private final static Object queueMutex = new Object();
/** Number of active readers. */ private int readers; /** Writer flag. */ private boolean writer;
/** * Default constructor. * @param c context */ Lock(final Context c) { ctx = c; }
/** * Modifications before executing a command. * @param w writing flag */ void lock(final boolean w) { synchronized(mutex) { int code = new Random(new Date().getTime()).nextInt(); // final Object o = new Object(); // queue.add(o);
try { while(true) { synchronized(queueMutex) { // if(o == queue.get(0) && !writer) { if(!writer) { if(w) { if(readers == 0) { writer = true; break; } } else if(readers <
Math.max(ctx.mprop.num(MainProp.PARALLEL), 1)) {
++readers; break; } } } mutex.wait(); } } catch(final InterruptedException ex) { Util.stack(ex); }
// queue.remove(0); } }
/** * Modifications after executing a command. * @param w writing flag */ synchronized void unlock(final boolean w) { synchronized(mutex) { if(w) { writer = false; } else { --readers; } mutex.notifyAll(); } } } _______________________________________________ BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk
Hi Christian,
Well, I know that the memory consumption is an issue as we are already fighting with it in our current system. It's just our main issue... So, I will adjust the client code. It's good to have a performance improvement. I hope I will not have problem with reading data from the socket chunk by chunk for a long time.
Regards, Laurent
-----Message d'origine----- De : Christian Grün [mailto:christian.gruen@gmail.com] Envoyé : lundi 19 septembre 2011 22:12 À : Laurent Chevalier Cc : basex-talk@mailman.uni-konstanz.de Objet : Re: [basex-talk] BaseX server deadlock
Laurent,
thanks for the elaborate description of your system architecture. I'm still quite positive that our new architecture shouldn't seriously set you back, and I'd claim that our caching architecture is pretty memory efficient, so I would suggest to first do some tests with the new iterator to evaluate if caching is the main issue (sorry for persisting; maybe you've already spent enough time in this anyway).
If the client-side caching turns out to waste too many resources, you could easily adjust the light-weight client code to fit your needs. All you have to do is to directly interpret the incoming results, and skip the remaining results if you have finished querying (see [1] for the Java client). In both cases, querying should at least be much faster than before, and the client-based adjustments won't open many sophisticated issues that would have to be resolved server-side.
Hope this helps; more feedback is welcome, Christian
[1] https://github.com/BaseXdb/basex- api/blob/master/src/main/java/BaseXClient.java ___________________________
On Mon, Sep 19, 2011 at 7:04 PM, Laurent Chevalier l.chevalier@cyim.com wrote:
Hi Christian,
We are building web applications in MS .NET. The data is made of a
hierarchy of containers. A container is a directory containing an XML file and attachments (static resources like pictures, videos, etc.). These containers are "indexed" in a database for better performance. Currently, we are using an SQL Server database with XQuery/XPath and fulltext search. I'm currently working on a new implementation with BaseX. Our goal is to simplify xquery writing (today, we have to mix xquery in sql queries which is a bit complicated), and, if possible, we would like to get better performances.
The biggest database I have to deal with today counts around 25000
containers and continues to grow. It contains medical events data, html articles, news, agenda, members directory, etc. The size of the BaseX database directory with indexes is 160 Mo.
We want to keep the database synchronized with the file system
hierarchy. For instance, if you manually add a container in the file system, you can launch a "re-indexing" process that will update the database automatically. For this process, I iterate over all containers in database, and check if it has to be updated or not. I'm using an iterative query for this. This query is very basic as it only returns a list of string (the identifiers of the containers) of 255 characters max. But, if you multiply 255 by the number of containers, it's starting to do much.
We have other usages of iterative queries. Another example : control
access data is not stored in the database. So, if I want, for instance, the first 10 accessible containers in a given website section, I will loop over the containers published in this section in the database, and return results as soon as I have found 10 accessible containers, ignoring the remaining ones (provided by the BaseX query).
With SQL Server clients, we have an equivalent of BaseX iterative
queries that avoid caching the whole request results. The memory consumption is a very serious issue for web applications in MS .NET or Java.
With JDBC drivers, the fetch size can be set
(http://www.oracle.com/technetwork/database/enterprise- edition/memory.pdf). With PostgreSQL JDBC driver, cursors are used and multiple queries may be fired to get all results (http://abhirama.wordpress.com/2009/01/07/postgresql-jdbc-and-large- result-sets/).
I think the iterative query without client caching (as it was
implemented in BaseX until version 6.7) was a really great feature and addressed a very common memory consumption issue.
BTW, I'm exploring two ways of using BaseX :
- either in client/server mode : the client (web site) communicates
with the BaseX server through TCP,
- or embedded : I have generated a .NET assembly (DLL) with IKVM.NET
and thus I can embed BaseX in a .NET application.
The client/server mode would be used for portals. The embedded mode might be interested for single sites that do not
share database with others.
I hope we'll find a good solution solving both the deadlock issue and
the client memory consumption issue.
Regards, Laurent
-----Message d'origine----- De : Christian Grün [mailto:christian.gruen@gmail.com] Envoyé : lundi 19 septembre 2011 17:38 À : Laurent Chevalier Cc : basex-talk@mailman.uni-konstanz.de Objet : Re: [basex-talk] BaseX server deadlock
Hi Laurent,
yes, the code has already been rewritten to reflect the new Client API. As there were too many potential conflicts with the old
solution,
this would have been happened sooner or later anyway.
I'm sorry that you believe that the new solution might conflict with your existing architecture. I'd be interested in a few things to get
a
better feeling if this problem cannot be solved in a different way:
-- how much data do you iterate through (kb, mb or even more)? -- how expensive are your queries? -- note that the data will be cached by the client.. do you use the same machine for clients and servers? -- I'd be interested in your first test results to see if your
worries
get true.. As the data will be transferred much faster than before (because of the single request to get the data), the new
architecture
might turn out to be beneficial even in your case. Indeed I'm quite convinced, after all, that most users will profit from the changes.
Salutations, Christian
On Mon, Sep 19, 2011 at 5:16 PM, Laurent Chevalier l.chevalier@cyim.com wrote:
In fact, the changes have already done in version 6.8... That's a
serious problem for me as we have to minimize the memory consumption
of
our web applications, that is already high.
-----Message d'origine----- De : Christian Grün [mailto:christian.gruen@gmail.com] Envoyé : lundi 19 septembre 2011 16:13 À : Laurent Chevalier Cc : basex-talk@mailman.uni-konstanz.de Objet : Re: [basex-talk] BaseX server deadlock
Hi Laurent,
while I didn't manage to reproduce the deadlock that you
described a
while ago, I came across some other potential scenarios in which
our
locking implementation could cause deadlocks. The simplest
example
looks as follows:
- Client1 creates an iterator and requests the first result
- Client2 sends an updating command
- Client1 requests no further results, thus blocking Client2
Instead of modifying the delicate Lock algorithm itself, we
decided
to
go one step further and rewrite our client architecture. From now
on,
the clients are responsible for iterating through their query
items,
and an iterator request to the server triggers the complete
execution
and transmission of a query. This has several advantages:
- The server will only perform atomic operations and is not
dependent
on the clients' behavior anymore
- The iterative evaluation of a query will only trigger a single
socket request, leading to a considerable speedup if network
latency
is high
The obvious drawback is that intermediate results need to be
cached.
The most straightforward alternative to bypass this problem is to
send
several queries to the server, or restrict the number of iterated results in the XQuery expression if not all requested results are actually needed.
We have added another Wiki page to better document our server
protocol
[1]. Next, I have closed the GitHub issue related to your locking problem, as it should now be fixed as well.
Hope this helps, Christian
[1] http://docs.basex.org/wiki/Server_Protocol [2] https://github.com/BaseXdb/basex/issues/173
On Mon, Aug 29, 2011 at 9:50 AM, Laurent Chevalier
l.chevalier@cyim.com wrote:
> Hi, > > A deadlock occurs in the following situation: a first client
program
opens an iterative query. For each iteration, this program does
some
processing and sends another reading request to BaseX (using
another
BaseX session). All works fine until a second client program (or another thread) sends an updating command to BaseX (like optimize
for
instance). This locks BaseX server. To unlock it, you have to
kill
the
first program.
> > I have read BaseX server code and found the reason for this
behavior
in the class org.basex.core.Lock:
> - with the iterative query, there is always at least one
reader
alive (readers=1).
> - when the updating query is received, it is put in the queue
(index 0) and remains in it as long as there is a reading query
running
(that is to say, as long as the iterative reading query is
running).
> - then a second reading request is received, it is put in the
queue
(index 1 as there is already the updating query in the queue). As
it
is
only the second item of the queue, it remains in the queue as
long
as
the first item in the queue (the updating query) has not been
processed
(BaseX processes the requests in the order of arrival, FIFO
queue).
But
this first item can not be processed because there is the
iterative
reading query running. All queries are thus locked.
> > Some may say that we should not send another query while we
are
in
the loop of an iterative query but in our context of many sites
being
developed by several developers, it is possible that a developer
codes
this and we do not want BaseX to be locked in this case (whatever
it
is
a mistake of the developer or not).
> > I have found a solution to this problem by modifying the
org.basex.core.Lock class. You will find my code hereafter. I do
not
use a queue anymore and i use a static mutex (called queueMutex)
to
synchronize all pending queries (threads). The "drawback" of this solution is that the queries are not processed anymore in the
order
of
arrival but randomly.
> > What do you think of this solution ? Do you plan to update
BaseX
locking mechanism ?
> > I'm using BaseX 6.7.1 but I have seen that Lock.java has not
been
changed in BaseX 6.7.2.
> > Here is my code : > > package org.basex.core; > > import java.util.Date; > //import java.util.LinkedList; > import java.util.Random; > > import org.basex.util.Util; > > /** > * Management of executing read/write processes. > * Supports multiple readers, limited by {@link
MainProp#PARALLEL},
> * and single writers (readers/writer lock). > * > * @author BaseX Team 2005-11, BSD License > * @author Christian Gruen > */ > final class Lock { > /** Queue for all waiting processes. */ > // private final LinkedList<Object> queue = new
LinkedList<Object>();
> /** Mutex object. */ > private final Object mutex = new Object(); > /** Database context. */ > private final Context ctx; > /** Static mutex used to synchronize all pending queries. **/ > private final static Object queueMutex = new Object(); > > /** Number of active readers. */ > private int readers; > /** Writer flag. */ > private boolean writer; > > /** > * Default constructor. > * @param c context > */ > Lock(final Context c) { > ctx = c; > } > > /** > * Modifications before executing a command. > * @param w writing flag > */ > void lock(final boolean w) { > synchronized(mutex) { > int code = new Random(new Date().getTime()).nextInt(); > // final Object o = new Object(); > // queue.add(o); > > try { > while(true) { > synchronized(queueMutex) { > // if(o == queue.get(0) && !writer) { > if(!writer) { > if(w) { > if(readers == 0) { > writer = true; > break; > } > } else if(readers <
Math.max(ctx.mprop.num(MainProp.PARALLEL), 1)) {
> ++readers; > break; > } > } > } > mutex.wait(); > } > } catch(final InterruptedException ex) { > Util.stack(ex); > } > > // queue.remove(0); > } > } > > /** > * Modifications after executing a command. > * @param w writing flag > */ > synchronized void unlock(final boolean w) { > synchronized(mutex) { > if(w) { > writer = false; > } else { > --readers; > } > mutex.notifyAll(); > } > } > } > _______________________________________________ > BaseX-Talk mailing list > BaseX-Talk@mailman.uni-konstanz.de > https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk >
Dear Laurent,
now that we've officially released our new iterator concept.. Have you been successful with optimizing the BaseX client for your system architecture? What are the current bottlenecks?
Christian ___________________________
On Tue, Sep 20, 2011 at 9:25 AM, Laurent Chevalier l.chevalier@cyim.com wrote:
Hi Christian,
Well, I know that the memory consumption is an issue as we are already fighting with it in our current system. It's just our main issue... So, I will adjust the client code. It's good to have a performance improvement. I hope I will not have problem with reading data from the socket chunk by chunk for a long time.
Regards, Laurent
-----Message d'origine----- De : Christian Grün [mailto:christian.gruen@gmail.com] Envoyé : lundi 19 septembre 2011 22:12 À : Laurent Chevalier Cc : basex-talk@mailman.uni-konstanz.de Objet : Re: [basex-talk] BaseX server deadlock
Laurent,
thanks for the elaborate description of your system architecture. I'm still quite positive that our new architecture shouldn't seriously set you back, and I'd claim that our caching architecture is pretty memory efficient, so I would suggest to first do some tests with the new iterator to evaluate if caching is the main issue (sorry for persisting; maybe you've already spent enough time in this anyway).
If the client-side caching turns out to waste too many resources, you could easily adjust the light-weight client code to fit your needs. All you have to do is to directly interpret the incoming results, and skip the remaining results if you have finished querying (see [1] for the Java client). In both cases, querying should at least be much faster than before, and the client-based adjustments won't open many sophisticated issues that would have to be resolved server-side.
Hope this helps; more feedback is welcome, Christian
[1] https://github.com/BaseXdb/basex- api/blob/master/src/main/java/BaseXClient.java ___________________________
On Mon, Sep 19, 2011 at 7:04 PM, Laurent Chevalier l.chevalier@cyim.com wrote:
Hi Christian,
We are building web applications in MS .NET. The data is made of a
hierarchy of containers. A container is a directory containing an XML file and attachments (static resources like pictures, videos, etc.). These containers are "indexed" in a database for better performance. Currently, we are using an SQL Server database with XQuery/XPath and fulltext search. I'm currently working on a new implementation with BaseX. Our goal is to simplify xquery writing (today, we have to mix xquery in sql queries which is a bit complicated), and, if possible, we would like to get better performances.
The biggest database I have to deal with today counts around 25000
containers and continues to grow. It contains medical events data, html articles, news, agenda, members directory, etc. The size of the BaseX database directory with indexes is 160 Mo.
We want to keep the database synchronized with the file system
hierarchy. For instance, if you manually add a container in the file system, you can launch a "re-indexing" process that will update the database automatically. For this process, I iterate over all containers in database, and check if it has to be updated or not. I'm using an iterative query for this. This query is very basic as it only returns a list of string (the identifiers of the containers) of 255 characters max. But, if you multiply 255 by the number of containers, it's starting to do much.
We have other usages of iterative queries. Another example : control
access data is not stored in the database. So, if I want, for instance, the first 10 accessible containers in a given website section, I will loop over the containers published in this section in the database, and return results as soon as I have found 10 accessible containers, ignoring the remaining ones (provided by the BaseX query).
With SQL Server clients, we have an equivalent of BaseX iterative
queries that avoid caching the whole request results. The memory consumption is a very serious issue for web applications in MS .NET or Java.
With JDBC drivers, the fetch size can be set
(http://www.oracle.com/technetwork/database/enterprise- edition/memory.pdf). With PostgreSQL JDBC driver, cursors are used and multiple queries may be fired to get all results (http://abhirama.wordpress.com/2009/01/07/postgresql-jdbc-and-large- result-sets/).
I think the iterative query without client caching (as it was
implemented in BaseX until version 6.7) was a really great feature and addressed a very common memory consumption issue.
BTW, I'm exploring two ways of using BaseX : - either in client/server mode : the client (web site) communicates
with the BaseX server through TCP,
- or embedded : I have generated a .NET assembly (DLL) with IKVM.NET
and thus I can embed BaseX in a .NET application.
The client/server mode would be used for portals. The embedded mode might be interested for single sites that do not
share database with others.
I hope we'll find a good solution solving both the deadlock issue and
the client memory consumption issue.
Regards, Laurent
-----Message d'origine----- De : Christian Grün [mailto:christian.gruen@gmail.com] Envoyé : lundi 19 septembre 2011 17:38 À : Laurent Chevalier Cc : basex-talk@mailman.uni-konstanz.de Objet : Re: [basex-talk] BaseX server deadlock
Hi Laurent,
yes, the code has already been rewritten to reflect the new Client API. As there were too many potential conflicts with the old
solution,
this would have been happened sooner or later anyway.
I'm sorry that you believe that the new solution might conflict with your existing architecture. I'd be interested in a few things to get
a
better feeling if this problem cannot be solved in a different way:
-- how much data do you iterate through (kb, mb or even more)? -- how expensive are your queries? -- note that the data will be cached by the client.. do you use the same machine for clients and servers? -- I'd be interested in your first test results to see if your
worries
get true.. As the data will be transferred much faster than before (because of the single request to get the data), the new
architecture
might turn out to be beneficial even in your case. Indeed I'm quite convinced, after all, that most users will profit from the changes.
Salutations, Christian
On Mon, Sep 19, 2011 at 5:16 PM, Laurent Chevalier l.chevalier@cyim.com wrote:
In fact, the changes have already done in version 6.8... That's a
serious problem for me as we have to minimize the memory consumption
of
our web applications, that is already high.
-----Message d'origine----- De : Christian Grün [mailto:christian.gruen@gmail.com] Envoyé : lundi 19 septembre 2011 16:13 À : Laurent Chevalier Cc : basex-talk@mailman.uni-konstanz.de Objet : Re: [basex-talk] BaseX server deadlock
Hi Laurent,
while I didn't manage to reproduce the deadlock that you
described a
while ago, I came across some other potential scenarios in which
our
locking implementation could cause deadlocks. The simplest
example
looks as follows:
- Client1 creates an iterator and requests the first result
- Client2 sends an updating command
- Client1 requests no further results, thus blocking Client2
Instead of modifying the delicate Lock algorithm itself, we
decided
to
go one step further and rewrite our client architecture. From now
on,
the clients are responsible for iterating through their query
items,
and an iterator request to the server triggers the complete
execution
and transmission of a query. This has several advantages:
- The server will only perform atomic operations and is not
dependent
on the clients' behavior anymore
- The iterative evaluation of a query will only trigger a single
socket request, leading to a considerable speedup if network
latency
is high
The obvious drawback is that intermediate results need to be
cached.
The most straightforward alternative to bypass this problem is to
send
several queries to the server, or restrict the number of iterated results in the XQuery expression if not all requested results are actually needed.
We have added another Wiki page to better document our server
protocol
[1]. Next, I have closed the GitHub issue related to your locking problem, as it should now be fixed as well.
Hope this helps, Christian
[1] http://docs.basex.org/wiki/Server_Protocol [2] https://github.com/BaseXdb/basex/issues/173
> __________________________ > > On Mon, Aug 29, 2011 at 9:50 AM, Laurent Chevalier l.chevalier@cyim.com wrote: >> Hi, >> >> A deadlock occurs in the following situation: a first client
program
opens an iterative query. For each iteration, this program does
some
processing and sends another reading request to BaseX (using
another
BaseX session). All works fine until a second client program (or another thread) sends an updating command to BaseX (like optimize
for
instance). This locks BaseX server. To unlock it, you have to
kill
the
first program. >> >> I have read BaseX server code and found the reason for this
behavior
in the class org.basex.core.Lock: >> - with the iterative query, there is always at least one
reader
alive (readers=1). >> - when the updating query is received, it is put in the queue (index 0) and remains in it as long as there is a reading query
running
(that is to say, as long as the iterative reading query is
running).
>> - then a second reading request is received, it is put in the
queue
(index 1 as there is already the updating query in the queue). As
it
is
only the second item of the queue, it remains in the queue as
long
as
the first item in the queue (the updating query) has not been
processed
(BaseX processes the requests in the order of arrival, FIFO
queue).
But
this first item can not be processed because there is the
iterative
reading query running. All queries are thus locked. >> >> Some may say that we should not send another query while we
are
in
the loop of an iterative query but in our context of many sites
being
developed by several developers, it is possible that a developer
codes
this and we do not want BaseX to be locked in this case (whatever
it
is
a mistake of the developer or not). >> >> I have found a solution to this problem by modifying the org.basex.core.Lock class. You will find my code hereafter. I do
not
use a queue anymore and i use a static mutex (called queueMutex)
to
synchronize all pending queries (threads). The "drawback" of this solution is that the queries are not processed anymore in the
order
of
arrival but randomly. >> >> What do you think of this solution ? Do you plan to update
BaseX
locking mechanism ? >> >> I'm using BaseX 6.7.1 but I have seen that Lock.java has not
been
changed in BaseX 6.7.2. >> >> Here is my code : >> >> package org.basex.core; >> >> import java.util.Date; >> //import java.util.LinkedList; >> import java.util.Random; >> >> import org.basex.util.Util; >> >> /** >> * Management of executing read/write processes. >> * Supports multiple readers, limited by {@link
MainProp#PARALLEL},
>> * and single writers (readers/writer lock). >> * >> * @author BaseX Team 2005-11, BSD License >> * @author Christian Gruen >> */ >> final class Lock { >> /** Queue for all waiting processes. */ >> // private final LinkedList<Object> queue = new LinkedList<Object>(); >> /** Mutex object. */ >> private final Object mutex = new Object(); >> /** Database context. */ >> private final Context ctx; >> /** Static mutex used to synchronize all pending queries. **/ >> private final static Object queueMutex = new Object(); >> >> /** Number of active readers. */ >> private int readers; >> /** Writer flag. */ >> private boolean writer; >> >> /** >> * Default constructor. >> * @param c context >> */ >> Lock(final Context c) { >> ctx = c; >> } >> >> /** >> * Modifications before executing a command. >> * @param w writing flag >> */ >> void lock(final boolean w) { >> synchronized(mutex) { >> int code = new Random(new Date().getTime()).nextInt(); >> // final Object o = new Object(); >> // queue.add(o); >> >> try { >> while(true) { >> synchronized(queueMutex) { >> // if(o == queue.get(0) && !writer) { >> if(!writer) { >> if(w) { >> if(readers == 0) { >> writer = true; >> break; >> } >> } else if(readers < Math.max(ctx.mprop.num(MainProp.PARALLEL), 1)) { >> ++readers; >> break; >> } >> } >> } >> mutex.wait(); >> } >> } catch(final InterruptedException ex) { >> Util.stack(ex); >> } >> >> // queue.remove(0); >> } >> } >> >> /** >> * Modifications after executing a command. >> * @param w writing flag >> */ >> synchronized void unlock(final boolean w) { >> synchronized(mutex) { >> if(w) { >> writer = false; >> } else { >> --readers; >> } >> mutex.notifyAll(); >> } >> } >> } >> _______________________________________________ >> BaseX-Talk mailing list >> BaseX-Talk@mailman.uni-konstanz.de >> https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk >> >
Hi Christian,
Thank you for your mail. I have updated my client code to use the new BaseX 7.0 release last week. It was not too painful. I'm not caching the results. With my application, I still have the deadlock problem with BaseX 7.0 and so I'm still using the Lock class fix but I failed to reproduce the problem with a small test code in Java. So, it's not sure that the problem is coming from BaseX yet, it may come from my .NET client. Tomorrow, I will translate the Java test code in VB.NET and I keep you inform.
I you want, I can also send you an updated version of the ClientQuery class that is not using cache.
Best regards, Laurent
-----Message d'origine----- De : Christian Grün [mailto:christian.gruen@gmail.com] Envoyé : jeudi 27 octobre 2011 00:27 À : Laurent Chevalier Cc : basex-talk@mailman.uni-konstanz.de Objet : Re: [basex-talk] BaseX server deadlock
Dear Laurent,
now that we've officially released our new iterator concept.. Have you been successful with optimizing the BaseX client for your system architecture? What are the current bottlenecks?
Christian ___________________________
On Tue, Sep 20, 2011 at 9:25 AM, Laurent Chevalier l.chevalier@cyim.com wrote:
Hi Christian,
Well, I know that the memory consumption is an issue as we are
already fighting with it in our current system. It's just our main issue... So, I will adjust the client code. It's good to have a performance improvement. I hope I will not have problem with reading data from the socket chunk by chunk for a long time.
Regards, Laurent
-----Message d'origine----- De : Christian Grün [mailto:christian.gruen@gmail.com] Envoyé : lundi 19 septembre 2011 22:12 À : Laurent Chevalier Cc : basex-talk@mailman.uni-konstanz.de Objet : Re: [basex-talk] BaseX server deadlock
Laurent,
thanks for the elaborate description of your system architecture.
I'm
still quite positive that our new architecture shouldn't seriously
set
you back, and I'd claim that our caching architecture is pretty
memory
efficient, so I would suggest to first do some tests with the new iterator to evaluate if caching is the main issue (sorry for persisting; maybe you've already spent enough time in this anyway).
If the client-side caching turns out to waste too many resources,
you
could easily adjust the light-weight client code to fit your needs. All you have to do is to directly interpret the incoming results,
and
skip the remaining results if you have finished querying (see [1]
for
the Java client). In both cases, querying should at least be much faster than before, and the client-based adjustments won't open many sophisticated issues that would have to be resolved server-side.
Hope this helps; more feedback is welcome, Christian
[1] https://github.com/BaseXdb/basex- api/blob/master/src/main/java/BaseXClient.java ___________________________
On Mon, Sep 19, 2011 at 7:04 PM, Laurent Chevalier l.chevalier@cyim.com wrote:
Hi Christian,
We are building web applications in MS .NET. The data is made of a
hierarchy of containers. A container is a directory containing an
XML
file and attachments (static resources like pictures, videos, etc.). These containers are "indexed" in a database for better performance. Currently, we are using an SQL Server database with XQuery/XPath and fulltext search. I'm currently working on a new implementation with BaseX. Our goal is to simplify xquery writing (today, we have to mix xquery in sql queries which is a bit complicated), and, if possible,
we
would like to get better performances.
The biggest database I have to deal with today counts around 25000
containers and continues to grow. It contains medical events data,
html
articles, news, agenda, members directory, etc. The size of the
BaseX
database directory with indexes is 160 Mo.
We want to keep the database synchronized with the file system
hierarchy. For instance, if you manually add a container in the file system, you can launch a "re-indexing" process that will update the database automatically. For this process, I iterate over all
containers
in database, and check if it has to be updated or not. I'm using an iterative query for this. This query is very basic as it only
returns a
list of string (the identifiers of the containers) of 255 characters max. But, if you multiply 255 by the number of containers, it's starting to do much.
We have other usages of iterative queries. Another example :
control
access data is not stored in the database. So, if I want, for
instance,
the first 10 accessible containers in a given website section, I
will
loop over the containers published in this section in the database,
and
return results as soon as I have found 10 accessible containers, ignoring the remaining ones (provided by the BaseX query).
With SQL Server clients, we have an equivalent of BaseX iterative
queries that avoid caching the whole request results. The memory consumption is a very serious issue for web applications in MS .NET
or
Java.
With JDBC drivers, the fetch size can be set
(http://www.oracle.com/technetwork/database/enterprise- edition/memory.pdf). With PostgreSQL JDBC driver, cursors are used
and
multiple queries may be fired to get all results (http://abhirama.wordpress.com/2009/01/07/postgresql-jdbc-and-large- result-sets/).
I think the iterative query without client caching (as it was
implemented in BaseX until version 6.7) was a really great feature
and
addressed a very common memory consumption issue.
BTW, I'm exploring two ways of using BaseX :
- either in client/server mode : the client (web site)
communicates
with the BaseX server through TCP,
- or embedded : I have generated a .NET assembly (DLL) with
IKVM.NET
and thus I can embed BaseX in a .NET application.
The client/server mode would be used for portals. The embedded mode might be interested for single sites that do not
share database with others.
I hope we'll find a good solution solving both the deadlock issue
and
the client memory consumption issue.
Regards, Laurent
-----Message d'origine----- De : Christian Grün [mailto:christian.gruen@gmail.com] Envoyé : lundi 19 septembre 2011 17:38 À : Laurent Chevalier Cc : basex-talk@mailman.uni-konstanz.de Objet : Re: [basex-talk] BaseX server deadlock
Hi Laurent,
yes, the code has already been rewritten to reflect the new
Client
API. As there were too many potential conflicts with the old
solution,
this would have been happened sooner or later anyway.
I'm sorry that you believe that the new solution might conflict
with
your existing architecture. I'd be interested in a few things to
get
a
better feeling if this problem cannot be solved in a different
way:
-- how much data do you iterate through (kb, mb or even more)? -- how expensive are your queries? -- note that the data will be cached by the client.. do you use
the
same machine for clients and servers? -- I'd be interested in your first test results to see if your
worries
get true.. As the data will be transferred much faster than
before
(because of the single request to get the data), the new
architecture
might turn out to be beneficial even in your case. Indeed I'm
quite
convinced, after all, that most users will profit from the
changes.
Salutations, Christian
On Mon, Sep 19, 2011 at 5:16 PM, Laurent Chevalier l.chevalier@cyim.com wrote:
In fact, the changes have already done in version 6.8... That's
a
serious problem for me as we have to minimize the memory
consumption
of
our web applications, that is already high.
> -----Message d'origine----- > De : Christian Grün [mailto:christian.gruen@gmail.com] > Envoyé : lundi 19 septembre 2011 16:13 > À : Laurent Chevalier > Cc : basex-talk@mailman.uni-konstanz.de > Objet : Re: [basex-talk] BaseX server deadlock > > Hi Laurent, > > while I didn't manage to reproduce the deadlock that you
described a
> while ago, I came across some other potential scenarios in
which
our
> locking implementation could cause deadlocks. The simplest
example
> looks as follows: > > - Client1 creates an iterator and requests the first result > - Client2 sends an updating command > - Client1 requests no further results, thus blocking Client2 > > Instead of modifying the delicate Lock algorithm itself, we
decided
to
> go one step further and rewrite our client architecture. From
now
on,
> the clients are responsible for iterating through their query
items,
> and an iterator request to the server triggers the complete
execution
> and transmission of a query. This has several advantages: > > - The server will only perform atomic operations and is not
dependent
> on the clients' behavior anymore > - The iterative evaluation of a query will only trigger a
single
> socket request, leading to a considerable speedup if network
latency
> is high > > The obvious drawback is that intermediate results need to be
cached.
> The most straightforward alternative to bypass this problem is
to
send
> several queries to the server, or restrict the number of
iterated
> results in the XQuery expression if not all requested results
are
> actually needed. > > We have added another Wiki page to better document our server
protocol
> [1]. Next, I have closed the GitHub issue related to your
locking
> problem, as it should now be fixed as well. > > Hope this helps, > Christian > > [1] http://docs.basex.org/wiki/Server_Protocol > [2] https://github.com/BaseXdb/basex/issues/173 > > > > __________________________ > > > > On Mon, Aug 29, 2011 at 9:50 AM, Laurent Chevalier > l.chevalier@cyim.com wrote: > >> Hi, > >> > >> A deadlock occurs in the following situation: a first
client
program
> opens an iterative query. For each iteration, this program
does
some
> processing and sends another reading request to BaseX (using
another
> BaseX session). All works fine until a second client program
(or
> another thread) sends an updating command to BaseX (like
optimize
for
> instance). This locks BaseX server. To unlock it, you have to
kill
the
> first program. > >> > >> I have read BaseX server code and found the reason for this
behavior
> in the class org.basex.core.Lock: > >> - with the iterative query, there is always at least one
reader
> alive (readers=1). > >> - when the updating query is received, it is put in the
queue
> (index 0) and remains in it as long as there is a reading
query
running
> (that is to say, as long as the iterative reading query is
running).
> >> - then a second reading request is received, it is put in
the
queue
> (index 1 as there is already the updating query in the queue).
As
it
is
> only the second item of the queue, it remains in the queue as
long
as
> the first item in the queue (the updating query) has not been
processed
> (BaseX processes the requests in the order of arrival, FIFO
queue).
But
> this first item can not be processed because there is the
iterative
> reading query running. All queries are thus locked. > >> > >> Some may say that we should not send another query while we
are
in
> the loop of an iterative query but in our context of many
sites
being
> developed by several developers, it is possible that a
developer
codes
> this and we do not want BaseX to be locked in this case
(whatever
it
is
> a mistake of the developer or not). > >> > >> I have found a solution to this problem by modifying the > org.basex.core.Lock class. You will find my code hereafter. I
do
not
> use a queue anymore and i use a static mutex (called
queueMutex)
to
> synchronize all pending queries (threads). The "drawback" of
this
> solution is that the queries are not processed anymore in the
order
of
> arrival but randomly. > >> > >> What do you think of this solution ? Do you plan to update
BaseX
> locking mechanism ? > >> > >> I'm using BaseX 6.7.1 but I have seen that Lock.java has
not
been
> changed in BaseX 6.7.2. > >> > >> Here is my code : > >> > >> package org.basex.core; > >> > >> import java.util.Date; > >> //import java.util.LinkedList; > >> import java.util.Random; > >> > >> import org.basex.util.Util; > >> > >> /** > >> * Management of executing read/write processes. > >> * Supports multiple readers, limited by {@link
MainProp#PARALLEL},
> >> * and single writers (readers/writer lock). > >> * > >> * @author BaseX Team 2005-11, BSD License > >> * @author Christian Gruen > >> */ > >> final class Lock { > >> /** Queue for all waiting processes. */ > >> // private final LinkedList<Object> queue = new > LinkedList<Object>(); > >> /** Mutex object. */ > >> private final Object mutex = new Object(); > >> /** Database context. */ > >> private final Context ctx; > >> /** Static mutex used to synchronize all pending queries.
**/
> >> private final static Object queueMutex = new Object(); > >> > >> /** Number of active readers. */ > >> private int readers; > >> /** Writer flag. */ > >> private boolean writer; > >> > >> /** > >> * Default constructor. > >> * @param c context > >> */ > >> Lock(final Context c) { > >> ctx = c; > >> } > >> > >> /** > >> * Modifications before executing a command. > >> * @param w writing flag > >> */ > >> void lock(final boolean w) { > >> synchronized(mutex) { > >> int code = new Random(new Date().getTime()).nextInt(); > >> // final Object o = new Object(); > >> // queue.add(o); > >> > >> try { > >> while(true) { > >> synchronized(queueMutex) { > >> // if(o == queue.get(0) && !writer) { > >> if(!writer) { > >> if(w) { > >> if(readers == 0) { > >> writer = true; > >> break; > >> } > >> } else if(readers < > Math.max(ctx.mprop.num(MainProp.PARALLEL), 1)) { > >> ++readers; > >> break; > >> } > >> } > >> } > >> mutex.wait(); > >> } > >> } catch(final InterruptedException ex) { > >> Util.stack(ex); > >> } > >> > >> // queue.remove(0); > >> } > >> } > >> > >> /** > >> * Modifications after executing a command. > >> * @param w writing flag > >> */ > >> synchronized void unlock(final boolean w) { > >> synchronized(mutex) { > >> if(w) { > >> writer = false; > >> } else { > >> --readers; > >> } > >> mutex.notifyAll(); > >> } > >> } > >> } > >> _______________________________________________ > >> BaseX-Talk mailing list > >> BaseX-Talk@mailman.uni-konstanz.de > >> https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk > >> > > >
Laurent,
thanks for your feedback. I am wondering that the deadlock problem hasn't been fixed with 7.0, as all API database operations should now be atomic. Just in case.. Could you tell me if you are also encountering the locking issue with the unmodified client?
All the best, Christian _________________________________________
On Thu, Oct 27, 2011 at 6:51 PM, Laurent Chevalier l.chevalier@cyim.com wrote:
Hi Christian,
Thank you for your mail. I have updated my client code to use the new BaseX 7.0 release last week. It was not too painful. I'm not caching the results. With my application, I still have the deadlock problem with BaseX 7.0 and so I'm still using the Lock class fix but I failed to reproduce the problem with a small test code in Java. So, it's not sure that the problem is coming from BaseX yet, it may come from my .NET client. Tomorrow, I will translate the Java test code in VB.NET and I keep you inform.
I you want, I can also send you an updated version of the ClientQuery class that is not using cache.
Best regards, Laurent
-----Message d'origine----- De : Christian Grün [mailto:christian.gruen@gmail.com] Envoyé : jeudi 27 octobre 2011 00:27 À : Laurent Chevalier Cc : basex-talk@mailman.uni-konstanz.de Objet : Re: [basex-talk] BaseX server deadlock
Dear Laurent,
now that we've officially released our new iterator concept.. Have you been successful with optimizing the BaseX client for your system architecture? What are the current bottlenecks?
Christian ___________________________
On Tue, Sep 20, 2011 at 9:25 AM, Laurent Chevalier l.chevalier@cyim.com wrote:
Hi Christian,
Well, I know that the memory consumption is an issue as we are
already fighting with it in our current system. It's just our main issue... So, I will adjust the client code. It's good to have a performance improvement. I hope I will not have problem with reading data from the socket chunk by chunk for a long time.
Regards, Laurent
-----Message d'origine----- De : Christian Grün [mailto:christian.gruen@gmail.com] Envoyé : lundi 19 septembre 2011 22:12 À : Laurent Chevalier Cc : basex-talk@mailman.uni-konstanz.de Objet : Re: [basex-talk] BaseX server deadlock
Laurent,
thanks for the elaborate description of your system architecture.
I'm
still quite positive that our new architecture shouldn't seriously
set
you back, and I'd claim that our caching architecture is pretty
memory
efficient, so I would suggest to first do some tests with the new iterator to evaluate if caching is the main issue (sorry for persisting; maybe you've already spent enough time in this anyway).
If the client-side caching turns out to waste too many resources,
you
could easily adjust the light-weight client code to fit your needs. All you have to do is to directly interpret the incoming results,
and
skip the remaining results if you have finished querying (see [1]
for
the Java client). In both cases, querying should at least be much faster than before, and the client-based adjustments won't open many sophisticated issues that would have to be resolved server-side.
Hope this helps; more feedback is welcome, Christian
[1] https://github.com/BaseXdb/basex- api/blob/master/src/main/java/BaseXClient.java ___________________________
On Mon, Sep 19, 2011 at 7:04 PM, Laurent Chevalier l.chevalier@cyim.com wrote:
Hi Christian,
We are building web applications in MS .NET. The data is made of a
hierarchy of containers. A container is a directory containing an
XML
file and attachments (static resources like pictures, videos, etc.). These containers are "indexed" in a database for better performance. Currently, we are using an SQL Server database with XQuery/XPath and fulltext search. I'm currently working on a new implementation with BaseX. Our goal is to simplify xquery writing (today, we have to mix xquery in sql queries which is a bit complicated), and, if possible,
we
would like to get better performances.
The biggest database I have to deal with today counts around 25000
containers and continues to grow. It contains medical events data,
html
articles, news, agenda, members directory, etc. The size of the
BaseX
database directory with indexes is 160 Mo.
We want to keep the database synchronized with the file system
hierarchy. For instance, if you manually add a container in the file system, you can launch a "re-indexing" process that will update the database automatically. For this process, I iterate over all
containers
in database, and check if it has to be updated or not. I'm using an iterative query for this. This query is very basic as it only
returns a
list of string (the identifiers of the containers) of 255 characters max. But, if you multiply 255 by the number of containers, it's starting to do much.
We have other usages of iterative queries. Another example :
control
access data is not stored in the database. So, if I want, for
instance,
the first 10 accessible containers in a given website section, I
will
loop over the containers published in this section in the database,
and
return results as soon as I have found 10 accessible containers, ignoring the remaining ones (provided by the BaseX query).
With SQL Server clients, we have an equivalent of BaseX iterative
queries that avoid caching the whole request results. The memory consumption is a very serious issue for web applications in MS .NET
or
Java.
With JDBC drivers, the fetch size can be set
(http://www.oracle.com/technetwork/database/enterprise- edition/memory.pdf). With PostgreSQL JDBC driver, cursors are used
and
multiple queries may be fired to get all results (http://abhirama.wordpress.com/2009/01/07/postgresql-jdbc-and-large- result-sets/).
I think the iterative query without client caching (as it was
implemented in BaseX until version 6.7) was a really great feature
and
addressed a very common memory consumption issue.
BTW, I'm exploring two ways of using BaseX : - either in client/server mode : the client (web site)
communicates
with the BaseX server through TCP,
- or embedded : I have generated a .NET assembly (DLL) with
IKVM.NET
and thus I can embed BaseX in a .NET application.
The client/server mode would be used for portals. The embedded mode might be interested for single sites that do not
share database with others.
I hope we'll find a good solution solving both the deadlock issue
and
the client memory consumption issue.
Regards, Laurent
-----Message d'origine----- De : Christian Grün [mailto:christian.gruen@gmail.com] Envoyé : lundi 19 septembre 2011 17:38 À : Laurent Chevalier Cc : basex-talk@mailman.uni-konstanz.de Objet : Re: [basex-talk] BaseX server deadlock
Hi Laurent,
yes, the code has already been rewritten to reflect the new
Client
API. As there were too many potential conflicts with the old
solution,
this would have been happened sooner or later anyway.
I'm sorry that you believe that the new solution might conflict
with
your existing architecture. I'd be interested in a few things to
get
a
better feeling if this problem cannot be solved in a different
way:
-- how much data do you iterate through (kb, mb or even more)? -- how expensive are your queries? -- note that the data will be cached by the client.. do you use
the
same machine for clients and servers? -- I'd be interested in your first test results to see if your
worries
get true.. As the data will be transferred much faster than
before
(because of the single request to get the data), the new
architecture
might turn out to be beneficial even in your case. Indeed I'm
quite
convinced, after all, that most users will profit from the
changes.
Salutations, Christian
On Mon, Sep 19, 2011 at 5:16 PM, Laurent Chevalier l.chevalier@cyim.com wrote: > In fact, the changes have already done in version 6.8... That's
a
serious problem for me as we have to minimize the memory
consumption
of
our web applications, that is already high. > > >> -----Message d'origine----- >> De : Christian Grün [mailto:christian.gruen@gmail.com] >> Envoyé : lundi 19 septembre 2011 16:13 >> À : Laurent Chevalier >> Cc : basex-talk@mailman.uni-konstanz.de >> Objet : Re: [basex-talk] BaseX server deadlock >> >> Hi Laurent, >> >> while I didn't manage to reproduce the deadlock that you
described a
>> while ago, I came across some other potential scenarios in
which
our
>> locking implementation could cause deadlocks. The simplest
example
>> looks as follows: >> >> - Client1 creates an iterator and requests the first result >> - Client2 sends an updating command >> - Client1 requests no further results, thus blocking Client2 >> >> Instead of modifying the delicate Lock algorithm itself, we
decided
to >> go one step further and rewrite our client architecture. From
now
on, >> the clients are responsible for iterating through their query
items,
>> and an iterator request to the server triggers the complete execution >> and transmission of a query. This has several advantages: >> >> - The server will only perform atomic operations and is not dependent >> on the clients' behavior anymore >> - The iterative evaluation of a query will only trigger a
single
>> socket request, leading to a considerable speedup if network
latency
>> is high >> >> The obvious drawback is that intermediate results need to be
cached.
>> The most straightforward alternative to bypass this problem is
to
send >> several queries to the server, or restrict the number of
iterated
>> results in the XQuery expression if not all requested results
are
>> actually needed. >> >> We have added another Wiki page to better document our server protocol >> [1]. Next, I have closed the GitHub issue related to your
locking
>> problem, as it should now be fixed as well. >> >> Hope this helps, >> Christian >> >> [1] http://docs.basex.org/wiki/Server_Protocol >> [2] https://github.com/BaseXdb/basex/issues/173 >> >> >> > __________________________ >> > >> > On Mon, Aug 29, 2011 at 9:50 AM, Laurent Chevalier >> l.chevalier@cyim.com wrote: >> >> Hi, >> >> >> >> A deadlock occurs in the following situation: a first
client
program >> opens an iterative query. For each iteration, this program
does
some
>> processing and sends another reading request to BaseX (using
another
>> BaseX session). All works fine until a second client program
(or
>> another thread) sends an updating command to BaseX (like
optimize
for >> instance). This locks BaseX server. To unlock it, you have to
kill
the >> first program. >> >> >> >> I have read BaseX server code and found the reason for this behavior >> in the class org.basex.core.Lock: >> >> - with the iterative query, there is always at least one
reader
>> alive (readers=1). >> >> - when the updating query is received, it is put in the
queue
>> (index 0) and remains in it as long as there is a reading
query
running >> (that is to say, as long as the iterative reading query is
running).
>> >> - then a second reading request is received, it is put in
the
queue >> (index 1 as there is already the updating query in the queue).
As
it
is >> only the second item of the queue, it remains in the queue as
long
as >> the first item in the queue (the updating query) has not been processed >> (BaseX processes the requests in the order of arrival, FIFO
queue).
But >> this first item can not be processed because there is the
iterative
>> reading query running. All queries are thus locked. >> >> >> >> Some may say that we should not send another query while we
are
in >> the loop of an iterative query but in our context of many
sites
being >> developed by several developers, it is possible that a
developer
codes >> this and we do not want BaseX to be locked in this case
(whatever
it
is >> a mistake of the developer or not). >> >> >> >> I have found a solution to this problem by modifying the >> org.basex.core.Lock class. You will find my code hereafter. I
do
not
>> use a queue anymore and i use a static mutex (called
queueMutex)
to
>> synchronize all pending queries (threads). The "drawback" of
this
>> solution is that the queries are not processed anymore in the
order
of >> arrival but randomly. >> >> >> >> What do you think of this solution ? Do you plan to update
BaseX
>> locking mechanism ? >> >> >> >> I'm using BaseX 6.7.1 but I have seen that Lock.java has
not
been
>> changed in BaseX 6.7.2. >> >> >> >> Here is my code : >> >> >> >> package org.basex.core; >> >> >> >> import java.util.Date; >> >> //import java.util.LinkedList; >> >> import java.util.Random; >> >> >> >> import org.basex.util.Util; >> >> >> >> /** >> >> * Management of executing read/write processes. >> >> * Supports multiple readers, limited by {@link MainProp#PARALLEL}, >> >> * and single writers (readers/writer lock). >> >> * >> >> * @author BaseX Team 2005-11, BSD License >> >> * @author Christian Gruen >> >> */ >> >> final class Lock { >> >> /** Queue for all waiting processes. */ >> >> // private final LinkedList<Object> queue = new >> LinkedList<Object>(); >> >> /** Mutex object. */ >> >> private final Object mutex = new Object(); >> >> /** Database context. */ >> >> private final Context ctx; >> >> /** Static mutex used to synchronize all pending queries.
**/
>> >> private final static Object queueMutex = new Object(); >> >> >> >> /** Number of active readers. */ >> >> private int readers; >> >> /** Writer flag. */ >> >> private boolean writer; >> >> >> >> /** >> >> * Default constructor. >> >> * @param c context >> >> */ >> >> Lock(final Context c) { >> >> ctx = c; >> >> } >> >> >> >> /** >> >> * Modifications before executing a command. >> >> * @param w writing flag >> >> */ >> >> void lock(final boolean w) { >> >> synchronized(mutex) { >> >> int code = new Random(new Date().getTime()).nextInt(); >> >> // final Object o = new Object(); >> >> // queue.add(o); >> >> >> >> try { >> >> while(true) { >> >> synchronized(queueMutex) { >> >> // if(o == queue.get(0) && !writer) { >> >> if(!writer) { >> >> if(w) { >> >> if(readers == 0) { >> >> writer = true; >> >> break; >> >> } >> >> } else if(readers < >> Math.max(ctx.mprop.num(MainProp.PARALLEL), 1)) { >> >> ++readers; >> >> break; >> >> } >> >> } >> >> } >> >> mutex.wait(); >> >> } >> >> } catch(final InterruptedException ex) { >> >> Util.stack(ex); >> >> } >> >> >> >> // queue.remove(0); >> >> } >> >> } >> >> >> >> /** >> >> * Modifications after executing a command. >> >> * @param w writing flag >> >> */ >> >> synchronized void unlock(final boolean w) { >> >> synchronized(mutex) { >> >> if(w) { >> >> writer = false; >> >> } else { >> >> --readers; >> >> } >> >> mutex.notifyAll(); >> >> } >> >> } >> >> } >> >> _______________________________________________ >> >> BaseX-Talk mailing list >> >> BaseX-Talk@mailman.uni-konstanz.de >> >> https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk >> >> >> > >> > > >
Hi Christian,
I have finally succeeded to reproduce the deadlock problem with a unitary test code in Java that you will find enclosed. I recommend to read the notes 1, 2 and 3 in DeadlockTest.java. I encountered difficulties to reproduce the problem as it happens only if the iterative query returns a minimum amount of data (see Note 3, line 69 in DeadlockTest.java).
The deadlock problem does not happen with the unmodified client. It seems normal as this client gets all results at one stroke and caches them.
But, I have to modify the client to avoid caching (to save memory). You will find enclosed the modified client. Four classes need to be changed : ClientQuery, ClientSession, Query and Session.
Best regards, Laurent
-----Message d'origine----- De : Christian Grün [mailto:christian.gruen@gmail.com] Envoyé : jeudi 27 octobre 2011 19:02 À : Laurent Chevalier Cc : basex-talk@mailman.uni-konstanz.de Objet : Re: [basex-talk] BaseX server deadlock
Laurent,
thanks for your feedback. I am wondering that the deadlock problem hasn't been fixed with 7.0, as all API database operations should now be atomic. Just in case.. Could you tell me if you are also encountering the locking issue with the unmodified client?
All the best, Christian _________________________________________
On Thu, Oct 27, 2011 at 6:51 PM, Laurent Chevalier l.chevalier@cyim.com wrote:
Hi Christian,
Thank you for your mail. I have updated my client code to use the new
BaseX 7.0 release last week. It was not too painful. I'm not caching the results. With my application, I still have the deadlock problem with BaseX 7.0 and so I'm still using the Lock class fix but I failed to reproduce the problem with a small test code in Java. So, it's not sure that the problem is coming from BaseX yet, it may come from my .NET client. Tomorrow, I will translate the Java test code in VB.NET and I keep you inform.
I you want, I can also send you an updated version of the ClientQuery
class that is not using cache.
Best regards, Laurent
-----Message d'origine----- De : Christian Grün [mailto:christian.gruen@gmail.com] Envoyé : jeudi 27 octobre 2011 00:27 À : Laurent Chevalier Cc : basex-talk@mailman.uni-konstanz.de Objet : Re: [basex-talk] BaseX server deadlock
Dear Laurent,
now that we've officially released our new iterator concept.. Have
you
been successful with optimizing the BaseX client for your system architecture? What are the current bottlenecks?
Christian ___________________________
On Tue, Sep 20, 2011 at 9:25 AM, Laurent Chevalier l.chevalier@cyim.com wrote:
Hi Christian,
Well, I know that the memory consumption is an issue as we are
already fighting with it in our current system. It's just our main issue... So, I will adjust the client code. It's good to have a performance improvement. I hope I will not have problem with reading data from the socket chunk by chunk for a long time.
Regards, Laurent
-----Message d'origine----- De : Christian Grün [mailto:christian.gruen@gmail.com] Envoyé : lundi 19 septembre 2011 22:12 À : Laurent Chevalier Cc : basex-talk@mailman.uni-konstanz.de Objet : Re: [basex-talk] BaseX server deadlock
Laurent,
thanks for the elaborate description of your system architecture.
I'm
still quite positive that our new architecture shouldn't
seriously
set
you back, and I'd claim that our caching architecture is pretty
memory
efficient, so I would suggest to first do some tests with the new iterator to evaluate if caching is the main issue (sorry for persisting; maybe you've already spent enough time in this
anyway).
If the client-side caching turns out to waste too many resources,
you
could easily adjust the light-weight client code to fit your
needs.
All you have to do is to directly interpret the incoming results,
and
skip the remaining results if you have finished querying (see [1]
for
the Java client). In both cases, querying should at least be much faster than before, and the client-based adjustments won't open
many
sophisticated issues that would have to be resolved server-side.
Hope this helps; more feedback is welcome, Christian
[1] https://github.com/BaseXdb/basex- api/blob/master/src/main/java/BaseXClient.java ___________________________
On Mon, Sep 19, 2011 at 7:04 PM, Laurent Chevalier l.chevalier@cyim.com wrote:
Hi Christian,
We are building web applications in MS .NET. The data is made
of a
hierarchy of containers. A container is a directory containing an
XML
file and attachments (static resources like pictures, videos,
etc.).
These containers are "indexed" in a database for better
performance.
Currently, we are using an SQL Server database with XQuery/XPath
and
fulltext search. I'm currently working on a new implementation
with
BaseX. Our goal is to simplify xquery writing (today, we have to
mix
xquery in sql queries which is a bit complicated), and, if
possible,
we
would like to get better performances.
The biggest database I have to deal with today counts around
25000
containers and continues to grow. It contains medical events
data,
html
articles, news, agenda, members directory, etc. The size of the
BaseX
database directory with indexes is 160 Mo.
We want to keep the database synchronized with the file system
hierarchy. For instance, if you manually add a container in the
file
system, you can launch a "re-indexing" process that will update
the
database automatically. For this process, I iterate over all
containers
in database, and check if it has to be updated or not. I'm using
an
iterative query for this. This query is very basic as it only
returns a
list of string (the identifiers of the containers) of 255
characters
max. But, if you multiply 255 by the number of containers, it's starting to do much.
We have other usages of iterative queries. Another example :
control
access data is not stored in the database. So, if I want, for
instance,
the first 10 accessible containers in a given website section, I
will
loop over the containers published in this section in the
database,
and
return results as soon as I have found 10 accessible containers, ignoring the remaining ones (provided by the BaseX query).
With SQL Server clients, we have an equivalent of BaseX
iterative
queries that avoid caching the whole request results. The memory consumption is a very serious issue for web applications in MS
.NET
or
Java.
With JDBC drivers, the fetch size can be set
(http://www.oracle.com/technetwork/database/enterprise- edition/memory.pdf). With PostgreSQL JDBC driver, cursors are
used
and
multiple queries may be fired to get all results (http://abhirama.wordpress.com/2009/01/07/postgresql-jdbc-and-
large-
result-sets/).
I think the iterative query without client caching (as it was
implemented in BaseX until version 6.7) was a really great
feature
and
addressed a very common memory consumption issue.
BTW, I'm exploring two ways of using BaseX :
- either in client/server mode : the client (web site)
communicates
with the BaseX server through TCP,
- or embedded : I have generated a .NET assembly (DLL) with
IKVM.NET
and thus I can embed BaseX in a .NET application.
The client/server mode would be used for portals. The embedded mode might be interested for single sites that do
not
share database with others.
I hope we'll find a good solution solving both the deadlock
issue
and
the client memory consumption issue.
Regards, Laurent
> -----Message d'origine----- > De : Christian Grün [mailto:christian.gruen@gmail.com] > Envoyé : lundi 19 septembre 2011 17:38 > À : Laurent Chevalier > Cc : basex-talk@mailman.uni-konstanz.de > Objet : Re: [basex-talk] BaseX server deadlock > > Hi Laurent, > > yes, the code has already been rewritten to reflect the new
Client
> API. As there were too many potential conflicts with the old
solution,
> this would have been happened sooner or later anyway. > > I'm sorry that you believe that the new solution might
conflict
with
> your existing architecture. I'd be interested in a few things
to
get
a
> better feeling if this problem cannot be solved in a different
way:
> > -- how much data do you iterate through (kb, mb or even more)? > -- how expensive are your queries? > -- note that the data will be cached by the client.. do you
use
the
> same machine for clients and servers? > -- I'd be interested in your first test results to see if your
worries
> get true.. As the data will be transferred much faster than
before
> (because of the single request to get the data), the new
architecture
> might turn out to be beneficial even in your case. Indeed I'm
quite
> convinced, after all, that most users will profit from the
changes.
> > Salutations, > Christian > > ___________________________ > > On Mon, Sep 19, 2011 at 5:16 PM, Laurent Chevalier > l.chevalier@cyim.com wrote: > > In fact, the changes have already done in version 6.8...
That's
a
> serious problem for me as we have to minimize the memory
consumption
of
> our web applications, that is already high. > > > > > >> -----Message d'origine----- > >> De : Christian Grün [mailto:christian.gruen@gmail.com] > >> Envoyé : lundi 19 septembre 2011 16:13 > >> À : Laurent Chevalier > >> Cc : basex-talk@mailman.uni-konstanz.de > >> Objet : Re: [basex-talk] BaseX server deadlock > >> > >> Hi Laurent, > >> > >> while I didn't manage to reproduce the deadlock that you
described a
> >> while ago, I came across some other potential scenarios in
which
our
> >> locking implementation could cause deadlocks. The simplest
example
> >> looks as follows: > >> > >> - Client1 creates an iterator and requests the first result > >> - Client2 sends an updating command > >> - Client1 requests no further results, thus blocking
Client2
> >> > >> Instead of modifying the delicate Lock algorithm itself, we
decided
> to > >> go one step further and rewrite our client architecture.
From
now
> on, > >> the clients are responsible for iterating through their
query
items,
> >> and an iterator request to the server triggers the complete > execution > >> and transmission of a query. This has several advantages: > >> > >> - The server will only perform atomic operations and is not > dependent > >> on the clients' behavior anymore > >> - The iterative evaluation of a query will only trigger a
single
> >> socket request, leading to a considerable speedup if
network
latency
> >> is high > >> > >> The obvious drawback is that intermediate results need to
be
cached.
> >> The most straightforward alternative to bypass this problem
is
to
> send > >> several queries to the server, or restrict the number of
iterated
> >> results in the XQuery expression if not all requested
results
are
> >> actually needed. > >> > >> We have added another Wiki page to better document our
server
> protocol > >> [1]. Next, I have closed the GitHub issue related to your
locking
> >> problem, as it should now be fixed as well. > >> > >> Hope this helps, > >> Christian > >> > >> [1] http://docs.basex.org/wiki/Server_Protocol > >> [2] https://github.com/BaseXdb/basex/issues/173 > >> > >> > >> > __________________________ > >> > > >> > On Mon, Aug 29, 2011 at 9:50 AM, Laurent Chevalier > >> l.chevalier@cyim.com wrote: > >> >> Hi, > >> >> > >> >> A deadlock occurs in the following situation: a first
client
> program > >> opens an iterative query. For each iteration, this program
does
some
> >> processing and sends another reading request to BaseX
(using
another
> >> BaseX session). All works fine until a second client
program
(or
> >> another thread) sends an updating command to BaseX (like
optimize
> for > >> instance). This locks BaseX server. To unlock it, you have
to
kill
> the > >> first program. > >> >> > >> >> I have read BaseX server code and found the reason for
this
> behavior > >> in the class org.basex.core.Lock: > >> >> - with the iterative query, there is always at least
one
reader
> >> alive (readers=1). > >> >> - when the updating query is received, it is put in the
queue
> >> (index 0) and remains in it as long as there is a reading
query
> running > >> (that is to say, as long as the iterative reading query is
running).
> >> >> - then a second reading request is received, it is put
in
the
> queue > >> (index 1 as there is already the updating query in the
queue).
As
it
> is > >> only the second item of the queue, it remains in the queue
as
long
> as > >> the first item in the queue (the updating query) has not
been
> processed > >> (BaseX processes the requests in the order of arrival, FIFO
queue).
> But > >> this first item can not be processed because there is the
iterative
> >> reading query running. All queries are thus locked. > >> >> > >> >> Some may say that we should not send another query while
we
are
> in > >> the loop of an iterative query but in our context of many
sites
> being > >> developed by several developers, it is possible that a
developer
> codes > >> this and we do not want BaseX to be locked in this case
(whatever
it
> is > >> a mistake of the developer or not). > >> >> > >> >> I have found a solution to this problem by modifying the > >> org.basex.core.Lock class. You will find my code hereafter.
I
do
not
> >> use a queue anymore and i use a static mutex (called
queueMutex)
to
> >> synchronize all pending queries (threads). The "drawback"
of
this
> >> solution is that the queries are not processed anymore in
the
order
> of > >> arrival but randomly. > >> >> > >> >> What do you think of this solution ? Do you plan to
update
BaseX
> >> locking mechanism ? > >> >> > >> >> I'm using BaseX 6.7.1 but I have seen that Lock.java has
not
been
> >> changed in BaseX 6.7.2. > >> >> > >> >> Here is my code : > >> >> > >> >> package org.basex.core; > >> >> > >> >> import java.util.Date; > >> >> //import java.util.LinkedList; > >> >> import java.util.Random; > >> >> > >> >> import org.basex.util.Util; > >> >> > >> >> /** > >> >> * Management of executing read/write processes. > >> >> * Supports multiple readers, limited by {@link > MainProp#PARALLEL}, > >> >> * and single writers (readers/writer lock). > >> >> * > >> >> * @author BaseX Team 2005-11, BSD License > >> >> * @author Christian Gruen > >> >> */ > >> >> final class Lock { > >> >> /** Queue for all waiting processes. */ > >> >> // private final LinkedList<Object> queue = new > >> LinkedList<Object>(); > >> >> /** Mutex object. */ > >> >> private final Object mutex = new Object(); > >> >> /** Database context. */ > >> >> private final Context ctx; > >> >> /** Static mutex used to synchronize all pending
queries.
**/
> >> >> private final static Object queueMutex = new Object(); > >> >> > >> >> /** Number of active readers. */ > >> >> private int readers; > >> >> /** Writer flag. */ > >> >> private boolean writer; > >> >> > >> >> /** > >> >> * Default constructor. > >> >> * @param c context > >> >> */ > >> >> Lock(final Context c) { > >> >> ctx = c; > >> >> } > >> >> > >> >> /** > >> >> * Modifications before executing a command. > >> >> * @param w writing flag > >> >> */ > >> >> void lock(final boolean w) { > >> >> synchronized(mutex) { > >> >> int code = new Random(new
Date().getTime()).nextInt();
> >> >> // final Object o = new Object(); > >> >> // queue.add(o); > >> >> > >> >> try { > >> >> while(true) { > >> >> synchronized(queueMutex) { > >> >> // if(o == queue.get(0) && !writer) { > >> >> if(!writer) { > >> >> if(w) { > >> >> if(readers == 0) { > >> >> writer = true; > >> >> break; > >> >> } > >> >> } else if(readers < > >> Math.max(ctx.mprop.num(MainProp.PARALLEL), 1)) { > >> >> ++readers; > >> >> break; > >> >> } > >> >> } > >> >> } > >> >> mutex.wait(); > >> >> } > >> >> } catch(final InterruptedException ex) { > >> >> Util.stack(ex); > >> >> } > >> >> > >> >> // queue.remove(0); > >> >> } > >> >> } > >> >> > >> >> /** > >> >> * Modifications after executing a command. > >> >> * @param w writing flag > >> >> */ > >> >> synchronized void unlock(final boolean w) { > >> >> synchronized(mutex) { > >> >> if(w) { > >> >> writer = false; > >> >> } else { > >> >> --readers; > >> >> } > >> >> mutex.notifyAll(); > >> >> } > >> >> } > >> >> } > >> >> _______________________________________________ > >> >> BaseX-Talk mailing list > >> >> BaseX-Talk@mailman.uni-konstanz.de > >> >> https://mailman.uni-konstanz.de/mailman/listinfo/basex-
talk
> >> >> > >> > > >> > > > > > > >
Laurent,
thanks for putting some effort into a reproducible test case (the example also causes a deadlock on my machine). Your modified client code basically runs into similar problems as the old iterative solution. What you would probably need to do is discard all pending results of an iterator before you launch a new updating query. There are several reasons why it's advisable to fetch all results before performing another update. One of them is that the internal database pointers used in one query might get invalid if an updating query is performed at the same time.
Another solution would be to first cache all query results on the server before they are sent over to the client. This means, however, that the whole query has to be evaluated before the results can be sent over the network, which would introduce another delay (next, the server-side caching might turn out to be a memory hog if numerous clients communicate with the server at the same time, or don't even fetch their results).
Maybe a related question: What are your criterias for canceling an iterative query? In other words, could you decide how many query result are needed before executing a query?
Christian ___________________________
On Fri, Oct 28, 2011 at 4:00 PM, Laurent Chevalier l.chevalier@cyim.com wrote:
Hi Christian,
I have finally succeeded to reproduce the deadlock problem with a unitary test code in Java that you will find enclosed. I recommend to read the notes 1, 2 and 3 in DeadlockTest.java. I encountered difficulties to reproduce the problem as it happens only if the iterative query returns a minimum amount of data (see Note 3, line 69 in DeadlockTest.java).
The deadlock problem does not happen with the unmodified client. It seems normal as this client gets all results at one stroke and caches them.
But, I have to modify the client to avoid caching (to save memory). You will find enclosed the modified client. Four classes need to be changed : ClientQuery, ClientSession, Query and Session.
Best regards, Laurent
-----Message d'origine----- De : Christian Grün [mailto:christian.gruen@gmail.com] Envoyé : jeudi 27 octobre 2011 19:02 À : Laurent Chevalier Cc : basex-talk@mailman.uni-konstanz.de Objet : Re: [basex-talk] BaseX server deadlock
Laurent,
thanks for your feedback. I am wondering that the deadlock problem hasn't been fixed with 7.0, as all API database operations should now be atomic. Just in case.. Could you tell me if you are also encountering the locking issue with the unmodified client?
All the best, Christian _________________________________________
On Thu, Oct 27, 2011 at 6:51 PM, Laurent Chevalier l.chevalier@cyim.com wrote:
Hi Christian,
Thank you for your mail. I have updated my client code to use the new
BaseX 7.0 release last week. It was not too painful. I'm not caching the results. With my application, I still have the deadlock problem with BaseX 7.0 and so I'm still using the Lock class fix but I failed to reproduce the problem with a small test code in Java. So, it's not sure that the problem is coming from BaseX yet, it may come from my .NET client. Tomorrow, I will translate the Java test code in VB.NET and I keep you inform.
I you want, I can also send you an updated version of the ClientQuery
class that is not using cache.
Best regards, Laurent
-----Message d'origine----- De : Christian Grün [mailto:christian.gruen@gmail.com] Envoyé : jeudi 27 octobre 2011 00:27 À : Laurent Chevalier Cc : basex-talk@mailman.uni-konstanz.de Objet : Re: [basex-talk] BaseX server deadlock
Dear Laurent,
now that we've officially released our new iterator concept.. Have
you
been successful with optimizing the BaseX client for your system architecture? What are the current bottlenecks?
Christian ___________________________
On Tue, Sep 20, 2011 at 9:25 AM, Laurent Chevalier l.chevalier@cyim.com wrote:
Hi Christian,
Well, I know that the memory consumption is an issue as we are
already fighting with it in our current system. It's just our main issue... So, I will adjust the client code. It's good to have a performance improvement. I hope I will not have problem with reading data from the socket chunk by chunk for a long time.
Regards, Laurent
-----Message d'origine----- De : Christian Grün [mailto:christian.gruen@gmail.com] Envoyé : lundi 19 septembre 2011 22:12 À : Laurent Chevalier Cc : basex-talk@mailman.uni-konstanz.de Objet : Re: [basex-talk] BaseX server deadlock
Laurent,
thanks for the elaborate description of your system architecture.
I'm
still quite positive that our new architecture shouldn't
seriously
set
you back, and I'd claim that our caching architecture is pretty
memory
efficient, so I would suggest to first do some tests with the new iterator to evaluate if caching is the main issue (sorry for persisting; maybe you've already spent enough time in this
anyway).
If the client-side caching turns out to waste too many resources,
you
could easily adjust the light-weight client code to fit your
needs.
All you have to do is to directly interpret the incoming results,
and
skip the remaining results if you have finished querying (see [1]
for
the Java client). In both cases, querying should at least be much faster than before, and the client-based adjustments won't open
many
sophisticated issues that would have to be resolved server-side.
Hope this helps; more feedback is welcome, Christian
[1] https://github.com/BaseXdb/basex- api/blob/master/src/main/java/BaseXClient.java ___________________________
On Mon, Sep 19, 2011 at 7:04 PM, Laurent Chevalier l.chevalier@cyim.com wrote: > Hi Christian, > > We are building web applications in MS .NET. The data is made
of a
hierarchy of containers. A container is a directory containing an
XML
file and attachments (static resources like pictures, videos,
etc.).
These containers are "indexed" in a database for better
performance.
Currently, we are using an SQL Server database with XQuery/XPath
and
fulltext search. I'm currently working on a new implementation
with
BaseX. Our goal is to simplify xquery writing (today, we have to
mix
xquery in sql queries which is a bit complicated), and, if
possible,
we
would like to get better performances. > > The biggest database I have to deal with today counts around
25000
containers and continues to grow. It contains medical events
data,
html
articles, news, agenda, members directory, etc. The size of the
BaseX
database directory with indexes is 160 Mo. > > We want to keep the database synchronized with the file system hierarchy. For instance, if you manually add a container in the
file
system, you can launch a "re-indexing" process that will update
the
database automatically. For this process, I iterate over all
containers
in database, and check if it has to be updated or not. I'm using
an
iterative query for this. This query is very basic as it only
returns a
list of string (the identifiers of the containers) of 255
characters
max. But, if you multiply 255 by the number of containers, it's starting to do much. > > We have other usages of iterative queries. Another example :
control
access data is not stored in the database. So, if I want, for
instance,
the first 10 accessible containers in a given website section, I
will
loop over the containers published in this section in the
database,
and
return results as soon as I have found 10 accessible containers, ignoring the remaining ones (provided by the BaseX query). > > With SQL Server clients, we have an equivalent of BaseX
iterative
queries that avoid caching the whole request results. The memory consumption is a very serious issue for web applications in MS
.NET
or
Java. > > With JDBC drivers, the fetch size can be set (http://www.oracle.com/technetwork/database/enterprise- edition/memory.pdf). With PostgreSQL JDBC driver, cursors are
used
and
multiple queries may be fired to get all results (http://abhirama.wordpress.com/2009/01/07/postgresql-jdbc-and-
large-
result-sets/). > > I think the iterative query without client caching (as it was implemented in BaseX until version 6.7) was a really great
feature
and
addressed a very common memory consumption issue. > > BTW, I'm exploring two ways of using BaseX : > - either in client/server mode : the client (web site)
communicates
with the BaseX server through TCP, > - or embedded : I have generated a .NET assembly (DLL) with
IKVM.NET
and thus I can embed BaseX in a .NET application. > > The client/server mode would be used for portals. > The embedded mode might be interested for single sites that do
not
share database with others. > > I hope we'll find a good solution solving both the deadlock
issue
and
the client memory consumption issue. > > Regards, > Laurent > >> -----Message d'origine----- >> De : Christian Grün [mailto:christian.gruen@gmail.com] >> Envoyé : lundi 19 septembre 2011 17:38 >> À : Laurent Chevalier >> Cc : basex-talk@mailman.uni-konstanz.de >> Objet : Re: [basex-talk] BaseX server deadlock >> >> Hi Laurent, >> >> yes, the code has already been rewritten to reflect the new
Client
>> API. As there were too many potential conflicts with the old solution, >> this would have been happened sooner or later anyway. >> >> I'm sorry that you believe that the new solution might
conflict
with
>> your existing architecture. I'd be interested in a few things
to
get
a >> better feeling if this problem cannot be solved in a different
way:
>> >> -- how much data do you iterate through (kb, mb or even more)? >> -- how expensive are your queries? >> -- note that the data will be cached by the client.. do you
use
the
>> same machine for clients and servers? >> -- I'd be interested in your first test results to see if your worries >> get true.. As the data will be transferred much faster than
before
>> (because of the single request to get the data), the new architecture >> might turn out to be beneficial even in your case. Indeed I'm
quite
>> convinced, after all, that most users will profit from the
changes.
>> >> Salutations, >> Christian >> >> ___________________________ >> >> On Mon, Sep 19, 2011 at 5:16 PM, Laurent Chevalier >> l.chevalier@cyim.com wrote: >> > In fact, the changes have already done in version 6.8...
That's
a
>> serious problem for me as we have to minimize the memory
consumption
of >> our web applications, that is already high. >> > >> > >> >> -----Message d'origine----- >> >> De : Christian Grün [mailto:christian.gruen@gmail.com] >> >> Envoyé : lundi 19 septembre 2011 16:13 >> >> À : Laurent Chevalier >> >> Cc : basex-talk@mailman.uni-konstanz.de >> >> Objet : Re: [basex-talk] BaseX server deadlock >> >> >> >> Hi Laurent, >> >> >> >> while I didn't manage to reproduce the deadlock that you described a >> >> while ago, I came across some other potential scenarios in
which
our >> >> locking implementation could cause deadlocks. The simplest example >> >> looks as follows: >> >> >> >> - Client1 creates an iterator and requests the first result >> >> - Client2 sends an updating command >> >> - Client1 requests no further results, thus blocking
Client2
>> >> >> >> Instead of modifying the delicate Lock algorithm itself, we decided >> to >> >> go one step further and rewrite our client architecture.
From
now
>> on, >> >> the clients are responsible for iterating through their
query
items, >> >> and an iterator request to the server triggers the complete >> execution >> >> and transmission of a query. This has several advantages: >> >> >> >> - The server will only perform atomic operations and is not >> dependent >> >> on the clients' behavior anymore >> >> - The iterative evaluation of a query will only trigger a
single
>> >> socket request, leading to a considerable speedup if
network
latency >> >> is high >> >> >> >> The obvious drawback is that intermediate results need to
be
cached. >> >> The most straightforward alternative to bypass this problem
is
to
>> send >> >> several queries to the server, or restrict the number of
iterated
>> >> results in the XQuery expression if not all requested
results
are
>> >> actually needed. >> >> >> >> We have added another Wiki page to better document our
server
>> protocol >> >> [1]. Next, I have closed the GitHub issue related to your
locking
>> >> problem, as it should now be fixed as well. >> >> >> >> Hope this helps, >> >> Christian >> >> >> >> [1] http://docs.basex.org/wiki/Server_Protocol >> >> [2] https://github.com/BaseXdb/basex/issues/173 >> >> >> >> >> >> > __________________________ >> >> > >> >> > On Mon, Aug 29, 2011 at 9:50 AM, Laurent Chevalier >> >> l.chevalier@cyim.com wrote: >> >> >> Hi, >> >> >> >> >> >> A deadlock occurs in the following situation: a first
client
>> program >> >> opens an iterative query. For each iteration, this program
does
some >> >> processing and sends another reading request to BaseX
(using
another >> >> BaseX session). All works fine until a second client
program
(or
>> >> another thread) sends an updating command to BaseX (like
optimize
>> for >> >> instance). This locks BaseX server. To unlock it, you have
to
kill >> the >> >> first program. >> >> >> >> >> >> I have read BaseX server code and found the reason for
this
>> behavior >> >> in the class org.basex.core.Lock: >> >> >> - with the iterative query, there is always at least
one
reader >> >> alive (readers=1). >> >> >> - when the updating query is received, it is put in the
queue
>> >> (index 0) and remains in it as long as there is a reading
query
>> running >> >> (that is to say, as long as the iterative reading query is running). >> >> >> - then a second reading request is received, it is put
in
the
>> queue >> >> (index 1 as there is already the updating query in the
queue).
As
it >> is >> >> only the second item of the queue, it remains in the queue
as
long >> as >> >> the first item in the queue (the updating query) has not
been
>> processed >> >> (BaseX processes the requests in the order of arrival, FIFO queue). >> But >> >> this first item can not be processed because there is the iterative >> >> reading query running. All queries are thus locked. >> >> >> >> >> >> Some may say that we should not send another query while
we
are >> in >> >> the loop of an iterative query but in our context of many
sites
>> being >> >> developed by several developers, it is possible that a
developer
>> codes >> >> this and we do not want BaseX to be locked in this case
(whatever
it >> is >> >> a mistake of the developer or not). >> >> >> >> >> >> I have found a solution to this problem by modifying the >> >> org.basex.core.Lock class. You will find my code hereafter.
I
do
not >> >> use a queue anymore and i use a static mutex (called
queueMutex)
to >> >> synchronize all pending queries (threads). The "drawback"
of
this
>> >> solution is that the queries are not processed anymore in
the
order >> of >> >> arrival but randomly. >> >> >> >> >> >> What do you think of this solution ? Do you plan to
update
BaseX >> >> locking mechanism ? >> >> >> >> >> >> I'm using BaseX 6.7.1 but I have seen that Lock.java has
not
been >> >> changed in BaseX 6.7.2. >> >> >> >> >> >> Here is my code : >> >> >> >> >> >> package org.basex.core; >> >> >> >> >> >> import java.util.Date; >> >> >> //import java.util.LinkedList; >> >> >> import java.util.Random; >> >> >> >> >> >> import org.basex.util.Util; >> >> >> >> >> >> /** >> >> >> * Management of executing read/write processes. >> >> >> * Supports multiple readers, limited by {@link >> MainProp#PARALLEL}, >> >> >> * and single writers (readers/writer lock). >> >> >> * >> >> >> * @author BaseX Team 2005-11, BSD License >> >> >> * @author Christian Gruen >> >> >> */ >> >> >> final class Lock { >> >> >> /** Queue for all waiting processes. */ >> >> >> // private final LinkedList<Object> queue = new >> >> LinkedList<Object>(); >> >> >> /** Mutex object. */ >> >> >> private final Object mutex = new Object(); >> >> >> /** Database context. */ >> >> >> private final Context ctx; >> >> >> /** Static mutex used to synchronize all pending
queries.
**/
>> >> >> private final static Object queueMutex = new Object(); >> >> >> >> >> >> /** Number of active readers. */ >> >> >> private int readers; >> >> >> /** Writer flag. */ >> >> >> private boolean writer; >> >> >> >> >> >> /** >> >> >> * Default constructor. >> >> >> * @param c context >> >> >> */ >> >> >> Lock(final Context c) { >> >> >> ctx = c; >> >> >> } >> >> >> >> >> >> /** >> >> >> * Modifications before executing a command. >> >> >> * @param w writing flag >> >> >> */ >> >> >> void lock(final boolean w) { >> >> >> synchronized(mutex) { >> >> >> int code = new Random(new
Date().getTime()).nextInt();
>> >> >> // final Object o = new Object(); >> >> >> // queue.add(o); >> >> >> >> >> >> try { >> >> >> while(true) { >> >> >> synchronized(queueMutex) { >> >> >> // if(o == queue.get(0) && !writer) { >> >> >> if(!writer) { >> >> >> if(w) { >> >> >> if(readers == 0) { >> >> >> writer = true; >> >> >> break; >> >> >> } >> >> >> } else if(readers < >> >> Math.max(ctx.mprop.num(MainProp.PARALLEL), 1)) { >> >> >> ++readers; >> >> >> break; >> >> >> } >> >> >> } >> >> >> } >> >> >> mutex.wait(); >> >> >> } >> >> >> } catch(final InterruptedException ex) { >> >> >> Util.stack(ex); >> >> >> } >> >> >> >> >> >> // queue.remove(0); >> >> >> } >> >> >> } >> >> >> >> >> >> /** >> >> >> * Modifications after executing a command. >> >> >> * @param w writing flag >> >> >> */ >> >> >> synchronized void unlock(final boolean w) { >> >> >> synchronized(mutex) { >> >> >> if(w) { >> >> >> writer = false; >> >> >> } else { >> >> >> --readers; >> >> >> } >> >> >> mutex.notifyAll(); >> >> >> } >> >> >> } >> >> >> } >> >> >> _______________________________________________ >> >> >> BaseX-Talk mailing list >> >> >> BaseX-Talk@mailman.uni-konstanz.de >> >> >> https://mailman.uni-konstanz.de/mailman/listinfo/basex-
talk
>> >> >> >> >> > >> >> >> > >> > >> > >> > > >
Hi Christian,
Actually, my real code does not send anymore any query within the iterative query loop. I can inform other developers but, for me, it should not be possible to deadlock the BaseX server whatever the developers do. We intend to use BaseX in a production environment.
I don't know BaseX enough to tell whether my fix of the Lock class can be take into account or not. But, with this fix, the updating queries are still lock if reading queries are running. So, there would not be any problem with the internal database pointers you mention in your mail. Just to be sure that we are talking of the same code, I send you again my fix (working with BaseX 7.0).
To answer your last question regarding the cancellation of an iterative query, there are two use cases : - when possible, in the query, we use position() or subsequence() to get only a part of the sequence and so we iterate over all results of the iterative query, - but, sometimes, we have to iterate until we get a given number of items depending on a condition that can not be included in the xquery because the condition depends on data that is not store in the database (user rights in our case).
Best regards, Laurent
package org.basex.core;
import java.util.LinkedList; import org.basex.util.Util;
/** * Management of executing read/write processes. * Supports multiple readers, limited by {@link MainProp#PARALLEL}, * and single writers (readers/writer lock). * * @author BaseX Team 2005-11, BSD License * @author Christian Gruen */ final class Lock { /** Queue for all waiting processes. */ private final LinkedList<LockedCommand> queue = new LinkedList<LockedCommand>(); /** Mutex object. */ private final Object mutex = new Object(); /** Database context. */ private final Context ctx;
/** Number of active readers. */ private int readers; /** Writer flag. */ private boolean writer;
/** * Default constructor. * @param c context */ Lock(final Context c) { ctx = c; }
/** * Tells whether this command can be processed now or not. * @param o Pending command. * @return Permission to run command. */ private boolean ok(LockedCommand o) { synchronized(mutex) { if ( writer ) return false; for(int i=0; i<queue.size(); i++) { if(o == queue.get(i)) { if(o.writer) { if(readers == 0) { writer = true; return true; } } else if(readers < Math.max(ctx.mprop.num(MainProp.PARALLEL), 1)) { ++readers; return true; } } } return false; } }
/** * Modifications before executing a command. * @param w writing flag */ void lock(final boolean w) { synchronized(mutex) { final LockedCommand o = new LockedCommand(w); queue.add(o);
try { while(true) { if ( ok(o) ) break; mutex.wait(); } } catch(final InterruptedException ex) { Util.stack(ex); }
synchronized(mutex) { queue.remove(o); } } }
/** * Modifications after executing a command. * @param w writing flag */ synchronized void unlock(final boolean w) { synchronized(mutex) { if(w) { writer = false; } else { --readers; } mutex.notifyAll(); } } }
package org.basex.core;
/** * Locked command stored in Lock class queue. * @author Laurent Chevalier */ final class LockedCommand { /** Writer flag. */ public boolean writer;
/** * Default constructor. * @param w Writer flag. Tells whether it is an updating command (true) or not (false). */ LockedCommand(final boolean w) { this.writer = w; } }
-----Message d'origine----- De : Christian Grün [mailto:christian.gruen@gmail.com] Envoyé : vendredi 28 octobre 2011 18:26 À : Laurent Chevalier Cc : basex-talk@mailman.uni-konstanz.de Objet : Re: [basex-talk] BaseX server deadlock
Laurent,
thanks for putting some effort into a reproducible test case (the example also causes a deadlock on my machine). Your modified client code basically runs into similar problems as the old iterative solution. What you would probably need to do is discard all pending results of an iterator before you launch a new updating query. There are several reasons why it's advisable to fetch all results before performing another update. One of them is that the internal database pointers used in one query might get invalid if an updating query is performed at the same time.
Another solution would be to first cache all query results on the server before they are sent over to the client. This means, however, that the whole query has to be evaluated before the results can be sent over the network, which would introduce another delay (next, the server-side caching might turn out to be a memory hog if numerous clients communicate with the server at the same time, or don't even fetch their results).
Maybe a related question: What are your criterias for canceling an iterative query? In other words, could you decide how many query result are needed before executing a query?
Christian ___________________________
On Fri, Oct 28, 2011 at 4:00 PM, Laurent Chevalier l.chevalier@cyim.com wrote:
Hi Christian,
I have finally succeeded to reproduce the deadlock problem with a
unitary test code in Java that you will find enclosed. I recommend to read the notes 1, 2 and 3 in DeadlockTest.java. I encountered difficulties to reproduce the problem as it happens only if the iterative query returns a minimum amount of data (see Note 3, line 69 in DeadlockTest.java).
The deadlock problem does not happen with the unmodified client. It
seems normal as this client gets all results at one stroke and caches them.
But, I have to modify the client to avoid caching (to save memory).
You will find enclosed the modified client. Four classes need to be changed : ClientQuery, ClientSession, Query and Session.
Best regards, Laurent
-----Message d'origine----- De : Christian Grün [mailto:christian.gruen@gmail.com] Envoyé : jeudi 27 octobre 2011 19:02 À : Laurent Chevalier Cc : basex-talk@mailman.uni-konstanz.de Objet : Re: [basex-talk] BaseX server deadlock
Laurent,
thanks for your feedback. I am wondering that the deadlock problem hasn't been fixed with 7.0, as all API database operations should
now
be atomic. Just in case.. Could you tell me if you are also encountering the locking issue with the unmodified client?
All the best, Christian _________________________________________
On Thu, Oct 27, 2011 at 6:51 PM, Laurent Chevalier l.chevalier@cyim.com wrote:
Hi Christian,
Thank you for your mail. I have updated my client code to use the
new
BaseX 7.0 release last week. It was not too painful. I'm not caching the results. With my application, I still have the deadlock problem with BaseX 7.0 and so I'm still using the Lock class fix but I
failed
to reproduce the problem with a small test code in Java. So, it's
not
sure that the problem is coming from BaseX yet, it may come from my .NET client. Tomorrow, I will translate the Java test code in VB.NET and I keep you inform.
I you want, I can also send you an updated version of the
ClientQuery
class that is not using cache.
Best regards, Laurent
-----Message d'origine----- De : Christian Grün [mailto:christian.gruen@gmail.com] Envoyé : jeudi 27 octobre 2011 00:27 À : Laurent Chevalier Cc : basex-talk@mailman.uni-konstanz.de Objet : Re: [basex-talk] BaseX server deadlock
Dear Laurent,
now that we've officially released our new iterator concept..
Have
you
been successful with optimizing the BaseX client for your system architecture? What are the current bottlenecks?
Christian ___________________________
On Tue, Sep 20, 2011 at 9:25 AM, Laurent Chevalier l.chevalier@cyim.com wrote:
Hi Christian,
Well, I know that the memory consumption is an issue as we are
already fighting with it in our current system. It's just our
main
issue... So, I will adjust the client code. It's good to have a performance improvement. I hope I will not have problem with
reading
data from the socket chunk by chunk for a long time.
Regards, Laurent
> -----Message d'origine----- > De : Christian Grün [mailto:christian.gruen@gmail.com] > Envoyé : lundi 19 septembre 2011 22:12 > À : Laurent Chevalier > Cc : basex-talk@mailman.uni-konstanz.de > Objet : Re: [basex-talk] BaseX server deadlock > > Laurent, > > thanks for the elaborate description of your system
architecture.
I'm
> still quite positive that our new architecture shouldn't
seriously
set
> you back, and I'd claim that our caching architecture is
pretty
memory
> efficient, so I would suggest to first do some tests with the
new
> iterator to evaluate if caching is the main issue (sorry for > persisting; maybe you've already spent enough time in this
anyway).
> > If the client-side caching turns out to waste too many
resources,
you
> could easily adjust the light-weight client code to fit your
needs.
> All you have to do is to directly interpret the incoming
results,
and
> skip the remaining results if you have finished querying (see
[1]
for
> the Java client). In both cases, querying should at least be
much
> faster than before, and the client-based adjustments won't
open
many
> sophisticated issues that would have to be resolved server-
side.
> > Hope this helps; more feedback is welcome, > Christian > > [1] https://github.com/BaseXdb/basex- > api/blob/master/src/main/java/BaseXClient.java > ___________________________ > > On Mon, Sep 19, 2011 at 7:04 PM, Laurent Chevalier > l.chevalier@cyim.com wrote: > > Hi Christian, > > > > We are building web applications in MS .NET. The data is
made
of a
> hierarchy of containers. A container is a directory containing
an
XML
> file and attachments (static resources like pictures, videos,
etc.).
> These containers are "indexed" in a database for better
performance.
> Currently, we are using an SQL Server database with
XQuery/XPath
and
> fulltext search. I'm currently working on a new implementation
with
> BaseX. Our goal is to simplify xquery writing (today, we have
to
mix
> xquery in sql queries which is a bit complicated), and, if
possible,
we
> would like to get better performances. > > > > The biggest database I have to deal with today counts around
25000
> containers and continues to grow. It contains medical events
data,
html
> articles, news, agenda, members directory, etc. The size of
the
BaseX
> database directory with indexes is 160 Mo. > > > > We want to keep the database synchronized with the file
system
> hierarchy. For instance, if you manually add a container in
the
file
> system, you can launch a "re-indexing" process that will
update
the
> database automatically. For this process, I iterate over all
containers
> in database, and check if it has to be updated or not. I'm
using
an
> iterative query for this. This query is very basic as it only
returns a
> list of string (the identifiers of the containers) of 255
characters
> max. But, if you multiply 255 by the number of containers,
it's
> starting to do much. > > > > We have other usages of iterative queries. Another example :
control
> access data is not stored in the database. So, if I want, for
instance,
> the first 10 accessible containers in a given website section,
I
will
> loop over the containers published in this section in the
database,
and
> return results as soon as I have found 10 accessible
containers,
> ignoring the remaining ones (provided by the BaseX query). > > > > With SQL Server clients, we have an equivalent of BaseX
iterative
> queries that avoid caching the whole request results. The
memory
> consumption is a very serious issue for web applications in MS
.NET
or
> Java. > > > > With JDBC drivers, the fetch size can be set > (http://www.oracle.com/technetwork/database/enterprise- > edition/memory.pdf). With PostgreSQL JDBC driver, cursors are
used
and
> multiple queries may be fired to get all results > (http://abhirama.wordpress.com/2009/01/07/postgresql-jdbc-and-
large-
> result-sets/). > > > > I think the iterative query without client caching (as it
was
> implemented in BaseX until version 6.7) was a really great
feature
and
> addressed a very common memory consumption issue. > > > > BTW, I'm exploring two ways of using BaseX : > > - either in client/server mode : the client (web site)
communicates
> with the BaseX server through TCP, > > - or embedded : I have generated a .NET assembly (DLL) with
IKVM.NET
> and thus I can embed BaseX in a .NET application. > > > > The client/server mode would be used for portals. > > The embedded mode might be interested for single sites that
do
not
> share database with others. > > > > I hope we'll find a good solution solving both the deadlock
issue
and
> the client memory consumption issue. > > > > Regards, > > Laurent > > > >> -----Message d'origine----- > >> De : Christian Grün [mailto:christian.gruen@gmail.com] > >> Envoyé : lundi 19 septembre 2011 17:38 > >> À : Laurent Chevalier > >> Cc : basex-talk@mailman.uni-konstanz.de > >> Objet : Re: [basex-talk] BaseX server deadlock > >> > >> Hi Laurent, > >> > >> yes, the code has already been rewritten to reflect the new
Client
> >> API. As there were too many potential conflicts with the
old
> solution, > >> this would have been happened sooner or later anyway. > >> > >> I'm sorry that you believe that the new solution might
conflict
with
> >> your existing architecture. I'd be interested in a few
things
to
get
> a > >> better feeling if this problem cannot be solved in a
different
way:
> >> > >> -- how much data do you iterate through (kb, mb or even
more)?
> >> -- how expensive are your queries? > >> -- note that the data will be cached by the client.. do you
use
the
> >> same machine for clients and servers? > >> -- I'd be interested in your first test results to see if
your
> worries > >> get true.. As the data will be transferred much faster than
before
> >> (because of the single request to get the data), the new > architecture > >> might turn out to be beneficial even in your case. Indeed
I'm
quite
> >> convinced, after all, that most users will profit from the
changes.
> >> > >> Salutations, > >> Christian > >> > >> ___________________________ > >> > >> On Mon, Sep 19, 2011 at 5:16 PM, Laurent Chevalier > >> l.chevalier@cyim.com wrote: > >> > In fact, the changes have already done in version 6.8...
That's
a
> >> serious problem for me as we have to minimize the memory
consumption
> of > >> our web applications, that is already high. > >> > > >> > > >> >> -----Message d'origine----- > >> >> De : Christian Grün [mailto:christian.gruen@gmail.com] > >> >> Envoyé : lundi 19 septembre 2011 16:13 > >> >> À : Laurent Chevalier > >> >> Cc : basex-talk@mailman.uni-konstanz.de > >> >> Objet : Re: [basex-talk] BaseX server deadlock > >> >> > >> >> Hi Laurent, > >> >> > >> >> while I didn't manage to reproduce the deadlock that you > described a > >> >> while ago, I came across some other potential scenarios
in
which
> our > >> >> locking implementation could cause deadlocks. The
simplest
> example > >> >> looks as follows: > >> >> > >> >> - Client1 creates an iterator and requests the first
result
> >> >> - Client2 sends an updating command > >> >> - Client1 requests no further results, thus blocking
Client2
> >> >> > >> >> Instead of modifying the delicate Lock algorithm itself,
we
> decided > >> to > >> >> go one step further and rewrite our client architecture.
From
now
> >> on, > >> >> the clients are responsible for iterating through their
query
> items, > >> >> and an iterator request to the server triggers the
complete
> >> execution > >> >> and transmission of a query. This has several
advantages:
> >> >> > >> >> - The server will only perform atomic operations and is
not
> >> dependent > >> >> on the clients' behavior anymore > >> >> - The iterative evaluation of a query will only trigger
a
single
> >> >> socket request, leading to a considerable speedup if
network
> latency > >> >> is high > >> >> > >> >> The obvious drawback is that intermediate results need
to
be
> cached. > >> >> The most straightforward alternative to bypass this
problem
is
to
> >> send > >> >> several queries to the server, or restrict the number of
iterated
> >> >> results in the XQuery expression if not all requested
results
are
> >> >> actually needed. > >> >> > >> >> We have added another Wiki page to better document our
server
> >> protocol > >> >> [1]. Next, I have closed the GitHub issue related to
your
locking
> >> >> problem, as it should now be fixed as well. > >> >> > >> >> Hope this helps, > >> >> Christian > >> >> > >> >> [1] http://docs.basex.org/wiki/Server_Protocol > >> >> [2] https://github.com/BaseXdb/basex/issues/173 > >> >> > >> >> > >> >> > __________________________ > >> >> > > >> >> > On Mon, Aug 29, 2011 at 9:50 AM, Laurent Chevalier > >> >> l.chevalier@cyim.com wrote: > >> >> >> Hi, > >> >> >> > >> >> >> A deadlock occurs in the following situation: a first
client
> >> program > >> >> opens an iterative query. For each iteration, this
program
does
> some > >> >> processing and sends another reading request to BaseX
(using
> another > >> >> BaseX session). All works fine until a second client
program
(or
> >> >> another thread) sends an updating command to BaseX (like
optimize
> >> for > >> >> instance). This locks BaseX server. To unlock it, you
have
to
> kill > >> the > >> >> first program. > >> >> >> > >> >> >> I have read BaseX server code and found the reason
for
this
> >> behavior > >> >> in the class org.basex.core.Lock: > >> >> >> - with the iterative query, there is always at least
one
> reader > >> >> alive (readers=1). > >> >> >> - when the updating query is received, it is put in
the
queue
> >> >> (index 0) and remains in it as long as there is a
reading
query
> >> running > >> >> (that is to say, as long as the iterative reading query
is
> running). > >> >> >> - then a second reading request is received, it is
put
in
the
> >> queue > >> >> (index 1 as there is already the updating query in the
queue).
As
> it > >> is > >> >> only the second item of the queue, it remains in the
queue
as
> long > >> as > >> >> the first item in the queue (the updating query) has not
been
> >> processed > >> >> (BaseX processes the requests in the order of arrival,
FIFO
> queue). > >> But > >> >> this first item can not be processed because there is
the
> iterative > >> >> reading query running. All queries are thus locked. > >> >> >> > >> >> >> Some may say that we should not send another query
while
we
> are > >> in > >> >> the loop of an iterative query but in our context of
many
sites
> >> being > >> >> developed by several developers, it is possible that a
developer
> >> codes > >> >> this and we do not want BaseX to be locked in this case
(whatever
> it > >> is > >> >> a mistake of the developer or not). > >> >> >> > >> >> >> I have found a solution to this problem by modifying
the
> >> >> org.basex.core.Lock class. You will find my code
hereafter.
I
do
> not > >> >> use a queue anymore and i use a static mutex (called
queueMutex)
> to > >> >> synchronize all pending queries (threads). The
"drawback"
of
this
> >> >> solution is that the queries are not processed anymore
in
the
> order > >> of > >> >> arrival but randomly. > >> >> >> > >> >> >> What do you think of this solution ? Do you plan to
update
> BaseX > >> >> locking mechanism ? > >> >> >> > >> >> >> I'm using BaseX 6.7.1 but I have seen that Lock.java
has
not
> been > >> >> changed in BaseX 6.7.2. > >> >> >> > >> >> >> Here is my code : > >> >> >> > >> >> >> package org.basex.core; > >> >> >> > >> >> >> import java.util.Date; > >> >> >> //import java.util.LinkedList; > >> >> >> import java.util.Random; > >> >> >> > >> >> >> import org.basex.util.Util; > >> >> >> > >> >> >> /** > >> >> >> * Management of executing read/write processes. > >> >> >> * Supports multiple readers, limited by {@link > >> MainProp#PARALLEL}, > >> >> >> * and single writers (readers/writer lock). > >> >> >> * > >> >> >> * @author BaseX Team 2005-11, BSD License > >> >> >> * @author Christian Gruen > >> >> >> */ > >> >> >> final class Lock { > >> >> >> /** Queue for all waiting processes. */ > >> >> >> // private final LinkedList<Object> queue = new > >> >> LinkedList<Object>(); > >> >> >> /** Mutex object. */ > >> >> >> private final Object mutex = new Object(); > >> >> >> /** Database context. */ > >> >> >> private final Context ctx; > >> >> >> /** Static mutex used to synchronize all pending
queries.
**/
> >> >> >> private final static Object queueMutex = new
Object();
> >> >> >> > >> >> >> /** Number of active readers. */ > >> >> >> private int readers; > >> >> >> /** Writer flag. */ > >> >> >> private boolean writer; > >> >> >> > >> >> >> /** > >> >> >> * Default constructor. > >> >> >> * @param c context > >> >> >> */ > >> >> >> Lock(final Context c) { > >> >> >> ctx = c; > >> >> >> } > >> >> >> > >> >> >> /** > >> >> >> * Modifications before executing a command. > >> >> >> * @param w writing flag > >> >> >> */ > >> >> >> void lock(final boolean w) { > >> >> >> synchronized(mutex) { > >> >> >> int code = new Random(new
Date().getTime()).nextInt();
> >> >> >> // final Object o = new Object(); > >> >> >> // queue.add(o); > >> >> >> > >> >> >> try { > >> >> >> while(true) { > >> >> >> synchronized(queueMutex) { > >> >> >> // if(o == queue.get(0) && !writer) { > >> >> >> if(!writer) { > >> >> >> if(w) { > >> >> >> if(readers == 0) { > >> >> >> writer = true; > >> >> >> break; > >> >> >> } > >> >> >> } else if(readers < > >> >> Math.max(ctx.mprop.num(MainProp.PARALLEL), 1)) { > >> >> >> ++readers; > >> >> >> break; > >> >> >> } > >> >> >> } > >> >> >> } > >> >> >> mutex.wait(); > >> >> >> } > >> >> >> } catch(final InterruptedException ex) { > >> >> >> Util.stack(ex); > >> >> >> } > >> >> >> > >> >> >> // queue.remove(0); > >> >> >> } > >> >> >> } > >> >> >> > >> >> >> /** > >> >> >> * Modifications after executing a command. > >> >> >> * @param w writing flag > >> >> >> */ > >> >> >> synchronized void unlock(final boolean w) { > >> >> >> synchronized(mutex) { > >> >> >> if(w) { > >> >> >> writer = false; > >> >> >> } else { > >> >> >> --readers; > >> >> >> } > >> >> >> mutex.notifyAll(); > >> >> >> } > >> >> >> } > >> >> >> } > >> >> >> _______________________________________________ > >> >> >> BaseX-Talk mailing list > >> >> >> BaseX-Talk@mailman.uni-konstanz.de > >> >> >> https://mailman.uni-
konstanz.de/mailman/listinfo/basex-
talk
> >> >> >> > >> >> > > >> >> > >> > > >> > > >> > > >> > > > > > > >
Hi Laurent,
Thank you for your analysis and proposed solution.
First of all, I'd like to point out that the deadlock is not in the server, because there are no server sessions which wait for each other. There is just one session (the open iterator), which holds the lock and blocks all other sessions.
I'd say the deadlock is rather in the client application. Imagine a program which holds a lock (in our case the lock would be the basex database) and then starts a new thread which wants the lock too and waits for it to finish. Obviously, the program will never finish. Of course this is a very simplified description (no reader/writer locks), but basically describes what you do.
The problem which you describe can happen in a relational database system which preserves the order of transactions. However, normally, RDBMS have smaller granularity locks (usually block-level locks) and this increases significantly the concurrency of the system. In contrast, BaseX has a global server lock (which is necessary due to the wide variety of data manipulation operations which can be executed in a single transaction with XQuery).
Your implementation of the Lock class basically gives higher priority to the reader processes. While it solves your problem, it makes the server prone to resource starvation of the writer processes. Maybe, we could add an interface for the locking implemenation and the database administrator can set the appropriate implementation with a start up argument, but I think that the current implementation should remain as the default.
As you can see, the world of XML databases although similar to the RDBMS world has some peculiarities, which should be taken into account. However, IMHO, two approaches from the RDBMS world could be useful in your case:
1) holding locks for long time should be avoided
2) don't start a read query for each record from the iterator; instead do the whole data processing in the database server (XQuery is much more powerful than SQL, and database servers are designed exactly for the purpose of data processing). Additionally, this approach will further decrease your memory consumption requirements, because the iterator records will not be sent to the client.
I hope my comments will be helpful.
Best regards, Dimitar
Am Mittwoch, 2. November 2011, 01:57:22 schrieb Laurent Chevalier:
Hi Christian,
Actually, my real code does not send anymore any query within the iterative query loop. I can inform other developers but, for me, it should not be possible to deadlock the BaseX server whatever the developers do. We intend to use BaseX in a production environment.
I don't know BaseX enough to tell whether my fix of the Lock class can be take into account or not. But, with this fix, the updating queries are still lock if reading queries are running. So, there would not be any problem with the internal database pointers you mention in your mail. Just to be sure that we are talking of the same code, I send you again my fix (working with BaseX 7.0).
To answer your last question regarding the cancellation of an iterative query, there are two use cases : - when possible, in the query, we use position() or subsequence() to get only a part of the sequence and so we iterate over all results of the iterative query, - but, sometimes, we have to iterate until we get a given number of items depending on a condition that can not be included in the xquery because the condition depends on data that is not store in the database (user rights in our case).
Best regards, Laurent
package org.basex.core;
import java.util.LinkedList; import org.basex.util.Util;
/**
- Management of executing read/write processes.
- Supports multiple readers, limited by {@link MainProp#PARALLEL},
- and single writers (readers/writer lock).
- @author BaseX Team 2005-11, BSD License
- @author Christian Gruen
*/ final class Lock { /** Queue for all waiting processes. */ private final LinkedList<LockedCommand> queue = new LinkedList<LockedCommand>(); /** Mutex object. */ private final Object mutex = new Object(); /** Database context. */ private final Context ctx;
/** Number of active readers. */ private int readers; /** Writer flag. */ private boolean writer;
/**
- Default constructor.
- @param c context
*/ Lock(final Context c) { ctx = c; }
/**
- Tells whether this command can be processed now or not.
- @param o Pending command.
- @return Permission to run command.
*/ private boolean ok(LockedCommand o) { synchronized(mutex) { if ( writer ) return false; for(int i=0; i<queue.size(); i++) { if(o == queue.get(i)) { if(o.writer) { if(readers == 0) { writer = true; return true; } } else if(readers < Math.max(ctx.mprop.num(MainProp.PARALLEL), 1)) { ++readers; return true; } } } return false; } }
/**
- Modifications before executing a command.
- @param w writing flag
*/ void lock(final boolean w) { synchronized(mutex) { final LockedCommand o = new LockedCommand(w); queue.add(o);
try { while(true) { if ( ok(o) ) break; mutex.wait(); } } catch(final InterruptedException ex) { Util.stack(ex); } synchronized(mutex) { queue.remove(o); } }
}
/**
- Modifications after executing a command.
- @param w writing flag
*/ synchronized void unlock(final boolean w) { synchronized(mutex) { if(w) { writer = false; } else { --readers; } mutex.notifyAll(); } } }
package org.basex.core;
/**
- Locked command stored in Lock class queue.
- @author Laurent Chevalier
*/ final class LockedCommand { /** Writer flag. */ public boolean writer;
/**
- Default constructor.
- @param w Writer flag. Tells whether it is an updating command (true) or
not (false). */ LockedCommand(final boolean w) { this.writer = w; } }
-----Message d'origine----- De : Christian Grün [mailto:christian.gruen@gmail.com] Envoyé : vendredi 28 octobre 2011 18:26 À : Laurent Chevalier Cc : basex-talk@mailman.uni-konstanz.de Objet : Re: [basex-talk] BaseX server deadlock
Laurent,
thanks for putting some effort into a reproducible test case (the example also causes a deadlock on my machine). Your modified client code basically runs into similar problems as the old iterative solution. What you would probably need to do is discard all pending results of an iterator before you launch a new updating query. There are several reasons why it's advisable to fetch all results before performing another update. One of them is that the internal database pointers used in one query might get invalid if an updating query is performed at the same time.
Another solution would be to first cache all query results on the server before they are sent over to the client. This means, however, that the whole query has to be evaluated before the results can be sent over the network, which would introduce another delay (next, the server-side caching might turn out to be a memory hog if numerous clients communicate with the server at the same time, or don't even fetch their results).
Maybe a related question: What are your criterias for canceling an iterative query? In other words, could you decide how many query result are needed before executing a query?
Christian ___________________________
On Fri, Oct 28, 2011 at 4:00 PM, Laurent Chevalier
l.chevalier@cyim.com wrote:
Hi Christian,
I have finally succeeded to reproduce the deadlock problem with a
unitary test code in Java that you will find enclosed. I recommend to read the notes 1, 2 and 3 in DeadlockTest.java. I encountered difficulties to reproduce the problem as it happens only if the iterative query returns a minimum amount of data (see Note 3, line 69 in DeadlockTest.java).
The deadlock problem does not happen with the unmodified client. It
seems normal as this client gets all results at one stroke and caches them.
But, I have to modify the client to avoid caching (to save memory).
You will find enclosed the modified client. Four classes need to be changed : ClientQuery, ClientSession, Query and Session.
Best regards, Laurent
-----Message d'origine----- De : Christian Grün [mailto:christian.gruen@gmail.com] Envoyé : jeudi 27 octobre 2011 19:02 À : Laurent Chevalier Cc : basex-talk@mailman.uni-konstanz.de Objet : Re: [basex-talk] BaseX server deadlock
Laurent,
thanks for your feedback. I am wondering that the deadlock problem hasn't been fixed with 7.0, as all API database operations should
now
be atomic. Just in case.. Could you tell me if you are also encountering the locking issue with the unmodified client?
All the best, Christian _________________________________________
On Thu, Oct 27, 2011 at 6:51 PM, Laurent Chevalier
l.chevalier@cyim.com wrote:
Hi Christian,
Thank you for your mail. I have updated my client code to use the
new
BaseX 7.0 release last week. It was not too painful. I'm not caching the results. With my application, I still have the deadlock problem with BaseX 7.0 and so I'm still using the Lock class fix but I
failed
to reproduce the problem with a small test code in Java. So, it's
not
sure that the problem is coming from BaseX yet, it may come from my .NET client. Tomorrow, I will translate the Java test code in VB.NET and I keep you inform.
I you want, I can also send you an updated version of the
ClientQuery
class that is not using cache.
Best regards, Laurent
-----Message d'origine----- De : Christian Grün [mailto:christian.gruen@gmail.com] Envoyé : jeudi 27 octobre 2011 00:27 À : Laurent Chevalier Cc : basex-talk@mailman.uni-konstanz.de Objet : Re: [basex-talk] BaseX server deadlock
Dear Laurent,
now that we've officially released our new iterator concept..
Have
you
been successful with optimizing the BaseX client for your system architecture? What are the current bottlenecks?
Christian ___________________________
On Tue, Sep 20, 2011 at 9:25 AM, Laurent Chevalier
l.chevalier@cyim.com wrote: > Hi Christian, > > Well, I know that the memory consumption is an issue as > we are
already fighting with it in our current system. It's just our
main
issue... So, I will adjust the client code. It's good to have a performance improvement. I hope I will not have problem with
reading
data from the socket chunk by chunk for a long time.
> Regards, > Laurent > >> -----Message d'origine----- >> De : Christian Grün [mailto:christian.gruen@gmail.com] >> Envoyé : lundi 19 septembre 2011 22:12 >> À : Laurent Chevalier >> Cc : basex-talk@mailman.uni-konstanz.de >> Objet : Re: [basex-talk] BaseX server deadlock >> >> Laurent, >> >> thanks for the elaborate description of your system
architecture.
I'm
>> still quite positive that our new architecture >> shouldn't
seriously
set
>> you back, and I'd claim that our caching architecture >> is
pretty
memory
>> efficient, so I would suggest to first do some tests >> with the
new
>> iterator to evaluate if caching is the main issue >> (sorry for >> persisting; maybe you've already spent enough time in >> this
anyway).
>> If the client-side caching turns out to waste too many
resources,
you
>> could easily adjust the light-weight client code to >> fit your
needs.
>> All you have to do is to directly interpret the >> incoming
results,
and
>> skip the remaining results if you have finished >> querying (see
[1]
for
>> the Java client). In both cases, querying should at >> least be
much
>> faster than before, and the client-based adjustments >> won't
open
many
>> sophisticated issues that would have to be resolved >> server-
side.
>> Hope this helps; more feedback is welcome, >> Christian >> >> [1] https://github.com/BaseXdb/basex- >> api/blob/master/src/main/java/BaseXClient.java >> ___________________________ >> >> On Mon, Sep 19, 2011 at 7:04 PM, Laurent Chevalier >> >> l.chevalier@cyim.com wrote: >> > Hi Christian, >> > >> > We are building web applications in MS .NET. The >> > data is
made
of a
>> hierarchy of containers. A container is a directory >> containing
an
XML
>> file and attachments (static resources like pictures, >> videos,
etc.).
>> These containers are "indexed" in a database for >> better
performance.
>> Currently, we are using an SQL Server database with
XQuery/XPath
and
>> fulltext search. I'm currently working on a new >> implementation
with
>> BaseX. Our goal is to simplify xquery writing (today, >> we have
to
mix
>> xquery in sql queries which is a bit complicated), >> and, if
possible,
we
>> would like to get better performances. >> >> > The biggest database I have to deal with today >> > counts around
25000
>> containers and continues to grow. It contains medical >> events
data,
html
>> articles, news, agenda, members directory, etc. The >> size of
the
BaseX
>> database directory with indexes is 160 Mo. >> >> > We want to keep the database synchronized with the >> > file
system
>> hierarchy. For instance, if you manually add a >> container in
the
file
>> system, you can launch a "re-indexing" process that >> will
update
the
>> database automatically. For this process, I iterate >> over all
containers
>> in database, and check if it has to be updated or not. >> I'm
using
an
>> iterative query for this. This query is very basic as >> it only
returns a
>> list of string (the identifiers of the containers) of >> 255
characters
>> max. But, if you multiply 255 by the number of >> containers,
it's
>> starting to do much. >> >> > We have other usages of iterative queries. Another example : control
>> access data is not stored in the database. So, if I >> want, for
instance,
>> the first 10 accessible containers in a given website >> section,
I
will
>> loop over the containers published in this section in >> the
database,
and
>> return results as soon as I have found 10 accessible
containers,
>> ignoring the remaining ones (provided by the BaseX >> query). >> >> > With SQL Server clients, we have an equivalent of >> > BaseX
iterative
>> queries that avoid caching the whole request results. >> The
memory
>> consumption is a very serious issue for web >> applications in MS
.NET
or
>> Java. >> >> > With JDBC drivers, the fetch size can be set >> >> (http://www.oracle.com/technetwork/database/enterprise >> - >> edition/memory.pdf). With PostgreSQL JDBC driver, >> cursors are
used
and
>> multiple queries may be fired to get all results >> (http://abhirama.wordpress.com/2009/01/07/postgresql-j >> dbc-and-
large-
>> result-sets/). >> >> > I think the iterative query without client caching >> > (as it
was
>> implemented in BaseX until version 6.7) was a really >> great
feature
and
>> addressed a very common memory consumption issue. >> >> > BTW, I'm exploring two ways of using BaseX : >> > - either in client/server mode : the client (web >> > site)
communicates
>> with the BaseX server through TCP, >> >> > - or embedded : I have generated a .NET assembly >> > (DLL) with
IKVM.NET
>> and thus I can embed BaseX in a .NET application. >> >> > The client/server mode would be used for portals. >> > The embedded mode might be interested for single >> > sites that
do
not
>> share database with others. >> >> > I hope we'll find a good solution solving both the >> > deadlock
issue
and
>> the client memory consumption issue. >> >> > Regards, >> > Laurent >> > >> >> -----Message d'origine----- >> >> De : Christian Grün >> >> [mailto:christian.gruen@gmail.com] >> >> Envoyé : lundi 19 septembre 2011 17:38 >> >> À : Laurent Chevalier >> >> Cc : basex-talk@mailman.uni-konstanz.de >> >> Objet : Re: [basex-talk] BaseX server deadlock >> >> >> >> Hi Laurent, >> >> >> >> yes, the code has already been rewritten to >> >> reflect the new
Client
>> >> API. As there were too many potential conflicts >> >> with the
old
>> solution, >> >> >> this would have been happened sooner or later >> >> anyway. >> >> >> >> I'm sorry that you believe that the new solution >> >> might
conflict
with
>> >> your existing architecture. I'd be interested in >> >> a few
things
to
get
>> a >> >> >> better feeling if this problem cannot be solved >> >> in a
different
way: >> >> -- how much data do you iterate through (kb, mb >> >> or even
more)?
>> >> -- how expensive are your queries? >> >> -- note that the data will be cached by the >> >> client.. do you
use
the
>> >> same machine for clients and servers? >> >> -- I'd be interested in your first test results >> >> to see if
your
>> worries >> >> >> get true.. As the data will be transferred much >> >> faster than
before
>> >> (because of the single request to get the data), >> >> the new >> >> architecture >> >> >> might turn out to be beneficial even in your >> >> case. Indeed
I'm
quite
>> >> convinced, after all, that most users will >> >> profit from the
changes.
>> >> Salutations, >> >> Christian >> >> >> >> ___________________________ >> >> >> >> On Mon, Sep 19, 2011 at 5:16 PM, Laurent >> >> Chevalier >> >> >> >> l.chevalier@cyim.com wrote: >> >> > In fact, the changes have already done in >> >> > version 6.8...
That's
a
>> >> serious problem for me as we have to minimize >> >> the memory
consumption
>> of >> >> >> our web applications, that is already high. >> >> >> >> >> -----Message d'origine----- >> >> >> De : Christian Grün >> >> >> [mailto:christian.gruen@gmail.com] >> >> >> Envoyé : lundi 19 septembre 2011 16:13 >> >> >> À : Laurent Chevalier >> >> >> Cc : basex-talk@mailman.uni-konstanz.de >> >> >> Objet : Re: [basex-talk] BaseX server >> >> >> deadlock >> >> >> >> >> >> Hi Laurent, >> >> >> >> >> >> while I didn't manage to reproduce the >> >> >> deadlock that you >> >> described a >> >> >> >> while ago, I came across some other >> >> >> potential scenarios
in
which
>> our >> >> >> >> locking implementation could cause >> >> >> deadlocks. The
simplest
>> example >> >> >> >> looks as follows: >> >> >> >> >> >> - Client1 creates an iterator and requests >> >> >> the first
result
>> >> >> - Client2 sends an updating command >> >> >> - Client1 requests no further results, >> >> >> thus blocking
Client2
>> >> >> Instead of modifying the delicate Lock >> >> >> algorithm itself,
we
>> decided >> >> >> to >> >> >> >> >> go one step further and rewrite our client >> >> >> architecture.
From
now
>> >> on, >> >> >> >> >> the clients are responsible for iterating >> >> >> through their
query
>> items, >> >> >> >> and an iterator request to the server >> >> >> triggers the
complete
>> >> execution >> >> >> >> >> and transmission of a query. This has >> >> >> several
advantages:
>> >> >> - The server will only perform atomic >> >> >> operations and is
not
>> >> dependent >> >> >> >> >> on the clients' behavior anymore >> >> >> - The iterative evaluation of a query will >> >> >> only trigger
a
single
>> >> >> socket request, leading to a considerable >> >> >> speedup if
network
>> latency >> >> >> >> is high >> >> >> >> >> >> The obvious drawback is that intermediate >> >> >> results need
to
be
>> cached. >> >> >> >> The most straightforward alternative to >> >> >> bypass this
problem
is
to
>> >> send >> >> >> >> >> several queries to the server, or restrict >> >> >> the number of
iterated
>> >> >> results in the XQuery expression if not >> >> >> all requested
results
are
>> >> >> actually needed. >> >> >> >> >> >> We have added another Wiki page to better >> >> >> document our
server
>> >> protocol >> >> >> >> >> [1]. Next, I have closed the GitHub issue >> >> >> related to
your
locking
>> >> >> problem, as it should now be fixed as >> >> >> well. >> >> >> >> >> >> Hope this helps, >> >> >> Christian >> >> >> >> >> >> [1] >> >> >> http://docs.basex.org/wiki/Server_Protoco >> >> >> l >> >> >> [2] >> >> >> https://github.com/BaseXdb/basex/issues/1 >> >> >> 73 >> >> >> >> >> >> > __________________________ >> >> >> > >> >> >> > On Mon, Aug 29, 2011 at 9:50 AM, >> >> >> > Laurent Chevalier >> >> >> >> >> >> l.chevalier@cyim.com wrote: >> >> >> >> Hi, >> >> >> >> >> >> >> >> A deadlock occurs in the following >> >> >> >> situation: a first
client
>> >> program >> >> >> >> >> opens an iterative query. For each >> >> >> iteration, this
program
does
>> some >> >> >> >> processing and sends another reading >> >> >> request to BaseX
(using
>> another >> >> >> >> BaseX session). All works fine until a >> >> >> second client
program
(or
>> >> >> another thread) sends an updating command >> >> >> to BaseX (like
optimize
>> >> for >> >> >> >> >> instance). This locks BaseX server. To >> >> >> unlock it, you
have
to
>> kill >> >> >> the >> >> >> >> >> first program. >> >> >> >> >> >> >> I have read BaseX server code and >> >> >> >> found the reason
for
this
>> >> behavior >> >> >> >> >> in the class org.basex.core.Lock: >> >> >> >> - with the iterative query, there >> >> >> >> is always at least
one
>> reader >> >> >> >> alive (readers=1). >> >> >> >> >> >> >> - when the updating query is >> >> >> >> received, it is put in
the
queue
>> >> >> (index 0) and remains in it as long as >> >> >> there is a
reading
query
>> >> running >> >> >> >> >> (that is to say, as long as the iterative >> >> >> reading query
is
>> running). >> >> >> >> >> - then a second reading request is >> >> >> >> received, it is
put
in
the
>> >> queue >> >> >> >> >> (index 1 as there is already the updating >> >> >> query in the
queue).
As
>> it >> >> >> is >> >> >> >> >> only the second item of the queue, it >> >> >> remains in the
queue
as
>> long >> >> >> as >> >> >> >> >> the first item in the queue (the updating >> >> >> query) has not
been
>> >> processed >> >> >> >> >> (BaseX processes the requests in the order >> >> >> of arrival,
FIFO
>> queue). >> >> >> But >> >> >> >> >> this first item can not be processed >> >> >> because there is
the
>> iterative >> >> >> >> reading query running. All queries are >> >> >> thus locked. >> >> >> >> >> >> >> Some may say that we should not send >> >> >> >> another query
while
we
>> are >> >> >> in >> >> >> >> >> the loop of an iterative query but in our >> >> >> context of
many
sites
>> >> being >> >> >> >> >> developed by several developers, it is >> >> >> possible that a
developer
>> >> codes >> >> >> >> >> this and we do not want BaseX to be locked >> >> >> in this case
(whatever
>> it >> >> >> is >> >> >> >> >> a mistake of the developer or not). >> >> >> >> >> >> >> I have found a solution to this >> >> >> >> problem by modifying
the
>> >> >> org.basex.core.Lock class. You will find >> >> >> my code
hereafter.
I
do
>> not >> >> >> >> use a queue anymore and i use a static >> >> >> mutex (called
queueMutex)
>> to >> >> >> >> synchronize all pending queries (threads). >> >> >> The
"drawback"
of
this
>> >> >> solution is that the queries are not >> >> >> processed anymore
in
the
>> order >> >> >> of >> >> >> >> >> arrival but randomly. >> >> >> >> >> >> >> What do you think of this solution ? >> >> >> >> Do you plan to
update
>> BaseX >> >> >> >> locking mechanism ? >> >> >> >> >> >> >> I'm using BaseX 6.7.1 but I have >> >> >> >> seen that Lock.java
has
not
>> been >> >> >> >> changed in BaseX 6.7.2. >> >> >> >> >> >> >> Here is my code : >> >> >> >> >> >> >> >> package org.basex.core; >> >> >> >> >> >> >> >> import java.util.Date; >> >> >> >> //import java.util.LinkedList; >> >> >> >> import java.util.Random; >> >> >> >> >> >> >> >> import org.basex.util.Util; >> >> >> >> >> >> >> >> /** >> >> >> >> >> >> >> >> * Management of executing >> >> >> >> read/write processes. >> >> >> >> * Supports multiple readers, >> >> >> >> limited by {@link >> >> >> >> MainProp#PARALLEL}, >> >> >> >> >> >> * and single writers >> >> >> >> (readers/writer lock). >> >> >> >> * >> >> >> >> * @author BaseX Team 2005-11, BSD >> >> >> >> License >> >> >> >> * @author Christian Gruen >> >> >> >> */ >> >> >> >> >> >> >> >> final class Lock { >> >> >> >> >> >> >> >> /** Queue for all waiting >> >> >> >> processes. */ >> >> >> >> >> >> >> >> // private final LinkedList<Object> >> >> >> >> queue = new >> >> >> >> >> >> LinkedList<Object>(); >> >> >> >> >> >> >> /** Mutex object. */ >> >> >> >> private final Object mutex = new >> >> >> >> Object(); >> >> >> >> /** Database context. */ >> >> >> >> private final Context ctx; >> >> >> >> /** Static mutex used to >> >> >> >> synchronize all pending
queries.
**/
>> >> >> >> private final static Object >> >> >> >> queueMutex = new
Object();
>> >> >> >> /** Number of active readers. */ >> >> >> >> private int readers; >> >> >> >> /** Writer flag. */ >> >> >> >> private boolean writer; >> >> >> >> >> >> >> >> /** >> >> >> >> >> >> >> >> * Default constructor. >> >> >> >> * @param c context >> >> >> >> */ >> >> >> >> >> >> >> >> Lock(final Context c) { >> >> >> >> >> >> >> >> ctx = c; >> >> >> >> >> >> >> >> } >> >> >> >> >> >> >> >> /** >> >> >> >> >> >> >> >> * Modifications before executing >> >> >> >> a command. >> >> >> >> * @param w writing flag >> >> >> >> */ >> >> >> >> >> >> >> >> void lock(final boolean w) { >> >> >> >> >> >> >> >> synchronized(mutex) { >> >> >> >> >> >> >> >> int code = new Random(new
Date().getTime()).nextInt();
>> >> >> >> // final Object o = new >> >> >> >> Object(); >> >> >> >> // queue.add(o); >> >> >> >> >> >> >> >> try { >> >> >> >> >> >> >> >> while(true) { >> >> >> >> >> >> >> >> synchronized(queue >> >> >> >> Mutex) { >> >> >> >> >> >> >> >> // if(o == queue.get(0) >> >> >> >> && !writer) { >> >> >> >> >> >> >> >> if(!writer) { >> >> >> >> >> >> >> >> if(w) { >> >> >> >> >> >> >> >> if(rea >> >> >> >> ders >> >> >> >> == 0) >> >> >> >> { >> >> >> >> >> >> >> >> wr >> >> >> >> it >> >> >> >> er >> >> >> >> = >> >> >> >> t >> >> >> >> ru >> >> >> >> e; >> >> >> >> br >> >> >> >> ea >> >> >> >> k; >> >> >> >> >> >> >> >> } >> >> >> >> >> >> >> >> } else >> >> >> >> if(reader >> >> >> >> s < >> >> >> >> >> >> Math.max(ctx.mprop.num(MainProp.PARALLEL), >> >> >> 1)) { >> >> >> >> >> >> >> ++read >> >> >> >> ers; >> >> >> >> break; >> >> >> >> >> >> >> >> } >> >> >> >> >> >> >> >> } >> >> >> >> >> >> >> >> } >> >> >> >> mutex.wait(); >> >> >> >> >> >> >> >> } >> >> >> >> >> >> >> >> } catch(final >> >> >> >> InterruptedException ex) >> >> >> >> { >> >> >> >> >> >> >> >> Util.stack(ex); >> >> >> >> >> >> >> >> } >> >> >> >> >> >> >> >> // queue.remove(0); >> >> >> >> >> >> >> >> } >> >> >> >> >> >> >> >> } >> >> >> >> >> >> >> >> /** >> >> >> >> >> >> >> >> * Modifications after executing >> >> >> >> a command. >> >> >> >> * @param w writing flag >> >> >> >> */ >> >> >> >> >> >> >> >> synchronized void unlock(final >> >> >> >> boolean w) { >> >> >> >> >> >> >> >> synchronized(mutex) { >> >> >> >> >> >> >> >> if(w) { >> >> >> >> >> >> >> >> writer = false; >> >> >> >> >> >> >> >> } else { >> >> >> >> >> >> >> >> --readers; >> >> >> >> >> >> >> >> } >> >> >> >> mutex.notifyAll(); >> >> >> >> >> >> >> >> } >> >> >> >> >> >> >> >> } >> >> >> >> >> >> >> >> } >> >> >> >> ____________________________________ >> >> >> >> ___________ >> >> >> >> BaseX-Talk mailing list >> >> >> >> BaseX-Talk@mailman.uni-konstanz.de >> >> >> >> https://mailman.uni-
konstanz.de/mailman/listinfo/basex-
talk
BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk
Hi Dimitar,
Thank you for taking some time looking at this issue. Regarding your proposals : 1) "holding locks for long time should be avoided" Sure it's better. We could also count the number of results and get them one by one without using an iterative query, but doing this, we complicate the code and degrade performance. 2) "don't start a read query for each record from the iterator;" I agree to discourage this kind of practice. 2b) "instead do the whole data processing in the database server" To be considered on a case by case basis.
Best regards, Laurent
-----Message d'origine----- De : Dimitar Popov [mailto:Dimitar.Popov@uni-konstanz.de] Envoyé : jeudi 3 novembre 2011 14:31 À : basex-talk@mailman.uni-konstanz.de Cc : Laurent Chevalier; Christian Grün Objet : Re: [basex-talk] BaseX server deadlock
Hi Laurent,
Thank you for your analysis and proposed solution.
First of all, I'd like to point out that the deadlock is not in the server, because there are no server sessions which wait for each other. There is just one session (the open iterator), which holds the lock and blocks all other sessions.
I'd say the deadlock is rather in the client application. Imagine a program which holds a lock (in our case the lock would be the basex database) and then starts a new thread which wants the lock too and waits for it to finish. Obviously, the program will never finish. Of course this is a very simplified description (no reader/writer locks), but basically describes what you do.
The problem which you describe can happen in a relational database system which preserves the order of transactions. However, normally, RDBMS have smaller granularity locks (usually block-level locks) and this increases significantly the concurrency of the system. In contrast, BaseX has a global server lock (which is necessary due to the wide variety of data manipulation operations which can be executed in a single transaction with XQuery).
Your implementation of the Lock class basically gives higher priority to the reader processes. While it solves your problem, it makes the server prone to resource starvation of the writer processes. Maybe, we could add an interface for the locking implemenation and the database administrator can set the appropriate implementation with a start up argument, but I think that the current implementation should remain as the default.
As you can see, the world of XML databases although similar to the RDBMS world has some peculiarities, which should be taken into account. However, IMHO, two approaches from the RDBMS world could be useful in your case:
holding locks for long time should be avoided
don't start a read query for each record from the iterator; instead
do the whole data processing in the database server (XQuery is much more powerful than SQL, and database servers are designed exactly for the purpose of data processing). Additionally, this approach will further decrease your memory consumption requirements, because the iterator records will not be sent to the client.
I hope my comments will be helpful.
Best regards, Dimitar
Am Mittwoch, 2. November 2011, 01:57:22 schrieb Laurent Chevalier:
Hi Christian,
Actually, my real code does not send anymore any query within the
iterative
query loop. I can inform other developers but, for me, it should not
be
possible to deadlock the BaseX server whatever the developers do. We
intend
to use BaseX in a production environment.
I don't know BaseX enough to tell whether my fix of the Lock class
can be
take into account or not. But, with this fix, the updating queries
are
still lock if reading queries are running. So, there would not be any problem with the internal database pointers you mention in your mail.
Just
to be sure that we are talking of the same code, I send you again my
fix
(working with BaseX 7.0).
To answer your last question regarding the cancellation of an
iterative
query, there are two use cases : - when possible, in the query, we
use
position() or subsequence() to get only a part of the sequence and so
we
iterate over all results of the iterative query, - but, sometimes, we
have
to iterate until we get a given number of items depending on a
condition
that can not be included in the xquery because the condition depends
on
data that is not store in the database (user rights in our case).
Best regards, Laurent
package org.basex.core;
import java.util.LinkedList; import org.basex.util.Util;
/**
- Management of executing read/write processes.
- Supports multiple readers, limited by {@link MainProp#PARALLEL},
- and single writers (readers/writer lock).
- @author BaseX Team 2005-11, BSD License
- @author Christian Gruen
*/ final class Lock { /** Queue for all waiting processes. */ private final LinkedList<LockedCommand> queue = new LinkedList<LockedCommand>(); /** Mutex object. */ private final Object mutex = new Object(); /** Database context. */ private final Context ctx;
/** Number of active readers. */ private int readers; /** Writer flag. */ private boolean writer;
/**
- Default constructor.
- @param c context
*/ Lock(final Context c) { ctx = c; }
/**
- Tells whether this command can be processed now or not.
- @param o Pending command.
- @return Permission to run command.
*/ private boolean ok(LockedCommand o) { synchronized(mutex) { if ( writer ) return false; for(int i=0; i<queue.size(); i++) { if(o == queue.get(i)) { if(o.writer) { if(readers == 0) { writer = true; return true; } } else if(readers <
Math.max(ctx.mprop.num(MainProp.PARALLEL), 1))
{ ++readers; return true; } } } return false; } }
/**
- Modifications before executing a command.
- @param w writing flag
*/ void lock(final boolean w) { synchronized(mutex) { final LockedCommand o = new LockedCommand(w); queue.add(o);
try { while(true) { if ( ok(o) ) break; mutex.wait(); } } catch(final InterruptedException ex) { Util.stack(ex); } synchronized(mutex) { queue.remove(o); } }
}
/**
- Modifications after executing a command.
- @param w writing flag
*/ synchronized void unlock(final boolean w) { synchronized(mutex) { if(w) { writer = false; } else { --readers; } mutex.notifyAll(); } } }
package org.basex.core;
/**
- Locked command stored in Lock class queue.
- @author Laurent Chevalier
*/ final class LockedCommand { /** Writer flag. */ public boolean writer;
/**
- Default constructor.
- @param w Writer flag. Tells whether it is an updating command
(true) or
not (false). */ LockedCommand(final boolean w) { this.writer = w; } }
-----Message d'origine----- De : Christian Grün [mailto:christian.gruen@gmail.com] Envoyé : vendredi 28 octobre 2011 18:26 À : Laurent Chevalier Cc : basex-talk@mailman.uni-konstanz.de Objet : Re: [basex-talk] BaseX server deadlock
Laurent,
thanks for putting some effort into a reproducible test case (the example also causes a deadlock on my machine). Your modified client code basically runs into similar problems as the old iterative solution. What you would probably need to do is discard all pending results of an iterator before you launch a new updating query.
There
are several reasons why it's advisable to fetch all results before performing another update. One of them is that the internal
database
pointers used in one query might get invalid if an updating query
is
performed at the same time.
Another solution would be to first cache all query results on the server before they are sent over to the client. This means,
however,
that the whole query has to be evaluated before the results can be sent over the network, which would introduce another delay (next,
the
server-side caching might turn out to be a memory hog if numerous clients communicate with the server at the same time, or don't even fetch their results).
Maybe a related question: What are your criterias for canceling an iterative query? In other words, could you decide how many query result are needed before executing a query?
Christian ___________________________
On Fri, Oct 28, 2011 at 4:00 PM, Laurent Chevalier
l.chevalier@cyim.com wrote:
Hi Christian,
I have finally succeeded to reproduce the deadlock problem with a
unitary test code in Java that you will find enclosed. I recommend
to
read the notes 1, 2 and 3 in DeadlockTest.java. I encountered difficulties to reproduce the problem as it happens only if the iterative query returns a minimum amount of data (see Note 3, line
69
in DeadlockTest.java).
The deadlock problem does not happen with the unmodified client.
It
seems normal as this client gets all results at one stroke and
caches
them.
But, I have to modify the client to avoid caching (to save
memory).
You will find enclosed the modified client. Four classes need to be changed : ClientQuery, ClientSession, Query and Session.
Best regards, Laurent
-----Message d'origine----- De : Christian Grün [mailto:christian.gruen@gmail.com] Envoyé : jeudi 27 octobre 2011 19:02 À : Laurent Chevalier Cc : basex-talk@mailman.uni-konstanz.de Objet : Re: [basex-talk] BaseX server deadlock
Laurent,
thanks for your feedback. I am wondering that the deadlock
problem
hasn't been fixed with 7.0, as all API database operations
should
now
be atomic. Just in case.. Could you tell me if you are also encountering the locking issue with the unmodified client?
All the best, Christian _________________________________________
On Thu, Oct 27, 2011 at 6:51 PM, Laurent Chevalier
l.chevalier@cyim.com wrote:
Hi Christian,
Thank you for your mail. I have updated my client code to use the
new
BaseX 7.0 release last week. It was not too painful. I'm not caching the results. With my application, I still have the deadlock problem with BaseX 7.0 and so I'm still using the Lock class fix but I
failed
to reproduce the problem with a small test code in Java. So,
it's
not
sure that the problem is coming from BaseX yet, it may come from my .NET client. Tomorrow, I will translate the Java test code in VB.NET and I keep you inform.
I you want, I can also send you an updated version of the
ClientQuery
class that is not using cache.
Best regards, Laurent
> -----Message d'origine----- > De : Christian Grün [mailto:christian.gruen@gmail.com] > Envoyé : jeudi 27 octobre 2011 00:27 > À : Laurent Chevalier > Cc : basex-talk@mailman.uni-konstanz.de > Objet : Re: [basex-talk] BaseX server deadlock > > Dear Laurent, > > now that we've officially released our new iterator > concept..
Have
you
> been successful with optimizing the BaseX client for your > system > architecture? What are the current bottlenecks? > > Christian > ___________________________ > > On Tue, Sep 20, 2011 at 9:25 AM, Laurent Chevalier > > l.chevalier@cyim.com wrote: > > Hi Christian, > > > > Well, I know that the memory consumption is an issue as > > we are > > already fighting with it in our current system. It's just > our
main
> issue... So, I will adjust the client code. It's good to > have a > performance improvement. I hope I will not have problem with
reading
> data from the socket chunk by chunk for a long time. > > > Regards, > > Laurent > > > >> -----Message d'origine----- > >> De : Christian Grün [mailto:christian.gruen@gmail.com] > >> Envoyé : lundi 19 septembre 2011 22:12 > >> À : Laurent Chevalier > >> Cc : basex-talk@mailman.uni-konstanz.de > >> Objet : Re: [basex-talk] BaseX server deadlock > >> > >> Laurent, > >> > >> thanks for the elaborate description of your system
architecture.
> I'm > > >> still quite positive that our new architecture > >> shouldn't
seriously
> set > > >> you back, and I'd claim that our caching architecture > >> is
pretty
> memory > > >> efficient, so I would suggest to first do some tests > >> with the
new
> >> iterator to evaluate if caching is the main issue > >> (sorry for > >> persisting; maybe you've already spent enough time in > >> this
anyway).
> >> If the client-side caching turns out to waste too many
resources,
> you > > >> could easily adjust the light-weight client code to > >> fit your
needs.
> >> All you have to do is to directly interpret the > >> incoming
results,
> and > > >> skip the remaining results if you have finished > >> querying (see
[1]
> for > > >> the Java client). In both cases, querying should at > >> least be
much
> >> faster than before, and the client-based adjustments > >> won't
open
many
> >> sophisticated issues that would have to be resolved > >> server-
side.
> >> Hope this helps; more feedback is welcome, > >> Christian > >> > >> [1] https://github.com/BaseXdb/basex- > >> api/blob/master/src/main/java/BaseXClient.java > >> ___________________________ > >> > >> On Mon, Sep 19, 2011 at 7:04 PM, Laurent Chevalier > >> > >> l.chevalier@cyim.com wrote: > >> > Hi Christian, > >> > > >> > We are building web applications in MS .NET. The > >> > data is
made
of a
> >> hierarchy of containers. A container is a directory > >> containing
an
> XML > > >> file and attachments (static resources like pictures, > >> videos,
etc.).
> >> These containers are "indexed" in a database for > >> better
performance.
> >> Currently, we are using an SQL Server database with
XQuery/XPath
and
> >> fulltext search. I'm currently working on a new > >> implementation
with
> >> BaseX. Our goal is to simplify xquery writing (today, > >> we have
to
mix
> >> xquery in sql queries which is a bit complicated), > >> and, if
possible,
> we > > >> would like to get better performances. > >> > >> > The biggest database I have to deal with today > >> > counts around
25000
> >> containers and continues to grow. It contains medical > >> events
data,
> html > > >> articles, news, agenda, members directory, etc. The > >> size of
the
> BaseX > > >> database directory with indexes is 160 Mo. > >> > >> > We want to keep the database synchronized with the > >> > file
system
> >> hierarchy. For instance, if you manually add a > >> container in
the
file
> >> system, you can launch a "re-indexing" process that > >> will
update
the
> >> database automatically. For this process, I iterate > >> over all > > containers > > >> in database, and check if it has to be updated or not. > >> I'm
using
an
> >> iterative query for this. This query is very basic as > >> it only > > returns a > > >> list of string (the identifiers of the containers) of > >> 255
characters
> >> max. But, if you multiply 255 by the number of > >> containers,
it's
> >> starting to do much. > >> > >> > We have other usages of iterative queries. Another
example :
> control > > >> access data is not stored in the database. So, if I > >> want, for > > instance, > > >> the first 10 accessible containers in a given website > >> section,
I
> will > > >> loop over the containers published in this section in > >> the
database,
> and > > >> return results as soon as I have found 10 accessible
containers,
> >> ignoring the remaining ones (provided by the BaseX > >> query). > >> > >> > With SQL Server clients, we have an equivalent of > >> > BaseX
iterative
> >> queries that avoid caching the whole request results. > >> The
memory
> >> consumption is a very serious issue for web > >> applications in MS
.NET
> or > > >> Java. > >> > >> > With JDBC drivers, the fetch size can be set > >> > >> (http://www.oracle.com/technetwork/database/enterprise > >> - > >> edition/memory.pdf). With PostgreSQL JDBC driver, > >> cursors are
used
> and > > >> multiple queries may be fired to get all results > >> (http://abhirama.wordpress.com/2009/01/07/postgresql-j > >> dbc-and-
large-
> >> result-sets/). > >> > >> > I think the iterative query without client caching > >> > (as it
was
> >> implemented in BaseX until version 6.7) was a really > >> great
feature
> and > > >> addressed a very common memory consumption issue. > >> > >> > BTW, I'm exploring two ways of using BaseX : > >> > - either in client/server mode : the client (web > >> > site) > > communicates > > >> with the BaseX server through TCP, > >> > >> > - or embedded : I have generated a .NET assembly > >> > (DLL) with > > IKVM.NET > > >> and thus I can embed BaseX in a .NET application. > >> > >> > The client/server mode would be used for portals. > >> > The embedded mode might be interested for single > >> > sites that
do
not
> >> share database with others. > >> > >> > I hope we'll find a good solution solving both the > >> > deadlock
issue
> and > > >> the client memory consumption issue. > >> > >> > Regards, > >> > Laurent > >> > > >> >> -----Message d'origine----- > >> >> De : Christian Grün > >> >> [mailto:christian.gruen@gmail.com] > >> >> Envoyé : lundi 19 septembre 2011 17:38 > >> >> À : Laurent Chevalier > >> >> Cc : basex-talk@mailman.uni-konstanz.de > >> >> Objet : Re: [basex-talk] BaseX server deadlock > >> >> > >> >> Hi Laurent, > >> >> > >> >> yes, the code has already been rewritten to > >> >> reflect the new > > Client > > >> >> API. As there were too many potential conflicts > >> >> with the
old
> >> solution, > >> > >> >> this would have been happened sooner or later > >> >> anyway. > >> >> > >> >> I'm sorry that you believe that the new solution > >> >> might
conflict
> with > > >> >> your existing architecture. I'd be interested in > >> >> a few
things
to
> get > > >> a > >> > >> >> better feeling if this problem cannot be solved > >> >> in a
different
> way: > >> >> -- how much data do you iterate through (kb, mb > >> >> or even
more)?
> >> >> -- how expensive are your queries? > >> >> -- note that the data will be cached by the > >> >> client.. do you
use
> the > > >> >> same machine for clients and servers? > >> >> -- I'd be interested in your first test results > >> >> to see if
your
> >> worries > >> > >> >> get true.. As the data will be transferred much > >> >> faster than > > before > > >> >> (because of the single request to get the data), > >> >> the new > >> > >> architecture > >> > >> >> might turn out to be beneficial even in your > >> >> case. Indeed
I'm
> quite > > >> >> convinced, after all, that most users will > >> >> profit from the > > changes. > > >> >> Salutations, > >> >> Christian > >> >> > >> >> ___________________________ > >> >> > >> >> On Mon, Sep 19, 2011 at 5:16 PM, Laurent > >> >> Chevalier > >> >> > >> >> l.chevalier@cyim.com wrote: > >> >> > In fact, the changes have already done in > >> >> > version 6.8...
That's
> a > > >> >> serious problem for me as we have to minimize > >> >> the memory > > consumption > > >> of > >> > >> >> our web applications, that is already high. > >> >> > >> >> >> -----Message d'origine----- > >> >> >> De : Christian Grün > >> >> >> [mailto:christian.gruen@gmail.com] > >> >> >> Envoyé : lundi 19 septembre 2011 16:13 > >> >> >> À : Laurent Chevalier > >> >> >> Cc : basex-talk@mailman.uni-konstanz.de > >> >> >> Objet : Re: [basex-talk] BaseX server > >> >> >> deadlock > >> >> >> > >> >> >> Hi Laurent, > >> >> >> > >> >> >> while I didn't manage to reproduce the > >> >> >> deadlock that you > >> > >> described a > >> > >> >> >> while ago, I came across some other > >> >> >> potential scenarios
in
> which > > >> our > >> > >> >> >> locking implementation could cause > >> >> >> deadlocks. The
simplest
> >> example > >> > >> >> >> looks as follows: > >> >> >> > >> >> >> - Client1 creates an iterator and requests > >> >> >> the first
result
> >> >> >> - Client2 sends an updating command > >> >> >> - Client1 requests no further results, > >> >> >> thus blocking
Client2
> >> >> >> Instead of modifying the delicate Lock > >> >> >> algorithm itself,
we
> >> decided > >> > >> >> to > >> >> > >> >> >> go one step further and rewrite our client > >> >> >> architecture.
From
> now > > >> >> on, > >> >> > >> >> >> the clients are responsible for iterating > >> >> >> through their
query
> >> items, > >> > >> >> >> and an iterator request to the server > >> >> >> triggers the
complete
> >> >> execution > >> >> > >> >> >> and transmission of a query. This has > >> >> >> several
advantages:
> >> >> >> - The server will only perform atomic > >> >> >> operations and is
not
> >> >> dependent > >> >> > >> >> >> on the clients' behavior anymore > >> >> >> - The iterative evaluation of a query will > >> >> >> only trigger
a
> single > > >> >> >> socket request, leading to a considerable > >> >> >> speedup if
network
> >> latency > >> > >> >> >> is high > >> >> >> > >> >> >> The obvious drawback is that intermediate > >> >> >> results need
to
be
> >> cached. > >> > >> >> >> The most straightforward alternative to > >> >> >> bypass this
problem
is
> to > > >> >> send > >> >> > >> >> >> several queries to the server, or restrict > >> >> >> the number of > > iterated > > >> >> >> results in the XQuery expression if not > >> >> >> all requested
results
> are > > >> >> >> actually needed. > >> >> >> > >> >> >> We have added another Wiki page to better > >> >> >> document our
server
> >> >> protocol > >> >> > >> >> >> [1]. Next, I have closed the GitHub issue > >> >> >> related to
your
> locking > > >> >> >> problem, as it should now be fixed as > >> >> >> well. > >> >> >> > >> >> >> Hope this helps, > >> >> >> Christian > >> >> >> > >> >> >> [1] > >> >> >> http://docs.basex.org/wiki/Server_Protoco > >> >> >> l > >> >> >> [2] > >> >> >> https://github.com/BaseXdb/basex/issues/1 > >> >> >> 73 > >> >> >> > >> >> >> > __________________________ > >> >> >> > > >> >> >> > On Mon, Aug 29, 2011 at 9:50 AM, > >> >> >> > Laurent Chevalier > >> >> >> > >> >> >> l.chevalier@cyim.com wrote: > >> >> >> >> Hi, > >> >> >> >> > >> >> >> >> A deadlock occurs in the following > >> >> >> >> situation: a first > > client > > >> >> program > >> >> > >> >> >> opens an iterative query. For each > >> >> >> iteration, this
program
> does > > >> some > >> > >> >> >> processing and sends another reading > >> >> >> request to BaseX
(using
> >> another > >> > >> >> >> BaseX session). All works fine until a > >> >> >> second client
program
> (or > > >> >> >> another thread) sends an updating command > >> >> >> to BaseX (like > > optimize > > >> >> for > >> >> > >> >> >> instance). This locks BaseX server. To > >> >> >> unlock it, you
have
to
> >> kill > >> > >> >> the > >> >> > >> >> >> first program. > >> >> >> > >> >> >> >> I have read BaseX server code and > >> >> >> >> found the reason
for
this
> >> >> behavior > >> >> > >> >> >> in the class org.basex.core.Lock: > >> >> >> >> - with the iterative query, there > >> >> >> >> is always at least
one
> >> reader > >> > >> >> >> alive (readers=1). > >> >> >> > >> >> >> >> - when the updating query is > >> >> >> >> received, it is put in
the
> queue > > >> >> >> (index 0) and remains in it as long as > >> >> >> there is a
reading
> query > > >> >> running > >> >> > >> >> >> (that is to say, as long as the iterative > >> >> >> reading query
is
> >> running). > >> > >> >> >> >> - then a second reading request is > >> >> >> >> received, it is
put
in
> the > > >> >> queue > >> >> > >> >> >> (index 1 as there is already the updating > >> >> >> query in the
queue).
> As > > >> it > >> > >> >> is > >> >> > >> >> >> only the second item of the queue, it > >> >> >> remains in the
queue
as
> >> long > >> > >> >> as > >> >> > >> >> >> the first item in the queue (the updating > >> >> >> query) has not
been
> >> >> processed > >> >> > >> >> >> (BaseX processes the requests in the order > >> >> >> of arrival,
FIFO
> >> queue). > >> > >> >> But > >> >> > >> >> >> this first item can not be processed > >> >> >> because there is
the
> >> iterative > >> > >> >> >> reading query running. All queries are > >> >> >> thus locked. > >> >> >> > >> >> >> >> Some may say that we should not send > >> >> >> >> another query
while
we
> >> are > >> > >> >> in > >> >> > >> >> >> the loop of an iterative query but in our > >> >> >> context of
many
> sites > > >> >> being > >> >> > >> >> >> developed by several developers, it is > >> >> >> possible that a > > developer > > >> >> codes > >> >> > >> >> >> this and we do not want BaseX to be locked > >> >> >> in this case > > (whatever > > >> it > >> > >> >> is > >> >> > >> >> >> a mistake of the developer or not). > >> >> >> > >> >> >> >> I have found a solution to this > >> >> >> >> problem by modifying
the
> >> >> >> org.basex.core.Lock class. You will find > >> >> >> my code
hereafter.
I
> do > > >> not > >> > >> >> >> use a queue anymore and i use a static > >> >> >> mutex (called > > queueMutex) > > >> to > >> > >> >> >> synchronize all pending queries (threads). > >> >> >> The
"drawback"
of
> this > > >> >> >> solution is that the queries are not > >> >> >> processed anymore
in
the
> >> order > >> > >> >> of > >> >> > >> >> >> arrival but randomly. > >> >> >> > >> >> >> >> What do you think of this solution ? > >> >> >> >> Do you plan to
update
> >> BaseX > >> > >> >> >> locking mechanism ? > >> >> >> > >> >> >> >> I'm using BaseX 6.7.1 but I have > >> >> >> >> seen that Lock.java
has
> not > > >> been > >> > >> >> >> changed in BaseX 6.7.2. > >> >> >> > >> >> >> >> Here is my code : > >> >> >> >> > >> >> >> >> package org.basex.core; > >> >> >> >> > >> >> >> >> import java.util.Date; > >> >> >> >> //import java.util.LinkedList; > >> >> >> >> import java.util.Random; > >> >> >> >> > >> >> >> >> import org.basex.util.Util; > >> >> >> >> > >> >> >> >> /** > >> >> >> >> > >> >> >> >> * Management of executing > >> >> >> >> read/write processes. > >> >> >> >> * Supports multiple readers, > >> >> >> >> limited by {@link > >> >> > >> >> MainProp#PARALLEL}, > >> >> > >> >> >> >> * and single writers > >> >> >> >> (readers/writer lock). > >> >> >> >> * > >> >> >> >> * @author BaseX Team 2005-11, BSD > >> >> >> >> License > >> >> >> >> * @author Christian Gruen > >> >> >> >> */ > >> >> >> >> > >> >> >> >> final class Lock { > >> >> >> >> > >> >> >> >> /** Queue for all waiting > >> >> >> >> processes. */ > >> >> >> >> > >> >> >> >> // private final LinkedList<Object> > >> >> >> >> queue = new > >> >> >> > >> >> >> LinkedList<Object>(); > >> >> >> > >> >> >> >> /** Mutex object. */ > >> >> >> >> private final Object mutex = new > >> >> >> >> Object(); > >> >> >> >> /** Database context. */ > >> >> >> >> private final Context ctx; > >> >> >> >> /** Static mutex used to > >> >> >> >> synchronize all pending
queries.
> **/ > > >> >> >> >> private final static Object > >> >> >> >> queueMutex = new
Object();
> >> >> >> >> /** Number of active readers. */ > >> >> >> >> private int readers; > >> >> >> >> /** Writer flag. */ > >> >> >> >> private boolean writer; > >> >> >> >> > >> >> >> >> /** > >> >> >> >> > >> >> >> >> * Default constructor. > >> >> >> >> * @param c context > >> >> >> >> */ > >> >> >> >> > >> >> >> >> Lock(final Context c) { > >> >> >> >> > >> >> >> >> ctx = c; > >> >> >> >> > >> >> >> >> } > >> >> >> >> > >> >> >> >> /** > >> >> >> >> > >> >> >> >> * Modifications before executing > >> >> >> >> a command. > >> >> >> >> * @param w writing flag > >> >> >> >> */ > >> >> >> >> > >> >> >> >> void lock(final boolean w) { > >> >> >> >> > >> >> >> >> synchronized(mutex) { > >> >> >> >> > >> >> >> >> int code = new Random(new
Date().getTime()).nextInt();
> >> >> >> >> // final Object o = new > >> >> >> >> Object(); > >> >> >> >> // queue.add(o); > >> >> >> >> > >> >> >> >> try { > >> >> >> >> > >> >> >> >> while(true) { > >> >> >> >> > >> >> >> >> synchronized(queue > >> >> >> >> Mutex) { > >> >> >> >> > >> >> >> >> // if(o == queue.get(0) > >> >> >> >> && !writer) { > >> >> >> >> > >> >> >> >> if(!writer) { > >> >> >> >> > >> >> >> >> if(w) { > >> >> >> >> > >> >> >> >> if(rea > >> >> >> >> ders > >> >> >> >> == 0) > >> >> >> >> { > >> >> >> >> > >> >> >> >> wr > >> >> >> >> it > >> >> >> >> er > >> >> >> >> = > >> >> >> >> t > >> >> >> >> ru > >> >> >> >> e; > >> >> >> >> br > >> >> >> >> ea > >> >> >> >> k; > >> >> >> >> > >> >> >> >> } > >> >> >> >> > >> >> >> >> } else > >> >> >> >> if(reader > >> >> >> >> s < > >> >> >> > >> >> >> Math.max(ctx.mprop.num(MainProp.PARALLEL), > >> >> >> 1)) { > >> >> >> > >> >> >> >> ++read > >> >> >> >> ers; > >> >> >> >> break; > >> >> >> >> > >> >> >> >> } > >> >> >> >> > >> >> >> >> } > >> >> >> >> > >> >> >> >> } > >> >> >> >> mutex.wait(); > >> >> >> >> > >> >> >> >> } > >> >> >> >> > >> >> >> >> } catch(final > >> >> >> >> InterruptedException ex) > >> >> >> >> { > >> >> >> >> > >> >> >> >> Util.stack(ex); > >> >> >> >> > >> >> >> >> } > >> >> >> >> > >> >> >> >> // queue.remove(0); > >> >> >> >> > >> >> >> >> } > >> >> >> >> > >> >> >> >> } > >> >> >> >> > >> >> >> >> /** > >> >> >> >> > >> >> >> >> * Modifications after executing > >> >> >> >> a command. > >> >> >> >> * @param w writing flag > >> >> >> >> */ > >> >> >> >> > >> >> >> >> synchronized void unlock(final > >> >> >> >> boolean w) { > >> >> >> >> > >> >> >> >> synchronized(mutex) { > >> >> >> >> > >> >> >> >> if(w) { > >> >> >> >> > >> >> >> >> writer = false; > >> >> >> >> > >> >> >> >> } else { > >> >> >> >> > >> >> >> >> --readers; > >> >> >> >> > >> >> >> >> } > >> >> >> >> mutex.notifyAll(); > >> >> >> >> > >> >> >> >> } > >> >> >> >> > >> >> >> >> } > >> >> >> >> > >> >> >> >> } > >> >> >> >> ____________________________________ > >> >> >> >> ___________ > >> >> >> >> BaseX-Talk mailing list > >> >> >> >> BaseX-Talk@mailman.uni-konstanz.de > >> >> >> >> https://mailman.uni-
konstanz.de/mailman/listinfo/basex-
talk
BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk
basex-talk@mailman.uni-konstanz.de