LREQ> Your perl substitution is putting <wbr/> after the first non-ascii LREQ> character on the line, and 你 is for sure not an ascii character, LREQ> so you get <wbr/> after it.
Not exactly after it. 1/3 of the way through it. I.e., shattered UTF-8. I was just curious if there was a way in basex if I could do s!<wbr/>!!g like I can do in perl, to restore the damaged UTF-8 characters.
http://www.couchsurfing.org/group_read.html?gid=430&post=13998575