Thanks, Bridger! `file:write-text-lines` seems to be the issue. For example, this query doesn’t run in parallel.
Is this expected behavior?
declare variable $PATH := "";
xquery:fork-join( for $_ in (1 to 8) return fn() { file:write-text-lines( $PATH||$_||".json", for $i in (1 to 1000000) return serialize( fn:map <fn:string key="n">{$i}</fn:string> </fn:map>, {"method": "json", "escape-solidus": "no", "json": { "format": "basic", "indent": "no" }} ) ) }, { "parallel": "8"} )
-- Tim A. Thompson (he, him) Librarian for Applied Metadata Research Interim Manager, Metadata Services Unit www.linkedin.com/in/timathompsonhttp://www.linkedin.com/in/timathompson
From: Bridger Dyson-Smith bdysonsmith@gmail.com Date: Wednesday, October 2, 2024 at 1:05 PM To: Thompson, Timothy timothy.thompson@yale.edu Cc: BaseX basex-talk@mailman.uni-konstanz.de Subject: Re: [basex-talk] Write files in parallel? hi Tim - hope you are well. In the past (i.e. I don't remember exactly if this was perfectly parallel, it was just "parallel enough"), I have used something like the following for web requests:
xquery:fork-join( for $xml in ('calq.xqm','factbook.xml','filesystem.xml','locations.xml','wiki1.zip', 'wiki2.zip','xmark.xml') let $url := 'https://files.basex.org/xml/' return fn() { file:write( '/tmp/fork-test/' || $xml, http:send-request( <http:request method='get'/>, $url || $xml ) ) }, map { 'parallel': '3'} ) Hopefully that's helpful (and apologies to the BaseX team's file server)! Best, Bridger
) ls -l --time-style=full-iso total 11640 -rw-r--r-- 1 bridger bridger 1593 2024-10-02 17:02:51.321251082 +0000 calq.xqm -rw-r--r-- 1 bridger bridger 1763070 2024-10-02 17:02:52.301261520 +0000 factbook.xml -rw-r--r-- 1 bridger bridger 2770290 2024-10-02 17:02:53.331272491 +0000 filesystem.xml -rw-r--r-- 1 bridger bridger 1566322 2024-10-02 17:02:52.497263608 +0000 locations.xml -rw-r--r-- 1 bridger bridger 512686 2024-10-02 17:02:52.670265451 +0000 wiki1.zip -rw-r--r-- 1 bridger bridger 5133340 2024-10-02 17:02:54.046280106 +0000 wiki2.zip -rw-r--r-- 1 bridger bridger 155448 2024-10-02 17:02:52.859267464 +0000 xmark.xml
On Tue, Oct 1, 2024 at 5:32 PM Thompson, Timothy <timothy.thompson@yale.edumailto:timothy.thompson@yale.edu> wrote: Hello,
Is it possible to call file:write-text-lines in parallel inside a fork-join operation? I have multiple databases that I would like to run a query over, in parallel, and write the results as JSON Lines to a file per database. When I try this, it doesn’t seem to parallelize.
Thanks in advance, Tim
-- Tim A. Thompson (he, him) Librarian for Applied Metadata Research Interim Manager, Metadata Services Unit Yale University Library www.linkedin.com/in/timathompsonhttp://www.linkedin.com/in/timathompson