Hey Tim -On Fri, Oct 4, 2024 at 5:53 PM Thompson, Timothy <timothy.thompson@yale.edu> wrote:Thanks, Bridger! `file:write-text-lines` seems to be the issue. For example, this query doesn’t run in parallel.
You're right - apologies for missing this key point in your initial email.
Is this expected behavior?
declare variable $PATH := "";
xquery:fork-join(
for $_ in (1 to 8)
return fn() {
file:write-text-lines(
$PATH||$_||".json",
for $i in (1 to 1000000)
return
serialize(
<fn:map>
<fn:string key="n">{$i}</fn:string>
</fn:map>, {"method": "json", "escape-solidus": "no", "json": {
"format": "basic", "indent": "no"
}}
)
)
},
{ "parallel": "8"}
)
It does seem to be the case that the writes in `file:write-text-lines` are *not* parallel vs a sequential use of the same:I did the following comparison:
using your example,
ls -l --time-style=full-iso /tmp/fork-test
total 130860
-rw-r--r-- 1 bridger bridger 14888896 2024-10-04 20:39:57.926518544 +0000 1.json
-rw-r--r-- 1 bridger bridger 14888896 2024-10-04 20:40:02.849576119 +0000 2.json
-rw-r--r-- 1 bridger bridger 14888896 2024-10-04 20:40:07.799634010 +0000 3.json
-rw-r--r-- 1 bridger bridger 14888896 2024-10-04 20:40:28.652877890 +0000 4.json
-rw-r--r-- 1 bridger bridger 14888896 2024-10-04 20:40:12.892693574 +0000 5.json
-rw-r--r-- 1 bridger bridger 14888896 2024-10-04 20:40:18.140754950 +0000 6.json
-rw-r--r-- 1 bridger bridger 14888896 2024-10-04 20:40:23.569818443 +0000 7.json
-rw-r--r-- 1 bridger bridger 14888896 2024-10-04 20:40:39.098000046 +0000 8.json
-rw-r--r-- 1 bridger bridger 14888896 2024-10-04 20:40:33.779937851 +0000 9.jsonvsusing a sequential write:declare variable $PATH := "/tmp/fork-test/sequential/";
for $i in (1 to 9)
return
file:write-text-lines(
$PATH || $i || ".json",
for $n in (1 to 1000000)
return
serialize(
<fn:map>
<fn:string key="n">{$n}</fn:string>
</fn:map>,
{ "method": "json", "escape-solidus": "no",
"json": { "format": "basic", "indent": "no" }
}
)
)ls -l --time-style=full-iso /tmp/fork-test/sequential
total 130860
-rw-r--r-- 1 bridger bridger 14888896 2024-10-04 20:49:19.841259435 +0000 1.json
-rw-r--r-- 1 bridger bridger 14888896 2024-10-04 20:49:24.820319704 +0000 2.json
-rw-r--r-- 1 bridger bridger 14888896 2024-10-04 20:49:29.838380446 +0000 3.json
-rw-r--r-- 1 bridger bridger 14888896 2024-10-04 20:49:35.041443427 +0000 4.json
-rw-r--r-- 1 bridger bridger 14888896 2024-10-04 20:49:40.182505657 +0000 5.json
-rw-r--r-- 1 bridger bridger 14888896 2024-10-04 20:49:45.305567669 +0000 6.json
-rw-r--r-- 1 bridger bridger 14888896 2024-10-04 20:49:50.535630977 +0000 7.json
-rw-r--r-- 1 bridger bridger 14888896 2024-10-04 20:49:55.703693534 +0000 8.json
-rw-r--r-- 1 bridger bridger 14888896 2024-10-04 20:50:00.948757024 +0000 9.jsoneach file in both attempts takes about 5ms to write, with the exception that the writes are non-sequential in the fork-join example. I wonder if it's due to the appending in `file:write-text-lines`?Maybe Christian can chime in and let us know :)Have a nice weekend!Best,Bridger
--
Tim A. Thompson (he, him)
Librarian for Applied Metadata ResearchInterim Manager, Metadata Services Unit