Performance on batch processes with http requests.

List overview All Threads
Download

newer

older

schema validation

Repo and development workflow

France Baril

9 Feb 2015 9 Feb '15

2:12 p.m.

Hi,

I have an item that I would like to bring to attention. We have developed a web controller to let users manages translation processes for BaseX content. Our process is something like this:

- Users select content to translation (1 to 500 small files) + languages to translation to (1 to 32 selections). - For each language, for each files: - The system transforms the content to xliff and sets the segments to translate="yes" if they have changed since the last translation for this language. Content is saved because we'll need to query it, we redirect to the next task. - File without new segment to translate, content is processed and saved to the target languages (because attributes might have changed and/or segments might have been deleted). The xliff-file is deleted, we redirect to the next task because we'll need the new information for the next query. - We query the server to offer the users stats on items to translate.

This is a simplification of the process, but it shows the logics. Redirects occur after each step for each language. We have grouped operations to limit commits/redirects to a minimum. We apply them:

- After each language to avoid running out of memory. - Before each operation that needs to query files based on the changes from the previous steps.

We have also split groups of tasks into smaller groups where too many tasks have led us to run out of memory in the past.

Our request would be for a way to force changes to commit without having to redirect. Refreshing the browser has a big impact on performance. Or maybe you have suggestions to improve batch processing when using a web interface for process management.

Thank you in advance for you input!

-- France Baril Architecte documentaire / Documentation architect france.baril@architextus.com

Attachments:

attachment.html (text/html — 2.2 KB)

Show replies by date

Christian Grün

11 Feb 11 Feb

1:47 a.m.

Hi France,

I guess there is no simple answer to your question; it mostly depends on the architecture of your approach what would be the best solution and further steps. And I'm not quite sure what's the major challenge? Is it performance, is it technical restrictions, is it the overall concept?

As you were mentioning that you are working with a web interface, our approach would be to provide a RESTXQ function that triggers all the transformations whenever a user requests it. Have you thought about that? What language is your web controller built on?

Best, Christian

On Mon, Feb 9, 2015 at 8:12 PM, France Baril france.baril@architextus.com wrote:

...

Hi,

I have an item that I would like to bring to attention. We have developed a web controller to let users manages translation processes for BaseX content. Our process is something like this:

Users select content to translation (1 to 500 small files) + languages to translation to (1 to 32 selections). For each language, for each files:

The system transforms the content to xliff and sets the segments to translate="yes" if they have changed since the last translation for this language. Content is saved because we'll need to query it, we redirect to the next task. File without new segment to translate, content is processed and saved to the target languages (because attributes might have changed and/or segments might have been deleted). The xliff-file is deleted, we redirect to the next task because we'll need the new information for the next query. We query the server to offer the users stats on items to translate.

This is a simplification of the process, but it shows the logics. Redirects occur after each step for each language. We have grouped operations to limit commits/redirects to a minimum. We apply them:

After each language to avoid running out of memory. Before each operation that needs to query files based on the changes from the previous steps.

We have also split groups of tasks into smaller groups where too many tasks have led us to run out of memory in the past.

Our request would be for a way to force changes to commit without having to redirect. Refreshing the browser has a big impact on performance. Or maybe you have suggestions to improve batch processing when using a web interface for process management.

Thank you in advance for you input!

-- France Baril Architecte documentaire / Documentation architect france.baril@architextus.com

France Baril

7:01 p.m.

Hi,

- Our issue is with performance. - Performing all the transformations lead to 2 issues: - We run out of memory. - We sometimes need to query the content that has been transformed after it has gone through some of the transformations. - Web controller is jquery, but we redirect from xquery. jquery just says: translate these files in these languages. The restxq function handles the steps, and calls itself back with an incremented step number when a group of transformations that could be handle without committing content (without a need to query the saved files or without running out of memory) are completed.

On Wed, Feb 11, 2015 at 1:47 AM, Christian Grün christian.gruen@gmail.com wrote:

...

Hi France,

I guess there is no simple answer to your question; it mostly depends on the architecture of your approach what would be the best solution and further steps. And I'm not quite sure what's the major challenge? Is it performance, is it technical restrictions, is it the overall concept?

As you were mentioning that you are working with a web interface, our approach would be to provide a RESTXQ function that triggers all the transformations whenever a user requests it. Have you thought about that? What language is your web controller built on?

Best, Christian

On Mon, Feb 9, 2015 at 8:12 PM, France Baril france.baril@architextus.com wrote:

...
Hi,

I have an item that I would like to bring to attention. We have

developed a

...
web controller to let users manages translation processes for BaseX

content.

...
Our process is something like this:

Users select content to translation (1 to 500 small files) + languages to translation to (1 to 32 selections). For each language, for each files:

The system transforms the content to xliff and sets the segments to translate="yes" if they have changed since the last translation for this language. Content is saved because we'll need to query it, we redirect to the next task. File without new segment to translate, content is processed and saved to

the

...
target languages (because attributes might have changed and/or segments might have been deleted). The xliff-file is deleted, we redirect to the

next

...
task because we'll need the new information for the next query. We query the server to offer the users stats on items to translate.

This is a simplification of the process, but it shows the logics.

Redirects

...
occur after each step for each language. We have grouped operations to

limit

...
commits/redirects to a minimum. We apply them:

After each language to avoid running out of memory. Before each operation that needs to query files based on the changes from the previous steps.

We have also split groups of tasks into smaller groups where too many

tasks

...
have led us to run out of memory in the past.

Our request would be for a way to force changes to commit without having

to

...
redirect. Refreshing the browser has a big impact on performance. Or

maybe

...
you have suggestions to improve batch processing when using a web

interface

...
for process management.

Thank you in advance for you input!

-- France Baril Architecte documentaire / Documentation architect france.baril@architextus.com

-- France Baril Architecte documentaire / Documentation architect france.baril@architextus.com

Christian Grün

13 Feb 13 Feb

5:29 a.m.

...

We run out of memory.

Who/what is responsible for the OOM)? Could you give us some more information on the exact step in the process that causes the bottleneck?

...

Web controller is jquery, but we redirect from xquery. jquery just says: translate these files in these languages. The restxq function handles the steps, and calls itself back with an incremented step number when a group of transformations that could be handle without committing content (without a need to query the saved files or without running out of memory) are completed.

On Wed, Feb 11, 2015 at 1:47 AM, Christian Grün christian.gruen@gmail.com wrote:

...
Hi France,

I guess there is no simple answer to your question; it mostly depends on the architecture of your approach what would be the best solution and further steps. And I'm not quite sure what's the major challenge? Is it performance, is it technical restrictions, is it the overall concept?

As you were mentioning that you are working with a web interface, our approach would be to provide a RESTXQ function that triggers all the transformations whenever a user requests it. Have you thought about that? What language is your web controller built on?

Best, Christian

On Mon, Feb 9, 2015 at 8:12 PM, France Baril france.baril@architextus.com wrote:

...
Hi,

I have an item that I would like to bring to attention. We have developed a web controller to let users manages translation processes for BaseX content. Our process is something like this:

Users select content to translation (1 to 500 small files) + languages to translation to (1 to 32 selections). For each language, for each files:

The system transforms the content to xliff and sets the segments to translate="yes" if they have changed since the last translation for this language. Content is saved because we'll need to query it, we redirect to the next task. File without new segment to translate, content is processed and saved to the target languages (because attributes might have changed and/or segments might have been deleted). The xliff-file is deleted, we redirect to the next task because we'll need the new information for the next query. We query the server to offer the users stats on items to translate.

This is a simplification of the process, but it shows the logics. Redirects occur after each step for each language. We have grouped operations to limit commits/redirects to a minimum. We apply them:

After each language to avoid running out of memory. Before each operation that needs to query files based on the changes from the previous steps.

We have also split groups of tasks into smaller groups where too many tasks have led us to run out of memory in the past.

Our request would be for a way to force changes to commit without having to redirect. Refreshing the browser has a big impact on performance. Or maybe you have suggestions to improve batch processing when using a web interface for process management.

Thank you in advance for you input!

-- France Baril Architecte documentaire / Documentation architect france.baril@architextus.com

-- France Baril Architecte documentaire / Documentation architect france.baril@architextus.com

France Baril

19 Feb 19 Feb

7:05 p.m.

Hi, here is an example: A process that aggregates a few 100 topics and transforms the aggregated content to a large HTML file for reviewers to see all content together works fine. Try to do it for 32 languages, and you run out of memory.

I'm trying to build a small sample. Our real processes also resolves gui values from the developers library of strings and filters content based on audiences and product numbers. It may take a while before I can get this to work.

On Fri, Feb 13, 2015 at 5:29 AM, Christian Grün christian.gruen@gmail.com wrote:

...

...
We run out of memory.

Who/what is responsible for the OOM)? Could you give us some more information on the exact step in the process that causes the bottleneck?

...
Web controller is jquery, but we redirect from xquery. jquery just says: translate these files in these languages. The restxq function handles the steps, and calls itself back with an incremented step number when a

group of

...
transformations that could be handle without committing content (without

a

...
need to query the saved files or without running out of memory) are completed.

On Wed, Feb 11, 2015 at 1:47 AM, Christian Grün <

christian.gruen@gmail.com>

...
wrote:

...
Hi France,

I guess there is no simple answer to your question; it mostly depends on the architecture of your approach what would be the best solution and further steps. And I'm not quite sure what's the major challenge? Is it performance, is it technical restrictions, is it the overall concept?

As you were mentioning that you are working with a web interface, our approach would be to provide a RESTXQ function that triggers all the transformations whenever a user requests it. Have you thought about that? What language is your web controller built on?

Best, Christian

On Mon, Feb 9, 2015 at 8:12 PM, France Baril france.baril@architextus.com wrote:

...
Hi,

I have an item that I would like to bring to attention. We have developed a web controller to let users manages translation processes for BaseX content. Our process is something like this:

Users select content to translation (1 to 500 small files) + languages to translation to (1 to 32 selections). For each language, for each files:

The system transforms the content to xliff and sets the segments to translate="yes" if they have changed since the last translation for

this

...
...
...
language. Content is saved because we'll need to query it, we redirect to the next task. File without new segment to translate, content is processed and saved

to

...
...
...
the target languages (because attributes might have changed and/or

segments

...
...
...
might have been deleted). The xliff-file is deleted, we redirect to

the

...
...
...
next task because we'll need the new information for the next query. We query the server to offer the users stats on items to translate.

This is a simplification of the process, but it shows the logics. Redirects occur after each step for each language. We have grouped operations to limit commits/redirects to a minimum. We apply them:

After each language to avoid running out of memory. Before each operation that needs to query files based on the changes from the previous steps.

We have also split groups of tasks into smaller groups where too many tasks have led us to run out of memory in the past.

Our request would be for a way to force changes to commit without

having

...
...
...
to redirect. Refreshing the browser has a big impact on performance. Or maybe you have suggestions to improve batch processing when using a web interface for process management.

Thank you in advance for you input!

-- France Baril Architecte documentaire / Documentation architect france.baril@architextus.com

-- France Baril Architecte documentaire / Documentation architect france.baril@architextus.com

-- France Baril Architecte documentaire / Documentation architect france.baril@architextus.com

3951

Age (days ago)

3962

Last active (days ago)

basex-talk@mailman.uni-konstanz.de

4 comments

2 participants

tags (0)

participants (2)

Christian Grün
France Baril