To the BaseX Community,
I am a 68-year-old #PayItForward cancer-survivor independent #CitizenScientist doing applied research in #DigitalHumanities and #MachineLearning and I need your help, please. And please forgive me for posting such a TL;DR post.
I am not lazy nor a dilettante, I am simply under tight time pressure to get some development done on my Python-based metadata discovery and curation toolkit which my fellow cancer-surviving wife, Timlynn, and I will be showcasing via our poster presentation accepted for next month's #DATeCH2019 conference in Brussels (http://datech.digitisation.eu/).
Our poster is entitled "#MAGAZINEgts and #dhSegment: Using a Metamodel Subgraph to Generate Synthetic Data of Under-Sampled Complex Document Structures for Machine-Learning" (ResearchGate preprint: https://is.gd/factminers_datech2019_poster). #MAGAZINEgts is the XML-based ground-truth storage format Timlynn and I are developing based on an ontological "stack" of #cidocCRM/FRBRoo/PRESSoo using a metamodel subgraph design pattern. The goal of our design is to support an integrated complex document structure and content depiction model for digital collections of print magazines and newspapers. (For more, see our #DATeCH2017 poster: https://is.gd/factminers_datech2017_poster)
We are evolving a reference implementation of the #MAGAZINEgts format for the collection of Softalk magazine at the Internet Archive. The collection is here: https://archive.org/details/softalkapple?&sort=date, and the MAGAZINEgts file (~10+ MB) is linked from the About page of the collection but is provided here as a shortened link: https://is.gd/softalk_magazinegts_xml_file.
MY IMMEDIATE GOAL: Rather than keep with the awkward workflow of generating intermediary JSON metadata files and, in batches, converting to XML and copy-pasting into appropriate positions in the master publication file, I want to incorporate direct incremental updating of fine-grained #MAGAZINEgts metamodels, metadata, and their associated source-document-specific datasets via integrating BaseX into the FactMiners Toolkit (fmtk). We would _really_ like to be showing this significant enhancement of our toolkit at the DATeCH conference (8-10 May).
MY CURRENT CONTEXT: I have the latest BaseX installed, working well, and I have done as much "fast track" learning as I can to come up to "toddler" speed on BaseX and its Python-based API. I have the Python API extension installed and working within my PyCharm IDE, and I am on Windows 10.
MY CURRENT NEED: I have used the BaseX GUI to develop a sample XQuery for updating/adding a machine-learning data spec for curating the bounding boxes of advertisements on a page within a magazine. The query below is not parameterized for programmatic dynamic execution. It is simply a hard-coded test of my evolving understanding of doing BaseX interactions. So the dataset name ("all_ads"), the issue-page filename (softalk_v2n02pg002.png) and the various dimension numbers, etc. are explicit rather than variable, etc., within this sample query. When I run this query and do an export of the MAGgts master file, the update is there and looks great. Even though my knowledge and skills with BaseX are small but growing, I feel I have enough grip on things to forge ahead to at least get BaseX integrated for the #ML image training dataset feature that we will showcase at #DATeCH2019. (BTW, I stripped the #MAGgts schema's XML namespace during BaseX database creation to make things easier during learning. I expect to simply restore it in the header after exporting and before uploading a new release of Softalk's reference implementation of this ground-truth format. Either that, or I will tweak the eventual Python-implementation of the queries to include the namespace and just leave it intact when importing into the local BaseX database.)
HERE IS THE SAMPLE QUERY:
===
declare option db:WRITEBACK 'true';
declare variable $new_spec := <ML_training_img_spec file_name="softalk_v2n02pg002.png">
<ML_image_dim width="940" height="1280"/>
<ML_label_bbox label="ad" status="predicted" left="500" top="680" width="444" height="580"/>
<ML_label_bbox label="ad" status="actual" left="490" top="620" width="440" height="575"/>
</ML_training_img_spec>;
update:output("Update successful."), insert node $new_spec as last into doc("MAGgts")
//Metadata//ML_maxpixel_datasets[@max_pixels = "1000000"]
/ML_dataset[@name="all_ads"]//ML_training_img_specs
===
MY REQUEST FOR ASSISTANCE: It would be _extraordinarily_ helpful, and Timlynn and I would be most grateful, if someone within the BaseX community that has familiarity with doing Python-based BaseX integration could provide a brief implementation -- similar to the examples supplied in the Client integration samples -- that would show me how to take a BaseX GUI-developed query and convert it to a usable state in a Python program.
BY WAY OF THANK YOU: If anyone can help us short-circuit my path to integrating BaseX into our #DATeCH2019 poster presentation, we will gladly cite you and your assistance in the acknowledgements of the poster and its 2-page companion handout.
Again, I am sorry for hitting folks with such a long and detailed request for nooby assistance.
In advance, thank you for any help you Good Folks may provide. I look forward to significantly improving the functionality of the FactMiners Toolkit by incorporating BaseX into its core platform.
Happy-Healthy Vibes from Colorado USA,
-: Jim Salmons :-
On Wed, 2019-04-10 at 14:52 -0600, Jim Salmons wrote:
[...]
I _think_ what you are asking is, how so i interpolate values into a string in Python.
If that is correct, then the first Google result for interpolate values into a string in Python is https://www.programiz.com/python-programming/string-interpolation
The main thing to remember is that $ and { are special in XQuery, so it can be easier to use substitution with a regular expression than direct interpolation.
https://stackoverflow.com/questions/3877623/in-python-can-you-have-variables... may also help.
If this is meaningless technobabble or i have misunderstood, please feel free to ask again but more directly.
Liam
Hi Liam,
Thank you! And not at all confusing in terms of babbleness.
Those links will certainly help with the nitty-gritty of creating the Python side of preparing the parameterized string of a new entry in the dataset. I will take your thoughts into account when trying to make a next step in this "solution discovery" process that I am on.
Still remaining is the basic "harness" of a Python-originated update transaction. I'm still trying to bridge the gap between all the fine-grained info of the Server Protocol page (http://docs.basex.org/wiki/Server_Protocol) and proper transformation of those bits into Python-submitted API calls. (I hope this makes sense.)
BTW, in the meantime, I have downloaded Christian Grun et al's 2012 awesome "A framework for retrieval and annotation in digital humanities using XQuery full text and update in BaseX" (PDF: https://ids-pub.bsz-bw.de/frontdoor/index/index/year/2015/docId/3715). I am digesting this incredible inspirational resource as fast as I can. :-)
-: Jim :-
-----Original Message----- From: Liam R. E. Quin liam@fromoldbooks.org Sent: Wednesday, April 10, 2019 7:06 PM To: Jim Salmons jim.salmons@factminers.org; basex-talk@mailman.uni-konstanz.de Subject: Re: [basex-talk] Nooby/#CitizenScientist REQUEST for HELP: Python implementation of this XQuery
On Wed, 2019-04-10 at 14:52 -0600, Jim Salmons wrote:
[...]
I _think_ what you are asking is, how so i interpolate values into a string in Python.
If that is correct, then the first Google result for interpolate values into a string in Python is https://www.programiz.com/python-programming/string-interpolation
The main thing to remember is that $ and { are special in XQuery, so it can be easier to use substitution with a regular expression than direct interpolation.
https://stackoverflow.com/questions/3877623/in-python-can-you-have-variables... may also help.
If this is meaningless technobabble or i have misunderstood, please feel free to ask again but more directly.
Liam
-- Liam Quin, https://www.delightfulcomputing.com/ Available for XML/Document/Information Architecture/XSLT/ XSL/XQuery/Web/Text Processing/A11Y training, work & consulting. Web slave for vintage clipart http://www.fromoldbooks.org/
Hey Liam and BaseX Community,
With Liam's helpful pointers I was able to create a modification of the QueryBindExample.py sample script and use the String Template class to do the Python version of my machine learning dataset update XQuery expression for adding new training image specs to the #MAGAZINEgts ground-truth storage document in BaseX! :-)
Here is a link to this sandbox script: https://1drv.ms/u/s!AtML1v0eUlpEiJYTjUDhkFmDyXoK1A
The one "hinky" thing in using the String Template approach was the "double-take" bit where a string of the name of the XQuery variable, $new_spec in this case, is replaced by the Template variable, $query_var, so the Template substitution could "put back" the query's original variable as a string, '$new_spec', within the constructed XQuery expression to be submitted to BaseX. All the other Template substitution values hardcoded in this sandbox exploration script will simply be pulled from the advertisement bound box being ground-truthed once I work this query into the FactMiners Toolkit.
Thank you, Liam, those pointers had all the info I needed when put together with the supplied BaseX Python client API examples to move forward. :D
Happy-Healthy Vibes from Colorado USA, -: Jim :-
-----Original Message----- From: BaseX-Talk basex-talk-bounces@mailman.uni-konstanz.de On Behalf Of Jim Salmons Sent: Wednesday, April 10, 2019 8:30 PM To: 'Liam R. E. Quin' liam@fromoldbooks.org; basex-talk@mailman.uni-konstanz.de Subject: Re: [basex-talk] Nooby/#CitizenScientist REQUEST for HELP: Python implementation of this XQuery
Hi Liam,
Thank you! And not at all confusing in terms of babbleness.
Those links will certainly help with the nitty-gritty of creating the Python side of preparing the parameterized string of a new entry in the dataset. I will take your thoughts into account when trying to make a next step in this "solution discovery" process that I am on.
Still remaining is the basic "harness" of a Python-originated update transaction. I'm still trying to bridge the gap between all the fine-grained info of the Server Protocol page (http://docs.basex.org/wiki/Server_Protocol) and proper transformation of those bits into Python-submitted API calls. (I hope this makes sense.)
BTW, in the meantime, I have downloaded Christian Grun et al's 2012 awesome "A framework for retrieval and annotation in digital humanities using XQuery full text and update in BaseX" (PDF: https://ids-pub.bsz-bw.de/frontdoor/index/index/year/2015/docId/3715). I am digesting this incredible inspirational resource as fast as I can. :-)
-: Jim :-
[[[snip]]]
basex-talk@mailman.uni-konstanz.de