forums.tropy.org

Import/export Tropy items to/from Transkribus

Hi,
I have a question regarding the integration of Tropy with Transkribus.

I am working in a team of research navigators at the University of Basel where we advise research projects from the humanities and social sciences on digital methods. We often recommend Tropy for image organization due to the flexible metadata options and easy export to Omeka S for presentation. A scenario we have encountered frequently is the following: images are organized and described in Tropy but then a transfer to Transkribus is required for ATR and sometimes textual annotation. The corresponding workflow is cumbersome:

  1. Manually import a Tropy item into Transkribus.
  2. Manually map the ATR output from Transkribus to the items/images in Tropy (either as an item/image’s metadata field, or note, or external reference).

Skipping the second step entails significant information loss and is thus not recommended. Ideally there would be an option to export/import projects/items from Tropy to Transkribus, on the export side perhaps similar to the Omeka S plugin for Tropy. But before we start developing such a solution (or a workaround), I wanted to ask here whether we miss a simpler or more obvious approach.

A Transkribus plugin with round-trip support would be great! Currently Tropy supports export and, in the latest release, import plugins. I think this is still lacking for your scenario, because an import always creates new items. We’re also currently working on a third plugin type (tentatively called ‘process’ plugins) that can process/manipulate a range of selected items/photos. The idea is to use this for OCR or similar requirements (i.e., you select a photo, select the OCR plugin, which will add a note with the transcription to the photo). However, we haven’t decided how best to support asynchronous processes such as your case: essentially, we’d need a plugin infrastructure where the plugin starts processing the items (e.g., imports them to Transkribus), then essentially waits and, by some means, gets alerted when the processing is done and updates the processed items accordingly.

I’m confident that we could finalize the simple processing plugins very quickly. The asynchronous processing will require more planning, but perhaps this is something on which we could collaborate.

Meanwhile, I think step 1 could already be implemented using Tropy’s export plugins. Basically, the plugin receives the JSON representation of the selected items and can convert and upload the data to Transkribus (similar to the Omeka S plugin).

Step 2 could probably be automated using the HTTP API – this is something that exists but has not been documented yet. Basically, Tropy can expose an HTTP API for the current project, so it should be possible to write a script which fetches information from Transkribus and then adds notes or metadata to Tropy items via the API (obviously step 1 could also be implemented this way; the main advantage of the plugin is that it allows you to select the items to export in the GUI). You can enable the API by adding "api":true to the state.json file in the user data directory (Help->Show user data folder in the menu). When you then restart Tropy the API should be enabled (on port 2019 by default). For example, to retrieve a list of items in the project you could use http://localhost:2019/project/items. If you would like to explore this further I can post all the endpoints currently supported.

Thank you very much for taking the time to reply to my question. I have to small follow-ups:

Is there already a rough release schedule for the (simple) processing plugins?

I tried out t the HTTP API and it looks indeed very promising for what we had in mind – so yes, posting the currently supported endpoints would be most welcome!

We’re looking to add processing plugins this year.

As for the API, the main open question is how to support different project files. Ideally, we’d like to avoid encoding the full path to the file as a parameter (though that’s an option) so we’re likely going to add project file aliases. Essentially, this would be a preference pane where you can set a unique name for a given project file. You can then use that name in the URL path to address a certain project (and you need to update the name-path mapping if you move the project file).

Currently, the API should support the following endpoints:

POST /project/import # Supported parameters: file (path or URL), list (id)

GET /project/items # Find items; supported parameters: q, tag, list, sort, reverse
GET /project/items/:id # Show specific item
GET /project/items/:id/photos # Show item's photos
GET /project/items/:id/tags # Show item's tags
POST /project/items/:id/tags # Add tags via parameter: name, color
DELETE /project/items/:id/tags # Delete tag via parameter: tag

GET /project/list/:id/items # Find items (see parameters above)

POST /project/tags # Add tags via parameter: name, color
DELETE /project/tags # Delete tags via parameter: tag
GET /project/tags # Find tags
GET /project/tags/:id # Show tag

GET /project/data/:id # Show item, photo, or selection metadata
POST /project/data/:id # Save item, photo, or selection metadata (key, value in body; key is property id)

GET /project/notes/:id # Show item, photo, or selection note; parameters: note (id), format (json, html, plain, text, markdown)
POST  /project/notes # Create note; parameters: html, language, photo, selection

GET /project/photos/:id # Show photo data
GET /project/photos/:id/raw # Get the original photo file
GET /project/photos/:id/file.:format # Get the photo file converted to the given image format

GET /project/selections:id # Show selection data
GET /project/selections/:id/file.:format # Get the selection as an image file in the given format

GET /version # Prints Tropy version

Please let me know if you have any questions and also if you have any suggestions or if something is missing.