forums.tropy.org

Importing notes via jsonld?

I see the latest version of Tropy allows json-ld import by dragging and dropping, which is cool.

I have several images that I have ocr’d for text. All of that text is now in one great big txt file. I want to add the text for each image to the image’s notes pane in Tropy, so I can clean it up. My txt file has each image title and image path in the text, so with a bit of regex, I should be able to just paste in the jsonld formatting around that, and import? So far, no…

I’ve studied the way Tropy exports jsonld with notes, and I’m finding that unless I include checksum and all the full metadata, I can’t get things to import. But at that point, I might as well just copy and paste the text one image at a time.

But shouldn’t the following work?

  "@graph": [
    {
      "@type": "Item",
      "template": "https://tropy.org/v1/templates/generic",
      "title": "e000000422",
      "photo": [
        {
          "@type": "Photo",
          "path": "/Users/shawngraham/Desktop/tropy-experiment/gamble/e000000422.jpg",
          "template": "https://tropy.org/v1/templates/photo",
          "title": "e000000422",
          "note": [
            {
              "@type": "Note",
              "text": {
                "@value": "blah blah blah ocr'd text goes here",
                "@language": "en"
              },
              "html": {
                "@value": "",
                "@language": "en"
              }
            }
          ]
        }
      ]
    }
  ],
  "version": "1.7.0"
}

Ideally, I’d just like to be able to import a file with

title:"001.jpg" ,
path:"path/to/file/001.jpg",
note:"ocrd text goes here"

Thanks!

Are your text files arranged exactly like in that last example? And do I understand correctly that the file contains many of these title-path-note triples? If that’s the case, we’ll make it work: I’ll post a quick script which you can run to generate a suitable JSON file.

Could you also let me know what operating system you’re using?

Hiya,

That would be fantastic, if it’s not too much trouble. Yes, my workflow for the automatic transcription ends up with a single file, which I’d clean up with regex to have many of those title-path-note triples.

I’m on Mac, Catalina 10.15.3

Thank you!

Right, here is a quick first go at it:

Please note that, as is, this is extremely trivial – it assumes you really have these three lines for every item; assumes the strings are JSON formatted, etc. We can make this much smarter obviously, but that might not even be necessary for this.

You’re right that Tropy currently expects the mimetype and checksum of the photo to be set; this is a bit silly, because Tropy will check those and update if necessary on import anyway. We’ll update Tropy not to require those fields, for the time being it’s fine to just put in placeholder values.

Similarly, I realized we should make the template fields optional (if missing, Tropy should use the default templates you set in the preferences).

Note also that this isn’t even a full JSON-LD object: Tropy will inject a default @context (the downside is that we need to pass a full property id for the title). You could even simplify this further and just import a JSON array – I left the @graph syntax in order not to make this too confusing, but it could be left off.

The photo paths need to be absolute paths or relative to the JSON file during import.

Right, please give this a spin and let me know if anything does not work as expected!

hot damn! it works!

I’m going to write up my entire process for using cognitive services to do the handwriting recognition, thence to tropy. Was thinking of submitting this to programming historian, if that’s ok? Make you a coauthor?

1 Like

For anyone who is interested in the larger workflow, I wrote it up here: https://github.com/shawngraham/handwriting-to-tropy/blob/master/index.md

Awesome, and thanks for sharing!