Checking our planned use-case

mankoff · September 15, 2020, 8:36pm

Hello,

I think tropy might be the right tool for us to use for our project, but wanted to confirm here.

We maintain a weather station network on Greenland and would like to make photos of the stations searchable and browseable by ourselves and other researchers (and the public). Each year we visit ~20 stations and take ~100 photos per station, so we have 20,000 photos from the past decade.

We’re going to host these photos on a Dataverse website (each photo gets a DOI) and my vision is that we share a single Tropy project that accesses each file from the website. Each photo is tagged by station, year, what is in the photo (sensor name or ID), and perhaps other metadata. Example photos: https://demo.dataverse.org/dataset.xhtml?persistentId=doi:10.70122/FK2/AETL0A

Each year after our field season we’ll add ~2,000 more photos, update the master Tropy file, and re-upload that file for anyone to download.

Some questions:

Do others work with this scale (i.e. >10k) of photos?
Can I build the tropy DB by code? I’d rather not drag-and-drop all our photos, but can crawl them via code and inject them into Tropy that way? Can I add tags via code? More generally, is there an API? It looks like maybe I just build a JSON-LD file and then import that?
I assume that the Tropy file we distribute should be read-only, or come with a warning, that if it is replaced next year any changes they make on a local copy will be lost.
Are there any better tools you can think of, or issues I’m not yet aware of?

Thanks,

-k.

inukshuk · September 16, 2020, 7:57am

I think this approach should work, yes. I guess my main question would be what problem you’re trying to solve by adding Tropy to the mix? If I understand correctly, the photos are already available via the website and you would create the Tropy project via code so Tropy would be mainly used to view the data which is also available on the website, right? In this case, I guess it’s a good solution if either Tropy’s UI makes the data more accessible or if you’re anticipating users to continue working on the data (or parts of the data) themselves in Tropy.

Regarding the questions: I’ve definitely seen and used projects at the scale (around 30k photos for sure); the UI should absolutely hold up, some of the database queries have not been optimized yet for larger projects so there are definitely paths to improve this further, but I think it should be OK. Another consideration is disk space obviously.

Importing remote photos currently only works via drag-and-drop and via the JSON-LD import, so, yes, generating suitable JSON to import your data is definitely the way to go. You can add tags via code (not lists though).

If the project file is read-only, Tropy should switch to read-only mode. This way you can signal that a project is not supposed to be changed (you could still, e.g., copy paste items into a writable project of course). Since these are just file attributes there’s no way for your to ensure what the file permissions are, of course.

Omeka might be worth a look as well.

mankoff · September 16, 2020, 11:24am

Hi Niukshuk,

Thank you for the feedback.

The thing that Tropy provides is taggable browsing. That isn’t available on the website. Easily say “I’d like all images from station X during these 3 years that include sensor Y”. Or “Sensor Y at all stations in 2018”, etc.

Does Tropy download all the images? In that case it might be too big. I assumed it downloads them all on my machine as I build the DB (that will take a while) but only to create the thumbnails, not storing all 20,000 images. Then it might be a few 100 MB file that we distribute?

Thanks,

-k.

inukshuk · September 16, 2020, 3:00pm

For remote images Tropy currently downloads the original files to create thumbnails, but also a full-size variant (this is skipped only for images which are available locally). This happens whenever Tropy tries to display an item (no matter which size) and there is a load error. Therefore, if you’d open the project for the first time, Tropy would instantly fetch the first photo of approximately the first 40 items (this depends on the zoom level and the size of your window). As you scroll through the project, it would keep fetching more photos and, ultimately, all of them.

While this will likely continue to be the default behavior, we’ve been considering adding ways to fine-tune this for cases where you have lots of remote images, or images on external drives, for example, where it’s to be expected that the images are not available all the time – this is mainly to avoid Tropy showing warnings for missing images, but for remote images this could be a way to stop fetching them.

Another option we could add is not to generate full size copies of remote images if they are natively supported by WebGL – this would probably be the best for your scenario. That way Tropy would fetch images automatically as needed and create thumbnails, and then fetch the original (or cached version) only when you open a photo in the item view.