Help choosing properties, data types (beginner question)

Hello everyone,

First of all, sorry in advance if this is a very basic question — I’m new to Tropy and I don’t have any background in programming or metadata standards. I’m still learning what “properties” and “data types” really mean in practice.

I’m using Tropy for a historical research project, collecting archival documents (primary sources) related to my hometown (Orpesa, Spain). I’m currently creating a custom Object template, and I’m a bit lost when choosing options in the Template Editor.

When adding a field, Tropy asks me to choose:

•	a property (from a long list with codes like dc:…, dcterms:…, bf:…, foaf:…)
•	and a data type, which I also don’t fully understand yet

I’ve read the documentation, but I’m still unsure how to apply it to my real use case.

These are the labels I would like to have in my Object template, and what I intend to use them for:
• Date: date of the document shown in the image
• Archive: institution where the document is held
• Call number: archival reference code
• Pages / folios: pages or folios of the physical document (e.g. 45v-70r)
• Images: image number or range in a PDF or online viewer (255-301)
• Place: place of creation or production
• Brief description: short description written by me
• Archive description: description provided by the archive
• Secondary descriptions: descriptions from bibliography (multiples)
• Copy date: date when the copy was made (if it’s a copy)
• Document type: type of document
• Language: language of the document
• Transcribed: whether it has been transcribed (yes / no / some parts…)
• Link: URL to the online resource
• Notes: free notes
• References: bibliographic references (multiples)
• Identifier: internal project identifier
• Rights: rights or usage information

In addition, I’d like to ask about file organization and naming strategy, to check whether I’m doing something reasonable or if there’s a better approach.

My current idea is:
• keep all jpg/pdf files in a single folder
• name them sequentially as ORP-00001, ORP-00002, ORP-00003, etc.
• the numbering reflects order of entry, not date or content
• all actual description, chronology, and organization is done inside Tropy, not through file names or folders

Does this sound like a good practice for long-term work in Tropy?
Would you improve this approach in any way?

Related to this:
• In this scenario, would you keep an “Identifier” label in the Object template?
• If yes, is there any way in Tropy to automatically populate the Identifier field from the file name when importing images?

Again, I’m not aiming for a perfect or highly technical setup — just something solid and sensible for a beginner.

Thanks a lot for your patience, and also thank you very much for Tropy itself. It’s honestly the tool I had been looking for for years, even if I’m still learning how to use it properly.

Best regards,
Pau, a 21-year-old clueless Spanish researcher :sweat_smile:

I would recommend to try and reserve yourself to the Dublin Core vocabulary dc / dcterms. This should already cover most of your fields. For convenience you can add labels to your template. For example, the Tropy Generic template uses ‘Archive’ as label for dc:source. This is really only for you (Tropy will show that label when the template is used for an item) but if you share or process the data later it’s just dc:source. So that’s what I’d do first. Only if you’re left with fields that you need which can’t be expressed using Dublin Core terms, would I go looking into other vocabularies (and there I’d orient myself based on what vocabularies are popular/used in your area of research).

The date type is in the template is used to describe what kind of value you want to put into the fields. This is not utilized much in Tropy yet; conceivably we might use this to show custom controls/widgets for certain fields in the future. Currently, you can just assume everything is a string, with the exception that if you use tropy:date Tropy will parse extended ISO date/time values and display them formatted according to your locale.

If you want to control how the files are named and organized on your disk, make sure to use an advanced project (standard projects make their own copy of each file). Having sequentially ordered names is helpful for sorting items in Tropy so that’s a good idea.

JPEG is supported very well in Tropy. Since Tropy doesn’t alter the original files, I’d suggest to also optimize the images before importing to save disk space. For PDFs Tropy has to render each page (a full-size image of each page is stored in the image cache). This is fine for PDF documents. However, if your PDFs are just a collection of images, I would recommend extracting the images first and importing those into Tropy (this way you skip the need for full-size variants in the image cache).

Tropy has internal ids for each item in a project. An extra identifier field is useful if you have meaningful outside identifiers already. I would only add it in that case (or if you anticipate linking the items to some outside source using an identifier). Currently there is no way to automate fields based on the filename. You could embed the identifer in each image using XMP and then Tropy would pick it up during import. That said, it would be easy to write a script to add an identifier to each item based on the filename. One way to do this is via Tropy’s developer API (which is also easy accessible to today’s coding agents).

2 Likes

Many thanks for your explanations, especially regarding the use of Dublin Core vocabularies. I really appreciate the recommendation to stick to dc/dcterms as much as possible. In that regard, I wanted to ask whether there is any practical difference or preferred usage between dc and dcterms.

Thank you as well for clarifying the role of data types in templates. I will follow your suggestion and use string for all fields, using tropy:date only for date values.

Regarding file organization, I’ll use that sequential naming scheme such as ORP-00000, ORP-00001, etc., which seems very helpful for sorting and identifying items. I will also switch to an advanced project in order to have full control over how files are named and stored on disk.

I also recall that at some point Tropy asked whether I wanted to import files or link a folder, and I chose to link the folder where I will store everything. In that folder I will keep all files together (JPG and PDF), without any internal subfolders.

This leads me to a practical question: when a single document consists of multiple images (for instance, a PDF that I have split into individual images), how would you recommend organizing them?
• Using the same identifier with a sequential suffix, for example ORP-00001-0001, ORP-00001-0002, etc.
• Or creating a folder ORP-00001 and placing all images belonging to that document inside it.

Any advice on this would be greatly appreciated.

Many thanks again for all your help!

I think both nested folders or a suffix is fine (or even both if you preserve the naming convention in the sub-folder). I think it depends on how much the PDF structure reflects your intended item/photo structure in Tropy. If you may want to split up the photos across different items, I’d recommend a single folder.

2 Likes

With regard to Dublin Core, dcterms is pretty much a newer and extended vocabulary. For historical reasons, Tropy uses dc:title and similar fields. dcterms:title is equivalent though dc:title so some people will recommend just adopting dcterms.

2 Likes