Understanding Vocabularies

Dave_C · February 7, 2023, 7:59am

Hi all, apologies if this is very newbie but I am somewhat confused about the use of vocabularies and their connection with the ‘Datatype’ parameter.

In the help guide, there is a comment “In order to effectively organize and finds sources based on type, use consistent terms or a controlled vocabulary”

This suggests that by having (say) dc:type as my property, the metadata item will be constrained to a predetermined set of values but it doesn’t seem to matter what value I fill in for that attribute, it just accepts it.

When I go to the link for dc:Type vocabulary, it tells me that the values include things like: ‘Collection’, ‘Dataset’, ‘Event’, ‘Image’, ’ InteractiveResource’, ’ Service’ etc. but these are not being constrained in the parameter field.

In a related question, along with the ‘Property’, there is a ‘Datatype’ field and it is not clear that there is any connection between them. E.g. if I set the Property to dc:type, the datatype seems to be ‘xsd:string’.

I feel like I am missing something obvious?

inukshuk · February 7, 2023, 9:24am

Tropy uses the datatype indicated by template fields as a hint for how to interpret values. At the moment only xsd:integer and tropy:date are handled separately (everything else is treated like xsd:string as a text value).

For xsd:integer the input is constrained to numbers; for tropy:date extended ISO dates are parsed an displayed according to your locale. Going forward we’d like to add a custom date input widget with support for ranges, approximate dates and so forth. We might also add support for more datatypes in the future.

Dave_C · February 7, 2023, 10:04pm

Hi Inukshuk & thanks for your speedy reply.

From this, I understand that the data types currently supported are date (YYYY-MM-DD), integer, and string - these are the main ones that I would need and I now understand their purpose so thanks.

Can you also comment on the other part of my question about how vocabularies work?

E.g. According to the DublinCore specs the property dc:type has set values (‘Collection’, ‘Dataset’, ‘Event’, ‘Image’, ’ InteractiveResource’, ’Service’ etc.) but I am not constrained to only those values when I enter a value in the Type metadata field.

In this example, if I start to type a value (e.g. “Im…”) in the Type field, I was half expecting that the only autocomplete options I would be presented would be from that vocabulary (in this case ‘Image’) but that is not happening.

Am I missing something?

More generally, you and your team are doing a great job - I’ve been accumulating a lot of information about 4 interconnecting stories that peak in Australia in the 1880s and was finding it increasingly difficult to keep the information organised and accessible - Tropy is doing this brilliantly!! Its a great tool and I particularly like the user experience, it is intuitive and fast, with everything organised where you would expect it to be.

I do like your proposal to look at a more sophisticated date widget. I have some material that I can only place to the approximate year (let alone month or day), so something that enables that ambiguity to be captured, while retaining the ability to list material in date order will be great (I guess the open question is what estimated date is used when a range is provided - I started using 1 January as not a lot happens then )

inukshuk · February 8, 2023, 10:31am

You’re not missing anything – this part (dealing with specific value/data types) is not yet where we’d like it to be. The idea is very close to what you were expecting: we’d like to make the templates and the metadata data inputs expressive enough to let you constrain a field’s datatype for example to classes of the dcmitype vocabulary and automatically suggest the appropriate values using localized labels (i.e., for English you would see only ‘Collection’ but Tropy would ensure that the full dcmitype:Colllection id would be set internally).

We’re hoping to embrace RDF / linked-data more generally for all metadata in Tropy but keeping the UI itself as straightforward as possible. For the time being though most values are just literal strings as far as Tropy is concerned.

With regard to dates, Tropy’s date implementation is based on the extended ISO format (originally called EDTF) which supports approximate dates and also different levels of precision. For example, you could use a date such as 1880 which would cover the full year (no need to pick a day and month); or ~1880 which covers approximately that year. Currently Tropy will parse such dates and display them accordingly. In the future we hope to add a dedicated widget to make inputting these dates easier. More importantly though, with the advanced search feature we’re planning to take these extended dates into account. In essence, each extended date stands for a time range and so when searching, we want to be able to use that information (e.g., if you search for 1. January 1880 both 1880 and ~1880 would match because that day is covered by those date ranges). Again, this is something we hope to add in the future, but if you already enter valid ISO dates now your data will already be in good shape.

Dave_C · February 8, 2023, 11:23pm

Hi inukshuk,

Once again thanks for your speedy reply. This helps and my read is that some of the notes in the user manual should be seen as end design goals and not necessarily all implemented at this stage.

I look forward to the more complete implementation of the vocabularies when your team is able to get around to it.

Thanks also for the additional notes on dates - I hadn’t clicked that I could just enter something like “1888” or “~1888”.

While this is very helpful, I just noticed that any item with a circa date (e.g. “c. 1880” or “ca. Jan 1, 1885”) is not being listed in chronological order but is being placed at the bottom of the normal date range. Is this by design?

My question is because I am trying to see a full chronology of events and being able to see the circa dates in sequence with the known dates would be very helpful. I appreciate that doing that would require that some assumption is made about the actual date however the fact that these dates are tagged with the “c.” or “ca.” makes it easy to interpret

inukshuk · February 9, 2023, 4:50pm

Sorting with taking datatypes into account is also another open issue. However, in this case you can probably get close enough by changing the value to 1880~ which is equally valid and will be sorted more in the way you’d expect.

Dave_C · February 9, 2023, 10:16pm

Hi Inukshuk, once again thanks for the speedy reply.

That worked brilliantly - thanks!

I’ve learnt a lot through these recent exchanges. The use of the tilda for date approximation was not obvious and I went back to the User Guide to see if I had missed it - which I had

While it is in there, it might be useful to make it clearer (e.g. under a subheading) and the User Guide is not clear on two points

The example used “(e.g., 1802, or 1802-01-01; 1802 or 1802-01-01 to indicate uncertainty about the day)” does not include the tilda - perhaps it could read (e.g., ~1802, or ~1802-01-01; 1802~ etc.
On my browser, the example dates are coming up with a strikethrough which is confusing

Hopefully this will be the last you will hear from me for a while - once again, great product