Methods/organizational dilemma - looking for advice from fellow historians!

I started this as a comment on the thread inquiring about pdfs, but it occurs to me that maybe I was asking the wrong question by assuming that way to do it.

At it’s heart, this is an organizational and retrieval problem. Organize a large # of files by labels for theme/subject (tags), and then be able to quickly and easily generate a list of all files with Y tag.

I am a few months into my research year in the archives and have already amassed A LOT of potential sources for my PhD dissertation. Some I’ve photographed in the archives myself, others I have been able to procure from digitized collections online. Almost anything I got online is a PDF. Anything I have taken in the archives I have as jpegs, and many of those I have converted to pdfs as well.

Most files I look at are organized by topic. They might span one week or 30 years. They might have a single sheet of paper or 300 pages of letters, reports, and internal memos. As an example, there is this file from the British Library:

Title: File Gen 184/9 Correspondence on the disposal of war grave monuments formerly in St Sepulchre’s cemetery, Poona
Collection Area: India Office Records and Private Papers
Reference: IOR/R/4/523
Creation Date: Jul 1956-Jun 1959

The ability to work with sources on such a granular level with Tropy is truly fantastic and something that should not be under appreciated. That said, at this point in my research I am trying to organize WHOLE files in their entirety so that I can cluster them by category (and in turn, dissertation potential chapter). Then I want to go through and reread them more closely. At this early stage, I do not, for instance, want to hop between files, a letter from this file then a letter from that file. That will probably come later, but for now I want to read all the files on topic Y, from beginning to end.

There are a few ways I imagine I might approach this, but none of them seem quite right or quite possible, thus my current inquiry.

Idea #1
Import each archival file into Tropy and then create a list for that particular file reference. Cluster as normal within list. Tag everything in that list as necessary. I started doing it this way, but realized having a couple hundred LISTS in the left hand panel would get unwieldy VERY quickly.

Idea #2
Import all the jpegs for each archival file into Tropy clustered as a SINGLE item (even if it is composed of 200+ images of letters, memos, etc). Tag each item. The read. This is probably the most doable at the moment, but Tropy’s inability to make subitems within a larger items means that I worry how much of a mess I will have on my hands once I begin to break things up on a more granular level. And “exploding”/unclustering items has so far proved a fairly messy process in my experience thus far

Idea #3 (currently not supported by Tropy)
Import to Tropy as PDFs so each archival file, regardless of how many pages/images it is, is only a single item to be tagged in Tropy. Not sure how later in the research I’d break this into a more granular option either, but perhaps after organizing and rereading I’d bring in select jpegs instead of ALL OF THEM?

Idea #4
Something else entirely? Organise them in Zotero as PDFs first somehow? Excel?

The ability to use a “tag” function is really what I think is necessary here, but I’m not sure how to implement it.

Any advice, Tropy or otherwise is most welcome!

Addendum for those w database and coding experience:
In a perfect world I’d have a form for a many-to-many table in my sqlite database that I have been developing to organize my archival requests (this is where I keep track of what I’ve looked at and when, what I’ve requested but hasn’t shown up, what I’d like to look at eventually). However, my self taught technical skills fall short of building a GUI interface that lets me build a form to do this. Using ODBC driver w LibreOffice’s Base I can make forms, but subforms won’t work when using sqlite. MSAccess doesn’t play particularly nicely w sqlite either. (I do most of my querying at present w DB Browseer for sqlite and most of my record entries of archival files are created using BeautifulSoup4, so leaving sqlite is not really an option at this pt). Ideally I could take time off and properly learn something like how to use Django or Flask to make an app, but I feel like that’s not time I can really spare during archival trips abroad.

I’m curious to hear what others think about this, too! In the meantime, I just wanted to share a few quick thoughts, coming from a Tropy-dev perspective, which might help inform the discussion:

  • We recently added support for nested lists. If your topics are in any way hierarchical, this might help and make even hundreds of lists managable in the sidebar. (But it’s still true, lots of lists will get unwieldy at some point)
  • There are some conceptional difficulties involved, but we’ll improve Tropy’s handling of a large number of tags one way or the other soon.
  • Searching / filtering within in an item: we’re aware that this will be very useful, but this is still a long time away.
  • Importing multiple pictures from a single PDF will likely create a single item for the PDF with multiple photos. You’ll be able to then break out the individual photos and re-group them any way you like, but initially, each PDF will be a single item.
1 Like

Please correct me if I’m misunderstanding what you want here, but here’s how I’ve done things similarly. I think you should use tags to identify items within a file (e.g., all items in the file you mentioned above, you could tag “File Gen 184/9” or something more memorable). You should still break out all the discrete pieces of the file into their own items, because you’ll want that later, and as you mentioned, unmerging can be messy. Once you’ve tagged your files like that, you can organize them into lists more easily. You can select all the items that have that tag and drag them into a category or thematic list. So, basically, I’m suggesting the exact opposite of your Idea #1.

This method does mean you’ll have lots of tags, which can be annoying. But the nice thing about tags (as opposed to lists) is that they’re ordered alphabetically, so you don’t have to remember where they are in the hierarchy of your lists.

As Sylvester mentioned, we’re working on dealing with scale for tags, so that process will likely only get better as new versions of Tropy roll out.

Perhaps an even easier way is to make sure you record all your metadata properly, then do a search for a unique term that only appears in that metadata (again, probably your File Gen 184/9 would suffice here). Then all of those file items would show up in your item table and you could read through them, categorize them individually with tags or dragging into lists, or categorize them in bulk by selecting all your items (Ctrl + A) and moving them to the appropriate list or tag.

Again, apologies if I’ve misunderstood what you’re trying to do, but I think these two ways will work for you (and I’d start with doing the metadata search).

1 Like

Thanks so much for your thoughts! And no, I think for the most part you’ve understood what I’m trying to do. So, in effect I tried something similar at one point. When importing photos for an archival file, in the metadata field for the Box, I always immediately input the reference number since that is how you’d locate the item in the online catalogue, request to view the item, and it would have to be included in any citation.

HOWEVER, I found that the search function in Tropy does not work with special characters. So if an item’s metadata shows it is from Box “IOR-L-PJ-7-746”, I can’t search by “IOR-L-PJ-7-746”, or even by “7-746”.

As an example, you can see here that this item has the above reference in its metadata:

But as soon as I put a special character in the search bar, it tells me that there are not items that contain “7-”

There are a couple others things I’d need to work out, but this has been a big barrier because it means I have no good way to pull up the entire contents of a box.

We will definitely fix that search error – thanks for letting us know!

The search index works, in part, by breaking down longer strings into words. If you have a - or similar delimiters, you can also try to just ignore them in your query. For example, I hope that a query such as “7 746” should match your item (and not match IOR-L-PJ-7-747 for instance).

1 Like

Dropping the delimiter will work sometimes, but I’d worry that it would also catch extraneous items. I have worked w a number of files that include in their reference numbers that could easily be years.

As an example, I’ll be looking later today at IOR-L-PJ-6-1928. Since I’ve looked at A LOT of files in that series (IOR-L-PJ-6), if I did a search without the delimiters for “IOR L PJ 6 1928”, would it then also capture items from that series that include 1928 elsewhere in the metadata but from different files?

It would include items that match all the terms: so, in other words, yes, it would match all items with field “IOR-L-PJ-6-XXXX” that also have 1928 in any of the fields or notes.

Come to think of it, here is a quick way around this: you can search for "PJ-6-1928" for example, that is, including the quotes – then it should work as you expected it to originally.

1 Like

But dropping the delimiters and doing “PJ 6 1928”, yes? At least until the search error is fixed as you said above?

First of all, THANK YOU both for being willing to share your thoughts and recommendations, and respond to feedback. It is among the reasons why I keep talking up Tropy to colleagues.

Another followup question for once the search stuff is worked out (which will be awesome!) -
is there/can there be a way to effectively run a query for distinct entries? In other words, let’s say I have added all the relevant images, added all the metadata, and tagged everything by broad subject. NOW, I want to find out how many archive boxes (not items) have the tag, say, “military” and make a list of them.

Effectively, I want to run a query that might look something like this (but surely is more complicated under the Tropy hood):
SELECT DISTINCT box FROM item_table WHERE tag = “military”

Except probably not actually that at all since there’s probably a many to many table somewhere in there and I’m not very good at working w those yet…

Nope, it should be OK to include the delimiters if you wrap the whole term in quotes. Try exactly "PJ-6-1928" like this and I hope you will get an exact match.

The queries there clearly need some sanitation (especially not produce any errors!), but basically, you can use quotes to say that you want to match something exactly, and words in that order. We’ll also be rolling out more advanced search functionalities this development cycle, by the way.

1 Like

As for the follow up: the archive box is defined as a field on the item, right? In that case, counting the distinct occurrences is something that’s not easy to do in Tropy right now. (I’m happy to help write a SQL query which you can run on the project file).

But we’ve had a solution for this in mind for a while, actually: the idea is to allow you to ‘group’ items based on a given field. So for instance, you would switch to a view grouped by the ‘Box’ field, select the tag ‘military’ – then you would see all items with that tag, grouped by ‘box’. Essentially this gives you the number of boxes (maybe plus 1 if you have items with an empty box field).

Oh excellent! Once I get back to my desk, I’ll try it out. It would DEFINITELY make my life substantially easier :slight_smile:

Looking forward to seeing what comes out on the next release. More advanced querying options will definitely be useful as the volume of material accumulates!