Revenge of the Brane: Unfinished Projects

Problem

One of my irregular chores is attempting to sort/filter/categorize the PDFs I’ve accumulated over time into something better than a rolling mess. I’m sitting at a somewhat sizable 31Gb and around 4500 files in just over 20 years of purchases, downloaded demos, and my own files – and that’s excluding novels, non-fiction, and comics. I’m looking at this problem – for the moment – entirely through the lens of the tabletop roleplaying, board game, TCG, miniatures, and related products since that’s what the biggest part of my collection is.

I find the problems with this task fall into three categories – usually all of them: Labeling, Organizing, and Storage.

Labeling

Labeling seems like it would be straight forward, but the reality is far from that for a number of reasons. For example, let’s look at a recent purchase of mine, from the Hurricane Harvey Relief Fund Charity Donation at RPGNow. One of the products in the bundle, [M&M3e] World Defenders: The Summit, a filename of The_Summit_Final.pdf on the invoice, but actually shows up as M&M3e_World_Defenders_The_Summit.pdf in the download bundle.  This doesn’t directly match the filename nor the product name. It appears that the RPGNow bundler does massaging on single file products, relabelling the file to match the product name but ALSO removing some characters that can make file systems cranky. Additionally, the document itself contains no metadata that can be used to assist here.

In the past, I attempted to solve this with third party products, having the most success with Calibre and a free plugin I wrote for it that attempts to scrape One Bookshelf for metadata. With this example, it works out. We correctly match the title, publisher, author, cover, game system ( without edition ), approximate publishing date, and One Bookshelf ID number. That’s quite a bit. Not quite perfect, but a lot of the way there.  Let’s try another. Also in the bundle were the Core Rules to Prowlers and Paragons, a superhero RPG. It comes with two files, one full color, and one a bit more ink friendly which we’ll use as an example. PP_Core_Rules_Printer-Friendly.pdf has no metadata in the file that’s useful, nor does the Calibre plugin find it on a search.  Without opening it or looking at the download invoice, I’d have no idea what this file is. This is what happens with a good chunk of the files.

Those of you who use Calibre ( or a similar tool )  are probably saying “When you import the file into your collections, it shows you the cover so you can edit and then run the metadata collection.” This is true. And most of the time it does work. But it is time and effort that shouldn’t be needed.

 

Organizing

Organizing a collection of files falls into two separate spheres. How you store the collection on disk and how you view and interact with that collection.  I’m concentrating on the later here and will cover the other in the Storage section.

Let’s continue with the [M&M3e] World Defenders: The Summit book as our example. When we think about a book we’re looking for, we might not always remember who it was by or the title. If I had ways to reference this, I could look the book up by Game System ( Mutants and Masterminds ) optionally with edition, a tag that it is third party or the publisher if I remember it, that it is a setting book, that it is a roster book, that it is a superheroes product, and/or that I got it from RPGNow/DriveThruRPG. On top of that, I might have some other search terms based on the genre ( Worldbeaters? ) or the setting if applicable. Storing these in a codified, consistent manner, makes it easy to find that PDF you’re looking for, even when you don’t know which exact one it is.

But isn’t this labeling?  Sort of. This is how the labels are collected, stored, and referenced.

Storage

This one is a noodle bender, for me. There are so many different ways you can organize – or not – the files on disk.  Do you go by Publisher? Game System? Genre? Should you put Peacekeepers: Savage Worlds edition in Superhero Gaming, Savage Worlds, or GRAmel? I’ll be honest. I have yet to find an answer here I am happy with. I am of the view this should largely be decoupled from the first two categories and that the next parts of the problem are as or more important. Those are versioning and online storage.

One of the greatest things about PDF publishing – with the right sales partner – is the ability to revise and propagate updated texts to your customers. Versioning isn’t a huge deal with gaming but it can be very handy to compare prior versions of rules when clarification or disagreement comes up. In the case of my collection, the average PDF is about 10 Mb. Keeping four or five copies would add up quickly. And then we’re back to the storage organizing issue, most filesystems aren’t version oriented and most versioning systems aren’t the most friendly. There are hybrid workarounds like gitfs, but these are not for the average user. Solutions like Dropbox and Google Drive can help here as well but complicate the mix.

Online storage dovetails with this simply. Virtually all online file storage systems have versioning as a feature, though it might require some tweaking to get enabled. This is a little harder to find in the user-friendly options, like Dropbox, Box, and Google Drive, but also exists in lower level systems like S3 or OpenStack Object Storage.

Proposed Solution

I have wasted a lot of time over the past decade at various points considering ways to solve these problems for myself, for others, and possibly as a service. My solution is broken into three parts that work together but should be independently usable.

Part 1: Metadata

The first part is based on the same idea as cddb. A centralized shared online database of metadata that can be compared against to label a PDF. This is accessed by a simple API call that return best matches based on the submitted metadata pulled from the PDF. In turn, end users can correct or create the metadata for other users.

The API part of this is straight forward. The trick is the data stored and how it may be attached to a book. Take for example the game Conspiracy X. Should the genre tags for it be Conspiracy, Science Fiction, and Horror or should they be hyphenated in some way? What other tags should exist? I will sketch out my ideas on this topic in my next post.

Part 2: Online Storage

My thinking here is there is a distinct need for a service that you can store your files, similarly to a generic storage, but purpose built to match up with the metadata store, do versioning, and deduplicate against other users who have purchased the same content where possible (Damn you watermarking!) In an ideal world, publisher could push new versions in themselves allowing automatic updates to all end users with minimal effort. Simple, secure API against a modern object storage back end.

Part 3: The Client

This probably has the least thought attached to it as, frankly, I don’t want to do it. 🙂 Ultimately it would be something akin to Calibre with the ability to pull/push from the Metadata store, add/pull files from the online storage, and use your preferred PDF reader. With small effort, this could extend to include other formats, such as MOBI or EPUB.

 

 


Posted

in

, ,

by

Tags:

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *