Reorganizing my Git writing repository

As some of you may know, I use Git to organize my writing. After years of accidentally overwriting a good chapter with an old one or trying to coordinate changes from two separate machines, I got into source control for writing; it worked for programming, why not my novels?

The submodule approach

Well, I've had a couple iterations of trying to get the “perfect” Git setup for my writing. Earlier this year, I broke apart the novels into submodules but left the bulk of my writing in the main repository (called stories). This meant I had a sand-and-blood, sand-and-ash, and sand-and-bone repository as submodules in the appropriate location of the stories repo (dmoonfire/fedran, if you are curious).

My reasons came while I was working on Sand and Blood covers. Since I checked in as I went, the size of the repository quickly became too large for my website to handle. I could download up to 50 MiB repo without too much trouble, but when it got into the 900 MiB range, I couldn't clone the repository anymore.

I had already worked with submodules before, so I thought they would be a perfect thing for the novels. I spent a pair of nights pulling out the five current WIP novels into a submodule, mainly by cloning the repo and using various commands to carve them out. It also took a while because I have a lot of project branches (41 beyond master) which represent every work-in-progress or semi-completed work I've done. Pulling out binaries from every branch was a painful process to say the least.

The submodule approach worked out fairly well, but I quickly found out some of its limitations. Because of how Git implements submodules, its inevitably shows up in other branches. It also has additional work.

To give an example. Assume I'm on my sand-and-ash branch and I'm happily working in the dmoonfire/fedran/sand-and-ash directory making changes. When I'm done, I've committed them and pushed up.

When I got up a level, to dmoonfire/fedran, I have to do a second commit to commit the submodule's position in the stories repository. It was a little extra work, but it kept the two isolated.

The real problem came when I switched to the sand-and-blood branch. The directory dmoonfire/fedran/sand-and-ash is still there and pointing to a repoistory (the sand-and-ash one), but I have to tell the sand-and-blood branch about it, otherwise it will show as an untracked file.

My two choices were to either add the dmoonfire/fedran/sand-and-ash directory to the .gitignore file of the sand-and-blood branch. (Okay, there are a lot of filenames in this post, sorry about that.)

The other approach is to add the submodule to the other branches so they didn't show as changes. Which worked until I made another change to the submodule and then I had to update it on every other branch to reflect the changes.

Isolating covers instead

Last night, I got tired of jumping through the hoops of submodules. I realized the entire reason I wanted to isolate the novels was to handle the covers. So, I decided to make a covers repository instead, put it into the root of the stories working directory and then add it to the .gitignore. This means that the stories repository doesn't officially know about the covers repository, but I can still reference it via soft links into covers.

The advantage of this approach is all the writing (actual words) are still managed in the same repository. This means when I switch branches, the stuff in sand-and-ash branch (not repo now) goes away until I go back. There isn't any cruft that drags on between the individual branches that has nothing to do with the current branch.

It isn't very elegant to have covers separated, but I only need covers when I'm formatting ebooks.

Losing history

One of the side effects of breaking apart the repository and pulling them back together is that I'm losing history data. I kept most of the commit histories intact, but now I can't really do a graph of total words written over a month or time. Since I can't tell if anyone actual read my posts when I documented them, I decided to accept that lose.

BFG

I mentioned that splitting apart the repositories was a lot of work. When I combined them back together, I was preparing myself for a lot of work. Then, I found BFG Repo Cleaner. This is a Scala (a language I don't know) tool that works better than git filter-branch in a lot of ways.

I ended up using BFG to remove most of the cover images from the repository along with the large files. This let me trim the final stories repository from 1.9 GiB to 20 MiB. The covers repository is at a nice 419 MiB, but that is also acceptable since I use it so infrequently.

If you have to remove files, directories, or large objects from your repository, it looks like BFG is something to seriously consider.

Metadata

Categories:

Tags: