State of writing with Markdown, YAML, and Git 2017

A year ago, at one of my more successful panels at WisCon, I was on a panel with K. Tempest Bradford and Kristine Smith talking about writing processes. I got to see a lot of cool gadgets but I also got a chance to talk about my processes of writing with Markdown, YAML, and Git.

I'm not going to WisCon this year, but I thought this would be a good opportunity to write up where I am using all those technologies to write, both for my personal projects and as a publisher.

This is a long post in a single part. If you've read my blog, there is some rehashed information.

Markdown and YAML

Probably the one part that hasn't changed is my use of Markdown and YAML. I originally used Creole with a makeshift header but after playing with Jekyll for a while, I jumped whole-heartedly on Markdown with a YAML header.

Below is an example of one of the files. I'll be referencing it a bit in this post so please forgive the length.

---
availability: public
when: 1471/3/28 MTR
duration: 25 gm
date: 2012-02-18
title: Rutejìmo
locations:
  primary:
    - Shimusogo Valley
characters:
  primary:
    - Shimusogo Rutejìmo
  secondary:
    - Shimusogo Hyonèku
  referenced:
    - Funikogo Ganósho
    - Shimusogo Gemènyo
    - Shimusogo Chimípu
    - Shimusogo Yutsupazéso
concepts:
  referenced:
    - The Wait in the Valleys
purpose:
  - Introduce Rutejìmo
  - Introduce Hyonèku
  - Introduce naming conventions
  - Introduce formality rules
  - Introduce the basic rules of politeness
summary: >
  Rutejìmo was on top of the clan's shrine roof trying to sneak in and steal his grandfather's ashes. It was a teenage game, but also one to prove that he was capable of becoming an adult. He ended up falling off the roof.

  The shrine guard, Hyonèku, caught him before he hurt himself. After a few humiliating comments, he gave Rutejìmo a choice: tell the clan elder or tell his grandmother. Neither choice was good, but Rutejìmo decided to tell his grandmother.
---

> When a child is waiting to become an adult, they are subtly encouraged to prove themselves ready for the rites of passage. In public, however, they are to remain patient and respectful. --- Funikogo Ganóshyo, *The Wait in the Valleys*

Rutejìmo's heart slammed against his ribs as he held himself still. The cool desert wind blew across his face, teasing his short, dark hair. In the night, his brown skin was lost to the shadows, but he would be exposed if anyone shone a lantern toward the top of the small building. Fortunately, the shrine house was at the southern end of the Shimusogo Valley, the clan's ancestral home, and very few of the clan went there except for meetings and prayers.

I seem to be moving from the .markdown to .md extension. It's minor but I haven't quite jumped on it; I've only done it for the last two new projects.

Atom

My current editor of choice for novels is Atom with a backup of Emacs. Unlike Emacs, Atom has standard keyboards and looks good with the fonts I use. It is a little slow, but for the most part, works pretty well for belting out words.

I'm writing some extensions for it including a modified spell check (project-spell) that can handle project-specific dictionaries (language.json). This lets me tell the spell-checker not to mark character names as misspelled.

Later, I'll integrate my work into Author Intrusion with this but that isn't nearly ready for prime time even for my purposes.

Metadata

Over the last year I noticed I am putting lot more information in the header than before. Some of it is done long after the project but even while writing, I'll put in notes about the characters, the reason I'm writing the chapter, time of day, even the outline which I remove as I write.

I just redid the Fedran website, otherwise I'd show you how the characters and related fields show up in the sidebar. I'll get that worked in the next few months, but it let me cross-link things into my wiki. Having it in the header also means I can query it to get a summary of the page using a TypeScript tool I wrote called markdowny.

$ markdowny table *.markdown -f _basename title when
| _basename           | title                   | when          |
| :------------------ | :---------------------- | :------------ |
| chapter-01.markdown | Rutejìmo                | 1471/3/28 MTR |
| chapter-02.markdown | Confession              | 1471/3/28 MTR |
| chapter-03.markdown | Morning                 | 1471/3/29 MTR |
| chapter-04.markdown | Rivals                  | 1471/3/29 MTR |
| chapter-05.markdown | Decisions               | 1471/3/29 MTR |
$

Another nice thing about markdowny is that it also lets me show those YAML lists in a useful manner.

$ markdowny table -f _basename characters.secondary
| _basename           | characters.secondary
| :------------------ | :-------------------------------------------------
| chapter-01.markdown | Hyonèku
| chapter-02.markdown | Somiryòki, Tejíko, Gemènyo
| chapter-03.markdown | Desòchu, Gemènyo, Hyonèku, Mapábyo, Opōgyo, Panédo
| chapter-04.markdown | Desòchu, Karawàbi, Tsubàyo
$

I also has writing synopsis at the end, so what I can do is put each individual chapter in its header (in my case summary) and then use markdowny to pull them out. That way, I can write the synopsis as I go and not be overwhelemed at the end.

$ markdowny sections *.markdown | head -n 5
# Rutejìmo

Rutejìmo was on top of the clan's shrine roof trying to sneak in and steal his grandfather's ashes. It was a teenage game, but also one to prove that he was capable of becoming an adult. He ended up falling off the roof.

The shrine guard, Hyonèku, caught him before he hurt himself. After a few humiliating comments, he gave Rutejìmo a choice: tell the clan elder or tell his grandmother. Neither choice was good, but Rutejìmo decided to tell his grandmother.
$

The final bit is word counting. Because of the YAML metadata, I can't get good word counts using most tools becaues the header skews the number. So I use the tool to get me counts of the content minus the YAML header.

$ markdowny count *.markdown
chapter-01.markdown:   1520
chapter-02.markdown:   2144
chapter-03.markdown:   2905
chapter-04.markdown:   1173
chapter-05.markdown:   2570

I also have an alias for mdwc to the count since I use it pretty heavily.

Directory Structure

I'm leaning toward a semi-standard layout for my projects. Some of this was built up over the last few years but even my short stories have been migrating over to it.

  • * `README.md` contains a summary of the project and my tasks list. * `chapters/` contains all the chapters in `chapter-01.md`. I have never had a project with more than 89 chapters, so I stick with two digit zero pad so it remains alphabetical. * `frontmatter/` contains the frontmatter (dedication, legal) files. * `backmatter/` contains the backmatter (colophon, about, also by). * `characters/` has one Markdown file per character, for future use with [Author Intrusion](/tags/author-intrusion/). * `covers/v1`: The first variant of the cover. I have a couple and each one goes into a different `vX` folder so I can keep them apart.

Even with short stories, I have a chapters/ folder. I'm just picking up a “muscle memory” of going into chapters. I consider switching to putting things into a src/ like most of my programming projects I just haven't because I couldn't see the advantage other than having a neat folder.

Renumbering

One drawback of having chapter-01.md, chapter-02.md, etc is when I have to add a new chapter. I wrote a renumber script that lets me inject a chapter.

$ ls -l
total 8
-rw-r--r-- 1 dmoonfire dmoonfire  22 May 24 08:06 chapter-01.md
-rw-r--r-- 1 dmoonfire dmoonfire 127 May 24 08:06 chapter-02.md
$ touch chapter-01a.md
$ renumber *.md
$ ls -l
total 8
-rw-r--r-- 1 dmoonfire dmoonfire   0 May 24 08:07 chapter-01.md
-rw-r--r-- 1 dmoonfire dmoonfire  22 May 24 08:06 chapter-02.md
-rw-r--r-- 1 dmoonfire dmoonfire 127 May 24 08:06 chapter-03.md
$

As you can see, using 01a puts it before the first chapter. It works the same with deleting chapters. Sadly, this script isn't very clean but eventually I'll merge it into markdowny.

Git

One of the biggest advantages of using Markdown (text files in general) is how well is plays with source control, Git in specific. I can't stress how much Git has helped me over the years. I still remember accidentally losing a chapter because I copied the wrong file in the wrong direction. Or the painful way of tracking versions (chapter-01a.doc, chapter01b.doc, chapter-01b-final.doc, etc.).

At one point, I had a single master repository which all completed pieces in the master branch and branches off that for the works-in-progress. That actually worked out pretty well… until I started publishing. While working with text is great, the binary files of working through the covers ended up creating huge repositories which took forever to clone.

Last year, I hadn't quite split apart all of my repositories but that's pretty much done now. This also meant I can give access to editors and others to a single novel without exposing the other ones. I don't have to worry about Git modules or jumping through hoops to have the binaries.

I tried LFS (large file system) for a while but then dropped it. With it not baked in, it was a little difficult to coordinate with continuous integration servers. I think in another year or so, it could work out fairly well. I also suspect it has to do with my comfort with LFS more than technical issues.

Gitlab

Related to LFS is where I'm hosting Git. A year ago, I used Git over SSH with my ISP. However, Gitlab has been fantastic, both as their hosted environment and also on an instance running on Dreamhost (my provider). There are a lot of reasons to use Gitlab as a writer (and a publisher).

Probably the biggest is private repositories. Even if I didn't how it on my own site, being able to make each of the novels private is fantastic. Github, which is only slightly slicker, doesn't allow private repositories. Even if they did, I have over a hundred projects when you count works-in-progress, completed novels, and websites. Hosting that on Github would be expensive.

I also have a Gitlab instance. The Broken Typewriter Press business and my private novels on there. This gives me full control over the site (though I trust Gitlab) and I like it. The only people on the site are me, authors, and editors.

Like Github, Gitlab has some pretty nice features. When publishing the last few books, we've used the issue tab for authors to ask for me to order books or make corrections to their book. I can give them access to make their own typo corrections, which reduces the amount of work. At the same time, because of Git, I can be doing other changes and I don't have to worry about losing or screwing up their work. For the latest book, Sins of Intent, the author did a fantastic job of using the features.

I use the milestones for the various tasks of publishing a book. That way I can set up milestones for when the book has to be done, when the release it done, and various conventions. The issues are assigned to the milestones for both the author and myself and we can track what is missing or remaining to do. In effect, we can use a public project management to handle a book's release.

I normally turn off wiki and snippets though, they usually don't help.

One of the things with using Gitlab this way is that the author has full access to the raw files. Typically, I get a Word document and break it into individual Markdown files. Corrections are done against the Markdown as my baseline format for everything else. If they want to leave BTP or something catastropic happens, they can have exactly what I published. This is because I've been through cases where local edits were made but I couldn't get them back myself. In this case, they get and can see everything I do.

Which leads to the last feature of Gitlab which I use heavily, their Continious Integration service. This was difficult to set up at first, mainly because I had to educate myself. For most novels, the .gitlab-ci.yml looks like this:

image: dmoonfire/mfgames-writing-js:0.4.0

stages:
    - review
    - publish

review:
    stage: review
    only:
        - master
    tags:
        - docker
    script:
        - git lfs pull # Yeah, I'm using LFS with this project.
        - npm install
        - npm run build
    artifacts:
        expire_in: 1 week
        paths:
            - "*.pdf"
            - "*.epub"
            - "*.mobi"

publish:
    stage: publish
    script: "echo published"
    when: manual
    artifacts:
        paths:
            - "*.pdf"
            - "*.epub"
            - "*.mobi"
    dependencies:
        - review

Every time I check in, this rebuilds the project. For the little changes, the “staging” lets me and authors test the results of the file which show up underneath the build. They expire after a week but that's okay. “Releases” are done by moving a task to “publish”.

The nice part is I can grab the resulting PDF, MOBI, and PDF about 5-10 minutes after I check in. This is good to ensure that not only I have a reproducible build and also that I won't lost the ability to recover the output if my laptop catches on fire (it is almost seven years old, four majors cracks, and has a hernia).

This ability to see the final version is great because me or another author can make changes and see it, test it, and make sure it is exactly what they want. If I change formatting, I can see the results without overloading my computer.

mfgames-writing-js

Of course, using CI requires some way of actually formatting the books. Markdown is fantastic for some things but there are relatively few tools to format it into good-looking EPUB, MOBI, and PDF files.

Over the years, I have many variants of this. The original few were based on Makefiles and various Python or Perl tools. I've also integrated pandoc into the mix. Nothing ever worked quite the way I wanted for what I considered to be a “proper” book. Files were put in the wrong place, dedications don't need titles, making sure chapters start on the right. There were little things that pushed me closer to making something more specific.

I also had a problem that extending the features for a later book broke the generation of older books. I had to start over or rebuild (usually copy/paste/edit). Now this is a problem that has been addressed by NuGet and NPM, having the build process inside the project instead of using shared programs and libraries.

Between this and using Gitlab CI runner, I decided to create a new framework that specifically was geared toward having specific versions that could be used to reproduce the book even if the underlying libraries were updated. I ended up using NPM, mainly because I'm learning TypeScript for the last few months. It also had a better story for installing (npm install), wasn't whitespace-based, and specifically designed to be isolated to the project.

The result is mfgames-writing-js, a framework for creating EPUB and PDF files from Markdown files. This entire thing is controlled by a single file checked into the Git repository.

editions:
    epub:
        format:             mfgames-writing-epub
        theme:              ./lib/efferding
        outputDirectory:    .
        outputFilename:     sins-of-intent-{{edition.version}}.epub

    pdf:
        format:             mfgames-writing-weasyprint
        theme:              ./lib/efferding
        outputDirectory:    .
        outputFilename:     sins-of-intent-{{edition.version}}.pdf
        isbn:               978-1-940509-24-2

metadata:
    title:      Sins of Intent
    series:     Cletus Efferding
    author:     Randy Roeder
    language:   en

contents:
    - element: cover
      source: covers/v2/front.jpg
      linear: false
      exclude:
        editions: [pdf]
        toc: true
    - element: bastard
      source: frontmatter/bastard.html
      linear: false
      exclude:
        toc: true
    - element: title
      source: frontmatter/title.html
      linear: false
      exclude:
        toc: true
    - element: legal
      source: frontmatter/legal.md
      liquid: true
      linear: false
      exclude:
        toc: true
    - element: dedication
      source: frontmatter/dedication.md
      linear: false
      exclude:
        toc: true
    - element: toc
      linear: false
      exclude:
        editions: [pdf]
    - element: chapter
      number: 1
      directory: chapters
      source: /^chapter-\d+.md$/
      start: true
      page: 1
      pipeline: &pipelines
          - module: mfgames-writing-hyphen
            exceptions:
                # https://www.hyphenation24.com/word/driving/ says "driv-ing"
                - driving
                - drive
    - element: acknowledgement
      source: backmatter/acknowledgments.md
      pipeline: *pipelines

This actually has turned out better than I expected. Yeah, I had trouble with using WeasyPrint for PDF generation, but it has almost all the features I needed to do right-side chapter openings, first page headers differently, and stitching everything into a single PDF with properly embedded fonts.

The one thing I can't do is create Smashword's Microsoft Word. I'm still trying to decide if it is worth it.

Website

I also release chapters every week. I used to use WordPress but it got cumbersome to add new chapters so I switched to a static site generator over the years. I started with DocPad, then Jekyll, and then extended Jekyll with Python and Perl programs to handle my requirements.

Well… I decided to write my own. This uses the same input files as the rest of my system (Markdown + YAML). It also lets me change a header to release the chapter and include it into the build.

One of the complexities I had to figure out is the different repository for each novel. In the building process, I actually clone every project I'm publishing or have published, then use scripts to pull them into a single website before generating it. Jekyll couldn't handle it easily which is one reason I created my own.

The tool for making the website is CobblestoneJS but it has no documentation and I'm still fumbling through it. On the other hand, it handles the ten or so repositories needed to build fedran.com without my conlang, world data, and individual novels.

As part of the weekly release, I've gotten the tasks down to:

  • Update a header in the YAML (or use the scheduler to do it once)
  • Write a blog post about the chapter
  • Copy the chapter and the post over to Ello
  • Copy the chapter from fedran.com to Wattpad
  • Copy the post over to Patreon
  • Run the diaspora sync post

It takes about 1-2 hours which is much shorter than the 4 is used to be.

Summary

Well, there it is, my current state of writing with Git. You have everything from writing the chapters, supplying metadata, how to store it, various tools for getting through the publication process, and even formatting it for the various vendors.

I've gotten lost trying to automate and simplify a lot of this, but I think the current state is mostly usable by others and has been pretty solid for my own needs. I'm sure I'll improve and expand on it.

If you have question or comments, please ask. I love talking about processes, looking for improvements, or explaining in more detail why I do these things.

Thank you.

Metadata

Categories:

Tags: