With my work with formatting Sam's Dot Publishing's (SDP) books for the Kindle, I have a steady stream of books in Microsoft Word format that need to be converted to Mobi files. This is a fairly tedious process since SDP's formatting isn't conductive to just throwing the book through
kindlegen and having everything magically ready.
Getting the SDP document into DocBook is the hardest part, but only takes about 10-15 minutes. I mainly go through the document and change headings, add some keywords for poetry, and normalize the section breaks. Once I have a clean Word document, I wrote a Perl program to convert Word into DocBook 5 and try to arrange everything according to DocBook's XML schema. This includes making articles, putting in authors, and parsing the legal page.
With the prompting of a co-worker (who suffers through my emoness) and a really nice person on Reddit, I decided to make most of this system public to get improvements and maybe save someone else some time and effort. Plus, I like giving back to the communities who helped me get this far.
With the work this weekend, I think I have a decent alpha-quality build system that takes a DocBook file and creates a fairly clean PDF, MOBI, EPUB, DOC, DOCX, RTF, HTML, and ODT version. Almost all of this is driven by XSLT stylesheets so it is easy to change the format for ascetic or branding appearance. It also has tools for uploading stories to Wordpress for when you might want to make a story public.
Almost all of this work is put under the MfGames Writing umbrella of utilities. This includes programs for converting, manipulating, and querying Creole, DocBook, OPF, and NCX files. OPF and NCX are critical for EPUB and I found they make generating MOBI files a lot more pleasant than trusting
kindlegen to convert it blindly.
There are two parts of MfGames Writing: command-line utilities and the build system.
The command-line utilities are Python programs written to do all the nifty work. For example, EPUB really wants a cover.html file, but MOBI doesn't. Since I like to use
epubcheck to verify my files, I take the EPUB and then remove the parts that
kindlegen complains about:
$ cd tmp-mobi $ unzip ../content.epub ... $ rm cover.html $ mfgames-opf manifest-remove content.opf cover $ mfgames-opf spine-remove content.opf cover $ mfgames-ncx nav-remove toc.ncx cover $
I always call files the same thing. Now, Kindle wants the JPG for the cover. Easy enough, you just use this:
$ $ mfgames-opf cover-set content.opf cover-image $
And those two parts are fully automatable. And I like automating processes. So, the second part (mfgames-writing-make, actually) is a Makefile and series of XSLT stylesheets to make this happen. I also have a number of Git repositories for my stories (most novels have one and each byline also has one), each with their own branding/appearance. To reduce the work I have to do, I have a
Makefile in each repository with this line:
The rest of the per-repository stuff is setting up directories and branding.
The following programs are used in addition to the two mfgames-writing projects:
- XeLaTeX is used to make pretty PDFs. I like typography and I found that XeLaTeX (which is horrible on touch-typing) creates rather nice-looking PDFs.
- docbook2odf is actually something I'm expanding on, but it isn't really mine but I couldn't find an active maintainer. This is used to convert a DocBook to ODF which is leads down the path of RTF, DOC, and DOCX via...
- Libreoffice or OpenOffice.org and their
unoconvtool which is a command-line utility for converting any file the two writer programs use into a different one.
- epubcheck is a utility to test the correctness of EPUB files. I like lint programs and this is the best I have to making sure I have a solid EPUB file.
- kindlegen is used to create the Mobi file. It is pretty simple to use, but I'm also using the files I feel generate the best Mobi file without significant user input.
- fop is used to create a fake cover if one isn't provided.
I actually tried to figure out how to make this cross-platform and failed. Since I'm writing these tools in Python, the most obvious is to use a Python-based build system like SCons. However, SCons isn't really designed to work with files this way. There isn't a limited number of outputs like most programs since I want to be able to pick any one of hundreds of short stories to generate. Also, SCons scans everything, figures out how to build it, and then lets you choose a known path. I don't have known paths. I want to be able to pick an arbitrary story and build that one specific file... without deciding beforehand how to build that file.
Likewise, my input is drastically different. For short stories, I have a Creole file which is converted to DocBook and reformatted. For novels, my book is a DocBook file and the individual chapters are Creole. So, I needed something that goes into the DocBook file, figures out which XML files are needed and then converts the Creole file into DocBook so they can be included. Ideally without editing a build file.
I looked a number of other build systems, including MSBuild/XBuild but they had the same limitations as SCons. They can work, but not with more work than I was willing to put into it. Make, on the other hand, is a pattern-based file. It figures out the files as it goes. So, when I ask for fots.mobi, it does the following:
- Figures out it needs to build
- Figures out it needs
chapter-01.xmldoesn't exist, so it needs to convert
- Includes the chapters into
- Wraps it into
- Manipulates the
It also does some work with covers, including faking them.
And I couldn't figure out a cross-platform system that could do that. In theory, I could run this from a VM on Windows, but I'd like to find a universal approach. But, until then, this is a Unix-based system (it probably would work on OS X too).
I have a number of projects and they rise and fall as I bumble through life. I'm working toward the 1.0 release of MfGames Writing, but there are a number of features I want to get working. Most of them can be found on the issues page for both the main and make project.
The big one is that I don't use the Norman Walsh's stylesheets. I had a reason, but I don't know if it was a lack of understanding or something else. The main trigger for writing my own was that those templates were rather complicated when I needed something simpler. Some of the automatic ID generation caused problems since both
kindlegen verify all links and I was getting no love there. I ended up changing how I do section breaks (instead of anonymous sections, I use bridgeheads), so some of the problems may have resolved themselves.