I decided to take a week or two off from writing. I was getting burned out but also because I have a few things that need to be done that I've been ignoring in my obsession to finish BAM (just got the writing group feedback on the last three chapters).

Yesterday, I got to a stopping point on rewriting one piece of MfGames Writing from Python to C#. It was an interesting experiment that took about four days (two was playing with coding style) to work out, but it is pretty critical for my build system for making both ebooks and collecting data for typesetting print books.

File organization

I use a pretty sparse file organization when it comes to managing books for Sam's Dot Publishing. This lets me convert new books as I get them, but also allow me to reference them in later productions. I try pretty hard to make a good "Also By" page that links to every other Kindle book (I'm only doing Kindle for SDP).

For my example, I'll use Shannon Ryan's book, Fangs for Nothing. In the build folder, I have the following.

  • src/
    • fangs-1/ (contains all the assets for Fangs for Nothing)
      • abstract.xml (back page blurb)
      • content.jpg (huge version of the front cover)
      • ad.jpg (scaled down version of the front cover)
      • content.xml (refers to cover.jpg, abstract.xml, ../about/shannon-ryan.xml, ../also/shannon-ryan.xml)
      • ad.xml (sound bite for book, points to ad.jpg and abstract.xml)
    • minion-of-evil/ (contains all the assets for Minion of Evil)
      • abstract.xml
      • content.jpg
      • ad.jpg
      • content.xml
      • ad.xml
    • about/ (contains the "About the Author" appendixes)
      • shannon-ryan.xml (All about Shannon!)
    • also/ (contains the "Also by" appendixes)
      • shannon-ryan.xml (Because I am not original in names, points to fangs-1/ad.xml and minion-of-evil/ad.xml)

The above organization is to make it easier to add a new book. When Shannon adds a new book, I just create a new directory structure along with the appropriate files, then update the also/shannon-ryan.xml. From there, I just have to rebuild all the MOBI files and I have an updated ebook for every book he's written along with an updated "Also By" page.

make clean build/{fangs-1,minion-of-evil}/content.mobi

For curiosity sake, I use XInclude to reference other XML files.

The Gathering

The drawback of this approach is the number of files. My entire build process works off a single XML file and I need to have all the image files in a single place. Previously, I wrote a Python program called mfgames-docbook with a mode of "gather". But, this choked on recursive XInclude elements.

When I rewrote it, I renamed the program to mfgames-writing so I can call it as mfgames-writing docbook gather inputFile outputFile. Slight different, but it should work out better in the long run. More importantly, it lets me continue to use the Python version side-by-side until I can migrate all the functionality to C#.

The end result of the gathering process is to give me this:

  • build/
    • fangs-1/
      • content.xml (includes everything from abstract, about, and also)
      • images/ad.jpg (from Fangs)
      • images/ad.jpg (from Minion of Evil)

Now, you'll notice that ad.jpg is in there twice. This is why I'm rewriting the Python version. I need them to be two separate names and it was just getting tedious to do that via my current Python base. The current version (C#) produces this:

  • build/
    • fangs-1/
      • content.xml (includes everything from abstract, about, and also)
      • images/ad_a8ef.jpg (from Fangs)
      • images/ad.jpg (from Minion of Evil)

The "a8ef" is the first four letters from the SHA256 hash of the filename. It is just used to give it a unique name but let two files pointing to the same file use the exact same filename.

The other part of this process is to rewrite the references to the images so it says "images/ad_a8ef.jpg" instead of "ad.jpg". The logic is pretty smart since it will try 4, 8, 12, 16… 32 characters of the hash to create a short, unique name.

But, once everything is gathered in a single place, the build process is pretty simple.

Gaping and missing holes

The current C# version misses one big hole: untransformed files. For BAM (actually every novel I've written), I have this:

  • src/
    • book.xml (points to chapters/chapter-01.xml)
    • chapters/
      • chapter-01.txt

The current build process knows how to transform "chapter-01.txt" into "chapter-01.xml" but the current gathering process does not. I also don't like temporary files in the source directory, so I have a temporary path.

  • tmp/
    • chapters/
      • chapter-01.xml

When I get to that point, I need to add search directories to the gathering process so when it sees a reference for chapters/chapter-01.xml it knows to search both src/chapters/chapter-01.xml and tmp/chapters/chapter-01.xml.

Development differences

I'm not going to go into details, you can also go to the GitHub site and look in horror there.

In C#, I have lots of little classes that do "one thing well" (aka, Single Responsibility Principle). I layer my classes pretty heavily to make it easier. So, I have one class that wraps around an XmlReader, a second to handle XInclude transparently, and a third that handles the file copying/rewriting attributes and nothing else. Each class is in its own file (my preferred method) and lots of little functions that also do one thing each.

C# uses Unicode by default. This doesn't seem like a big thing, but I found that authors use some non-standard characters pretty heavily: "fancy quotes", em-dash (and en-dash incorrectly), accented characters. Having it the default simplified a lot of logic while processing these files. In Python, it was a bit harder to read and write UTF-8 XML or even display it to the screen.

I have much better tools in C# than in Python. The biggest is ReSharper which I use heavily, but I also have their dotCover and dotPerformance packages. I don't get the free version because that requires me to release more often and I haven't released MfGames Writing (or anything else) in years. But, it is hard to explain how much more effective I am when I can type everything. Jump to class, jump to method, run unit test, you name it. I don't need the mouse.

I like NUnit a lot. Because I'm more comfortable with the organization of C#, I could add in the hooks and methods to create unit tests a lot more productively than in Pythin. For some reason, the UI for navigating failed tests wasn't as good in the Python side. So, when I had a bug, I would end up commenting out all the other unit tests until I fixed one. With ReSharper and C#, I just hit Control-U, U from anywhere in the application to rerun the last test.

Drawbacks

There is a big one: Windows. I don't use Windows unless I have to. I'm willing to go without fancy games or programs if it means I have to switch environments. It isn't a choice I inflict on others, but something that I feel the need to do. The only major exception is… programming C#.

This got a bit difficult when I finished my unit tests, then switch over to Linux to try building Fangs for Nothing. It failed, but there is a lot of work to switch environments to fix the bug. Fortunately, I was able to create a unit test that failed in the same way and fixed it.

The current process is to: find a bug on Linux, hibernate Linux, reboot into Windows, fix it, shutdown Windows, restore Linux, run it again. The length of this cycle really encourages me to find a unit test quickly to verify the bug.

Thanks to the power of Mono, I run my program directly from my Windows partition. So, I don't have to rebuild on Linux, I just run it from the mounted NTFS drive and "it just works".

Conclusion

Overall, I'm pretty happy with the results so far. Obviously, this is going to take some time to iron out, but I have eight books I need to convert for SDP also with formatting for my own stuff. By the end of the year, I'll have a good idea of what works, doesn't work, and what needs to change.

2012-11-09