unoconv, docbook2odf, and solving a problem

When I decided to write my own writing management system, I knew there were going to be things that weren't going to be as polished as just using (or Microsoft Word). One of the recent things that came up with my inability to create Microsoft Word for the writing group. Some of the people there can't use PDF files and a few others like to reformat my output into something that fits their own preferences*.

Starting three weeks ago, one of the ladies at the group needed it in full Microsoft Word format instead of the usual text file I submitted. For the first two times, I manually copied it. She mentioned that the //italics// and "miw: foreign language comments" were a bit confusing.

The best programmer is a lazy one.

Yesterday, I decided that I don't like manually copying the file. Actually, I don't like manually doing anything besides writing programs to do things I had to do manually. A series of Google searches and I found unoconv, which can convert documents into Word. That would make it easy to convert .odt to .doc files for her.

Now, the hard part is trying to get one of my intermediate formats into Oasis Document Format (ODF or .odt file). I use DocBook 5 as my intermediate format. I did a few more Google searches and found some promising things.

The first was the namespace-aware stylesheets from DocBook themselves. There isn't one for ODF anywhere I can find. So, I looked into creating a new one but for I honestly couldn't figure out to get it working, much less figure out how to convert DocBook properly over to that system.

I also found docbook2odf which looked exactly like what I needed. But, when I ran it through a test file, I got nothing. After a few hours of messing with both of them, I still had nothing.

You know, my brain is a funny thing. It has a secondary processor dedicated purely for obsessing things so it can wake me up five in the morning to tell me it figured it out. The docbook2odf file was for DocBook 4, which doesn't handle properly yummy namespaces. I just had to modify it to handle it. A few hours (around a family picnic) later, I got it working with my build system.

To get it working, I pretty much had to fork docbook2odf because it appears to be a dead project. I threw it up on GitHub to keep track of the changes until I can figure out if this is the right thing to do. Regardless, I'm happy with how it turned out.

I also used a trick with styling that I learned with Balance. I formatted it to match the style I liked (Courier New, double spaced) and saved it. Then I extracted the styles.xml file from the .odt file and threw it into my style directory. Now, when I create the .odt file, I replace the styles.xml file inside it with the formatted one.

And now, I have an automated system to convert my file into Microsoft Word documents:

make build/chapters/chapter-12-plain.doc

I love making things work. It makes me all warm and fuzzy inside. And, I can also making the writing group happier by generating my submissions in the format they want.

* I fully admit that what I find useful is not what others would find useful. For example, my default PDF format numbers the paragraphs so someone can say "paragraph 12 sucks" and everyone can flip directly to that paragraph.