An obsession with data (a.k.a. "writers write")

Well, that took a bit longer than I expected, but I've managed to parse the Git and Subversion logs and turn them into a nice intermediate (I said "normalized" too much last post) format and then wrote another tiny little program to tag all my stories.

All that work just to figure out the answer:

How many words have I written?

Now, this answer isn't exact nor entirely accurate. It doesn't including the four complete rewrites of Flight of the Scions (a.k.a. Wind, Bear, and Moon). It also doesn't include the 100k words I pulled out of Flight for KK. Or the re-writes, struggles, and everything else. It also doesn't include the two novels or anything else I wrote in high school including my two books of poetry.

What it does take is the "final" version of every story, chapter, and commission (I actually kept good records of that) I've written since 2001 and gave me an idea of how much I've written.

1,784,085 words from a total of 195 stories and 228 chapters in 7 novels.

That seems a lot. I figured out that number with the following Perl program.




use strict; use warnings;

Go through all the files in the directory from the first argument.

my %months = (); my $total = 0;

open FIND, “find ‘$ARGV[0]’ -type f -name ‘*.txt’ |” or die "Cannot open find ($!)";

while () { chomp; my $file = $_;

# Make sure the file has a date field.
my $sep = $/;
$/ = undef;
open FILE, "<$file";
$_ = ;
close FILE;
$/ = $sep;

next unless ($_ =~ m@* Date: (d+)-(d+)-d+@);

my $year = $1;
my $month = $2;

# Figure out how many words.
my $word_output = `wc -w "$file"`;

next unless $word_output =~ m@^s*(d+)s+@s;

my $words = $1;

# Print the file.
print STDERR "Processing $file ($year-$month) [$words]n";

my $key = "$year-$month";

$months{$key} += $words;
$total += $words;


close FIND;

Add in the zeros.

foreach my $y (qw(2007 2008 2009 2010 2011)) { foreach my $m (qw(01 02 03 04 05 06 07 08 09 10 11 12)) { $months{"$y-$m"} += 0; } }

Write out the months and dates.

foreach my $mkey (sort(keys(%months))) { my $words = $months{$mkey};

print "$mkeyt$wordsn";


print “nTotalt$totaln”;

I took the output of that program and threw it into Google Docs so I could chart it over time.

Chart of the words written by month

I did fudge the epoch date for Subversion since I had 403,201 words when I converted over to Subversion. In the above chart, I broke it into four months of 100k and added it there. I also had two dates ahead of then because I could figure out a rough date for those from the contracts I got when I sold them.

As you can tell from the chart, I've had a couple months of writing 100k+ words. Those were the good writing months. The highest was March 2007 when I wrote 158,497 words in a single month. I also noticed that around July is my major writing month, year after year.

It's kind of cool, only to see where I had "bad" months (there were a number of zeros) and good months for writing. But, more importantly, the red line shows the total words over time. Writing isn't about belting out a 50k word novel in a single month or (roughly) three of them. It isn't about getting out a single piece and being done with it. For me, writing is about just keep doing it. Writing whenever I can, whatever I can. Like compound interest, the individual stories and chapters pale under the slow accumulate of writing.

And one could hope that becoming a better writer is part of that running total of all words.

Now, what other conclusions can I take from that chart.


I'm actually serious about that. A million words doesn't make me an expert. I can't tell you if I did the mythical 10,000 hours of writing because at 60 words per minute (half my maximum), that's only 500 hours and I'm very sure I've written for more than 500 hours. Belting out words doesn't make me a great, or even a good writer. It just means I've written since 2001 and I apparently enjoy the process enough to keep doing it.