Emacs

Author Intrusion 0.1.0

2015-07-29T05:00:00Z

Over the last few weeks, I did some minor improvements to Author Intrusion. Since I need to actually use it to write stories and novels, I figured I'd get to a stopping point and update the code.

Part of Speech

One of the biggest features I needed was part of speech tagging. This is in the node-author-intrusion-pos-tagger module. This uses the output from the splitter and adds more information about the part of speech, such as “present tense noun” or “adverb”. This lets me identify overuse of adverbs in a short distance (much like before).

Reworking Echo

I also significantly reworked the echo plugin to handle the POS tagging, checking for echo words against the stem (which can be used to treat “spit” and “spitting” as the same word).

This was a breaking change, sadly, but hopefully not too many people are using it. I added some unit tests in this (and a number other plugins) to help explore the different functionality.

Documentation

I've updated the documentation and the script to check everything out.

Forums

I also created a category on my forum, if anyone wants to talk about it. It also includes a sub-category for recipes for those who want to talk about how to do some of the analysis.

Next Steps

The next biggest step is to get the echo plugin to handle ngrams. Right now, it works on a token-by-token basis, but I also want it to be able to identify a series of three or four word segments and treat them as higher priority duplicates. For example, to see if the paragraphs starts repeat themselves or sentences have the same beginning.

Author Intrusion for Node.js

2015-07-12T05:00:00Z

The idea of Author Intrusion has been haunting me for years. I know that I have problems in my writing, little flaws, that make it hard for readers to get into my stories. One reason I need two sets of editors is simply to catch them. But, I also know that it is possible to write a program that could detect many of those flaws, they are mechanical but complicated.

I've written Author Intrusion a number of times now, usually as a full-blown text editor with analysis plugins in the background. Some of them have gotten pretty far, but I always hit some road block that prevented me from going further. Not to mention, it takes time to write a text editor simply to get far enough to write what is really important, the analysis aspect.

After fumbling for a few years, I had almost given up on it. It was suppose to help me, but I just didn't have the skills to do it myself. It hurt, because I know it would help me, but I couldn't manage it.

But, like writing, I couldn't stop. I kept thinking about how to make it work, to change how I viewed the program and write something that actually worked. A few ideas began to gel together over the last few months until they finally came into focus last week. I had to give up what I envisioned Author Intrusion doing and change it into something I could do.

Command Line Tools

One of the biggest things is getting rid of the text editor. Yeah, it's fun writing something on screen, but it is also distracting. If I focused on the analysis part as a separate tool, then I could avoid that. Fortunately, Atom had an infrastructure for doing that.

The bulk of the new system is author-intrusion-cli, a command-line tool that takes a Markdown file and produces a standard output from it. The format is pretty simple, an error message for Emacs or the linter JSON for Atom.

It isn't perfect, but it worked.

Analysis

There is only one real analyzer now: echo words. It scans the file and looks for the same word used in rapid succession. When that happens (sadly the first chapter of Sand and Blood has some example), then Atom will underline the problem text. When you type and fix it (and save), then they go away and new ones pop up.

There will be more, the hot buttons of my writing, but I feel pretty good about the results. This is something I can extend and build on, hopefully without hitting the roadblocks that previously stopped me.

Correctness

I'm making a point of not making default settings for this. Author Intrusion isn't trying to make a “one true” grammar checking but to allow a writer who knows their flaws to identify them. I want it to let someone keep their voice and style while giving hints on improvement.

Packages

I spent a lot of time cleaning up the code and posting it on Github. I also write a script to install it on a Linux, it's rough, but it seems to work. It also means, in the odd chance that someone is interested, I can actually accept some contributions. I don't expect it, but there was already someone expressing an interest.

I also documented a little bit of the process.

Future Plans

I can't worry about failing. This may work or it may not, but it feels pretty good to me. I liked how easily everything fell into place, no doubt because of the endless times I've already written it. The code base is pretty solid. Needs changes, but the foundation is pretty simple because I'm doing less so I can focus on the important parts.

I want this to succeed. I think it will help me be a better writer by identifying my problem areas, the places I know are wrong but I can't always see. This tool, both now with only one feature and later with the other things I want, should help me do that.

Emacs and Multiple Dictionaries

2015-04-01T05:00:00Z

For the last four years, I've been trying to write a program called Author Intrusion. There were a number of reasons for this, but one of the biggest was that I couldn't find any program that handled dictionaries (really word lists, but a lot of people use the wrong name).

This morning, when I woke up, I ended up doing a random search that took me through a long winding journey that finally gave me an interim solution that is pretty solid until I can get Author Intrusion finished (which may be another four… decades or so).

The problem

As with any long-term writing project, I've created a large number of characters, groups, and locations. Most of them are based on a conlang while others just sounded cool. However, when I'm spell-checking my chapters, I need to have those names in the dictionary otherwise they'll continually show up as a typo.

One common solution is to add those names to the program's dictionary. This works out pretty well, until the end of the project. Then, the hundreds of names are not longer relevant for the next series but still show up in suggestions for every project in the future.

My preferred novel-writing editor, Emacs, has the ability to have per file word lists. This is called “LocalWords”, but it means that I can identify a list of valid words without adding it to my permanent dictionary. Of course, this means I have to keep copying that per file list into each new chapter, which then gets the new words for the characters I've introduced in that chapter. And when I create the chapter after that, it keeps moving and growing.

Because I just finished the draft of Sand and Bone, I have built up a three book collection of proper names. This list is in the top of every file, which means I have to scroll down a little to even see the title of the chapter.

Rutejìmo Chimípu Pidòhu Shimusògo Tateshyúso Pabinkúe Jìmo Mípu Dòhu Pidòhu's Desòchu Sòchu Mapábyo Kechikìma Hyonèku Opōgyo Chimípu's Gemènyo Mènyo Pábyo Kìma Mapábyo's Zotetsūchi Rutejìmo's Hyonèku's Gemènyo's Ryayusúki Wamifuko Nèku Hána Zúchi Mépu Nenemépu Shimusògo's Desòchu's Myunédo Shimusogo Karawàbi Wàbi Tsubàyo Bàyo Tsubàyo's Tejíko Palasaid Markon Tejíko's Mifuníko Yutsupazéso Yutsupazéso's Karawàbi's Nibonyāchu Jyotekàbi Yunujyoraze Byomími nibonyāchu ranuchyahāhi Mípu's shimusogo dépa alchemical dépa's Mifukiga Chobāni Rabedájyo Badenfumi Shigáto Porlin Kamanen Kakasaba Mioshigàma Pabinkue Mikáryo Mikáryo's tazágu Palarin Mistan rikunámi Ryachuikùo Tateshyúso's Nedorómi Chidomifu Kapōra Káryo Chyábi Ganifúma Ralador Markin Kidorīsi Mifúno Mafimára pyābi Mifuno Faríhyo mizonekima chyòre Rolan Madranir Kiríshi Som figaki tòra chyóre's shikāfu Tachìra's Chobìre's Wh Tachìra Monafuma Gidon Kormar Nigímo wabōryo Faríhyo's avian's Ríhyo Gímo ryodifūne Tsudakìmo Myobùshi Funikogo Ganósho Myobùshi's Gidorámi Pyatose myofūne Pyatòse Gichyòbi Higoryo Ríshi Jacin Torabin Kishifín's Makohūni's Tsu Rojikinomi Fimúchi Rojikinòmi Rapinbun Finol Pokīmu Waryōni Nyochizoma clanless Chizoki Miyóna Kyōti Tijikóse Chyobizo Nichikōse Tifukòmi Talsir Shifáni Milifor Krum Opōgyo's banyosiōu kojinōmi kojinōmi's Nyobichóhi Mifúno's helmed Kitópi Piròma Tópi Bakóki Bakóki's Nifùni Byochína Chobìre Midoshina Kafūma Korechyoki Baroshìko Tedoku Nuchikomu Machikimu Garènu Piròma's Kitópi's Nana dépas Kosobyo Kosòbyo nocked Fidochìma Foteramàsu Foteramasu chima Tsupòbi Dimóryo Fùni petabiryōchi Chína Techyomása Mioráshi Kosòbyo's Kidóri Atefómu's Kidóri's ambushers Tikói Menodàka Tateshyuso Kos Ràchyo Záji Gichyòbi's

That's a lot of names, including a couple that were removed for pacing. Almost every single one of them isn't in the final chapter of Sand and Bone, but they were in one of the hundred or so chapters before it.

There is also no easy way of removing the Miwāfu names and passing them into the next story since those are pretty common across any story I have in the desert.

As far as I could tell, there were only two ways of handling all those names: put it in the permanent dictionary or shovel it along the chapters as I went.

Vim

About a year ago, I found out that Vim had a setting that allowed multiple dictionaries, but I didn't want to grok a new writing environment when I had (high) hopes for getting Author Intrusion done.

The idea

This morning, I found a random link that led to another. Eventually, I came up with Wcheck. It looked like it had potential for resolving my dictionary problem, so I spent an hour or so trying it out.

In the end, I couldn't get it to work. But, the process of trying gave me a little epiphany on what could work. Instead of changing the library, I decided to write a wrapper around aspell that interrupted checking words and substituted my own lookups instead.

The results fell into place pretty easily. With a local.words file in the same directory as the chapters, my newly created caspell program loads it into memory. When Emacs asks for a word to spell, it checks to see if it knows about the world already and verifies it as passed even if the base dictionary doesn't know about it.

Likewise, adding a word adds it to the local.words file, not the aspell personal dictionary.

But wait, there's more

The basic format of the file is pretty simple.

word    nibonyāchu
word    dépa

I originally went with “&” as the suggestion used in the pipe, but then I realized I could use readable words without too much of a problem. So, it became “word” and made things a lot easier to process.

Getting the basic lookup was a nice little rush, but then I realized that I could return suggestions. That lead into writing code that gave suggestions for “incorrect” words that I want to expand into real ones.

suggest Shimu = Shimusogo, Shimusògo
suggest shimu = Shimusogo

There is a certain mindset when things are working. It is easy to move into the next code, though times to the results take longer to develop. In this case, I decided to allow one file to include another. This pulls in the words and suggestions from other files but doesn't merge them together.

command include "../../sand-and-blood/chapters/local.words"

And then I had it. Dictionaries for per file, per project, per world, and any other combination that I need. I'm planning on creating them over the next couple files, but I think it will let me chain dictionaries so book two will include book one's words. And book three will add book two's which also includes ones. And then Raging Alone includes all three books.

And then one more

There was one more thing I ended up doing before I stopped. I used Emacs's abbrev-mode to do auto-corrections while writing. That way, I can type “Rute” and have it expand into “Rutejìmo” complete with accents. Same with various greetings, names, and locations.

As you can guess, I added that feature into the file too.

replace GS = Great Shimusogo
replace GT = Great Tateshyuso

This feature isn't built in, so I wrote a special mode for the program that takes a local.words and creates a abbrev.el file for the mode.

$ ls
local.words
$ caspell --emacs -p .
$ ls
abbrev.el local.words

Full example

A larger example for the local.words for Raging Alone:

command include "../../sand-and-blood/chapters/local.words"

suggest Shimu = Shimusogo, Shimusògo
suggest shimu = Shimusogo

replace GS = Great Shimusogo
replace GT = Great Tateshyuso

word    Badenfumi
word    Basamiku

The entire thing is rewritten whenever I add a word to the dictionary. Each section (except for commands) is sorted so it always produces a consistent order. This makes source control easier to work with (always sort output for that reason, it saves a lot of time later).

Tying it all together

Once all the files are created and populated, I had to tell Emacs about the new program and how to hook up the abbrevations. This is done in the .emacs file. I have a hook for text mode that automatically configures what I need.

(defun my-text-hook ()
  (setq fill-column 99999)

  (setq
   abbrev-file-name
   (concat (file-name-directory (buffer-file-name)) "abbrev_defs.el"))
  (quietly-read-abbrev-file
   (concat (file-name-directory (buffer-file-name)) "abbrev.el"))
  (setq save-abbrevs nil)
  (abbrev-mode 1)

  (setq ispell-program-name "caspell")
  (setq ispell-personal-dictionary (file-name-directory (buffer-file-name)))

  (flyspell-mode 1)
  (visual-line-mode)
)
(add-hook 'text-mode-hook 'my-text-hook)
(add-hook 'markdown-mode-hook 'my-text-hook)

The key parts are the “ispell” lines for hooking up caspell. The “personal dictionary” uses the name of the text file ((buffer-file-name)), figures out the directory, and then passes it into caspell via the -p parameter.

The other bit is the “abbrev” lines to look for abbrev.el in the same directory as the text file and uses it. It seems to work and I'm pretty happy with the results so far.

Github

Like almost everything else I write, I threw it up on Github along with a few other programs I've been using. I'll document them eventually but the caspell is pretty functional as-is.