Over the last few weeks, I did some minor improvements to Author Intrusion. Since I need to actually use it to write stories and novels, I figured I'd get to a stopping point and update the code.
Part of Speech
One of the biggest features I needed was part of speech tagging. This is in the node-author-intrusion-pos-tagger module. This uses the output from the splitter and adds more information about the part of speech, such as "present tense noun" or "adverb". This lets me identify overuse of adverbs in a short distance (much like before).
I also significantly reworked the echo plugin to handle the POS tagging, checking for echo words against the stem (which can be used to treat "spit" and "spitting" as the same word).
This was a breaking change, sadly, but hopefully not too many people are using it. I added some unit tests in this (and a number other plugins) to help explore the different functionality.
I've updated the documentation and the script to check everything out.
I also created a category on my forum, if anyone wants to talk about it. It also includes a sub-category for recipes for those who want to talk about how to do some of the analysis.
The next biggest step is to get the echo plugin to handle ngrams. Right now, it works on a token-by-token basis, but I also want it to be able to identify a series of three or four word segments and treat them as higher priority duplicates. For example, to see if the paragraphs starts repeat themselves or sentences have the same beginning.