Even though it isn't quite Saturday, I finished all the issues for Author Intrusion v0.10.0 so I finalized the milestone and decided to give a celebratory post to announce it.

This still isn't even remotely stable enough to use but I think I need a cadence to keep working on it instead of letting it atrophy for a few months before going back and losing track of things. So, to keep it fresh in my mind, I'm going to try keeping with a two week cadence where I do at least something on the project to keep it going, updated, and working. Over time (not unlike writing a novel), this should produce useful results for other people.

v0.10.0

The v0.10.0 release is mainly to improve developing on the project. Most of the changes aren't very sexy: allowing packages to be force installed for debugging, adding scripts, reducing noise.

Paragraph Splitting

One of the reasons this is complicated is that I have to break apart English (a non-structured way of communicating) into discrete components. Looking at the first three words of a paragraph requires the system to know what a paragraph is for.

In the v0.9.0 version, I did a quick and dirty paragraph splitter. It failed on some of my bigger projects so I rewrote it to be faster and use less memory (foreach loop instead of RegEx).

I still have to do one for the tokens and I realized that I also need to find a better way of handling large documents. C# doesn't like objects over 85 kB. My largest story (single file document) is 43 kw (kilowords) and 238 kB. Also, C# uses UTF-16 which means loading that entire thing into memory requires a bit over 480 kB of RAM. That will be a bigger mess but it is low enough it needs to be dealt with sooner than later.

XSLT Functions

I added a length() function for the XSLT calls. That way, plugins like echo detection can ignore short words.

plugins:
  analysis:
  - compare: text()
    error: 5
    plugin: EchoDetection
    key: echo-1
    warning: 2
    within: 200
    select: //token[length() > 4]

Logging

One of the biggest things was reducing information overload by breaking apart the logging into different categories. Like MPlayer, there are a lot of things going on, so I added a switch.

./pcli analyze --log NuGet:verbose --verbose

The --verbose turns on what is logged to the console, the --log NuGet:verbose turns the NuGet management section from it's default warning to verbose to get the tedious details.

./pcli log-list

The log-list version will let you see the categories. Plugins can add additional logging targets which is why it's a project-based command but even without a project, it should work (we'll find out).

v0.11.0

The next sprint, starting next Sunday, will be focused on those memory management problems. I think it will take me a while to puzzle through them.

v0.12.0

The sprint after that is currently slated to be working on server mode. This is going to be used by the Language Server Protocol which will let me hook up to Atom and get real-time analysis, highlighting, and other fancy features.

Development

Author Intrusion is currently being managed via its Gitlab project. I'm not sure if it would be worthwhile for anyone to consider joining, but if you want to watch it, this would be the place.

If you have questions, please don't hesitate to poke me on any social network I'm on. I always love to bounce ideas or talk about future place. The more I do, the more I can make it useful for everyone, not just myself.

Metadata

Categories:

Programming

Tags:

Author Intrusion