Daniel LaFavers

Software Engineer

daniel@arrenkyle.com
(248) 231-4253

A Career in Digital Publishing and Large Text Processing


I have had the good fortune to spend my programming career in the publishing business, which has always been an inspiration for my own writing.

After receiving my BS in Computer Science from Purdue University at Indianapolis, I started in 1987 at Executive Telecom Systems, a digital publishing division of the Bureau of National Affairs that published several legal and human resource oriented books.


On Vax/VMS systems, we wrote Fortran programs to create inverted index files on the digital ISAM file system and wrote search interfaces for delivery to VT100 terminals.


This is where I first learned to write search engines and to manage very large text files.


In 1993 I started work at ProQuest, which was called University Microfilm at the time. Here I had the great pleasure of working with one of the pioneers of the Lexus/Nexus system, who taught me most of what I know about formal language processing. We wrote a tokenizer with NFA to DFA conversion and minimization. We wrote of form of LALR parser with a modification to allow it to recognize patterns in natural language, which led to our patent.


Here, while I was in the R&D group, I wrote another full-text search engine and systems to manage a massively scaleable distribute search network.

In 2003, we moved to Texas, to help take care of my wife's parents.


Surprisingly, there wasn't much information technology in Tyler, TX, but I was able to help out at a Computerland store that supplied computers and network support to the school districts in the area. I unpacked and set up several Dell computer systems, and even passed my first Microsoft Certification class. I was able to write them a Visual Basic system for a physical therapy business client, but soon it was time to return to the publishing world.


In 2005 we moved back to Michigan, and I led a team building a content storage, search, and delivery system for Cengage Learning. At Cengage, I created another new search engine for large XML data. My innovation here was to index element ranges, and to provide custom operators to relate text and other elements withing any arbitrary element by name. This seach engine also has the feature of being able to load several index types, each with a custom storage profile that allows us to optimize the data structures for the type of index. This allows the most frequently used index files for keywords, title, and author, to be stored as memory-resident or memory mapped files, while in the same search the full-text part of the query could use the larger XML element index.


Our team also created a massive content delivery system to house millions of antique newspaper images. Over 60 terabytes of data were stored and delivered through our custom Apache module.


It was also here that we had to tackle the processing of extremely large XML files. We wrote a custom, compressed DOM handler for this, and what started as a simple tree processing script language turned into a useful language that we ran in our Apache module, not unlike how PHP or ModPerl might work.

This is near the time that I finished my novel. Amazon had just opened digital publishing to the world, and so I read the ePub standard and wrote some code to convert my text files into a Kindle book.


In 2013, I had the opportunity to take this publishing knowlege and apply it to the automotive industry at Tweddle Group, which produces automotive user and service manuals.


Instead of writing another new language, we selected PHP as the primary processing glue, knowing that we could write C++ extension as needed, just like we did with our own language. The center point of the new system was a service container architecture that surrounded basic PHP classes with a common rights and permission framework with a text-based API definition language that we developed around the time RAML was getting started.

This system is the backbone of the online services that deliver owners information to head units, mobile devices, and the web.