;;; -*- mode: fundamental; coding: iso-8859-1; indent-tabs-mode: t; -*- ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; ;;; first shot at a finite-state language for preprocessing, normalization, and ;;; tokenization in LKB grammars. requires LKB version after 1-feb-03. note ;;; that the syntax is rigid: everything starting in column 2 (i.e. right after ;;; the rule type marker) is used as the match pattern until the first `\t' ;;; (tabulator sign); one or more tabulator sign are considered the separator ;;; between the matching pattern and the replacement, but other whitespace will ;;; be considered part of the patterns. empty lines or lines with a semicolon ;;; in column 1 (i.e. in place of the rule type marker, this is not Lisp) will ;;; be ignored. ;;; ;;; rules are applied in order and, in the case of substitution rules, each see ;;; the output of the previous iteration. token-level augmentation rules (the ;;; `+' type, for now) are different in that they add an alternative for the ;;; token but the original form remains in the input buffer for subsequent rule ;;; applications (i.e. the alternative is _not_ visible to further rules). ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; ;;; ;;; preprocessor rules versioning; auto-maintained upon CVS check-in. ;;; @$Date$ ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; ;;; tokenization pattern: after normalization, the string will be broken up at ;;; each occurrence of this pattern; the pattern match itself is deleted. ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; :[ \t]+ ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; ;;; string rewrite rules: all matches, over the entire string, are replaced by ;;; the right-hand side; grouping (using `(' and `)') in the pattern) and group ;;; references (`\1' for the first group, et al.) carry over part of the match. ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;; ;;; ;;; pad the full string with trailing and leading whitespace; makes matches for ;;; word boundaries a little easier down the road. ;;; !^(.+)$ \1