;;; -*- Mode: tdl; Coding: utf-8; -*- ;;; ;;; Copyright (c) 2009 -- 2011 Stephan Oepen (oe@ifi.uio.no); ;;; see `LICENSE' for conditions. ;;; ;; ;; now, with +CLASS information available, optionally make any token a proper ;; NE that is (a) capitalized and not initial, (b) spelled in mixed case ;; (|LinGO|), or (c) initial all-caps (a sub-set of capitalized). note that ;; we want these rules to also fire on single-character tokens, as e.g. in ;; |the A Team| or |the I and J columns| (dan may tame the |I| proper name by ;; adding a non-nominative native lexical entry for it). ;; ;; _fix_me_ ;; the ERG lexicon includes a few entries (e.g. titles like |Mr.| and |Jr.|) ;; with capitalized orthography. currently capitalized NEs are about the only ;; class of generics that can survive alongside a native entry (in the lexical ;; filtering phase), hence it might make sense to prune unwanted tokens here, ;; even though that means knowledge about the ERG lexicon is applied at token ;; mapping time already? (23-jan-09; oe) ;; ;; come to think of it, i suspect the |I| special case mentioned above would ;; fall into this class too? (6-aug-11; oe) ;; capitalized_name_tmr := add_ne_tmt & [ +CONTEXT < [ +CLASS alphanumeric & [ +INITIAL -, +CASE capitalized ] ] >, +OUTPUT < [ +CLASS proper_ne ] > ]. mixed_name_tmr := add_ne_tmt & [ +CONTEXT < [ +CLASS alphanumeric & [ +CASE mixed ] ] >, +OUTPUT < [ +CLASS proper_ne ] > ]. upper_name_tmr := add_ne_tmt & [ +CONTEXT < [ +CLASS alphanumeric & [ +INITIAL +, +CASE capitalized+upper ] ] >, +OUTPUT < [ +CLASS proper_ne ] > ].