This program extracts tagged tokens from tsdb profile (virtual profiles included, item numbers must be unique). Works by tokenising the items from the item file using whichever tokeniser option is given, extracting the leaves with their lexical types (and optionally morph rules) from the parse tree and trying to match the tokens to the leaves. Install directions: autoreconf -i ./configure make Usage: ./tagextract [options] grammar-file profile Options: -h [ --help ] This usage information. -t [ --tok ] arg (=none) tokeniser: none, repp, chasen,yy (default: none) Tokeniser will be applied to the item string in the item file. (YY and ChaSen unimplemented as yet. Let Rebecca know if you want them.) -r [ --rpp ] arg tokenizer .rpp file -c [ --call ] arg rpp calls, multiple options valid. -p [ --pos ] Use TnT POS tags. -m [ --model ] arg TnT model, defaults to WSJ model in LOGON tree. -i [ --infl ] Tags include morphological inflection rules. --format arg (=TNT) token format:TNT,CANDC,FSC (default: TNT) -n [ --num ] Output item and parse number. -l [ --limit ] arg (=0) Number of readings at which a context is ignored. Set to nbest to negate the effect of using a model during parsing. eg: ./tagextract -t repp -c ascii -c xml -c latex -c wiki \ -r $LOGONROOT/lingo/erg/rpp/tokenizer.rpp \ $LOGONROOT/lingo/erg/english.tdl $LOGONROOT/lingo/erg/tsdb/gold/ws01