SETTING UP SUPERTAGGING FOR DELPH-IN GRAMMARS Preparing the grammar 1. Add STAG related features to the token structure. (For the ERG, adding the included stag_utils/supertagger-types.tdl to the grammar directory and including it in the appropriate grammar files was sufficient.) 2. Modify the tokenmapping rules as required to pass the STAG feature through. 3. Create the lexical type addendum files and add them to the grammar after the lextypes definitions: cat lexicon.tdl |~/STAG/stag_utils/lexicon2supertags.pl > supertagger-letypes.tdl cat gle.tdl |~/STAG/stag_utils/lexicon2supertags.pl > supertagger-gle.tdl 4. Recompile (flop) the grammar. Parsing 0. Follow the instructions in README.tagextract to compile tagextract 1. Extract tags for training the supertagger model ~/STAG/tagextract/tagextract -t repp \ -r $LOGONROOT/lingo/terg/rpp/tokenizer.rpp \ -c ascii -c xml -quotes -c wiki $LOGONROOT/lingo/terg/english.tdl $LOGONROOT/lingo/terg/tsdb/gold/wescience > ws.tags 2. Train a supertagger TNT: $LOGONROOT/coli/bin/linux.x86.32/tnt-para -o ws_model ws.tags 3. Create item files with FSC-formatted items, including the supertags eg: zcat $LOGONROOT/lingo/terg/tsdb/gold/mrs/item.gz|cut -d@ -f1,7 \ |~/STAG/repp/repp -c wiki -c ascii -c xml -c quotes \ --format FSC -t $LOGONROOT/lingo/terg/rpp/tokenizer.rpp \ |~/STAG/stag_utils/supertagfsc.pl ws_model > item.mrs.fsc 4. Parse with the LOGON machinery: * create skeleton(s) from these item files * set up a CPU with no preprocessor, no reader, no tagger, and -tok=fsc, instead of -yy as an option to cheap -n (not -t) * parse as normal with PET: * cat item.mrs.fsc|cut -d@ -f7 |cheap -n -tok=fsc -packing -cm \ -default-les=all -memlimit=1024 -timeout=60 \ -tsdbdump=$TSDBHOME/supertagged