SETTING UP SUPERTAGGING FOR DELPH-IN GRAMMARS Preparing the grammar 1. Add STAG related features to the token structure. (For the ERG, adding the included stag_utils/supertagger-types.tdl to the grammar directory and including it in the appropriate grammar files was sufficient.) 2. Modify the tokenmapping rules as required to pass the STAG feature through. 3. Modify POS-related tokenmapping rules to separate POS tags and supertags. 3a. Optionally, modify POS-related tokenmapping rules to separate off the lexical rules and put them in their own +SLR feature. 4. Create the lexical type addendum files and add them to the grammar after the lextypes definitions: cat lexicon.tdl |stag_utils/lexicon2supertags.pl > supertagger-letypes.tdl maybe annotate the gle types: cat gle.tdl |stag_utils/lexicon2supertags.pl > supertagger-gle.tdl or else create a mapping file to be used by tagextract: - cat lextypes.tdl|stag_utils/getmapping.pl > genle.map - fix lines marked FIXME so that format is: old_letype ne_letype (optional lex rule) at present, only an le type, or an le type with a single lexical rule can be the result of the mapping. 4a. Optionally, add restrictions to the lexical rule definitions: cat inflr.tdl | stag_utils/add_slrs.pl > inflr-slr.tdl cat lexrinst.tdl | stag_utils/add_slrs.pl > lexrinst-slr.tdl (If you create new files, rather than overwriting, as above, make sure you edit the grammar file to load the new files instead of the originals.) 6. Recompile (flop) the grammar. Parsing 0. Follow the instructions in README.tagextract to compile tagextract 1. Extract tags for training the supertagger model ~/STAG/tagextract/tagextract $LOGONROOT/lingo/terg/english.tdl \ $LOGONROOT/lingo/terg/tsdb/gold/wescience > ws.tags possibly useful options: -i to get tags with lexical rules attached, rather than just the lex type -m to have (gen) lex types mapped according to map_file 2. Train a supertagger TNT: $LOGONROOT/coli/bin/linux.x86.32/tnt-para -o ws_model ws.tags 4. Parse - using PET internal tagging: * svn co https://pet.opendfki.de/repos/pet/branches/tagger * compile as normal * add stag_utils/mytagger.set to your grammar setting directory, making sure taggers is set to "tnt stnt" * set LOGONCHEAP variable to the just-compiled cheap * set up a CPU with options: "-tsdb" "-packing" "-repp" "-tagger=mytagger" "-cm" "-default-les=all" "-memlimit=1024" "-timeout=60" * run $LOGONROOT/parse as normal, using this CPU - tagging externally * create YY input where the supertags are included in the POS tag field, differentiating tags by enclosing them in tnt[] or stnt[] (or whatever else your tokenmapping rules expect) * parse YY input with options: -packing -yy -cm -default-les=all -memlimit=1024 -timeout=60