SETTING UP SUPERTAGGING FOR DELPH-IN GRAMMARS

Preparing the grammar

1. Add STAG related features to the token structure. (For the ERG, adding the
	included stag_utils/supertagger-types.tdl to the grammar directory and 
	including it in the appropriate grammar files was sufficient.)

2. Modify the tokenmapping rules as required to pass the STAG feature through.

3. Modify POS-related tokenmapping rules to separate POS tags and supertags.

3a. Optionally, modify POS-related tokenmapping rules to separate off the 
	lexical rules and put them in their own +SLR feature.

4. Create the lexical type addendum files and add them to the grammar after the
	lextypes definitions:

	cat lexicon.tdl |stag_utils/lexicon2supertags.pl > supertagger-letypes.tdl 

	maybe annotate the gle types: 
		cat gle.tdl |stag_utils/lexicon2supertags.pl > supertagger-gle.tdl 

	or else create a mapping file to be used by tagextract:
		- cat lextypes.tdl|stag_utils/getmapping.pl > genle.map
		- fix lines marked FIXME so that format is:
		 old_letype	ne_letype (optional lex rule)
		 
		at present, only an le type, or an le type with a single lexical rule can
		be the result of the mapping.

4a. Optionally, add restrictions to the lexical rule definitions:

	cat inflr.tdl | stag_utils/add_slrs.pl > inflr-slr.tdl
	cat lexrinst.tdl | stag_utils/add_slrs.pl > lexrinst-slr.tdl

	(If you create new files, rather than overwriting, as above, make sure you
	edit the grammar file to load the new files instead of the originals.)

6. Recompile (flop) the grammar.


Parsing

0. Follow the instructions in README.tagextract to compile tagextract

1. Extract tags for training the supertagger model

	~/STAG/tagextract/tagextract $LOGONROOT/lingo/terg/english.tdl \
	$LOGONROOT/lingo/terg/tsdb/gold/wescience > ws.tags

	possibly useful options:
		-i to get tags with lexical rules attached, rather than just the lex type
		-m <map_file> to have (gen) lex types mapped according to map_file

2. Train a supertagger

	TNT: $LOGONROOT/coli/bin/linux.x86.32/tnt-para -o ws_model ws.tags

4. Parse

	- using PET internal tagging:
		* svn co https://pet.opendfki.de/repos/pet/branches/tagger
		* compile as normal
		* add stag_utils/mytagger.set to your grammar setting directory,
			making sure taggers is set to "tnt stnt"
		* set LOGONCHEAP variable to the just-compiled cheap
		* set up a CPU with options:
			"-tsdb" "-packing" "-repp" "-tagger=mytagger" "-cm" 
			"-default-les=all" "-memlimit=1024" "-timeout=60"
		* run $LOGONROOT/parse as normal, using this CPU

	- tagging externally
		* create YY input where the supertags are included in the POS tag field,
		  differentiating tags by enclosing them in tnt[] or stnt[] (or whatever 
		  else your tokenmapping rules expect)
		* parse YY input with options:
			-packing -yy -cm -default-les=all -memlimit=1024 -timeout=60