Gpred is for `grammar' predicates: these have to be enumerated in the SEM-I, but much too frequent for the DTD. I am not sure that Realpred should be empty - possibly the lemma should be an element rather than an attribute Note that the LKB code does not treat pred case as significant - the only place where case is respected is string-valued arguments (CARG and ??) POS - now lowercase and obligatory (v|n|j|r|p|q|c|x|u|a|s) v - verb n - noun a - adjective or adverb (i.e. supertype of j and r) j - adjective r - adverb s - verbal noun (used in Japanese and Korean) p - preposition q - quantifier (needs to be distinguished for scoping code) x - other closed class u - unknown The implicit hierarchy is n := u v := u a := u j := a r := a s := n, s:= v p := u q := u c := u x := u Labels are always implicitly of the same sort (i.e., label) Var is the old arg: the list of attributes must be determined by the Matrix/SEM-I, so the current list is still preliminary Note that the reason for not making vids be IDs in the XML sense is two-fold: a) it's too much hassle to ensure uniqueness in a document b) this isn't supported in RELAX NG - argument is this is the sort of thing that's part of the semantics, not the syntax, which I tend to agree with - there's lots of things that have to be checked outside the DTD anyway Rargnames should be enumerated in the SEM-I, but I don't propose to do this in the DTD cfrom and cto are on eps and rmrs - they might also end up on the other main elements (in-groups, hcons, rargs). Having them on preds seems too restrictive/cumbersome - they should be associated with minimal information units. cfrom and cto default to -1 if unknown In-groups are used for predicates which have the same handle in the MRS - they are grouped in one in-group but given unique handles for the RMRS. I [AAC] am still not entirely happy with in-groups. The surface attribute on rmrs and ep is used to give the original surface string. It is optional. The base attribute is used for languages where the current parsing engines cannot represent the lemma properly. For example, in Jacy the lemma is in ASCII, but the base form should often be Chinese characters. This attribute is depreciated - as the lemma attribute in realpred is a string, in theory it can handle the real base form (we can use all of unicode). When all the tools can handle it, then we can remove it. For the moment, base should only be added if it is different from both lemma and string. ident is an attribute on rmrs's to identify which utterance they belong with. The HoG currently uses a wrapper around the RMRS, with identifying information there instead. Hinoki uses the ident identifier but may switch to a wrapper, in which case ident may be removed. In any case it is optional. 04/09 changed the DTD to conform with 09/02 ERG SEM-I VPM though note the use of plus and minus for + and minus