Lab 4 Writeup

The facts: In Armenian, the sentence "I can eat glass" is:

  es     apaki         krnam   utel
  I-NOM  glass-NOM/ACC can-1SG eat-INF

"es" is the first person singular nominative pronoun.  "apaki" is the
nominative/accusative form.  "krnam" is an auxiliary verb meaning "to
be able" that gets the inflection, and "utel" is the infinitive.

The sentence "It doesn't hurt me" is:

  an     intsi  chhi       vnaser
  it-NOM me-DAT not-is-3SG hurt-NEGPART

"an" is the third person singular nominative pronoun.  "intsi" is the
first person singular dative pronoun, because the verb "vnasel" takes
a dative object (which is nice, because it lets us make sure that case
agreement is being passed up correctly).  "chhi" is the third person
singular of the negative form of the copula, and it takes "vnaser", a
special negative participle form (which is very close to the
infinitive: -l --> -r), as an argument.

Before I started work on the lab, I had to replumb some morphology
that I'd faked up in the previous labs.  As you may recall, most
Armenian verbs in the present and imperfect tenses take the prefix
particle "ke-/k-".  I had previous just put this particle directly in
the lexicon.  There are a few exceptional verbs that don't take
"ke-/k-", including the copula and "to be able" above, and I could
have just entered those in the lexicon without the particle.  However,
that wasn't going to be sufficient: I need the infinitive and negative
participle forms of the verbs, and neither of those forms take
"ke-/k-".

To address this, I first replaced all the verb lexical entries, which
had been in the "-l" infinitive form, with a bare stem.  Then I wrote
lexical rules to produce the infinitive and negative participle forms,
adding a feature FORM to VERB as described in the lab.  At first, the only
values of FORM were "inf" and "neg", but this turned out to be
insufficient, as we'll see below.

Getting all the forms of inflection to play together was tricky.  I
went through many iterations before I finally settled on the final
system.  I override the dfinition of SIGN to add two new boolean
features: KE-MARKED (which records whether a verb has had the "ke-/k-"
prefix added yet) and PN-MARKED (which records whether a verb has been
inflected for person and number).  I did this so that I could force
the lexical rules to apply in precisely one order.  For the infinitive
and negative participle forms, only the lexeme-to-word rules that
build those forms (inf_verb-lex-rule and neg_verb-lex-rule) apply,
leaving the verb marked [FORM inf] or [FORM neg].  For finite verbs,
first the lexeme-to-lexeme ke-marking rule (ke_verb-lex-rule) applies
marking the sign [KE-MARKED +]; then (and only then) the
lexeme-to-word person-and-number rules apply, marking the verb
[PN-MARKED +].  I also enhanced the various phrase structure rules so
that they require [KE-MARKED +, PN-MARKED +] as well as [INFLECTED +],
because otherwise the infinitive and negative participle forms could
be used as main verbs.

That brought me back to where I was at the end of the last lab,
(although I had to tweak all the test sentences in test.all to replace
"ke VERB" with "ke-VERB", since it was now an affix instead of a
separate word).

[Aside: the work described up to here was actually interleaved with
the rest of the lab described below, but I didn't keep track of all
the twisty little passages, and separating them makes for a more
straightforward writeup.]

To implement the potential form, I created potential-aux-verb-lex,
which takes an infinitive verb as its first complement, and constrains
its subject and other complements to be the same as the subject and
complements of the verbal complement.  It is also [KE-MARKED +], since
the verb "krnal" does not take the "ke-" prefix.  While NP complements
come before the V in the unmarked sentence order, V complements come
*after* the verb.  To deal with this, I created a new phrase structure
rule, aux-head-comp-phrase, that is head-initial instead of
head-final.  I also constrained the old head-comp-phrase rule to only
work with [HEAD noun] phrases, and the new aux rule to only work with
[HEAD verb] phrases, to prevent accepting many odd word orders.  The
one that should work is SOAV (subject, object, aux, verb), but without
those constraints I also allowed (and over-generated) SAVO, SOVA, and
several others.

The semantics of the potential fell out nicely from the example
provided in the lab.  Although the handles are in a different order,
they encode an equivalent semantic structure.

To implement the negative form, I added a new lexical item for the
negative copula (which will have to be broken down into the positive
copula and a negative inflection if we need the positive copula
later).  This has an irregular form in the third-person singular --
see irregs.tab.  This copula also behaves as an auxiliary verb, so I
was able to make another new verb lex-item called
negative-aux-verb-lex based on the potential verb one above.  The same
new phrase structure rule worked for this negative auxiliary.

However, the semantics of this new lex-item turned out wrong.  Rather
than being some form of adverb, it was a verb, so its PRED became
associated with an event instead of the handle of the hurt relation.
To fix this, I stopped deriving negative-aux-verb-lex from
basic-verb-lex, and instead derive it from basic-scopal-adjective-lex,
adding the additional CONT.HCONS constraint to make a qeq associating
the negative relation with the hurt relation.  It still has [HEAD
verb], though, so that it will be inflected properly.  Deriving a verb
from an adverb but still subjecting it to verbal inflection in this
way is either a crime against nature or an elegant solution.

At this point, I could parse all the correct sentences, but I had a
serious overgeneration problem.  If I parsed a sentence with an
auxiliary verb and then generated, I also got a three spurious
versions of the sentence: with a finite verb as the verbal complement,
and with a finite or non-finite form with "ke-" added.  This was
because pn-marking and ke-marking did not add anything to the SYNSEM
of the verb, so ke-marked verbs were compatible with all FORM values,
and so would unify with lexical items that wanted either [FORM neg] or
[FORM inf].  The solution to this was to add one more value of FORM,
"fin" for finite, which is added by the ke-marking rule.

I've made a new test.items that includes correct test sentences and
incorrect variations of them, including:

  mard e   krnay khnanal
  man  the can   to-sleep
  "the man can sleep"

  mard e   kov e   krnay utel
  man  the cow the can   to-eat
  "the man can eat the cow"

  es krnam phrrshtal
  I  can   to-sneeze
  "I can sneeze"

  es kov e   krnam utel
  I  cow the can   to-eat
  "I can eat the cow"

and, of course,

  es apaki krnam utel
  "I can eat glass"

  an intsi chhi vnaser
  "It doesn't hurt me"

These new sentences are included in the revised test.all as well.  All
the right ones parse, all the wrong ones don't parse, and the
glass/hurt sentences don't overgenerate.