;;; -*- Coding: utf-8 -*- (lkb::do-parse-tty "犬 が 暑 そう だ") * 2007-10-10 split lexicon into n,v,a,r,s,x,... propername ? * fix numbers bug * add 1つ etc * lose messages * fix ORTH/STEM and generation/MWE? FCB 2006-11-21 We don't get the right parses for "犬 が 猫 が 行く と 思う" "猫 が 行く と 犬 が 思う" These should be roughly the same. FCB 2006-06-06 Change compounds-rule from NP-NP to N-N. FCB 2006-05-31 Use irregulars.tab ;;; 五段・ワ行ウ音便 ;; c2stem past form (regular would be 乞い) ;乞う t-lexeme-c-stem-infl-rule 乞う ;こう t-lexeme-c-stem-infl-rule こう ;問う t-lexeme-c-stem-infl-rule 問う ;とう t-lexeme-c-stem-infl-rule とう ;五段・カ行促音便 --- 行く ;行っ t-lexeme-c-stem-infl-rule 行く Update script.chasen ------------------------------------------------------------------------ FCB 2005-05-25 ------------------------------------------------------------------------ (do-parse-tty "する 木") spurious ambiguity with suru-soc ------------------------------------------------------------------------ FCB 2005-04-25 ------------------------------------------------------------------------ [116156] `きょろきょろ 。' (2) result # 1: scoping: `in 5 qeq NIL, NIL is not a label'. scoping: `no valid scope(s)'. ------------------------------------------------------------------------ FCB 2005-04-25 ------------------------------------------------------------------------ (do-parse-tty "綺麗 そう だ") Spurious ambiguity -- unifies with na-end-rule and head-specifier ------------------------------------------------------------------------ FCB 2005-04-25 ------------------------------------------------------------------------ (do-parse-tty "「 検察官 」") scope problems with defintion quotation ------------------------------------------------------------------------ FRANCIS ------------------------------------------------------------------------ ERROR with TEKI cheap "cannot instantiate base of full form"? LKB seems to parse fine! ------------------------------------------------------------------------ FRANCIS ------------------------------------------------------------------------ ERROR with parse of "東京 へ の 道" (and in general all such postposition -no sentences) ------------------------------------------------------------------------ FRANCIS ------------------------------------------------------------------------ Lexical types fail to unify Spurious Ambiguity (do-parse-tty "心 が 満たさ れ ない") TAI-OBJ-CHANGE-RULE SF fixed by adding KEY - Now only get adversative passive! ------------------------------------------------------------------------ FRANCIS ------------------------------------------------------------------------ some miscellaneous words 3 error: no lexicon entries for "どうし". 2 error: no lexicon entries for "にあたる". 1 error: no lexicon entries for "路". 1 error: no lexicon entries for "分の". 1 error: no lexicon entries for "団". 1 error: no lexicon entries for "相". 1 error: no lexicon entries for "製", "製". 1 error: no lexicon entries for "どおり". 1 error: no lexicon entries for "じゅう". 1 error: no lexicon entries for "ざれ". 1 error: no lexicon entries for "げ". 1 error: no lexicon entries for "がま". 1 error: no lexicon entries for "かぎり". ------------------------------------------------------------------------ FRANCIS ------------------------------------------------------------------------ * "tari" (as in "V tari V tari suru") can modify a non-suru verb I think this could be restricted more. e.g. (do-parse-tty "叱っ たり 注意 し たり する 時 に 発する 語") ------------------------------------------------------------------------ FRANCIS ------------------------------------------------------------------------ And more of a wishlist/ do we really need to do this kind of thing - * Among the verbs I added were slightly archaic verbs like 「生ずる」 "shou-zuru". At the moment, we can't inflect these. It would be trivial to as (ze ji) etc to infl.tdl, but all these verbs also have a more modern counterpart e.g. 「生じる」 "shou-jiru" which gives the same inflectional forms and has the same meaning. It would be nifty to have them as a special class in some way. But i would need to discuss them more with a knowledgable Japanese linguist before doing anything. I will ask Akira Ohtanai next tim we have some time. * Also added were verbs like 「関する」"kansuru" which pretty much only turn up as pseudo postpositions 「に関する」"nikansuru". Matsumoto Yo has a nice paper about these somewhere (I read it in a bookshop and don't actully own a copy or remember the title). * Some of the verbs are those where Melanie and I disagree on the subcat (intransitive vs v2). For the moment I put them in as v2, but I realize that further discussion is needed. ------------------------------------------------------------------------ FRANCIS ------------------------------------------------------------------------ The suffix file contained a fair few problematic entries. Many of them we can now parse, but the semantics (and possibly syntax) is definitely not so correct. E.G. 団 "dan" which doesn't have the group meaning げ "ge" as in sumu-ge (coldish) which doesn't get the POS conversion etc. ------------------------------------------------------------------------ FRANCIS ------------------------------------------------------------------------ I have tried to comment all the entries I thought needed more work (with FIXME) and also those which I wasn't sure about (CHECKME). I expect Takaaki "NP" Tanaka will be looking at them soon (^_^). I haven't been able to check the grammar for generation as it ran out of memory while indexing (1GB memory). Our new machine have arrived, but I haven't had time to play with them yet. Ben Waldron at Cambridge is working on moving the lexicon to PostgreSQL, which should make some things easier. ------------------------------------------------------------------------ FRANCIS ------------------------------------------------------------------------ P.S. The suffix file also included a fair few verbs which operate as the second verb in V-V constructions, I didn't add them to the official grammar, but list them here in case anyone is interested (^_^): ; ; 切れる ; ; JACY 切れる: kireru-stem (intrans-v-stem-lex) ; ; Used 1 times ; kireru_2 := ordinary-suffix-n-lex & ; [SYNSEM.LOCAL.KEYS.KEY.PRED 'kireru_2, ; ORTH ]. ; くたびれる ; Used 1 times ; (TT 2003-08-07) ; FIXJACY: 複合動詞を生成する(lexical typeが必要) ; e.g. 「待ちくたびれる」 ; kutabireru_2 := ordinary-suffix-n-lex & ; [SYNSEM.LOCAL.KEYS.KEY.PRED 'kutabireru_2, ; ORTH ]. ; ; 果てる ; hateru_2 := ordinary-suffix-n-lex & ; [SYNSEM.LOCAL.KEYS.KEY.PRED 'hateru_2, ; ORTH ]. ; ; 惚れる ; horeru_2 := ordinary-suffix-n-lex & ; [SYNSEM.LOCAL.KEYS.KEY.PRED 'horeru_2, ; ORTH ]. ; ; 詰める ; ; JACY 詰める: tsumeru-stem (v1-v-stem-lex) ; ; Used 3 times ; tsumeru_3 := ordinary-suffix-n-lex & ; [SYNSEM.LOCAL.KEYS.KEY.PRED 'tsumeru_3, ; ORTH ]. ; ; 通す ; ; JACY 通す: toosu-stem (v1-c-stem-lex) tousu-stem (v1-c-stem-lex) ; ; Used 5 times ; toosu_1 := ordinary-suffix-n-lex & ; [SYNSEM.LOCAL.KEYS.KEY.PRED 'toosu_1, ; ORTH ]. ; ; 飛ばす ; ; JACY 飛ばす: tobasu-stem (v1-c-stem-lex) ; ; Used 2 times ; tobasu_1 := ordinary-suffix-n-lex & ; [SYNSEM.LOCAL.KEYS.KEY.PRED 'tobasu_1, ; ORTH ]. ; ; 直す ; ; JACY 直す: naosu-stem (v1-c-stem-lex) ; ; Used 4 times ; naosu_1 := ordinary-suffix-n-lex & ; [SYNSEM.LOCAL.KEYS.KEY.PRED 'naosu_1, ; ORTH ]. ; 成す ; Used 5 times ; (TT 2003-08-07) ; FIXJACY: 複合動詞を生成する(lexical typeが必要) ; e.g. 「つくり成す」 ; nasu_1_2 := ordinary-suffix-n-lex & ; [SYNSEM.LOCAL.KEYS.KEY.PRED 'nasu_1_2, ; ORTH ]. ; ; 習わす ; narawasu_1 := ordinary-suffix-n-lex & ; [SYNSEM.LOCAL.KEYS.KEY.PRED 'narawasu_1, ; ORTH ]. ; ; 抜く ; ; JACY 抜く: nuku-stem (v1-c-stem-lex) ; ; Used 3 times ; nuku_1 := ordinary-suffix-n-lex & ; [SYNSEM.LOCAL.KEYS.KEY.PRED 'nuku_1, ; ORTH ]. ; ; 果たす ; ; JACY 果たす: hatasu-stem (v1-c-stem-lex) ; ; Used 2 times ; hatasu_1 := ordinary-suffix-n-lex & ; [SYNSEM.LOCAL.KEYS.KEY.PRED 'hatasu_1, ; ORTH ]. ; ; 回す ; ; JACY 回す: mawasu-stem (v4-c-stem-lex) ; ; Used 2 times ; mawasu_1 := ordinary-suffix-n-lex & ; [SYNSEM.LOCAL.KEYS.KEY.PRED 'mawasu_1, ; ORTH ]. ; ; 渡す ; ; JACY 渡す: watasu-stem (v1-c-stem-lex) ; ; Used 9 times ; watasu_1 := ordinary-suffix-n-lex & ; [SYNSEM.LOCAL.KEYS.KEY.PRED 'watasu_1, ; ORTH ]. ; ; 得る ; ; JACY 得る: eru-stem (v1-v-stem-lex) ; ; Used 25 times ; eru_2_1 := ordinary-suffix-n-lex & ; [SYNSEM.LOCAL.KEYS.KEY.PRED 'eru_2_1, ; ORTH ]. ; ; 掛ける ; ; JACY 掛ける: kakeru-kanji_telephone-stem (v1-v-stem-lex) ; ; Used 18 times ; kakeru_2 := ordinary-suffix-n-lex & ; [SYNSEM.LOCAL.KEYS.KEY.PRED 'kakeru_2, ; ORTH ]. ; 兼ねる ; Used 1 times ; (TT 2003-08-07) ; FIXJACY: 複合動詞を生成する(lexical typeが必要) ; e.g. 「見兼ねる」 ; kaneru_2 := ordinary-suffix-n-lex & ; [SYNSEM.LOCAL.KEYS.KEY.PRED 'kaneru_2, ; ORTH ]. ; 建てる ; Used 9 times ; (TT 2003-08-07) ; ORTH 「建てる」->「立てる」 ; FIXJACY: 複合動詞を生成する(lexical typeが必要) ; e.g. 「並べ立てる」「述べ立てる」 ; tateru_2 := ordinary-suffix-n-lex & ; [SYNSEM.LOCAL.KEYS.KEY.PRED 'tateru_2, ; ORTH ]. ; ; 付ける ; ; JACY 付ける: tsukeru_attach-stem (v1-v-stem-lex) ; ; Used 82 times ; tsukeru_2 := ordinary-suffix-n-lex & ; [SYNSEM.LOCAL.KEYS.KEY.PRED 'tsukeru_2, ; ORTH ]. ; ; 続ける ; ; JACY 続ける: tsuzukeru-stem (v1-v-stem-lex) tsudukeru-stem (v1-v-stem-lex) ; ; Used 12 times ; tsudukeru_2 := ordinary-suffix-n-lex & ; [SYNSEM.LOCAL.KEYS.KEY.PRED 'tsudukeru_2, ; ORTH ]. ; ; 始める ; ; JACY 始める: hajimeru-stem (v1-v-stem-lex) ; ; Used 14 times ; hajimeru_2 := ordinary-suffix-n-lex & ; [SYNSEM.LOCAL.KEYS.KEY.PRED 'hajimeru_2, ; ORTH ]. ; ; 上げる ; ; JACY 上げる: ageru-stem (v1-v-stem-lex) ; ; Used 11 times ; ageru_2 := ordinary-suffix-n-lex & ; [SYNSEM.LOCAL.KEYS.KEY.PRED 'ageru_2, ; ORTH ]. ------------------------------------------------------------------ The latest grammar still has about 1% less cover than our version of Oct 14. It appears a couple of verbs we changed haven't been merged - notably different subcats of "omou" "naru" and I think "ikiru". On the other hand, we are producing far fewer bad MRSs due to various improvements Melanie added in, and we have lost some of the more annoying spurious ambiguity (and then added most of it back in with a new "ni" particle ...). ------------------------------------------------------------------ FRANCIS ------------------------------------------------------------------------ I was unable to generate with the latest grammar because the lexicon was so big I ran out of memory indexing ... ------------------------------------------------------------------ FRANCIS ------------------------------------------------------------------------ Still to do: - get the cover up to where it was (and beyond) - fix the MRSs there are still some with "string" appearing in them ... - update the trees to the new grammar - relearn the stochastic model - redo the knowledge acquisition Specific things people are working on/looking at that I know about - Chikara - V-V, relative clauses - Akira and Miyata - paraphrasing - Ohtani maybe also "koko-yori higashi" - Atsuko - verbal-nouns - Wai-lok, Kyonghee (and me) - numeral classifiers - Taka (and me) - NP analysis - Sanae (and me) - Verb classes - Eric (and me) - Improving the knowledge acquisition ----------------------------------------------------------------------- Kuribayashi ---------------------------------------------------------------------- 「ノーモア」(学研辞書では感動詞) 仮にordinary-prefixで登録しているが、検討の必要あり。 ex.「ノーモア 広島」 「ノーモア 戦争」 「ノーモア 産廃」 「ノーモア」を一語のみで使えるかどうかも検討の必要あり。 --------------------------------------------------------------------------- Kuribayashi --------------------------------------------------------------------------- 後に「の」をとって名詞を修飾するna-adj,adv ex.「まあまあの出来」(=「まあまあな出来」) 「自然の成り行き」(=「自然な成り行き」) 「逆の見方」(=「見方が逆である」) ex.「まさかの事態」 「さしあたっての予定」 「例えばの話」 ex.「当然の報い」 「かなりの時間」 ex.「普通の人」(=「普通な人」) 「正反対の立場」(=「正反対な立場」) 「ぎりぎりの交渉」(=「ぎりぎりな交渉」) <後に「の」をとれないna-adj> ex.「急の転勤」 「有名の小説」 「オープンのスペース」 ------------------------------------------------------------------------------- Kuribayashi ------------------------------------------------------------------------------- <トタル型活用の形容動詞> ex.「堂々」「あ然」 このようなトタル型の活用をする形容動詞も、na-adjで登録されてしまっている。 そのため、 ex.「堂々と入場する」 のような文は解析できなくなっている。 -------------------------------------------------------------------------------- Kuribayashi -------------------------------------------------------------------------------- <名詞+と+na-adj><名詞+と+na-adj+の+名詞> ex.「同一」「反対」「逆」「一緒」など ex.「正しい向きと(orとは)逆だ」 「彼と同一の意見」 「彼と(orとは)反対の意見」 これらは現在正しく解析できていない。 -------------------------------------------------------------------------------- Kuribayashi -------------------------------------------------------------------------------- ex.「移動」 この場合、 ex.「車で市内を移動する」 「箱を右に移動する」 という2つの文において、「移動」はvn-trans1で選択することになるが、 この2つの例を同一視してしまってよいか。 ---------------------------------------------------------------------------------- Kuribayashi --------------------------------------------------------------------------------- 「せっかく」「折角」(普通名詞・副詞) 学研辞書や広辞苑では普通名詞として以下の2つの用法をあげている。 ex.「せっかく(折角)の休日なのに風邪をひいてしまった」 「せっかく(折角)ですが、お断りさせていただきます」 ただし、名詞的用法はこの2つのパターンに限られること、複合名詞を 作らないことなど、普通名詞(ordinary-nohon-n-lex)での登録 に疑問がある。 --------------------------------------------------------------------------------- Kuribayashi --------------------------------------------------------------------------------- v1-kurusuru-stem(「発する」など) lexicaltypeがv1-kurusuru-stemのものを選択すると、その文章が疑問文 なのか聞いてくる(=UTTERANCE_RULE-DECL-FINITEかUTTERANCE_RULE- WH-WITHOUT-KAかの選択が出る)。 また、「どう する」でv1-kurusuru-stemを選ぶと、強制的に疑問文になる。 ただし、「する」「要する」「愛する」「達する」「屈する」は例外。 ======================================================================== ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ ======================================================================== Fixed * converted to utf-8 * fixed preposition order ------------------------------------------------------------------------ FRANCIS ------------------------------------------------------------------------ (do-parse-tty "日常 の 行い が 確か で ある 。") There is a difference in the parse output between pet and the lkb Only pet gives - utterance_rule-wh-without-ka ! As pet is less restrictive than the lkb, it causes immense trouble when updateing or annotating. lkb/pet did different things if a rule inhereited from two supertypes directly. We no longer do so!