意味表現の学習

意味表現の学習 2014/06/05 PFI セミナー能地宏総研大D2 / NII

意味とはなにか？ ‣ 真面目に考えるのは言語哲学の分野？ -‐ 例えば：検証主義（クワイン） -‐ 文の意味とは、文の検証条件、つまり、文が真であることを示すことにつながるような可能な経験の集合に他ならない。 ‣ 意味とは何かは考えたくない（博士が取れない） ‣ 言語処理では、意味の計算によって次のような問題が解けるようになることを目指す -‐ 質問応答（QA） -‐ 含意関係認識 -‐ 記事の分類（クラスタリング）、評判分析、etc 1 /37

今日の焦点：質問応答 ‣ 目標：質問文の意味を理解して、データベースから答えを探すようなシステムを構築する Liang et al.’11 The Big Picture What is the most populous city in California? Database System Los Angeles ensive: logical forms le & Mooney, 1996; Zettlemoyer & Collins, 2005] ong & Mooney, 2007; Kwiatkowski et al., 2010] at is the most populous city in California? argmax( x.city(x) ^ loc(x, CA), x.pop.(x)) w many states border Oregon? count( x.state(x) ^ border(x, OR) · They allow us to temporarily sidestep intractable philosophical ques5ons on how to represent meaning in general. Liang et al.’13 2 /37

今日の範囲 ‣ 質問応答の分野での、意味の表現についての議論 -‐ 学習については多分あまり話しません -‐ 主に二つの意味表現の紹介と、両者の最近の進展について CCG g algo- original he lexi- f l and to the ide for handle y. For ston ` ﬂights from Boston N (NN)/NP NP lx.flight(x) lyl flx.f(x)^ from(x,y) bos > (NN) l flx.f(x)^ from(x,bos) < N lx.flight(x)^ from(x,bos) Given analyses of this form, we introduce new templates that will allow us to recover from miss- ing words, for example if “from” was dropped. We identify commonly occurring nodes in the best parse trees found during training, in this case the non- DCS New: Dependency-Based Compositional Semanti most populous city in California 1 1 1 1 cc argmax population 2 1 CA loc city 3 /37

問題設定：意味表現の学習The Big Picture What is the most populous city in California? Database System Los Angeles ve: logical forms Mooney, 1996; Zettlemoyer & Collins, 2005] Mooney, 2007; Kwiatkowski et al., 2010] the most populous city in California? ax( x.city(x) ^ loc(x, CA), x.pop.(x)) ny states border Oregon? t( x.state(x) ^ border(x, OR) 自然言語を、コンピュータの理解できる意味表現に変換論理式（プログラミング言語） Database Expensive: logical forms [Zelle & Mooney, 1996; Zettlemoyer & Collin [Wong & Mooney, 2007; Kwiatkowski et al., What is the most populous city in Cali ) argmax( x.city(x) ^ loc(x, CA), x How many states border Oregon? ) count( x.state(x) ^ border(x, OR) · · · Database System Los Angeles Expensive: logical forms [Zelle & Mooney, 1996; Zettlemoyer & Collins, 2005] [Wong & Mooney, 2007; Kwiatkowski et al., 2010] What is the most populous city in California? ) argmax( x.city(x) ^ loc(x, CA), x.pop.(x)) How many states border Oregon? ) count( x.state(x) ^ border(x, OR) · · · その他の表現 New: Dependency-Based Compositional Sema most populous city in California 1 1 1 1 cc argmax population 2 1 CA loc city 文意味表現答え曖昧性がある難しい！決定的簡単！文が与えられたとき、正解の意味表現もしくは答えを求められれば良い教師あり学習 4 /37

教師データの与え方文意味表現答え曖昧性がある難しい！決定的簡単！意味表現文のペア答え文のペアアノテートが高コスト学習はより簡単非専門家がアノテートできる学習が難しい How many states border Oregon? count(λx.state(x) ∧ border(x, OR) How many states border Oregon? 3 5 /37

大きく二つの意味表現 CCG + 論理式系 DCS系他に Tree Grammar 系もあるが省略文と論理式のペアから学習文と答えのペアから学習 Ze#lemore & Collins’05 Ze#lemore & Collins’07 Kwiatkovski et al.’10 Kwiatkovski et al.’11 ・・・ Liang et al.’11 Berant et al.’13 Berant and Liang’14 Kwiatkovski et al.’13 Artzi & Ze#lemore’11 Artzi & Ze#lemore’13 Matsuzek et al.’12 QA以外 6 /37

問題設定の確認 ‣ 自然文から、論理式への変換を行う分類器を構築したい -‐ 機械翻訳に似ている？（そういう手法もある） -‐ 論理式は構造を持っていることが異なる -‐ 関数の合成によって式を得たい -‐ 文の構造に沿って論理式の計算ができる枠組みが欲しい -‐ そのための道具として CCG を用いる How many states border Oregon? count(λx.state(x) ∧ border(x, OR)) How many states border Oregon? λf.λg.count(λx.f(x)∧g(x)) λg.count(λx.state(x)∧g(x)) λx.state(x) λx.border(x, OR) count(λx.state(x) ∧ border(x, OR)) λg.count(λx. state(x)∧g(x)) λx.border(x, OR) 7 /37

Combinatory Categorical Grammar CCG = Combinatory rules + Categorical Grammar 文の構造を記述する文法理論の一種依存文法 (Dependency Grammar) John loves Mary sbj obj 文脈自由文法 (CFG) John loves Mary S NP VP 範疇文法 (Categorical Grammar) John loves Mary NP SNP/NP NP SNP S 見た目は CFG と似ているが 8 /37

(CG) • • S, NP, N • – “/” “” – X/Y Y X – XY Y X • – SNP – SNP/NP SNPNP John walked S 宮尾祐介 (2012) 自然言語処理における構文解析と言語理論の関係より 9 /37

組み合わせ規則 John loves Mary NP SNP/NP NP SNP S X/Y Y X ‣ 少数の組み合わせ規則が存在する -‐ forward applicaXon (>) -‐ backward applicaXon (<) Y XY X X と Y にはどんなカテゴリも入る文法が定めるのは、これらの少数のルールだけ 10 /37

組み合わせ規則 John loves Mary NP SNP/NP NP SNP S X/Y Y X Y XY X X と Y にはどんなカテゴリも入る文法が定めるのは、これらの少数のルールだけ ‣ 少数の組み合わせ規則が存在する -‐ forward applica-on (>) -‐ backward applicaXon (<) 10 /37

組み合わせ規則 John loves Mary NP SNP/NP NP SNP S X/Y Y X ‣ 少数の組み合わせ規則が存在する -‐ forward applicaXon (>) -‐ backward applica-on (<) Y XY X X と Y にはどんなカテゴリも入る文法が定めるのは、これらの少数のルールだけ 10 /37

組み合わせ規則 John loves Mary NP SNP/NP NP SNP S X/Y Y X ‣ 少数の組み合わせ規則が存在する -‐ forward applicaXon (>) -‐ backward applicaXon (<) Y XY X CCG の導出は証明の形で表されることが多い loves Mary John SNP/NP NP > SNPNP < S 10 /37

意味表現の計算 ‣ CCGを用いることの利点：木構造に沿って意味の計算が行える -‐ 各単語には、カテゴリと共に、ラムダ式の形で意味表示が与えられる -‐ 各規則は、論理式の合成の仕方も規定する forward applicaXon (>) X/Y Y X f g f(g) backward applicaXon (<) Y XY X g f f(g) loves Mary John SNP/NP NP λx.λy.love(y,x) mary > SNP λy.love(y,mary) NP john < S love(john,mary) John ⊢ NP: john is ⊢ SNP/NP: λx.λy.love(y,x) Mary ⊢ NP: mary 11 /37

CCG-‐based: Overview ZeHlemore & Collins’05,’09 Kwiatkowski et al.’10,’11 n- of se vi- ur us e- x- gh x- to as th a) What states border Texas x.state(x) ^ borders(x, texas) b) What is the largest state arg max( x.state(x), x.size(x)) c) What states border the state that borders the most states x.state(x) ^ borders(x, arg max( y.state(y), y.count( z.state(z) ^ borders(y, z)))) Figure 1: Examples of sentences with their logical forms. • Additional quantifiers: The expressions involve the additional quantifying terms count, arg max, arg min, and the definite operator ◆. An example of a count expression is count( x.state(x)), which returns the number of entities for which state(x) is true. arg max expressions are of the form arg max( x.state(x), x.size(x)). The first argument is a lambda expression denoting some set of entities; the second argument is a function of type he, ri. 教師データ：(文, 論理式) の集合機械学習テスト（評価） How many states border Oregon? ??? 知っていること・CCG の合成規則・各単語のカテゴリのゆるい候補 Y: g XY: f X: f(g) X/Y: f Y/Z: g X/Z: λx.f(g(x)) ・・・正解の木構造は与えられない文の論理式だけをたよりに、モデルのパラメータを学習 12 /37

正解の木構造は与えられない ‣ 一種の distant supervision -‐ 木構造をアノテートする必要がない -‐ 普通の構文解析より難しい -‐ 文法獲得との関連？ b) What states border Texas (S/(SNP))/N N (SNP)/NP NP f. g. x.f(x) ^ g(x) x.state(x) x. y.borders(y, x) texas > > S/(SNP) (SNP) g. x.state(x) ^ g(x) y.borders(y, texas) > S x.state(x) ^ borders(x, texas) e 2: Two examples of CCG parses. that a sin- , and hence ombinatory ategories in mplest such application rules are then extended as follows: (2) The functional application rules (with semantics): a. A/B : f B : g ) A : f(g) b. B : g AB : f ) A : f(g) Rule 2(a) now speciﬁes how the semantics of the category b) What states border Texas (S/(SNP))/N N (SNP)/NP NP f. g. x.f(x) ^ g(x) x.state(x) x. y.borders(y, x) texas > > S/(SNP) (SNP) g. x.state(x) ^ g(x) y.borders(y, texas) > S x.state(x) ^ borders(x, texas) What states border Texas (S/(SNP))/N N (SNP)/NP NP f. g. x.f(x) ^ g(x) x.state(x) x. y.borders(y, x) texas > > S/(SNP) (SNP) g. x.state(x) ^ g(x) y.borders(y, texas) > S x.state(x) ^ borders(x, texas) : Two examples of CCG parses. at a sin- d hence binatory ories in application rules are then extended as follows: (2) The functional application rules (with semantics): a. A/B : f B : g ) A : f(g) b. B : g AB : f ) A : f(g) 目的関数： Latent Variable Structured Perceptron 学習： 13 /37

文法獲得との関連（余談） ‣ 二つのゴール： -‐ Scien-ﬁc: 赤ちゃんが言語を獲得する仕組みを明らかにする -‐ Engineering: 教師データのない言語の解析に役立つ -‐ しかし、赤ちゃんは言語以外の様々なシグナルを利用して文法を獲得する（科学的目的のためには、設定があまり現実的でない）教師なし構文解析 Klein & Manning’04 Smith & Eisner’06 Headden III et al.’09 Mareček & Žabokrtský’11 ・・・ you have another cookie 教師なし学習 you have another cookie 完全に生の文から、モデルを推定する問題 14 /37

文法獲得との関連（余談） ‣ 今回の問題設定 -‐ （文、論理式）のペアから文の構造 (隠れ変数) を推定する -‐ 文法獲得の観点からは、生の文だけで学習するよりも現実的といえる？ ‣ より現実的なタスク： Kwiatkowski et al.’12 一文に対し、複数の候補が与えられたもとでの学習 ac.uk lsz@cs.washington.edu steedman@inf.ed.ac.uk cs † Computer Science & Engineering University of Washington Seattle, WA, 98195, USA b- s- r- h s. i- e of s. gs of propositional uncertainty1, from a set of con- textually afforded meaning candidates, as here: Utterance : you have another cookie Candidate Meanings 8 < : have(you, another(x, cookie(x))) eat(you, your(x, cake(x))) want(i, another(x, cookie(x))) The task is then to learn, from a sequence of such (utterance, meaning-candidates) pairs, the correct どれが正解か分からない 15 /37

どのように学習するか？ ‣ モデルは木構造の上での対数線形モデル -‐ 主に、各単語がどのようなカテゴリと結びつくべきか？を学習する b) What states border Texas (S/(SNP))/N N (SNP)/NP NP f. g. x.f(x) ^ g(x) x.state(x) x. y.borders(y, x) texas > > S/(SNP) (SNP) g. x.state(x) ^ g(x) y.borders(y, texas) > S x.state(x) ^ borders(x, texas) gure 2: Two examples of CCG parses. Note that a sin- ype, and hence application rules are then extended as follows: (2) The functional application rules (with semantics): > > S/(SNP) (SNP) g. x.state(x) ^ g(x) y.borders(y, texas) > S x.state(x) ^ borders(x, texas) 2: Two examples of CCG parses. that a sin- and hence mbinatory egories in plest such as follows: application rules are then extended as follows: (2) The functional application rules (with semantics): a. A/B : f B : g ) A : f(g) b. B : g AB : f ) A : f(g) Rule 2(a) now speciﬁes how the semantics of the category A is compositionally built out of the semantics for A/B and B. Our derivations are then extended to include a compositional semantics. See Figure 2(a) for an example parse. This parse shows that Utah borders Idaho has the syntactic type S and the semantics borders(utah, idaho). 論理式をもとに、単語レベルでありえそうなカテゴリを抽出する S/(SNP)/(SNP): λg.λf.λx.g(x) ∧ f(x) SNP: λx.state(x) SNP: λx.borders(x,texas) S/(SNP): λf.λx.state(x) ∧ f(x) S/S: λx.x ・・・ S/(SNP)/(SNP): λg.λf.λx.g(x) ∧ f(x) SNP: λx.state(x) S/S: λx.x S/(SNP)/(SNP): λg.λf.λx.g(x) ∧ f(x) SNP: λx.state(x) S/S: λx.x What -‐ S/S: λx.x What -‐ SNP: λx.state(x) 42 -‐30 states -‐ SNP: λx.state(x) 63 16 /37

手法の進化 ‣ Zeblemore & Collins’05 -‐ 文と論理式のペアから初めて CCG を学習 -‐ いくつかの機能語のカテゴリは固定する e.g., every ⊢ (S/(S|NP))/N: λf.λg.∀x.f(x) → g(x) ‣ Kwiatkovski et al.’10 -‐ 全ての語のカテゴリを学習する（英語以外でも学習可能に） -‐ 良い初期値を得るために IBM モデル1 を最初に使う ‣ Kwiatkovski et al.’11 -‐ カテゴリのパラメータを分解してスパースネスを抑える Parameter Initialization Compute co-occurrence (IBM Model 1) between words and logical constants Initial score for new lexical entries: average over pairwise weights I want a ﬂight to Boston ` S : x.flight( I want a ﬂight to Boston ` S : x.flight(x) ^ to(x, BOS) Artzi et al.’13 17 /37

大きく二つの意味表現 CCG + 論理式系 DCS系他に Tree Grammar 系もあるが省略文と論理式のペアから学習文と答えのペアから学習 Ze#lemore & Collins’05 Ze#lemore & Collins’07 Kwiatkovski et al.’10 Kwiatkovski et al.’11 ・・・ Liang et al.’11 Berant et al.’13 Berant and Liang’14 Kwiatkovski et al.’13 Artzi & Ze#lemore’11 Artzi & Ze#lemore’13 Matsuzek et al.’12 QA以外 18 /37

文と答えのペアから学習Graphical Model x capital of California? parameters ✓ z 1 2 1 1 CA capital ⇤⇤ database w y Sacramento Semantic Parsing: p(z | x, ✓) (probabilistic) Interpretation: p(y | z, w) (deterministic) 11 ‣ これまでは、CCGの導出を隠れ変数としてモデル化した ‣ DCS では、論理表現を隠れ変数として扱う 19 /37

DCS Dependency-‐based ComposiXonal SemanXcs Basic DCS Trees DCS tree Constraints city c 2 city 1 1 c1 = `1 loc ` 2 loc 2 1 `2 = s1 CA s 2 CA Database city San Francisco Chicago Boston · · · loc Mount Shasta California San Francisco California Boston Massachusetts · · · · · · CA California 例: city in California 部分木は集合を表す loc の2列目が California であるような loc の要素 20 /37

DCS Dependency-‐based ComposiXonal SemanXcs Basic DCS Trees DCS tree Constraints city c 2 city 1 1 c1 = `1 loc ` 2 loc 2 1 `2 = s1 CA s 2 CA Database city San Francisco Chicago Boston · · · loc Mount Shasta California San Francisco California Boston Massachusetts · · · · · · CA California 例: city in California earning Dependency-Based Compositional Semantics iii c 9m 9` 9s . city(c) ^ major(m) ^ loc(`) ^ CA(s)^ c1 = m1 ^ c1 = `1 ^ `2 = s1 (b) Lambda calculus formula 20 /37

DCS の特徴 ‣ 論理式は、自然言語と意味表現の間に大きなギャップがある ‣ DCS は文の係り受け構造にかなり似ている Challenges Computational: how to e ciently search exponential space? What is the most populous city in California? argmax( x.city(x) ^ loc(x, CA), x.population(x)) Los Angeles New: Dependency-Based Compositional Semanti most populous city in California 1 1 1 1 cc argmax population 2 1 CA loc city dency-Based Compositional Semantics (DCS) most populous city in California most populous California in city 21 /37

つまり… ‣ 論理式を（文、答え）のペアから導出するのはかなり厳しい -‐ 自然言語との乖離があるため、意味のある候補を探すことができない ‣ DCS は、（文、答え）のペアからでも学習できるほどにシンプルで、かつ十分な表現力を持つ意味表現 -‐ 文の木構造を反映した意味表現 -‐ 従って表現できる意味はラムダ計算のサブセット -‐ しかし、自然に出てくる文の意味を表現するのには十分？ -‐ 文の木構造と意味表現に透過性を持たせるための工夫： Mark-‐Execute 22 /37

Mark-‐ExecuteSolution: Mark-Execute most populous city in California Execute at semantic scope Mark at syntactic scope x1x1 1 1 1 1 cc argmax population 2 1 CA loc city ⇤⇤ Superlatives 9 rgence between Syntactic and Semantic Scope most populous city in California tax Semantics in y argmax( x.city(x) ^ loc(x, CA), x.population(x)) 23 /37

全量子化、Scope ambiguitySolution: Mark-Execute Some river traverses every city. Execute at semantic scope Mark at syntactic scope x12x12 2 1 1 1 qq some river qq every city traverse ⇤⇤ Quantification (narrow) Solution: Mark-Execute es every city. tic scope c scope x12x12 2 1 1 1 qq some river qq every city traverse ⇤⇤ Quantification (narrow) 9 surface scope Solution: Mark-Execute Some river traverses every city. Execute at semantic scope Mark at syntactic scope x21x21 2 1 1 1 qq some river qq every city traverse ⇤⇤ Quantification (wide) inverse scope ∃x.(river(x) ∧ ∀y.(city(y) → traverse(x, y))) ∀y.(city(y) → ∃x.(river(x) ∧ traverse(x, y))) 継続の shif/reset 操作と似ているらしい 24 /37

どのように学習するか？ ‣ CCG の場合と基本的に同じ -‐ DCS の構造については何も仮定しない -‐ 文の係り受け構造は使わない Words to Predicates (Lexical Semantics) city city state state river river argmax population population CA What is the most populous city in CA ? Lexical Triggers: 1. String match CA ) CA 2. Function words (20 words) most ) argmax 3. Nouns/adjectives city ) city state river population 機能語や一部の語は人手で正解を与える city in CA California ci5es Basic DC DCS tree Constraints city c 2 city 1 1 c1 = `1 loc ` 2 loc 2 1 `2 = s1 CA s 2 CA A DCS tree encodes a constraint sat Computation: dynamic programming 25 /37

どのように学習するか？ ‣ CCG の場合と基本的に同じ -‐ DP で全探索することができない -‐ Beam-‐search で k-‐best の木を抽出し、SGD で更新 Predicates to DCS Trees (Compositional Semantics) Ci,j = set of DCS trees for span [i, j] most populous city in California i jk Ci,k Ck,j cc argmax population 1 1 2 1 CA loc city 1 1 1 1 cc argmax population 2 1 CA loc city 26 /37

実験：GEO data ‣ 少し複雑な表現（接続詞、最上級、否定など）を含む ‣ 語彙は少ない -‐ 単語のタイプ数：280 -‐ 述語の数：48 what states does the ohio river run through (lambda $0 e (and (state:t $0) (loc:t ohio_river:r $0))) what states surround kentucky (lambda $0 e (and (state:t $0) (next_to:t $0 kentucky:s))) what is the capital of states that have ci6es named durham (lambda $0 e (and (capital:t $0) (exists $1 (and (state:t $1) (exists $2 (and (city:t $2) (named:t $2 durham:n) (loc:t $2 $1))) (loc:t $0 $1))))) which is the highest peak not in alaska (argmax $0 (and (mountain:t $0) (not (loc:t $0 alaska:s))) (elevaRon:i $0)) 訓練データ：論理式 or 答えとペアの文の集合 (600文) 27 /37

比較Experiment 2 On Geo, 600 training examples, 280 test examples System Description Lexicon Logical forms zc05 CCG [Zettlemoyer & Collins, 2005] zc07 relaxed CCG [Zettlemoyer & Collins, 2007] kzgs10 CCG w/uniﬁcation [Kwiatkowski et al., 2010] dcs our system dcs+ our system zc05 79.3% zc07 86.1% kzgs10 88.9% dcs 88.6% dcs+ 91.1% 75 80 85 90 95 100 testaccuracy 2328 /37

これまでのまとめ ‣ 教師あり QA に対する二つのアプローチ -‐ CCG 系：(文、論理式) のペアから、CCG のモデルを学習する -‐ DCS 系：(文、答え) のペアから、DCS のモデルを学習する ‣ DCS のほうが性能が高いが、語彙レベルで手がかりを与えないといけない -‐ CCG 系は、与えられた論理式から IBM モデルなどでチューニングできる ‣ 今後の展開 -‐ web スケールの QA システムへの拡張 (Freebase) -‐ CCG 系でも、論理式を直接与えずに学習ができるようになってきた -‐ QA 以外での DCS の活用 29 /37

web-‐scale の QA を行いたい Berant et al.’13 Kwiatkovski’13 Berant and Liang’14 What was the cover price of the X-men Issue 1? • Generate questions based on Freebase facts WebQuestions [our work]: 5,810 examples, 4,525 w What character did Natalie Portman play in Star Wars? What kind of money to take to Bahamas? What did Edward Jenner do for a living? • Generate questions from Google ) less formu ‣ これまでは比較的綺麗なデータを扱っていた（語彙も少ない） ‣ web のデータベースをもとに、システムをスケールさせることはできるか？ 30 /37

Freebase knowledge graph Berant et al.’13 Freebase knowledge graph BarackObama Person Type Politician Profession 1961.08.04 DateOfBirth HonoluluPlaceOfBirth Hawaii ContainedBy City Type UnitedStates ContainedBy USState Type Event8 Marriage MichelleObama Spouse Type Female Gender 1992.10.03 StartDate Event3 PlacesLived Chicago Location Event21 PlacesLived Location ContainedBy 9 BarackObama Person Type Politician Profession 1961.08.04 DateOfBirth HonoluluPlaceOfBirth City Type Event3 PlacesLived Chicago Location ContainedBy 41M entities (nodes) 19K properties (edge labels) 596M assertions (edges) SPARQL によってクエリを投げることができる 31 /37

何が難しいか？ ‣ 述語が多く、自然文との間にミスマッチが発生 -‐ GEO のように全ての述語を enumerate して学習することができない ‣ 使用すべき述語がドメイン依存 Type.Country Profession.Lawyer PeopleBornHere InventorOf ... ... Type.HumanLanguage Type.ProgrammingLanguage Brazil BrazilFootballTeam What languages do people in Brazil use alignment alignment 13 Berant et al.’13 sz}@cs.washington.edu tions (Chen and Mooney, 2011; Artzi and Zettle- moyer, 2013b), and generating programs (Kushman and Barzilay, 2013). In each case, the parser uses a predeﬁned set of logical constants, or an ontology, to construct meaning representations. In practice, the choice of ontology signiﬁcantly impacts learning. For example, consider the following questions (Q) and candidate meaning representations (MR): Q1: What is the population of Seattle? Q2: How many people live in Seattle? MR1: x.population(Seattle, x) MR2: count( x.person(x) ^ live(x, Seattle)) A semantic parser might aim to construct MR1 for Freebase ではこちらしか受け付けない 32 /37

DCS 系のアプローチ ‣ 機能が制限された DCS (basic λ-‐DCS) を用いている -‐ Mark-‐Execute などはいつの間にか消えている -‐ Freebase のクエリは単に知識を問うことしかできず、量子化などを表現する必要性がない（できない）から？ -‐ 熟語を選ぶ難しさが増したが、構造の導出はより簡単に？ Berant et al.’13 Berant and Liang’14 naries, ist u ersec- nt(u) K = star- would ma) ^ ise)); ma u K as Type.Location u PeopleBornHere.BarackObama Type.Location where was PeopleBornHere.BarackObama BarackObama Obama PeopleBornHere born ? join intersection lexicon lexicon lexicon Figure 2: An example of a derivation d of the utterance “Where was Obama born?” and its sub-derivations, each labeled with composition rule (in blue) and logical form (in red). The derivation d skips the words “was” and “?”. ily over-generates. We instead rely on features and 33 /37

CCG でも答えから学習する Kwiatkovski’13Domain Independent Parsing How many people live in Seattle S/(SNP)/N N SNP SS/NP NP f g x.eq(x, count( x.P(x) x ev.P(x, ev) x f9ev.P(ev, x) ^ f(ev) C y.g(y) ^ f(y))) > > S/(SNP) SS g x.eq(x, count( y.g(y) ^ P(y))) f9ev.P(ev, C) ^ f(ev) <B SNP x9ev.P(x, ev) ^ P(ev, C) > S x.eq(x, count( y.P(y) ^ 9ev.P(y, ev) ^ P(ev, C))) x.eq(x, count( y.9ev.people(y) ^ live(y, ev) ^ in(ev, seattle))) String labels signify source words, not semantic constants. ドメイン非依存の論理式を最初につくる CCG の語彙は学習しない（ある程度人手で与える） Constant Matches 2 Step Semantic Parsing How many people live in Seattle S/(SNP)/N N SNP SS/NP NP f g x.eq(x, count( x.people(x) x ev.live(x, ev) x f9ev.in(ev, x) seattle y.g(y) ^ f(y))) ^ f(ev) > > <B > S x.eq(x, count( y.9ev.people(y) ^ live(y, ev) ^ in(ev, seattle))) Domain Independent Parse Ontology Match x.eq(x, count( y.9ev.people(y) ^ live(y, ev) ^ in(ev, seattle))) x.how many people live in(seattle, x) x.how many people live in(seattle, x) Structure Match Constant Matches for . 2 Step Semantic Parsing How many people live in Seattle S/(SNP)/N N SNP SS/NP NP f g x.eq(x, count( x.people(x) x ev.live(x, ev) x f9ev.in(ev, x) seattle y.g(y) ^ f(y))) ^ f(ev) > > <B > S x.eq(x, count( y.9ev.people(y) ^ live(y, ev) ^ in(ev, seattle))) Domain Independent Parse Ontology Match x.eq(x, count( y.9ev.people(y) ^ live(y, ev) ^ in(ev, seattle))) x.how many people live in(seattle, x) x.how many people live in(seattle, x) x.population(seattle, x) Structure Match 論理式まで含めて隠れ変数として学習 34 /37

両者の差が小さくなっている？ ‣ 従来の CCG 系 -‐ 文と論理式のペアから学習する -‐ それ以外のチューニングは何もいらない（言語や論理体系にも非依存） ‣ Kwiatkowski et al.’13 -‐ CCG の導出は、人手である程度手がかりを与える（DCS と類似） -‐ 導出した論理式を、Freebase の表現に合うように確率的に書き換える -‐ クエリを投げて答えと一致していれば、それまでの過程を正解とみなす 35 /37

QA 以外での DCS と CCG ‣ CCG は広い範囲に使われだしている -‐ 入力に対する論理式（プログラム）を学習するような問題 -‐ 対話ログからの対話システムの構築 (ArX and Zeblemore’11) -‐ ロボットの誘導 (Artzi and Zeblemore’13)Modeling Instructions 1 2 3 4 5 1 2 3 4 5 go to the chair { } Events can be modiﬁed by adverbials a.move(a)^ to(a, ◆x.chair(x)) Modeling Instructions 1 2 3 4 5 1 2 3 4 5 go to the chair { } Events can be modiﬁed by adverbials a.move(a)^ to(a, ◆x.chair(x)) Artzi et al.’13現在位置この対応関係を得ることが目的・ただし論理式は直接与えられない・実行したらそれが正解がどうかが分かる 36 /37

QA 以外での DCS と CCG ‣ DCS の意味表示の実行は、データベースの存在に依存している -‐ データベース上での集合の直積によって意味が表現される ‣ Tian, Miyao and Matsuzaki’14 (ACL) -‐ DCS の枠組みを、含意関係認識に適用 -‐ データベースがなくても DCS を意味表示として用いることができる方法を示した (abstract denotaXon) ‣ CCG のほうが歴史が古い分、新しい問題にも適用しやすい？ -‐ DCS はよりシンプルで文の構造と親和性が高い -‐ どの表現がどの問題に対し、どれぐらい（なぜ）優れているのか 1: The DCS tree of “students read books” book ARG A Tale of Two Cities Ulysses ... read SUBJ OBJ Mark New York Times Mary A Tale of Two Cities John Ulysses ... ... 1: Databases of student, book, and read CS trees SUBJ have Tom animal OBJ ARG ARG love ARG OBJ SUBJ love Mary dog OBJ ARG ARG Tom SUBJ have dog OBJ ARG ARG Mary SUBJ ARG T: H: ⊂ Figure 2: DCS trees of “Mary loves every (Left-Up), “Tom has a dog” (Left-Down) “Tom has an animal that Mary loves” (Right responding words1. To formulate the dat querying process deﬁned by a DCS tree, we vide formal semantics to DCS trees by empl 37 /37

Reference (1) ‣ Yoav Artzi and Luke S ZeHlemoyer (2011). Bootstrapping seman_c parsers from conversa_ons. In EMNLP. ‣ Yoav Artzi and Luke S ZeHlemoyer (2013). Weakly Supervised Learning of Seman_c Parsers for Mapping Instruc_ons to Ac_ons. In TACL. ‣ Yoav Artzi, Nicholas FitzGerald, and Luke ZeHlemoyer (2013). Seman_c Parsing with Combinatory Categorical Grammars. In ACL tutorial. ‣ Jonathan Berant, Andrew Chou, Roy Fros_g, and Percy Liang (2013). Seman_c Parsing on Freebase from Ques_on-‐Answer Pairs. In EMNLP. ‣ Jonathan Berant and Percy Liang (2014). Seman_c parsing via paraphrasing. In ACL. ‣ Tom Kwiatkowski, Luke S ZeHlemoyer, Sharon Goldwater, and Mark Steedman (2010). Inducing probabilis_c CCG grammars from logical form with higher-‐order uniﬁca_on. In EMNLP. ‣ Tom Kwiatkowski, Luke S ZeHlemoyer, Sharon Goldwater, and Mark Steedman (2011). Lexical generaliza_on in CCG grammar induc_on for seman_c parsing. In EMNLP.

Reference (2) ‣ Tom Kwiatkowski, Sharon Goldwater, Luke S ZeHlemoyer, and Mark Steedman (2012). A probabilis_c model of syntac_c and seman_c acquisi_on from child-‐directed uHerances and their meanings. In EACL. ‣ Tom Kwiatkowski, E Choi, Y Artzi, and Luke S ZeHlemoyer (2013). Scaling seman_c parsers with on-‐the-‐ﬂy ontology matching. In EMNLP. ‣ Percy Liang, Michael I Jordan, and Dan Klein (2011). Learning dependency-‐based composi_onal seman_cs. In ACL. ‣ Percy Liang, Michael I Jordan, and Dan Klein (2013). Learning dependency-‐based composi_onal seman_cs. In ComputaBonal LinguisBcs. ‣ Cynthia Matuszek, Nicholas FitzGerald, Luke S ZeHlemoyer, Liefeng Bo, and Dieter Fox (2012). A Joint Model of Language and Percep_on for Grounded AHribute Learning. In ICML. ‣ Ran Tian, Yusuke Miyao, and Takuya Matsuzaki (2014). Logical Inference on Dependency-‐ based Composi_onal Seman_cs. In ACL.

Reference (3) ‣ Luke S ZeHlemoyer and Michael Collins (2005). Learning to Map Sentences to Logical Form: Structured Classiﬁca_on with Probabilis_c Categorial Grammars. In UAI. ‣ Luke S ZeHlemoyer and Michael Collins (2007). Online learning of relaxed CCG grammars for parsing to logical form. In EMNLP.

意味表現の学習

by nozyh

on Jun 07, 2014

Statistics

Views

Actions

1 Embed 8

Accessibility

Categories

Upload Details

Usage Rights

Report content

意味表現の学習 Presentation Transcript