46.
Global Vector Prediction (GloVe)
•skip-gramとCBOWは一文単位でコーパスを処理し,意味表現学習していく.
•オンライン学習
•大規模なコーパスからでも容易に(処理時間が文数と線形に増加,分割による並列化可能)できる.
•一方,分布表現ではコーパス全体から共起行列を作っている.
•コーパス全体の共起情報を使って意味表現学習ができないか.
•Pennington+ EMNLP 14, Global Vectors for Word Representation
46
予測誤差 バイアス項 共起頻度
内積と共起の対数間の自乗誤差を最小化している.
理由:ベクトルの引き算で意味を特徴付けたい.
ポイント:log(a/b) = log(a) - log(b)
J =
|V|
X
i,j
f(Mi,j) v(xi)>
v(xj) + bi + bj log Mi,j
2
頻度による重み
47.
重み関数f
47
ex-
the
pen-
for
˜wk
(7)
(1),
m di-
olu-
ft in
hich
e di-
the
and
0.2
0.4
0.6
0.8
1.0
0.0
Figure 1: Weighting function f with ↵ = 3/4.
The performance of the model depends weakly on
the cutoff, which we fix to xmax = 100 for all our
experiments. We found that ↵ = 3/4 gives a mod-
est improvement over a linear version with ↵ = 1.
Although we offer only empirical motivation for
choosing the value 3/4, it is interesting that a sim-
ilar fractional power scaling was found to give the
低頻度の共起を強調しないで欲しい(PMIでは問題)
頻度に対し,ほぼ線形に増加して欲しい.
ある閾値以上の共起に対し,ほぼ定数になって欲しい.
条件:
3. f(x) should be relatively small for large va
ues of x, so that frequent co-occurrences a
not overweighted.
Of course a large number of functions satisfy the
properties, but one class of functions that we foun
to work well can be parameterized as,
f(x) =
⇢
(x/xmax)↵ if x < xmax
1 otherwise .
(
Be the first to comment