Completed • $15,000 • 1,604 teams

Click-Through Rate Prediction

Tue 18 Nov 2014

– Mon 9 Feb 2015 (54 days ago)

All Forums » Click-Through Rate Prediction

4 Idiots' Solution & LIBFFM

Start Watching View all posts

« Prev
Topic » Next
Topic

<1 23>

votes

Congratulations to Yu-Chin, Wei-Sheng, Yong and Michael!

There have been several questions about the relationship between FM and FFM. Here are my thoughts about the differences and similarities.

Notation

m categorical variables (="fields")
k is the factorization dimension of FM
k' is the factorization dimension of FFM

Models (slightly simplified)

FM is defined as

y(x) = sum_i sum_j>i 〈v_i,v_j〉 x_i x_j

FFM is defined as

y(x) = sum_i sum_j>i 〈v^J_i,v^I_j〉 x_i x_j

The difference between both models is that FFM assumes that the factors between interactions (e.g. v_i of (I,J) and v_i of (I,L)) are independent whereas FM uses a shared parameter space.

Number of parameters and costs

FFM has k' * (m-1) parameters per predictor variable.
FM has k parameters per predictor variable.
FFM has a runtime complexity of k' * m * (m-1) / 2 = O(k' * m^2) per training example
FM has a runtime complexity of k * m = O(k * m) per training example (because the nested sums can be decomposed due to parameter sharing).

That means from a cost point of view, an FFM with dimensionality k' should be compared to an FM with an m times larger dimension, i.e. k=k'*m. With this choice both FFM and FM have the same number of parameters (memory costs) and the same runtime complexity.

Expressiveness

FFM and FM have different assumptions on the interaction matrix V. But given a large enough k and k', both can represent any possible second order polynomial model.

The motivation of FM and FFM is to approximate the (unobserved) pairwise interaction matrix W of polynomial regression by a low rank solution V*V^t = W. FM and FFM have different assumptions how V looks like. FFM assumes that V has a block structure:

| v^2_1 v^1_2 0 0 0 | | v^3_1 0 v^1_3 0 0 | | v^4_1 0 0 v^1_4 0 | V(FFM) = | v^5_1 0 0 0 v^1_5 | | 0 v^3_2 v^2_3 0 0 | | 0 v^4_2 0 v^2_4 0 | | ... |

FM does not assume such a structure:

V(FM) = | v_1 v_2 v_3 v_4 v_5 |

(Note that v are not scalars but vectors of length k' (for FFM) or of length k (for FM). Also to shorten notation, one entry v in the matrices above represents all the v vectors of a "field"/ categorical variable.)

If the assumption of FFM holds, then FFM needs less parameters than FM to describe V because FM would need parameters to capture the 0s.

If the assumption of FM holds, then FM needs less parameters than FFM to describe V because FFM would need to repeat values of vectors as it requires separate parameters.

#41 | Posted 11 hours ago

Permalink

Steffen Rendle

Posts 21 | Votes 36
Joined 30 Nov '11 | Email User

0 votes	In case if somebody is interested I have posted the R slidify presentation about my solution: http://mech.math.msu.su/~efimov/2015_04_ICMS_Click/html/index.html#/slide-1
	#42 \| Posted 3 hours ago Permalink Dmitry Efimov Competition 5th \| Overall 11th Posts 101 \| Votes 146 Joined 12 Jan '12 \| Email User

<1 23>

Reply

Flagging is a way of notifying administrators that this message contents inappropriate or abusive content. Are you sure this forum post qualifies?

Start Watching View all posts « Back to forum

Completed • $15,000 • 1,604 teams

Click-Through Rate Prediction

Dashboard

Competition Forum

4 Idiots' Solution & LIBFFM

Reply