论文记录:Translation-based Recommendation
Inspired by graph embedding, which models transition(edge) embedding as a vector.
Basic idea is to model user’s sequential item-selection behavior as embedding vectors’ addition, where i, j are user-chosen item and the next to-predict item, u is user.

In order to solve cold-start problem, this paper replace tu with:

Here, t is average embedding of all users, tu is personalized offset (similar to residual). When a new user appears in recommendation system, tu is 0, Tu = t is used.
Similarity between γi + tu and γj :
In order to make items in neighborhood of γi + tu near to each other, triangle inequality should be kept. Therefore, distance definition used in this paper is L1 or L2. Inner products of vectors cannot be used for its violation to triangle inequality.
Prob of next item j, given user u and current user-selected-item i,

Here, a personalized bias βj is used to represent item-specific bias, which is useful for conquer the power-law data distribution problem.
The other point worth attention here is limiting embedding vector γi to a subspace Ψ of Φ, which is a means of relieving curse of dimensionality issues. (e.g. Ψ is unit L2-ball then γi should be apply transformation of γi / max(1, || γi ||))
Rank Opitimization :

Probability of order (j > j’) for user chosen item j and non-chosen j’ is estimated by sigmoid function. In training process, j and j’ are randomly selected from samples(SGD), and Ω(Θ) is an L2 regularizer.