MACHINE LEARNING

1 Duality (40 points)

In class, we have talked the maximum entropy model. For learning the posterior probabilities Pr(ylx) = p(yjx) for y = 1, ,K given a set of training examples (xi, yi), i = 1, , n, we can maximize the entropy of the posterior probabilities subject to a set of constraints, i.e., K p141r) — E Egylmingyko (ml Ual 8.t. Ep(ylxi) = 1 (1)

ysal f (xi) = P(ci) f (xi) , i = 1, , d, y = 1,…, K, n n where d(y, yi) = 1 if yi = y and 0 otherwise, and Mx’) is a feature function. Let us consider f3 (xi) = [xi], i.e., the j-th coordinate of xi.  Derive the dual of the above Maximum Entropy Model. How is this dual problem related to the logistic regression?