Homework Assignment #2
Through this homework assignment, you will refresh on the concepts of derivatives and convexity, and will practice on programming of logistic regression for binary classification. For Problems 1-3, you may write the solutions by hand, and then scan them into a pdf file using a digital scanner. Photo imaging is not accepted. Typing the solution is optional. For Problem 4, you will need to type the report, and submit the source code as well. Except for the source code, please combine all your solutions into one pdf file. and submit online via Blackboard. Submit your homework solution, including both Python code for Problem 4 and a single PDF file for all problems. Problem 1. Logistic Regression
A) Which of these is the logistic loss” function (for the i-th sample)? (a) LOVV/Y(0) = (no logry(n) + -Y(0)1.61 -AO (b) 1400,^01Y(0) = -A01 (c) Lo(1fnyto) = Irv) -Ao12 (4) 44/WM = max(0,y(n-Y44 ram (400010)
B) Derive de , where L(000,),9 is the logistic loss function, ‘)A0 = °(WO) h) and 6,0 is the logistic function. Show your steps. b; X. y)
C) The loss function of logistic regression ET, LW (IP), F(•)). Derive -(Pi’.and mAl eb Show your steps. Problem 2. Write the derivatives /(x) of the following functions of x A) (x). 1r. B) (x) = log(14- rrz) C) R.) = W. where x E FO and a = [2,3,1],
D) f(x) = xrAx where x E W and A= [2,3;1,2] E E) R.) = (y – ‘Ja)y where x E = 1 and a = [2,3,1],Problem 3. The softplus function is given by f(r) = log(1 o A) Plot/(z) as a function of z. If you plot by hand, mark a couple of critical points on the curve. B) Show that /(z) is convex in z by showing that its second derivative is positive for all.
Problem 4. Logistic Regression with Real Dataset
The Spambase Data Set contains email spam data for 4601 email messages. You can download the data from https://archiveics.ucledu/m1/datasets/spambase.
(a) Divide the data into twining set and test set. The training set should contain the first 2/3 of spam messages and first 2/3 of ham (i.e., non-spam) messages. The test set should contain the last 1/3 spam messages and last 1/3 ham messages.
(b) Write a logistic regression program (function) using gradient descent algorithm. And train the weights using training set and then test the result on the test set. Compute both the training and test errors, measured by the percentage of incorrect classification. Experiment with the step size (learning rate).
(c) Next, normalize the features, so that each feature in the training data has mean 0 and variance 1. Then run logistic regression on the normalized data.
(d) For the report, you need to submit in the homework report a summary of the results you obtained,the results on the learning rate used, and training and test errors.
Not. You need to write the code using Python. You will be graded based on two artifacts: 1) the figures generated (need to be included in the submitted homework report), and 2) the source code submitted.