Polynomial regression A common misconception is that linear regression can only be used to fit a linear relationship. We can fit
more complicated functions of the explanatory variables by defining new features that are functions of the existing features. A common class of models is the polynomial, with a d-th degree polynomial being of the form ĝa(x) = a dvd + ad-1.2d-1 + ... +212 + b with the d + 1 parameters ß = (a_d, ..., a_1, b)^t. So d = 1 corresponds to a line, d = 2 to a quadratic, d = 3 to a cubic, and so forth. In this problem, you will build a series of functions that fit polynomials of different degrees to a dataset. You will then use this to determine the best fit to a dataset by comparing the models from different degrees visually against a scatterplot of the data, and make a prediction for an unseen sample. More specifically: 1. Complete the functions in polyfit.py, which accepts as input a dataset to be fit and polynomial degrees to be tried, and outputs a list of fitted models. The specifications for the main, feature matrix, and least_squares functions are contained as comments in the skeleton code. The key steps are parsing the input data, creating the feature matrix, and solving the least squares equations. 2. Use your completed polyfit.py to find fitted polynomial coefficients for d = 1,2,3,4,5 on the poly.txt dataset. Write out the resulting estimated functions ya(w) for each d. 3. Use the scatter and plot functions in the matplotlib.pyplot module to visualize the dataset and these fitted models on a single graph (i.e., for each x, plot y, ĝi(x), ..., (x)). Be sure to vary colors and include a legend so that each curve can be distinguished. What degree polynomial does the relationship seem to follow? Explain. 4. If we measured a new datapoint x = 2, what would be the predicted value ŷ of y (based on the polynomial identified as the best fit in Question 3)? Note that in this problem, you are not permitted to use the sklearn library. You must use matrix operations in numpy to solve the least squares equations. Once you have completed polyfit.py, if you run the test case provided, it should output: [array( [ -1.15834068, 22.60822925, 100.79905593]), array( [-1.43365571e-02, 1.66770942e+00, -9.05694 import numpy as np #Return fitted model parameters to the dataset at datapath for each choice in degrees. #Input: datapath as a string specifying a .txt file, degrees as a list of positive integers. #Output: paramFits, a list with the same length as degrees, where paramFits[i] is the list of #coefficients when fitting a polynomial of d = degrees[i]. def main(datapath, degrees): paramFits = 1 #fill in #read the input file, assuming it has two columns, where each row is of the form [x y] as #in poly.txt. #iterate through each n in degrees, calling the feature_matrix and least_squares functions to solve #for the model parameters in each case. Append the result to paramFits each time. return paramFits #Return the feature matrix for fitting a polynomial of degree d based on the explanatory variable #samples in x. #Input: x as a list of the independent variable samples, and d as an integer. #Output: X, a list of features for each sample, where x[i][j] corresponds to the jth coefficient #for the ith sample. Viewed as a matrix, X should have dimension #samples by d+1. def feature_matrix(x, d): #fill in #There are several ways to write this function. The most efficient would be a nested list comprehension #which for each sample in x calculates x^d, x^(d-1), ..., x^0. return X #Return the least squares solution based on the feature matrix X and corresponding target variable samples in y. #Input: X as a list of features for each sample, and y as a list of target variable samples. #Output: B, a list of the fitted model parameters based on the least squares solution. def least_squares(x, y): X = np.array(X) y = np.array(y) #fill in #Use the matrix algebra functions in numpy to solve the least squares equations. This can be done in just one line. return B if name _main__': datapath = 'poly.txt' degrees = [2, 4] paramFits = main(datapath, degrees) print(paramFits)