LR with basis
- Lecture: S1-LRSelectExtend
- Version: current
- Please to Read: L5_Poly_Regression.ipynb
- Recorded Videos: M1 + M2
- Advanced to Read: S1-nonparametric
- Extra Notes to Read: NonLinearR + ELS Ch5
Att: the following markdown text was generated from the corresponding powerpoint lecture file automatically. Errors and misformatting, therefore, do exist (a lot!)!
- 
    Notebook Resources: notebook/L5-Poly-Regression.ipynb 
- 
    Notebook Resources: notebook/L5-RBF-regressionLR.ipynb 
Understanding Linear Regression with Basis Functions Expansion
Study Guide
This study guide is designed to review your understanding of Linear Regression with Basis Functions Expansion, as presented in the provided lecture material.
I. Core Concepts of Linear Regression
Definition of Regression
What is the primary characteristic of the y variable in a regression task?
Goal of Linear Regression with Basis Expansion
How does Linear Regression (LR) achieve the ability to model non-linear relationships using basis functions?
Key Components of a Machine Learning Task
Identify and briefly describe the four fundamental components mentioned for any machine learning task:
- Task: What needs to be predicted
- Representation: How data and functions are structured
- Score Function: Metric to evaluate model performance
- Search/Optimization: Algorithms to find optimal parameters
Parameters vs. Models
Differentiate between “models” and “parameters” in the context of machine learning.
Optimization Methods
List the three primary methods mentioned for optimizing the Sum of Squared Error:
- Normal Equation
- Gradient Descent (GD)
- Stochastic Gradient Descent (SGD)
“KEY” Insight
Explain the fundamental implication of having predefined basis functions on the nature of the learning problem.
II. Basis Functions Expansion
Purpose of Basis Functions
Why are basis functions introduced in linear regression?
Types of Basis Functions
Name at least three different types of basis functions mentioned in the lecture:
- Polynomial basis functions
- Radial Basis Functions (RBFs)
- Trigonometric basis functions
Polynomial Regression
What is the specific form of the basis function φ(x) for polynomial regression up to degree two (d=2)?
For degree 2: φ(x) = [1, x, x²]ᵀ
How is the prediction ŷ formulated using polynomial basis functions?
ŷ = θᵀφ(x) = θ₀ + θ₁x + θ₂x²
Radial Basis Functions (RBFs)
Define what an RBF is in terms of its dependency. RBFs are functions whose output depends only on the distance from a central point.
Describe the characteristic behavior of a Gaussian RBF as distance from its center increases. The output decreases exponentially as the distance from the center increases, creating a bell-shaped curve.
What are the “hyperparameters” that users need to define for RBF basis functions?
- Centers (μⱼ): The central points where RBFs are located
- Widths (λⱼ): Parameters controlling the spread of the RBF
Provide the general mathematical form of a Gaussian RBF, specifically φⱼ(x).
| φⱼ(x) = exp(-λⱼ | x - μⱼ | ²) | 
III. Parametric vs. Non-Parametric Models
Parametric Learning Algorithm
Define a parametric learning algorithm, using (unweighted) linear regression as an example. What is a key characteristic regarding the need for training data after fitting?
Parametric algorithms have a fixed number of parameters that are learned from data. Once trained, the training data can be discarded as predictions are made using only the learned parameters.
Non-Parametric Learning Algorithm
Define a non-parametric learning algorithm, providing examples like K-Nearest Neighbor (KNN) or Locally Weighted Linear Regression. How does the “amount of knowledge” required to represent the hypothesis differ from parametric models?
Non-parametric algorithms require the entire training dataset to be kept for making predictions. The amount of knowledge grows with the size of the training set.
IV. Mathematical Notations and Formulas
General Prediction Equation
Write down the general equation for ŷ (predicted output) using basis functions φ(x) and weights θ.
ŷ = θᵀφ(x)
Normal Equation
Provide the formula for calculating the optimal weight vector θ* using the Normal Equation, given φ(x) and y.
θ* = (ΦᵀΦ)⁻¹Φᵀy
where Φ is the design matrix with rows φ(xᵢ)ᵀ
Quiz: Linear Regression with Basis Functions
Instructions: Answer each question in 2-3 sentences.
- 
    How does Linear Regression with basis functions expansion allow for the modeling of non-linear relationships, despite being fundamentally a “linear” model? 
- 
    What is the significance of the “KEY” insight mentioned in the lecture regarding predefined basis functions? 
- 
    Describe the primary goal of the “Search/Optimization” component in a machine learning task, as it relates to the score function. 
- 
    If you are performing polynomial regression with a degree up to three, what would be the form of the basis function vector φ(x)? 
- 
    Explain why Radial Basis Functions (RBFs) are often described as “bell-shaped” or having an output that decreases with distance from a center. 
- 
    What role do the “centers” and “widths” play in defining Radial Basis Functions? 
- 
    Differentiate between the “Task” and “Representation” components of a machine learning problem. 
- 
    Briefly explain the main difference in how parametric and non-parametric learning algorithms utilize training data after the initial fitting phase. 
- 
    Which optimization methods, other than the Normal Equation, are mentioned for finding the optimal regression coefficients? 
- 
    In the context of the general prediction equation ŷ = θᵀφ(x), what do θ and φ(x) represent, respectively? 
Quiz Answer Key
- 
    Non-linear Modeling: Linear Regression with basis functions models non-linear relationships by transforming the input features x into a higher-dimensional space using non-linear basis functions φ(x). The relationship between the transformed features φ(x) and the output y remains linear, allowing linear regression techniques to be applied. 
- 
    KEY Insight: The “KEY” insight is that even when non-linear basis functions are used, the problem of learning the parameters from the data is still considered Linear Regression. This is because the model is linear with respect to the parameters θ, even if it’s non-linear with respect to the original input x. 
- 
    Search/Optimization Goal: The “Search/Optimization” component’s primary goal is to find the set of model parameters that minimize (or maximize, depending on the objective) the score function. This process iteratively adjusts the parameters to achieve the best possible fit to the training data according to the defined score. 
- 
    Polynomial Basis Vector: For polynomial regression with a degree up to three, the basis function vector φ(x) would be [1, x, x², x³]ᵀ. This expands the single input feature x into a set of features including its powers. 
- 
    RBF Bell Shape: RBFs are “bell-shaped” because their output typically reaches a maximum at a central point and then decreases symmetrically as the input moves further away from that center. For Gaussian RBFs, this decay is exponential, creating a characteristic bell-like curve. 
- 
    RBF Centers and Widths: In RBFs, “centers” (μⱼ) define the point in the input space where the basis function’s influence is maximal, while “widths” (λⱼ) determine how rapidly the function’s output decreases as the input moves away from that center, thus controlling the spread or local influence of the RBF. 
- 
    Task vs Representation: “Task” defines what needs to be predicted (e.g., y is continuous for regression). “Representation” refers to how the input data x and its potential transformations f() are structured for the model, essentially defining the features used for prediction. 
- 
    Parametric vs Non-parametric Data Usage: Parametric algorithms, once trained, can discard the training data as predictions are made solely using the learned, fixed parameters. Non-parametric algorithms, in contrast, require the entire training dataset to be kept available for making future predictions, as their “knowledge” grows with the data size. 
- 
    Other Optimization Methods: Besides the Normal Equation, the lecture also mentions Gradient Descent (GD) and Stochastic Gradient Descent (SGD) as optimization methods for finding optimal regression coefficients by minimizing the Sum of Squared Error. 
- 
    Prediction Equation Components: In ŷ = θᵀφ(x), θ represents the vector of regression coefficients (weights) that the model learns from the data. φ(x) represents the basis function expansion of the input x, which transforms the original features into a potentially higher-dimensional space.