<aside> 💡
Multiple linear regression predicts a value using more than one input (feature). Each feature has its own weight; the model is a weighted sum of features plus a bias.
</aside>


$$ \begin{aligned} \hat{y} &= w_0 + w_1 x_1 + w_2 x_2 + \dots + w_p x_p \end{aligned} $$
$x_1,x_2,\dots,x_p$ are the features (e.g., square feet, #bathrooms, #bedrooms).
$w_0$ is the bias (intercept); $w_1,\dots,w_p$ are feature weights.
We train the model by minimizing a loss (commonly MSE) and updating weights (e.g., gradient descent).
$w_{\text{new}} = w_{\text{old}} - \alpha \,\frac{\partial}{\partial w}\text{MSE}$
More features can improve predictions but can also cause overfitting if the model is too complex.
Practical tips: scale numeric features, consider regularization (L1/L2) to reduce overfitting.