Implementing SGD(Stochastic Gradient Descent) for Linear Regression



Loss function
  1. Initially assume m = 0 and c = 0. Let L be our learning rate (a tuning parameter in an optimization algorithm that determines the step size at each iteration while moving toward a minimum of a loss function). L could be a small value like 0.0001 for good accuracy.
  2. Calculate the partial derivative of the loss function with respect to m, and plug in the current values of x, y, m and c in it to obtain the derivative value D.
Derivative of loss function with respective to m
Derivative of loss function with respective to c
  • we update the current value of m and c using the equations.
  • Repeat this process until our loss function is a small value(which means 0 error ,100%accuracy). The value of m and c that we are left with now will be the optimal values.


  1. Stochastic Gradient Descent is sensitive to feature scaling, so it is highly recommended to scale your data.
  2. For example, scale each attribute on the input vector X to [0,1] or [-1,+1], or standardize it to have mean 0 and variance 1.
  3. Note that the same scaling must be applied to the test vector to obtain meaningful results. This can be easily done using StandardScaler.
print("R^2 score for
our model :", r2_score(Y_test, pred_test))
out[4]:R^2 score for our model : 0.7226745354368981

