Loss Function:

Loss functions are used to optimize the parameters of the model at the next data run. In the next data run, the weight parameters are tweaked such that cumulative error as computed by the loss function is lesser than in the previous run. In our particular case, mean_squared_error has been used, as we want to have a model that does not generate predictions which deviate largely away from the actual predictions. Large deviations or differences will imply that either we will overstock on the product or understock on the product, which in either of the case is very bad for the business. We prefer slight inconsistencies over large prediction differences, which is exactly what the mean squared errors does as it  penalizes heavily, when there are large differences for actual and predicated demand value.