Why ?

Having large number of features generally helps enhance the  model accuracy. For example, persons height to compute the weight vs

persons height and BMI to compute the Weight. However have too much of features introduces another complexity, which is known as the “Curse of the Dimensionality”. The more features causes some data mining algorithms to not converge at all. In addition as the dimension size increases, it  requires more parameter for the model to be understood and hence ultimately it means we need more rows of data to compute.

Lets take an simple example of a regression model,

y = ax1+ b where x1  = BMI /  height

y =  weight

In this example, a single data set will be able to yield a model for us. But lets imagine that this x1 represented the BMI to height ration,  now this parameter “x1” could also have been  represented individually as

y = c x2 +dx3 +b where x2 = BMI

x3 = height

Now to compute the same weight, now we need two data points vs the one data point in the earlier case.

Further more, the fact that two  parameter are involved implies that the gradient descent or the learning   rate will have to be tweaked for the two parameters, which increases the model learning speed and the compelxity. And often in worst case, with large number of parameter to learn the learning model may find it extremely difficult or impossible to learn all those parameters as well.