Question: Is deep learning different from the Logistic regression units?
Answer: Well in reality a deep learning model can be though of as nothing new but a extension / better representation of the “ML model Stacking” concept. It is fundamentally a stacked logistic regression unit.

Question : Why is deep learning model doing so good?
Answer: Well previous manually designed features are often either incomplete, partial or are over- specified and take a long time to design and validate. While with deep learning, the machine learns the features by itself, as long as we give it the basic building block of anything whether it be image, text or anything. Hence is much more flexible and correct.

Question: When was the first breakthrough for the deep learning model?
Answer: Well, the the first time deep learning model shine was in 2010 with the speech recognition at University of Toronto by Jeff Hinton. Previous to that people were using Gaussian Mixture models , Hidden Markov Model and many more complex techniques and people were honing the skills to change the accuracy by few percentage points a year. However with deep learning, they got a 33% increase in speech recognition which was pretty pretty huge.

Question : When was the next deep learning breakthrough?
Answer: The next breakthrough was in the image recognition challenge, where they got a 37% error reduction in the image net challenge, which saw the performance stalled at a constant rate for a long period of time.

Deep learning is often characterized by  the use of  many layers of input  for feature extraction and transformation. It takes the input performs feature extraction as necessary and yields the output.

Across various problem domain ranging from time series analysis to image recognition to speech recognition, often what varies is the composition of the layer.

In deep learning unlike in the shallow learning methods, the input is transformed through each layers of the deep learning network. During these each layers the layer learn a particular trait of the data, which is then ultimately combined across layers and used to make the final predictions.

Drop Out In Deep Learning :

It is  one of the  most important process of regularization. With this implemented we can be assured that  our deep learning model is not over-fitting. In this process, we  process all the inputs then kill the half of the outputs of the layer to zero on a random basis.

It ensures that the learnt model is robust and not over fitting.

If drop out does not work then we should probably try the bigger network.