What ?

Traditionally, we refer to using either  max pooling, or 1 x 1 convnet or 3 x 3 or 5 x 5 convnet. All these gather different information. Instead of using these one at a time in a layer, what if we could use all of these in a single input layer, then we create more better models.

For example, in the image  below, we use the following on a single Input layer

  1. Average pooling
  2. 1  x 1 Convolution Network
  3. 1 x 1 Convolution Network followed  by 3 x 3 Convolution Network
  4. 1 x 1 Convolution Network followed  by 5 x 5 Convolution Network

And on the top we concatenate all these four. In inception, we can choose the parameters in such a way that despite having all these numerous ConvNets, that the total number of hyper-parameters are simple, thus reducing the chance of over-fitting, while performing significantly better than the conventional convolution networks.

Convolution Network - Inception Modules.JPG

Fig Inception Module REF: Udacity