Technical Terms

  1. Tensor : A tensor is a n-dimensional array .E.g for mnist image , e,g  [200, 784] array of 200 images of 28*28 pixels flattened will be a tensor

a tensor.JPG

Fig. Showing a Input Feature Tensor for MNIST

2. Graph : A collection of interacting operations in Tensor.

Why Graph ? : In python for efficient numerical computation, we use libraries such as Numpy which perform expensive operations such as matrix multiplications outside outside python and give us speed.

However what shall we do for,  large data-sets and deep learning  models, which require extensively heavy numeric operation, could we use Numpy ? Unfortunately Numpy is not enough because, because the I/O operations to transfer data to and from the Python will be much greater and it slows the speed down. And it is especially bad when we want to run operation across GPUs  or in a distributed manner.

Hence to solve the problem, tensorflow  does so with graphs, which lets it export the  expensive numeric operations outside the python in batches.

Tensor Basics

  1. Symbolic Arrays / Variables : Variables / Nd-array that are provided during the runtime. For example in MNIST example, take any number of MNIST images for processing, as provided by the  users.

Is defined as the placeholder

x = tf.placeholder(tf.float32, [None, 784])

Her None means that the input can be any number of images, but that each  images must be of the shape 784 i.e 28 * 28 linearly flattened.

2. Variables : A variable is a modifiable tensors. It can be used and modified during computation. For example, weights and biases

W = tf.Variable(tf.zeros([784, 10]))
b = tf.Variable(tf.zeros([10]))

3. Cross- Entropy : Cross entropy measure  how inefficient our prediction is vs the actual.

-\sum y'\log(y)
cross_entropy = tf.reduce_mean(-tf.reduce_sum(y_ * tf.log(y), reduction_indices=[1]))

Optimisers:  We ask the TensorFlow to minimize cross_entropy using gradient descent algorithm with a learning rate of 0.5′

 train_step = tf.train.GradientDescentOptimizer(0.5).minimize(cross_entropy)

Initialise variables : Before we can run it, we need to initialise the variables which is done as

 init = tf.global_variables_initializer()

Launch model  :  Now we can launch the model in a session and run it as

sess = tf.Session()
sess
.run(init)

Training Iteration :  Since we have such a large set of data, it will not be feasible to run the whole dataset through  training. In addition to complexity, it also may lead to overfitting, hence we in turn rather run  many iterations, with smaller batches of data. Using small bathces of random data is called stochastic learning – also called stochastic gradient descent. This approach is cheap and has same beneift and furtehr more, also produces robust model i.e not ovrfitting models.

 Here we run training 1000 times, with 100 random data taken during each batch.
for i in range(1000):
  batch_xs, batch_ys = mnist.train.next_batch(100)
  sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys})

Explained in terms of MNIST example

Step 1 : Plan the Tensor implementation

  1. Determine Input :  Model the input as tensor as in figure above

a tensor.JPG

  1. Determine Output : Determine the  desired output as in below

mnist-train-ys

Fig. MNIST one hot encoded output

3. Training Model :

  •  1  regression output, y[1,1] =   train[1,784] * weight [784,1]
  • 10 class  output . y[10,1] =   train[10,784] (data same across 10 rows) * weight [784,10] (weight diff across 10 rows)

softmax-regression-vectorequation

2. Layers : Layers can be  any networks and also the soft-max functions. For example in case of the  digit recognition,  if we want to know the probability of digits across 0 and 9, then we will need to add the Soft-max function as  the last layer to get so.

Advertisements