Tensorflow MNIST convolutional neural networks
Tensorflow MNIST Convolutional Network Tutorial
This project is another tutorial for teaching you Artificial Neural Networks. I hope that my way of presenting the material will help you in long learning process. All the examples are presented in TensorFlow and as a runtime environment I choose the online python IDE - PLON.io. PLON makes much easier to share this tutorial with you and run the computations online without any configuraion.
Project presents four different neural nets for MNIST digit classification. The former two are fully connected neural networks and latter are convolutional networks. Each network is build on top of previous example with gradually increasing difficulty in order to learn more powerful models.
Tensorflow neural network examples
- simple single layer neural network (one fully-connected layer),
- 5 layer Fully-connected neural network (5 FC NN) in 3 variants
- convolutional neural netowork: 3x convNet+1FC+output - activation function sigmoid
- convolutional neural netowork with dropout, relu, and better weight initialization: 3x convNet+1FC+output
Single layer neural network
This is the simplest architecture that we will consider. This feedforward neural network will be our baseline model for further more powerfull solutions. We start with simple model in order to lay Tensorflow foundations:
- How to work with placeholders?
- What tensorflow varibales are?
- Understanding Tensorflow shapes and dimensions.
- Initalize TF session
- Run computations in a loop.
This is simple one layer feedforward network with one input layer and one output layer
- Input layer 28*28= 784,
- Output 10 dim vector (10 digits, one-hot encoding)
input layer - X[batch, 784] Fully connected - W[784,10] + b One-hot encoded labels - Y[batch, 10]
Y = softmax(X*W+b) Matrix mul: X*W - [batch,784]x[784,10] -> [batch,10]
Training consists in finding good W elements, this is handled automaticaly by Tensorflow Gradient Descent optimizer.
This simple model achieves 0.9237 accuracy
Five layers fully-connected neural network
This is upgraded version of previous model, between input and output we added five fully connected hidden layers. Adding more layers makes network more expressive but in the same time harder to train. The three new problems could emerge: vanising gradients, model overfitting and computation time complexity. In our case where the dataset is rather small, we did not see those problems in real scale.
In order to deal with those problems, different training techniques was invented. Changeing from sigmoid to relu activation function will prevent vanising gradients, chosing Adam optimizer will speed up optimization and in the same time shorten training time, adding dropout will help with overfitting.
This model was implemented in three variants, where each successive variant builds on previous one and add some new fatures:
- Variant 1 is simple fully connected network with sigmoid activation fucntion and Gradient descent optimizer
- Variant 2 use more powerful RELU acitivation function instead sigmoid and utilize better Adam optimizer
- Variant 2 add dropout usage in order to prevent overfitting
All variants share the same network architecture, all have five layers with sizes given below:
input layer - X[batch, 784] 1 layer - W1[784, 200] + b1 Y1[batch, 200] 2 layer - W2[200, 100] + b2 Y2[batch, 200] 3 layer - W3[100, 60] + b3 Y3[batch, 200] 4 layer - W4[60, 30] + b4 Y4[batch, 30] 5 layer - W5[30, 10] + b5 One-hot encoded labels Y5[batch, 10] model Y = softmax(X*W+b) Matrix mul: X*W - [batch,784]x[784,10] -> [batch,10]
All results are for 5k iteration.
- five layer fully-connected : accuracy=0.9541
- five layer fully-connected with relu activation function and Adam optmizer: accuracy=0.9817
- five layer fully-connected with relu activation, Adam optmizer and dropout: accuracy=0.9761
As we can see changing from sigmoid to RELU activation and use Adam optimizer increse accuracy over 2.5%, wich is significant for such small change. Howerver, adding dropout decrease , but if we compare test loss graphs we can notice that dropout decrease the final test accuracy, but the test accuracy graph is much smoother.
Convolutional neural network
5 layer neural network with 3 convolution layers, input layer 28*28= 784, output 10 (10 digits) Output labels uses one-hot encoding input layer - X[batch, 784] 1 conv. layer - W1[5,5,,1,C1] + b1[C1] Y1[batch, 28, 28, C1] 2 conv. layer - W2[3, 3, C1, C2] + b2[C2] 2.1 max pooling filter 2x2, stride 2 - down sample the input (rescale input by 2) 28x28-> 14x14 Y2[batch, 14,14,C2] 3 conv. layer - W3[3, 3, C2, C3] + b3[C3] 3.1 max pooling filter 2x2, stride 2 - down sample the input (rescale input by 2) 14x14-> 7x7 Y3[batch, 7, 7, C3] 4 fully connecteed layer - W4[7*7*C3, FC4] + b4[FC4] Y4[batch, FC4] 5 output layer - W5[FC4, 10] + b5 One-hot encoded labels Y5[batch, 10]
Download or run the project
Refernces and furhter reading
- CS231n Convolutional Neural Networks for Visual Recognition
- Tensorflow and deep learning without a PHD - very good tutorial showing how to build modern MNIST conv net. It was my inspiration for this tutorial :)
- What is the difference between a Fully-Connected and Convolutional Neural Network?
- Tensorflow Examples by aymericdamien - github repository with very useful and not so obious Tensorflow examples
- Awesome tensorflow - A curated list of dedicated resources
- Projects with #Tensorflow tag in plon.io
- Dive Into TensorFlow, Part I: Getting Started with TensorFlow
- Dive Into TensorFlow, Part II: Basic Concepts
- Dive Into TensorFlow, Part III: GTX 1080+Ubuntu16.04+CUDA8.0+cuDNN5.0+TensorFlow
- Dive Into TensorFlow, Part IV: Hello MNIST
MetaFlow blog tutorial
[2017-04-4 21:30] v0.2.2 : Add results for single layer neural network and more deeper 5 layer NN. Update some information about Tensorflow examples.
[2017-03-29 14:34] v0.2.1 : Add simple convolutional neural network, change plots titles generation
[2017-03-28 21:01] v0.1.2 : Update Readme, add 5 layer Neural Network example description
[2017-03-26 21:16] v0.1.1 : First version, single layer neural network and 5 layers deep neural network with dropout and relu
More from this author
- Linear regression models
- SVM MNIST handwritten digit classification
- Introduction to matplotlib
- Tensorflow numbers recognition
- Numpy tutorial
- The World Bank eastern Europe GDP Analysis using Python Pandas and Seaborn
- TensorFlow Examples by aymericdamien
- Image convolution in python
- Primal SVM
- Tensorflow tutorial