Hi, I'm ThadeusB.

I code stuff. I raise bees. I game.

Artificial Neural Networks - Multi-Layer Perceptron

This will be my first real tutorial so bear with me as I am sure it will undergo many revisions. Please feel free to ask any questions in the comments. Note:  If I actually finish this tutorial and provide source code depends on the response I get to the article. If you liked it and would like me to write more then please let me know! What will be covered:

  • Overview - Of Neural Networks
  • Multi-Layer Perceptron - A kind of neural network
  • Feed Foward Algorithm - Calculating an answer
  • Back Propagation Algorithm - Supervised Training
  • Application - Neural Network written in C#.NET


Lets start from the beginning. Artificial Intelligence is simply defined as making a computer seem more human. The field of artificial intelligence is vast, and every month there are innovations in the field that is getting computers one step closerto their makers. Things such as video game opponents, optical character recognition, facial recognition, voice synthesis, data mining, robotic surgery, and much more are accomplished by using artificial intelligence.

One of the most popular ways to emulate human decision making is by simulating the human brain. What better way to make a computer more human than to design it like a human?

Our brain is composed of a network of cells called neurons, and these neurons are linked to each other by dendrites. Long story short, a neuron receives an input from all the dendrites it is connected to, does some math, if it is above a certain voltage level then that neuron will fire and send its value out to each neuron that is connected to it. By some act of God, that is how we have thoughts.

When we simulate this process in a computer, we call it an artificial neural network. There are a few different kinds of artificial neural networks ( ANN for short ). These are just some of the different kinds of ANN.

  • Feed-Foward
    • Perceptron
    • Multi-Layer Perceptron
  • Feedback
    • Hopfield Net
    • Self- Organizing Maps

The main difference in each is how the neurons are arranged and how each neuron interacts with every other neuron.  Since this article is about ML- perceptrons, that is all I will be explaining.

There are two ways of teaching an ANN to do what you expect it to, make an reasonable decision.

  • Supervised Learning
  • Un-supervised Learning

With supervised training you give the network some input data, such as an image of a letter. It spits out an answer as to what it thinks it is (which will be wrong until it is trained). You tell it what the correct answer was, and it fixes itself so the next time it sees that same image, it will be closer to the correct answer.

In un-supervised learning the network will make its own assumptions about the input data, and over time possibly provide answers to a problem that had not even been thought of. A Self-Organizing map is an example of a un-supervised network that will organize like data into groups.

Basics of a Multi-Layer Perceptron:

A ML-perceptron is a ANN that is composed of layers of neurons in which each neuron in a layer is connected to every neuron in the previous layer.

*Note: Dendrites are also referred to as Weights*

In a ML-perceptron, there are at least three layers

  • 1 Input
  • 1 - ∞ Hidden
  • 1 Output

In short:

The input layer receives data about something. Each input neuron represents a piece of data such as pixes to a image of a letter.

The hidden layers do some calculations.

The output layers provide an answer in percentage. Each output neuron represents a different possibility.

In Long:

The input layer is the layer of neurons that will be receiving the data about something. For character recognition there would need to be one neuron for each pixel in the image of the character. Obviously there can be only one input layer. Input is usually normalized to a double between 0.0 and 1.0.

The hidden layer is a layer used for calculations. This is what makes up the "brain" of a ML-perceptron. There can be as many hidden layers with any number of neurons. Typically one hidden layer is usually enough, though to get the best performance (learning rate vs. correctness) different configurations could betested.

The output layer is the results of what the network has calculated. For character recognition that could only see the letters of the alphabet there would need to be 32 output neurons (one for each letter). A neurons output value is a normalized double between 0.0 and 1.0. 0 is false, 1 is true. It could also be though of as apercentage of correctness for what that neuron is to represent. In the OCR network, if a neuron represented the capital letter T was a .85, that would me the network thinks the input is an 85% chance T, yet also the neuron representing the letter F might be .64, meaning the network thinks the input is an 64% chance F.

The Example:

The following example will go through a network that is trained to solve the XOR problem.

XOR Problem: One and only one can be true but one has to be true.

1 XOR 1 = 0
1 XOR 0 = 1
0 XOR 1 = 1
0 XOR 0 = 0

The neural network that can solve such a problem is shown below

Since the XOR problem has only two binary inputs, the network will have two neurons in the input layer. Also since the XOR problems output is just one answer, there will only be one output neuron.

How A Multi-Layer Perceptron Calculates An Answer:

The calculations happen inside each individual neuron. A neuron needs three pieces of information.

  • Output values of every neuron in the previous layer
  • Value of every dendrite(weight) connecting the current neuron toevery corresponding previous neuron
  • Activation Function

The value of every dendrite(weight) in a ML-perceptron are randomized double values between 0.0 and 1.0.

The weights of the network are what make up the magic. These are the main determiners of what the final answer is. Later, when the network is to be trained or taught, these dendrites(weights) are what will be altered.

The input layer does not have any calculations performed on it, therefore the input to the input layer neurons is also their output.

Starting with the first hidden layer and using the input layers, for each neuron in the layer sum all of the previous layers output values * their corresponding weights. Pass the sum through an activation function. Continue to the next layer. Do this for the output layer as well. The values of each neuron in the output layeris the answer from the network. This can be represented in calculus as follows:

v = Sigmoid( Σ( pv * pw ) )


  • v = output value of the neuron
  • Σ = calculus for [summation][5]
  • pv = a previous neurons output value
  • pw = the dendrite or weight value connecting to the previous neuron
  • Sigmoid = activation function

There is a problem with this structure though. If all the inputs to the network are zero, when the network is training, it will not be able to learn what all zero inputs are. For this reason there must be a pseudo input added to each layer excluding the output layer. This bias acts as a neuron with a value that is always 1 or -1 and it has a dendrite (weight) attached to it and is calculated inwith the summation.

The Activation:

This is probably the most important calculation in a neural network. The activation function. Simply put, this function decides if the neuron should "fire" and send a signal to receiving neurons. There are a few types of activation functions:

  • Threshold
  • Piecewise
  • Sigmoid Logistics Hyperbolic Tangent Algebriac Sigmoid

Note: A Sigmoid function is also referred to as a squashing function since it normalizes input

Typically a ML-Perceptron uses a sigmoid function as its activation.

The sigmoid function looks as follows:

threshold = 1 sum = Σ( pv * pw ) sigmoid = ( 1 / ( 1 + e ^ ( -( sum ) / threshold )))


  • e = [Mathematical constant e ][8]
  • sum = sum of previous neurons
  • threshold = Activation value

How A Multi-Layer Perceptron Learns:

With randomized weights, it will be luck if the network spits out a correct answer. The answer will seemingly be randomized as well.

These weights are what will be altered to make the network spit out an answer more to what is being expected. However, to know how to change these weights the current output of the neural network will have to be known so that we can calculate the error rate. The error rate is how far off the neural network is from the correct answer.

The training process happens on a per-input basis. If the network is trained on one set of data, such as the letter "A", it will not be trained if it comes across the letter "B". All the neural network will know is how much "B" is like "A".

Training may have to be done hundreds or even thousands of times on each kind of input until the error rate of the network is within a satisfactory range. Usually less than 1%.

The process as follows:

  • Compute resulting output
  • Compute ERROR for neurons in output layer
  • Compute ERROR for all other neurons
  • Compute CHANGE for all weights

To calculate the error for the output layer.

Error is equal to the desired output minus the actual output.

β = d - o


  • β = Error
  • d = Desired output
  • o = Actual output

To calculate the error for all other neurons.

Take the weights and outputs and errors of each neuron in the right-side layer.

Sum the product of them all. Also include a 1 - output value for the slope of the line.

βj= Σ ( wk> * ok ( 1 - ok ) βk )


  • βj = Error for current neuron
  • wk = Weight for neuron in right-side layer
  • ok = Output for neuron in right-side layer
  • βk = Error for neuron in right-side layer
  • j = Current Neuron
  • k = Neuron in right-side layer
  • Σ = calculus for[summation][5]

To calculate the changes for the weights.

For a weight of a neuron. Take the neuron in the left-side layer the weight is connected to Multiply it with the slope and the error of the current neuron Multiply in a learning rate (How fast the network will learn) .20-.25 seem to be optimal

Δwj = r * oi * oj ( 1 - oj ) βj

wj= wj+ Δwj


  • Δwj = Change in weight for current neuron
  • wj = Weight of the current neuron
  • r = Learning rate of the network ( how fast the network learns ) .2o-.25 are good.
  • oi = Output for neuron in the left-side layer
  • oj = Output for neuron in the right-side layer
  • βj = Error for current neuron
  • j = Current Neuron
  • i = Neuron in left-side layer
Neural Network With Many Layers

Neural Network With Many Layers

Putting it all together:

wj = wj + Δwj wj

Δwj = r * oi * oj ( 1 - oj ) βj

βj = Σ ( wk * ok ( 1 - ok ) βk )

βz = d - o

How A Multi-Layer Perceptron Can Be Used:

A few uses for a neural network include optical character recognition, facial recognition, texture analysis, data validation, sales forecasting, etc.

For a more extensive list visit this websiteAlyuda.com

A tutorial on writing a neural network with C# will be coming soon!

The project will be a simple optical character recognition program.

List of missing files.

  • perceptron-300x187.jpg (Multi-Layer-Perceptron)
  • perceptron.jpg
  • perceptron-xor-300x187.jpg (Multi-Layer-Perceptron-XOR)
  • perceptron-xor.jpg
  • perceptron-activation-300x225.jpg (Muti-Layer-Perceptron-Activation)
  • perceptron-activation.jpg