Understanding Artificial Neural Network With Linear Regression

Artificial Neural Network (ANN) is probably the first stop for anyone who enters into the field of Deep Learning. Inspired by the structure of Natural Neural Network present in our body, ANN mimics a similar structure and learning mechanism.

ANN is just an algorithm to build an efficient predictive model. Because the algorithm and so its implementation resembles a typical neural network, it is named so. The functionality of ANN can be explained in below 5 simple steps:

Read the input data
Produce the predictive model (A mathematical function)
Measure the error in the predictive model
Inform and implement necessary corrections to the model repeatedly until a model with least error is found
Use this model for predicting the unknown

A beginner in data science, after going through the concepts of Regression, Classification, Feature Engineering etc. and enters into the field of deep learning, it would be very beneficial if one can relate the functionality of algorithms in deep learning with above concepts.

Before understanding ANN, let us understand a perceptron, which is a basic building block of ANN. Perceptron is the name initially given to a binary classifier. However, we can view the perceptron as a function which takes certain inputs and produces a linear equation which is nothing but a straight line. This can be used to separate certain easily separable data as shown in the figure. However, remember that in real-world scenarios, classes will not be so easily separable.

The structure of a perceptron can be visualised as below:

A typical neural network with multiple perceptrons in it looks like below:

This means generating multiple linear equations at multiple points. These perceptrons can also be called as neurons or nodes which are actually the basic building blocks in natural neural network within our body. In the above figure, the first vertical set of 3 neurons is the input layer. The next two vertical sets of neurons are part of the middle layer which are usually referred to as hidden layers, and the last single neuron is the output layer. The neural network in the above figure is a 3-layered network. This is because the input layer is generally not counted as part of network layers. Each neuron in the input layer represents an attribute (column) in the input data (i.e., x1, x2, x3 etc.). What is happening in the above network is that input data is fed to set of neurons, and each produces an output. Again, each of these outputs are fed to other neurons which in turn produces another output, which is again fed to the output layer. Error calculated at this output layer is again sent back in the network to further refine the outputs of each neuron which are again fed to the neuron in output layer to produce a refined output than before. As explained in the 5-step process above, this process is repeated until we get an output with minimal error.

The process of producing outputs, calculating errors, feeding them back again to produce a better output is generally a confusing process, especially for a beginner to visualise and understand. Hence, an effort is made here to explain this process with just one neuron and one layer. Once this basic concept is understood, expanding this to a larger neural network is not difficult.

Everyone agrees that simple linear regression is the simplest thing in machine learning or atleast the first thing that anyone learns in machine learning. So, we will try to understand this concept of deep learning also with a simple linear regression, by solving a regression problem using ANN.

Implementing ANN for Linear Regression

We have understood from the above that each of the neuron in the ANN except the input layer produces an output. The output is based on what function that we use. This function is generally referred as ‘Activation Function’. As ANN is mainly used for classification purposes, generally sigmoid function or other similar classification algorithms are used as activation functions. But, as we are now trying to solve a linear regression problem, our activation function here is nothing but a ‘Simple Linear Equation’ of the form –

y=w₀ + w₁x₁+ w₂x₂ + w₃x₃ + …. w_nx_n

where x₁, x₂, x₃.. x_n are the independent attributes in the input data,

w₁, w₃… w_nare the weights (Co-efficients) to corresponding attributes, and

w₀ is the bias

Because our output should just be a single linear line, we should configure our ANN with just 1 neuron. As the output of this 1 neuron itself is the linear line, this neuron will be placed in the output layer. Hidden layers are required when we try to classify objects with using multiple lines (or curves). So, we don’t need any hidden layers as well here.

Hence the ANN to solve a linear regression problem consists of an input layer with all the input attributes and an output layer with just 1 neuron as shown below:

Now, we have finalised the structure of our ANN. Our next task is to actually write code to implement it. We will be implementing this simple ANN from scratch as that will help to understand lot of underlying concepts in already available ANN libraries.

Recall the 5 steps that are mentioned at the beginning. As mentioned there, the process involves feeding input to a neuron in the next layer to produce an output using an activation function. This process is called as ‘Feed Forward’. After producing the output, error (or loss) is calculated and a correction is sent back in the network. This process is called as ‘Back Propagation’. We will also use some standard terminologies for our ANN network such as ‘Network’, ‘Topology’ etc. which we will see in the code. With various terms and terminologies that we have learnt so far, let us implement the code –

1. Import the required libraries

2. Initialise the weights and other variables

In our approach, we will be providing input to the code as a list such as [2,3,1]. Here, the total no. of values present in the list (list size) indicate the number of layers that we want to configure, and each number in the list indicate the no. of neurons inside each layer. So, the list [2,3,1] indicates our network should consists of 3 layers in which first layer consists of 2 neurons, second layer consists of 3 neurons and output layer consists of 1 neuron. This structure can be called as ‘network topology’. However, as we are solving regression problem, we just need 1 neuron at the output layer as discussed above. So, we just need to pass the input list as [1].

In our approach to build a Linear Regression Neural Network, we will be using Stochastic Gradient Descent (SGD) as an algorithm because this is the algorithm used mostly even for classification problems with a deep neural network (means multiple layers and multiple neurons). I will assume the reader is already aware of this algorithm and proceed with its implementation.

We will initialise all the weights to zeros. Let us create a class called ‘Network’ and initialise all required variable in the constructor as below –

‘self.output’ variable in the above code is to hold the outputs of each neuron. It will be initialised accordingly with a sufficient sized list based on our input. Remaining variables are pretty self-explanatory.

3. Coding ‘fit’ function

We know that the gradient descent algorithm requires ‘learning rate’ (eta) and no. of iterations (epoch) as inputs. We will be passing all these values in a list to the program along with the training data. Let us build a ‘fit’ method to construct a predictive model with all the inputs given –

4. Produce the Output and Correct the Error

I have mentioned above what ‘feed forward’ and ‘back propagation’ are. Let us implement those methods –

Above function is just forming a simple linear equation of y = mx + c kind and nothing more.

In SGD algorithm, we continuously update the initialised weights in the negative direction of the slope to reach the minimal point.

Error function E(w) = ∑[(w₀+ w₁x₁ – y₁)² +(w₀+ w₁x₂ – y₂)²+….. +(w₀+ w₁x_n – y_n)²]

Here, I have not taken ½ as scaling factor to the equation. One may take if desired so. Also, in SGD only one row is passed to the above error function every time to calculate the error. Hence, if we differentiate the above equation w.r.t. each of the weights w₀,w₁, w₂ .. etc., we get equations like

Enjoyed this story? Join our Telegram group. And be part of an engaging community.

Provide your comments below

comments

Understanding Artificial Neural Network With Linear Regression

Tesla Did Zero Market Research On Cybertruck (Or Any Other New Product) — Didn’t Need To

Idorsia and Mochida enter into a license agreement for the supply, co-development and co-marketing of daridorexant in Japan Swiss Stock Exchange:IDIA

Idorsia and Mochida enter into a license agreement for the supply, co-development and co-marketing of daridorexant in Japan Swiss Stock Exchange:IDIA

Category

HPIN International Financial Platform Becomes a New Benchmark for India’s Digital Economy

Top 10 Market Research Companies in the world

3 Best Market Research Certifications in High Demand

Understanding Artificial Neural Network With Linear Regression

Enjoyed this story? Join our Telegram group. And be part of an engaging community.

FEATURED VIDEO

Provide your comments below

Tesla Did Zero Market Research On Cybertruck (Or Any Other New Product) — Didn’t Need To

Idorsia and Mochida enter into a license agreement for the supply, co-development and co-marketing of daridorexant in Japan Swiss Stock Exchange:IDIA

Idorsia and Mochida enter into a license agreement for the supply, co-development and co-marketing of daridorexant in Japan Swiss Stock Exchange:IDIA

Category

HPIN International Financial Platform Becomes a New Benchmark for India’s Digital Economy

Top 10 Market Research Companies in the world

3 Best Market Research Certifications in High Demand