Verilog Code for MOD 5 Counter


As discussed in the previous post, I implemented the MOD4 and MOD 8 Counters. In this, I'll implement MOD 5 Counter. This counter will have 5 states starting from 000 to 100 and then again back to zero. However, according to the equation below,

                                                                                    N <= 2n 
you might find it vague. I mean 5 is not a power of 2. So how is it possible to design a counter which will count a non-power of 2? If you guessed by using external circuitry, then you are absolutely correct. Within this type of counters, we will have D Flip-Flops with clear flags. By intuition, we can say that just after 100, we will have to somehow clear the flip-flops' values, thus bringing the values back to zero. To calculate the minimum number of gates, we will have to use the same equation.  This gives the value of n as 3. Hence, we will have to have 3 D flip-flops to count 5 states in order to satisfy the requirements of MOD 5 counter. Here, we will be using AND gates to clear the flip-flops. OnFor UP counters, we use Q output from each flip-flop. If we assign each bit of 101 to variables A, B, and C then we have to choose only those variables which are HIGH i.e. 1. Here we will choose A and C as both of these variables are 1. Only these variables will act as an input to the AND gate. When the state reaches 5 i.e. 101, AND gate inputs will be 11 which will result in 1 and will be provided to each flip-flop. This will trigger the clear flag within each register, thus will reset each flip-flop to zero and counting of the states will start again.

Pretty Simple huh?
Let us apply the concept to reality.

Our MOD 5 counter will count 5 states i.e. from 0 to 4 and then will reset the flip-flops back to zero. One can make a careful observation that the AND gate that has been used must have some propagation delay. An ideal AND gate would hang the counter onto a single state forever. The proposed counter is an asynchronous counter as each flip-flop is not simultaneously triggered and the clock is depended from the previous flip-flop.

Here is the Verilog code for the implementation:



Here is the output



Carefully Observe that the counter partially goes into the 6th state but resets itself. This delay is because of the AND gate which takes time to compute the result. Here, have a look at the RTL schematic of the MOD 5 Counter.


In the Verilog code, I have introduced a delay of 1ns for the AND gate. Try to set it to 0 and give it a shot. You might think that you would get an ideal MOD 5 counter but it won't happen. The system will hang at 110.

Why does it happen?

Well, it happens because of edge issue. In the Verilog code, observe the always condition. It says @(posedge clk or posedge clear), which means from state 100 as soon the third flip-flop goes from 0 to 1 to make the state 101 at the positive edge of the clock, the AND gate with no delay will also turn on at the same instant. Thus counter stops as it cannot follow the two conditions at the same time. However, giving a delay will first activate the clock condition and then after the delay period arrives the clear edge which will reset the flip-flop again.

Problemo Solved!


Verilog Code for MOD Counters


Counters are used to count and move the state of a circuit from one state to another. Whenever they are given a clock signal, either the system moves one state ahead or behind. It is not necessary to jump only one state. We can jump by a number of steps but for that, we would require circuitry. Counters are examples of sequential circuits. It requires a clock signal to function and states are changed at either positive edge of the negative edge of the clock. It is important to decide on the type of counter. We can design either an asynchronous counter or a synchronous counter. In an asynchronous counter, only the first flip-flop is dependent on the clock. The output of the first-floor shop act as a clock to the second flip-flop. Similarly,  the output of the second flip-flipflop will act as the clock for the third flip-flop and it goes on. On the other side in a synchronous counter, each flip-flop receives the clock signal at the same time. So we can conclude that asynchronous counters are slower than the synchronous counter. Asynchronous counters required less circuit free then synchronous counter. For asynchronous counters, all the data will change one after another and will follow On the other hand, with a synchronous counter, data will change at the very moment, the clock is triggered either at the positive edge or the negative edge.
Now as the titles suggest, we have to design Mod N counter Where N equals the number of the FFs. For example, if we have a MOD 2 counter hence it will go from 00 to 11. Here N denotes the number of flip-flops required to design the counter and will provide 2 to the power N States. Thus, we can easily make counters of 2, 4, 8, 16 states and so on.

To design MOD counters, I'll be using D flip-flop as they are most sophisticated and easy to use. To know about D flip-flops, you can follow this link.

MOD 4 Down Counter

For MOD 4 counter, we will require 2 FFs. To count the no of FFs required, use this formula
                                                                             
                                                                                    N <= 2n 


N represents the number of states, n represents the number of  FFs required. MOD 4 counter will have 4 states from 00 to 11. It will have 2 flip-flops.

Verilog Code:




Output:


MOD 4 UP Counter

Verilog Code:
Output:

MOD 8 Down Counter

For MOD 8 counter, we will require 3 FFs. To count the no of FFs required, use this formula
                                                                             
                                                                                    N <= 2n 


N represents the number of states, n represents the number of  FFs required. MOD 8 counter will have 8 states from 000 to 111 and then will again return to zero. It will have 3 flip-flops.

Verilog Code:



Output:
                          


MOD 8 Up Counter

Verilog Code:


Output:



Cheers.

Setup Time and Hold Time


Setup time and Hold time are very important concepts when designing circuits. Compromising with these parameters would give an organized output. Whenever we see a circuit, we see it switching instantly ON and instantly OFF. It just appears like teleportation. Electronics from the low state were teleported to a higher state and vice versa. However, teleportation never occurs in electronics. No matter how fast switching circuits has been invented, it will always take some amount of time.
Switching a circuit from LOW to HIGH does involve time. The time involved is as small as nanoseconds or picoseconds.

Setup Time:
The minimum time required to keep the input data stable while the clock has started to change is called Setup Time. Any change or malfunction of data within this period leads to Setup Violations.

Hold Time:
The minimum time required to keep the input data stable after the clock transition has taken place is called Hold Time. Any change or malfunction of data within this period leads to Hold Violations.

Below is the image how the human eye perceives a change in the clock signal



For the humans, it's always clear whether it is a HIGH or a LOW signal. However, the transition takes time. The delay introduced for the electronics to travel from a HIGH state to LOW takes place as shown in the below diagram.




The fall delay is the time taken for the transition from HIGH state to a LOW signal of any state. So when we see a clock working, we actually have a clock like this below.



To model these delays in Verilog, we use #NUMBER where NUMBER is the minimum number of clocks, the delay has to be made.

For example, if we make a simple code for AND gate, then the ideal gate would be coded as
and(C, A, B);   This represents C = A & B
However, with a delay in the circuit, we can code it as
and #1 (C, A, B): This too represents C = A & B but the output would have a delay of 1 tick of the clock.

Now each diagram above is a digital representation of a signal. What happens in the analog form is the most interesting part of signals. The analog form of the signal is the reason for Setup time and Hold time. When a signal is flipped to its complementary state, it takes time to stabilize. Until that time, the electronics oscillates in an undamped manner. This is called the switch Bouncing Effect. The following diagram would explain clearly.


Thus, the output clock edge must be captured and stabilized before any data change occurs. Now in Verilog, we use @posedge or @negedge for most sequential circuits. This means that the data has to undergo certain operations and produce an output during that change of transition, neither before, nor after. So the data is "held tightly" while the clock is transiting from the LOW state as we cannot afford the data to change before edge just because of some quantum disturbances. Now, as soon as the clock reaches midway, the change in data takes place and is again "held together" until the clock stabilizes its HIGH state.


As shown in the above diagram, during stabilization of clock, there are several "minute" positive and negative edge caused by oscillations which might trigger a change in the data if not held properly. It is this, why we require Setup Time and Hold Time.

To practice this in Verilog, we will execute some basic modules.

Code:
module Delay(A, B, C, D, E);
 input A,B;
 output C, D, E;
 wire p, q;


or #50 an2(q, A, B);
or #52 an3(p, A, B);
nand #54 an4(E, A, B);
assign C = p;
assign D = q;


endmodule

TestBench:
module TEst1;

reg A,B;
wire C,D,E;

Delay DEF(A, B, C, D,  E);

initial begin
A = 1;
B = 0;  
end
endmodule

OUTPUT with the delay for all changes


Here is an easy example to explain Rise delay.


Code:
module Delay(A, B, C, D, E);
 input A,B;
 output C, D, E;
 wire p, q;


or #2 an2(C, A, B);
or an3(D, A, B);
endmodule

TestBench:
module TEst1;

reg A,B;
wire C,D,E;

Delay DEF(A, B, C, D,  E);

initial begin
A = 1;
B = 0;  
        fork
       #10 A = 0; #B = 0;
end
endmodule

The signal D is driven without delay, hence it started the very moment, we started our simulation. It also changed its output when the inputs were changed at 10ns.
The signal C is driven by a delay of 2ns, hence it started with a delay of 2ns and also ended with a delay of 2ns. #NUM defines the delay where NUM is the delay of every parameter.

Now let us introduce, Rise delay and  Fall delay. Its basic syntax is:
#(R_Delay, F_Delay)

An example explaining both of these delay with comparison of no delay signal

Code:
module Delay(A, B, C, D, E);
 input A,B;
 output C, D, E;
 wire p, q;


or an2(q, A, B);
or #(2,0) an3(p, A, B);
or #(0,2) an4(E, A, B);

assign C = p;
assign D = q;
endmodule

TestBench:
module TEst1;

reg A,B;
wire C,D,E;

Delay DEF(A, B, C, D, E);

initial begin
A = 1;
B = 0;  
fork
#10 A = 0;
#10 B = 0;
join
end
endmodule

Output:

Now as shown in the code, signal C has a Rise delay of 2ns but no fall delay, hence it started at 2ns and not at 0ns but ended exactly at 10ns as we changed the signal exactly at 10ns. Thus without Fall delay, we didn't see the delay in C when the inputs were changed. The signal D, as usual, is having no delay hence it started and stopped without any delay. Signal E has no Rise delay, hence it started without any delay but it does have Fall delay, hence it ended with a delay of 2ns.

In the case of Turn Off delay, it is the minimum amount of time required to jump from X, 0, 1 to a high impedance state Z. Z is called the high impedance state. It signifies a connection split, hence the output has a very high resistance and is floating. Similarly, x signal has an unknown signal state.

Now, every signal has 3 delays with 3 sub delay for each delay. Confused? Me too.
We have 
1.)  Rise Delay 2.) Fall Delay 3.) Turn Off Delay
 Within the Rise/ fall/ turn-off delay, we have 3 subcategories.
  • Minimum Delay: This signifies the minimum amount of time delay to rise/ fall/ turn-off.
  • Typical Delay: This signifies the typical amount of time delay to rise/ fall/ turn-off.
  • Maximum Delay: This signifies the maximum amount of time delay to rise/ fall/ turn-off.
So now, we have to declare delay as follows:
#(min:typ:max, min:typ:max, min:typ:max)

Here is an example signifying the above:
Code:
module Delay(A, B, C, D, E);
 input A,B;
 output C, D, E;
 wire p, q;


or an2(q, A, B);
or #(0:2:4,2:3:4,1:2:3) an3(p, A, B);
or #(2:6:7) an4(E, A, B);

assign C = p;
assign D = q;
endmodule

TestBench:
module TEst1;

reg A,B;
wire C,D,E;

Delay DEF(A, B, C, D, E);

initial begin
A = 1;
B = 0;  
fork
#10 A = 0;
#10 B = 0;
join
end
endmodule

Output:



Here signal C is having a typical Rise delay of 2ns and typical Fall delay of 3ns.

Verilog simulation in Xilinx


Hi 
In this post, I am going to show you how to simulate a Verilog code in Xilinx ISE. I have included every step with an image so that the user can easily understand every step clearly. 

Ok let's begin 
I am using Xilinx ISE version 14.2 
You can have your own version of Xilinx or ModelSim or MultiSim HDL Simulator but the methodologies for implementing a Verilog code within them might be different. 

For this post its version 14.2 


First of all click on the desktop icon ISE Design Suite 14.2 



A Window will open. After the window opens, which might take some time depending on your system configuration. Click on the tab "New Project" as shown below. 



Enter the name of the project in the first field. Be sure that the location where you are going to save your project is having permission to write. You cannot save at a location that has only read only permission. 
Click "Next" and select the "Device" field as Spartan 6 because that's what I am going to use within this post.


Now click "Finish".

Now on the top left corner, you'll see two views. One of them states "Implementation" which is for finalizing implementing RTL schemaric of code and to find its parameters like heat, speeds, timings etc. The other view states "Simulation". We will start with the implementation first. Click on the  "Implementation" radio button 

Under your project name you will see another folder structure named xclscf something like that. Right click and click on new source.


Select "Verilog Module" and give the module a name. Click next when done. You can leave the input and output ports blank or you can enter the ports if you are sure about your code. As I am never sure of my code, I will leave it blank. I'll insert the input and output ports within the code itself.


For this post, I am implementing my own SPI which is the serial peripheral interface in this simulation. To have a look on my SPI simulation and code, Click Here.

A window on the right side will pop out as shown below. It will have some predefined comments, describing the date, user and creator of project. It's just the time and owner stamp. It's upto you whether you want to keep it or not.


I removed the time and creator stamp and pasted my own Master code from my SPI post. I have also added Slave code using the same method as I showed above to add a new "Verilog Module". Now its time to explain in layman terms about module, testbench, and datapath. I am expecting all students to have some classes about Verilog coding and understanding of wires, outputs, and circuits.

Let us consider an IC which has components like register file, multiplexors, adder, subtractor etc as shown below. 

As per the above diagram we have a component A and component B. 
For each component, as shown above, for A and for B, will have to declare a module or specifically saying a Verilog module like below. For module A, we are having an input present at pin 2 and an output present at pin 7. Similarly, for module B we have an input from pin 3 and from A and a single output at pin 6. The Verilog code for the module or the component a using the description above will be as follows

module A(in_pin2, out_pin7, out_to_B);
input in_pin2,
output out_pin7,
output out_to_B;
initial begin
<Code>
end
endmodule

module B(in_pin3, in_from_A, out_pin6);
input in_pin3,
input in_from_A,
output out_pin6,
initial begin
<Code>
end
endmodule

This is how we are going to add Verilog code for master as well as the slave for our simulation. 

No coming to the next question. What is a datapath?
Basically a data path includes the connection between various components inside the Integrated circuit. For example, in the above diagram, you can easily see components A and B are linked with each other using the red coloured wire and the wire which is attached to pin 7. Now, remember, for data path you have to include every wire that is connected between various models which are not present at the input or the output along with wires that you want to see as your output from the IC. 

All the connection between various models which are not the input port or the output port have to be declared as wires. In the above diagram,  component A has two internal wires with  the component B, the first one is the red wire and the second wire is connected with pin 7. Now I want to see the input of component A and component B as well as the output of component A and component B. 

module Datapath(in2,in3, out7, out6);
input int2, in3;
output out6, out7;
wire red_wire;
A Component_A(in2, out7, red_wire);
B Component_B(in3, red_wire, out6);
endmodule;

The basic syntax to instantiate a componet is 

Component Name Variable_Name (PORTS IN ORDER)

Here component name is A. Its variable name is "Component_A" and ports which have to be in order as they were declared in A. Have a close look at the ports of componet A within the brackets. 

(in_pin2, out_pin7, out_to_B);

The first port is in_pin2 which is an input pin attached to datapath port in2. The second port is out_pin7 which is connected to the out7 pin of the datapath. The third port is "out_to_B" which is an internal wire in datapath connected to "red_wire" in the datapath. One simply cannot assign any datapath port while instantiating a component. It has to be in order. Input port of IC is pin 2 which is declared as "in2" in the datapath. We have to connect this to component A's input which is "in_pin2" which is the first port declared within the brackets. Thus, while instantiating component A, we have to assign correct datapath port to the correct position of A. We cannot instantiate like this below

A Component_A(out7, in2, red_wire);

The first port of A, which is reserved to be input from pin 2 has to be the input from pin 2 but we have assigned "out7" to the input of A which is completely wrong. Similarly, we have assigned input from pin2 (in2) to the output of component A. The red wire is, however, assigned correctly to the 3rd position which points to output to B of A.


Now we have instantiated component A and component B within the data path, saving the data path module will automatically assign or basically put the module A and module B within the data path module. This shows that the models are being instantiated under the datapath module as shown in the below image. Now refer to the IC diagram above. As you can see, both the components lies within the IC. The IC is our datapath here hence, component A and B must lie within it.


Now coming to the tesbench part. A testbench is an environment which is used to provide input to our system and get the output from this system. Just take connecting a CRO to a transistor, where we can see the output from a transistor. Similarly, we can see our simulation through the testbench. One should always remember that all the input ports in the testbench have to be of Register type and all the outputs of have to be of wire type unlike in module where input has to be of wire type and the output can be wire as well as register.




To create a testbench, click on the "New source" and this time instead of selecting Verilog module select Verilog text fixture. It will show you some options of the available modules from your project on which you have to start your testing. We have to test the entire IC not specifically to the component A or component B. Select the data path module which is to be tested via Verilog text fixture or test bench. Xilinx will automatically assign ports according to the input and output ports defined in the data path module. Do remember that the internal wires are not attached with the testbench. The internal wire is that wire which is neither connected to the output nor to the input.
Only the input and the output of the data path will be connected to the test bench. All the input and output of the data path are the same to which we have to provide input and from which we have to derive the output. 

To simulate, we have to select the view "Simulation' which was earlier set to "Implementation". The moment you will set it to "Simulation", in the below menu you will get options like this.


Double click on "Behavioral Check Syntax" to check the logic for errors. Wish for a green colored tickmark beside. If successful, double-click on "Simulate Behavioral Model" to simulate and wait for the results. 


Now to see a specific port which is NOT present in the testbench can be selected from the leftmost panel from "Test" and "glbl" menu. Select the port and press CTRL + W and then "Re-Launch".

Regards

Neural Network: Part 2

Neural Network: Logistic Regression

In this post, I'll deal with logic regression and its implementation of a single neuron network. I hope you have read my previous post on mattresses and its operation using NumPy package. You can click here to go to my previous post 

Logistic regression basically computes the probability of the output to be one. For example, if we train our Logistic model to recognize the image of a dog then for any new image the model basically try to calculate the probability of whether the new image is a dog or not. the higher the probability will imply that the given image is of a dog. 

A neuron is the primary and fundamental unit of computation for any neural network. A neuron will receive a vector that will include the input features. Each of the features will get multiplied with their corresponding weights and then a bias will be added to each of the features after which the weighted sum will be calculated. This will produce a linear output from the previous step. After processing the linear output, data will have to pass through the activation function. 

Activation Function

An activation function is a non-linear transformation which decides the probability of the linear input. Its output resides from 0 to 1 or from -1 to +1. This function helps in the mapping of values between a certain range which for us is between 0 and 1. For this post, I will use the Sigmoid function as the activation function. The Sigmoid function can be represented by the following equation
Sigmoid Function

The sigmoid function produces an S-shaped spanning across left and right plane. As per Wikipedia Sigmoid function is a bounded differentiable and a real function having a non-negative derivative at each. 
Sigmoid Graph
Source: Wikipedia

Weights and Biases

Weights and biases are responsible to adjust the input function which consists of features in order to bring the final output close to the actual output. Weights are multiplied with the features and then the bias component is added. The output has been passed to the activation function. Recently added from StackOverflow an interesting thing about the bias. As the name suggests bias means to favour something irrespective of whether that thing is right or wrong. Similarly, the bias favors the actual output by adding itself to the weighted sum.


The output z will now act as an input to our sigmoid function. One thing that users must always remember is that weight is a column vector whose size is equal to the number of features in the input and bias is a scalar number. The sigmoid function will now map the value of Z between 0 and 1. The output from this sigmoid function will indicate the probability of how close the input is to the actual output. 

Clearly, the above example was for a single input sample. What if we have multiple input samples? Well, the Logistic regression also works on multiple training samples. Suppose there are m training samples and each of the samples has n features then the input can be stated as M x N matrix. Let us call this matrix as X. Now we can easily perform dot product between X and the weight vector to get a linear output. The vector output will be then passed to the activation function which will result in an output vector A containing the probabilities of each training sample.


In the next post, I'll deal with the Loss function, Cost function and the implmentation of SNN (Single Neuron Network).

Neural Network: Part 1


Basics of Neurons


Hey everyone, long time no see. I have been working on some stuff so was busy and away from my blog. Well, as for my interest in AI, I have been working on neural networks and its implementation which I'll discuss in this post.

In this post, you will learn:
  • The basic concepts used in building a neural network
  • Implementing logistic regression
  • Single Neural Network
  • Forward propagation and Backward propagation
  • Building a small neural network using TensorFlow
  • Building a deep neural network using TensorFlow.
As we know, we humans learn from our experiences and through training. Our brain runs a very high speed and processes any activity quickly. Well, it actually depends on the type of activity, the person is working on. But as we all know, humans have a high grasping power and easily learn, understand, classify, remember and do various other things which machines take a hell lot of time to compute. All these computations are processed by our mighty brain. So to understand the technical neural network, we must understand the biological neural network.

Source: Wikipedia


The brain consists of a huge number of neurons. These neurons are the fundamental unit of the brain. They receive electric signals as input from the external world (Hello World!!) which are processed and sent to other neurons. A basic neuron consists of axion, dendrites, action potential, synapse etc.


Axons are the thin structure and are called as the transmitting part of the neurons. Dendrite is the receiving part of the neuron which receives its input from synapse summing total inputs. The action potential helps the neurons to communicate with each other. So basically, neurons receive electrical impulses from dendrites, which informs the neuron about the outcome of an incident. If the outcome is not in favor, these impulses are altered by chemical and electrical reactions and sent to other neurons by neurotransmission. This keeps on happening until the brain achieves perfection. 

In a similar way, we will work on an Artificial neuron.
Matrices and its manipulation is a basic requirement for this post. To simplify, we will work on the dot product, vectors, and broadcasting in python.

Dot Product
  If A is a C × D matrix and B is a D × E matrix, then the dot product of A and B is a C × E matrix. The dot product will reduce our computation time whenever we have quite a lot of equations. Note that the number of columns in the first matrix should always be equal to the number of rows in the second matrix. In python, we will use dot() function from the NumPy package.

A typical example describing the dot() function in the NumPy package

   import numpy as np 
   a = np.array([[7,6],[2,5]])
   b = np.array([[10,6,70],[18,9, 11]]) 
   c = np.dot(a,b)
   print("Shape of A ", a.shape)
   print("Shape of B ", b.shape)
   print("Shape of C ", c.shape)
   print("The dot product of A and B is : ", c)

And the result is 

Broadcasting
   Broadcasting in Python helps to perform element-wise calculation between a matrix and vector or scalar. Scalar is the just a single number whereas a vector is rank 1 array. The advantage of NumPy package is that it provides all these operations on matrices, vectors, reshaping, transform etc.A vector can be represented as numpy.ndarray(n-Dimensional Array) object using NumPy's array() function. A column vector is of shape (A × 1) and a row vector is of shape (1 × B)

To shape an array we use the syntax Array_name.reshape(row, column)
To find the rank of the matrix we will use the function numpy.linalg.matrix_rank()

These were the basics required to start a simple program or project on neural networks. In the next post, I'll deal with Logic Regression with a single neuron which requires the basics as I have described above. 

Update: Click here to jump to my next post





SARSA Learning with Python

Hola,
I worked on SARSA algorithm as well as on Q Learning algorithm and both of them had different Q matrix (Duh!) The methodology of both of the algorithms depicts how well one algorithm responds to future awards (which we can say OFF Policy for Q learning) while the other works of the current policy and takes an action before updating Q matrix (ON Policy).

The previous post example of the grid game showed different results when I implemented SARSA. It also involved some repetitive paths whereas Q didn't show any. A single step showed that SARSA followed the agent path and Q followed an optimal agent path.

To implement both ways I remember the way of pseudo code.

QL

initiate Q matrix.
Loop (Episodes):
   Choose an initial state (s)
   while (goal):
   Choose an action (a) with the maximum Q value
   Determine the next State (s')
   Find total reward -> Immediate Reward + Discounted Reward (Max(Q[s'][a]))
   Update Q matrix
   s <- s'
new episode

SARSA-L

initiate Q matrix
Loop (Episodes):
   choose an initial state (s)
   while (goal):
   Take an action (a) and get next state (s')
   Get a' from s'
   Total Reward -> Immediate reward + Gamma * next Q value - current Q value
   Update Q
   s <- s' a <- a'

Here are the outputs from Q-L and SARSA-L


The above is Q-L


This one is SARSA 

There is a difference between both Q Matrix. I worked on another example by using both Q learning and SARSA. It might appear similar to mouse cliff problem for some readers so bear with me.


The code for Naruto-Q-Learning is below






Here is Hinata trying to find her way to her goal by using SARSA




The code for Hinata SARSA Learning

I used epsilon-greedy method for action prediction. I generated a random floating number between 0 to 1 and set epsilon as 0.2. If the generated number is greater than 0.2 then I select maximum Q valued action (argmax). If the generated number is less than 0.2 then I select the action (permitted)  randomly. With each episode passing by, I decreased the value of epsilon (Epsilon Decay) This will ensure that as the agent learns its way it follows the path rather than continuing exploration. Exploration is maximum at the start of the simulation and gradually decreases as each episode are passed.


This is the decay of the epsilon.

The path followed in the above simulation is 0 - 4 - 8 - 9 - 10 - 11 - 7. Sometimes the agent also follows the same path as followed during Q learning. Well, I am continuing my exploration for the same and will post more details as I learn more about RL.

Till then, bye

SARSA Learning with Python

Hola,
I worked on SARSA algorithm as well as on Q Learning algorithm and both of them had different Q matrix (Duh!) The methodology of both of the algorithms depicts how well one algorithm responds to future awards (which we can say OFF Policy for Q learning) while the other works of the current policy and takes an action before updating Q matrix (ON Policy).

The previous post example of the grid game showed different results when I implemented SARSA. It also involved some repetitive paths whereas Q didn't show any. A single step showed that SARSA followed the agent path and Q followed an optimal agent path.

To implement both ways I remember the way of pseudo code.

QL

initiate Q matrix.
Loop (Episodes):
   Choose an initial state (s)
   while (goal):
   Choose an action (a) with the maximum Q value
   Determine the next State (s')
   Find total reward -> Immediate Reward + Discounted Reward (Max(Q[s'][a]))
   Update Q matrix
   s <- s'
new episode

SARSA-L

initiate Q matrix
Loop (Episodes):
   choose an initial state (s)
   while (goal):
   Take an action (a) and get next state (s')
   Get a' from s'
   Total Reward -> Immediate reward + Gamma * next Q value - current Q value
   Update Q
   s <- s' a <- a'

Here are the outputs from Q-L and SARSA-L


The above is Q-L


This one is SARSA 

There is a difference between both Q Matrix. I worked on another example by using both Q learning and SARSA. It might appear similar to mouse cliff problem for some readers so bear with me.


The code for Naruto-Q-Learning is below






Here is Hinata trying to find her way to her goal by using SARSA




The code for Hinata SARSA Learning

I used epsilon-greedy method for action prediction. I generated a random floating number between 0 to 1 and set epsilon as 0.2. If the generated number is greater than 0.2 then I select maximum Q valued action (argmax). If the generated number is less than 0.2 then I select the action (permitted)  randomly. With each episode passing by, I decreased the value of epsilon (Epsilon Decay) This will ensure that as the agent learns its way it follows the path rather than continuing exploration. Exploration is maximum at the start of the simulation and gradually decreases as each episode are passed.

This is the decay of the epsilon.

The path followed in the above simulation is 0 - 4 - 8 - 9 - 10 - 11 - 7. Sometimes the agent also follows the same path as followed during Q learning. Well, I am continuing my exploration for the same and will post more details as I learn more about RL.

Till then, bye