What are examples of non-Markovian processes

Real-life examples of Markov decision-making processes


I've seen a lot of tutorial videos and they look the same. This example: https://www.youtube.com/watch?v=ip4iSMRW5X4

They explain states, actions, and probabilities that are okay. The person explains ok, but I just can't get a handle on what it would be used for in real life. So far I haven't found any lists. The most common one I see is chess.

Can you use it to predict things? If so, what kinds of things? Can it find patterns in an infinite amount of data? What can this algorithm do for me?

Bonus: It also feels like MDP is about getting from one state to another, right?

Reply:


A Markovian decision-making process, in fact, has to do with the transition from one state to another and is mainly used for planning and decision-making.

Theory

Just quickly repeating the theory is an MDP:

MDP = ⟨S, A, T, R, γ⟩

SEINTPr (s ′ | s, a) Rγ

To be able to use it, you must have predefined the following:

  1. conditions : These can refer, for example, to grid maps in robotics, or to, for example door open and door closed .
  2. Actions : A fixed set of actions, e.g. B. for a robot to the north, south, east etc. or to open and close a door.
  3. Transition probabilities : The probability of going from one state to another with a certain action. For example, what is the likelihood of an open door when the action is open . In a perfect world, the later version might be 1.0, but if it's a robot it might have failed to properly handle the doorknob. Another example in the case of a moving robot would be the action after north which in most cases would bring him to the grid cell north of it, but in some cases could move too far and reach the next cell, for example.
  4. Rewards : These serve as a planning aid. In the case of the grid example, we might want to go to a specific cell and the reward will be higher as we get closer. In the case of the door example, an open door can offer a high reward.

Once the MDP is defined, a Directive be learned by performing a value or policy iteration that calculates the expected reward for each status. The Directive then gives per state the best Action (taking into account the MDP model).

In summary, an MDP is useful when you want to plan an efficient sequence of actions where your actions may not always be 100% effective.

Your questions

Can you use it to predict things?

I would call it planning, not predicting regression.

If so, what kinds of things?

Please refer Examples .

Can it find patterns among infinite amounts of data?

| S |

What can this algorithm do for me?

Please refer Examples .

Application examples for MDPs

And there are a few other models. An even more interesting model is the partially observable Markov decision process, in which states are not fully visible and instead uses observations to get an idea of ​​the current state, which is outside the scope of this question.

additional information

It is a stochastic process Markovian (or has the Markov property) if the conditional probability distribution of future states depends only on the current state and not on previous states (ie not on a list of previous states).




We use cookies and other tracking technologies to improve your browsing experience on our website, to show you personalized content and targeted ads, to analyze our website traffic, and to understand where our visitors are coming from.

By continuing, you consent to our use of cookies and other tracking technologies and affirm you're at least 16 years old or have consent from a parent or guardian.

You can read details in our Cookie policy and Privacy policy.