1answer.
Ask question
Login Signup
Ask question
All categories
  • English
  • Mathematics
  • Social Studies
  • Business
  • History
  • Health
  • Geography
  • Biology
  • Physics
  • Chemistry
  • Computers and Technology
  • Arts
  • World Languages
  • Spanish
  • French
  • German
  • Advanced Placement (AP)
  • SAT
  • Medicine
  • Law
  • Engineering
Ahat [919]
3 years ago
15

Show how am MDP with a reward function R(s, a, s’) can be transformed into a different MDP with reward function R(s, a), such th

at optimal policies in the new MDP correspond exactly to optimal policies in the original MDP
Engineering
1 answer:
sasho [114]3 years ago
7 0

Answer:

U(s) = maxa[R0

(s, a) + γ

1

2

P

pre T

0

(s, a, pre)(maxb[R0

(pre, b) + γ

1

2

P

s

0 T

0

(pre, b, s0

) ∗ U(s

0

))]]

U(s) = maxa[

P

s

0 T(s, a, s0

)(R(s, a, s0

) + γU(s

0

)]

U(s) = R0

(s) + γ

1

2 maxa[

P

post T

0

(s, a, post)(R0

(post) + γ

1

2 maxb[

P

s

0 T

0

(post, b, s0

)U(s

0

))]]

U(s) = maxa[R(s, a) + γ

P

s

0 T(s, a, s0

)U(s

0

)]

Explanation:

MDPs

MDPs can formulated with a reward function R(s), R(s, a) that depends on the action taken or R(s, a, s’) that

depends on the action taken and outcome state.

To Show how am MDP with a reward function R(s, a, s’) can be transformed into a different MDP with reward

function R(s, a), such that optimal policies in the new MDP correspond exactly to optimal policies in the

original MDP.

One solution is to define a ’pre-state’ pre(s, a, s’) for every s, a, s’ such that executing a in s leads not to s’

but to pre(s, a, s’). From the pre-state there is only one action b that always leads to s’. Let the new MDP

have transition T’, reward R’, and discount γ

0

.

T

0

(s, a, pre(s, a, s0

)) = T(s, a, s0

)

T

0

(pre(s, a, s0

), b, s0

) = 1

R0

(s, a) = 0

R0

(pre(s, a, s0

), b) = γ

− 1

2 R(s, a, s0

)

γ

0 = γ

1

2

Then, using pre as shorthand for pre(s, a, s’):

U(s) = maxa[R0

(s, a) + γ

1

2

P

pre T

0

(s, a, pre)(maxb[R0

(pre, b) + γ

1

2

P

s

0 T

0

(pre, b, s0

) ∗ U(s

0

))]]

U(s) = maxa[

P

s

0 T(s, a, s0

)(R(s, a, s0

) + γU(s

0

)]

Now do the same to convert MDPs with R(s, a) into MDPs with R(s).

Similar to part (c), create a state post(s, a) for every s, a such that

T

0

(s, a, post(s, a, s0

)) = 1

T

0

(post(s, a, s0

), b, s0

) = T(s, a, s0

)

R0

(s) = 0

R0

(post(s, a, s0

)) = γ

− 1

2 R(s, a)

γ

0 = γ

1

2

Then, using post as shorthand for post(s, a, s’):

U(s) = R0

(s) + γ

1

2 maxa[

P

post T

0

(s, a, post)(R0

(post) + γ

1

2 maxb[

P

s

0 T

0

(post, b, s0

)U(s

0

))]]

U(s) = maxa[R(s, a) + γ

P

s

0 T(s, a, s0

)U(s

0

)]

3

You might be interested in
Is a gas turbine a heat engine?
Nookie1986 [14]
Gas turbines extract the energy from combustion gas

and heat engines convert thermal and chemical energy to mechanical energy

gas is considered a chemical so i'm pretty sure it is considered one
4 0
3 years ago
Consider a single, porous, spherical, inert mineral particle. The pores inside the particle are filled with liquid water (specie
cluponka [151]

Answer:

Explanation:

kindly check the attachment below for detailed explanations. Thanks

7 0
4 years ago
Select the correct answer.
andrezito [222]

Answer:

it is A. balance

Explanation:

6 0
3 years ago
First drilled to a depth of 1 47/64 how much deeper must it be drilled to reach the depth indicated?
lara [203]

Answer:

  1 13/64

Explanation:

The depth of the hole is the height of the block less the remaining material below the hole:

  depth = 1 3/8 +1 15/16 - 3/8 = (1 3/8 -3/8) +1 15/16 = 2 15/16

We want to find the difference between this depth and the depth of the hole already drilled:

  2 15/16 -1 47/64 = 2 60/64 -1 47/64

  = (2 -1) +(60 -47)/64 = 1 13/64

The hole must be drilled 1 13/64 deeper to match the drawing.

7 0
2 years ago
<img src="https://tex.z-dn.net/?f=%20%5Ctt%20%7BWhat%5C%3A%20%5C%3A%20is%20%5C%3A%20%5C%3A%20inline%5C%3A%20%5C%3A%20function%20
Mila [183]
<h3 /><h3>\huge \bf༆ Answer ༄</h3>

What is a function ?

An expression or rule that defines a relationship between two variables [ one independent and other dependent variable ]

for example ~

y = x² is a function, where y is a dependent variable and x is independent variable.

This function is used to find squares and numbers ~

5 0
3 years ago
Read 2 more answers
Other questions:
  • What is the deflection equation for a simply supported beam with a uniformly distributed load?
    9·1 answer
  • Using Python, have your program do the following, using loops (no recursion)
    5·1 answer
  • Air is pumped from a vacuum chamber until the pressure drops to 3 torr. If the air temperature at the end of the pumping process
    14·1 answer
  • One kilogram of water fills a 150 L rigid container at an initial pressure of 2MPa. The container is cooled to 40 oC. Find the i
    7·2 answers
  • An aircraft is flying at 300 mph true airspeed has a 50 mph tailwind. What is its ground speed?
    5·1 answer
  • When the psychologist simply records the relationship between two variables...
    8·1 answer
  • The difference between a thermocouple and a thermistor is the A. technology inside. B. thermocouple measures temperatures at the
    13·1 answer
  • A cylindrical 1045 steel bar is subjected to repeated compression-tension stress cycling along its axis. If the load amplitude i
    10·1 answer
  • When a retaining structure moves towards the soil backfill, the stress condition is called:__________.
    8·1 answer
  • A. Briefly describe the microstructural difference between spheroidite and tempered martensite. Explain why tempered martensite
    14·1 answer
Add answer
Login
Not registered? Fast signup
Signup
Login Signup
Ask question!