1answer.
Ask question
Login Signup
Ask question
All categories
  • English
  • Mathematics
  • Social Studies
  • Business
  • History
  • Health
  • Geography
  • Biology
  • Physics
  • Chemistry
  • Computers and Technology
  • Arts
  • World Languages
  • Spanish
  • French
  • German
  • Advanced Placement (AP)
  • SAT
  • Medicine
  • Law
  • Engineering
Ahat [919]
3 years ago
15

Show how am MDP with a reward function R(s, a, s’) can be transformed into a different MDP with reward function R(s, a), such th

at optimal policies in the new MDP correspond exactly to optimal policies in the original MDP
Engineering
1 answer:
sasho [114]3 years ago
7 0

Answer:

U(s) = maxa[R0

(s, a) + γ

1

2

P

pre T

0

(s, a, pre)(maxb[R0

(pre, b) + γ

1

2

P

s

0 T

0

(pre, b, s0

) ∗ U(s

0

))]]

U(s) = maxa[

P

s

0 T(s, a, s0

)(R(s, a, s0

) + γU(s

0

)]

U(s) = R0

(s) + γ

1

2 maxa[

P

post T

0

(s, a, post)(R0

(post) + γ

1

2 maxb[

P

s

0 T

0

(post, b, s0

)U(s

0

))]]

U(s) = maxa[R(s, a) + γ

P

s

0 T(s, a, s0

)U(s

0

)]

Explanation:

MDPs

MDPs can formulated with a reward function R(s), R(s, a) that depends on the action taken or R(s, a, s’) that

depends on the action taken and outcome state.

To Show how am MDP with a reward function R(s, a, s’) can be transformed into a different MDP with reward

function R(s, a), such that optimal policies in the new MDP correspond exactly to optimal policies in the

original MDP.

One solution is to define a ’pre-state’ pre(s, a, s’) for every s, a, s’ such that executing a in s leads not to s’

but to pre(s, a, s’). From the pre-state there is only one action b that always leads to s’. Let the new MDP

have transition T’, reward R’, and discount γ

0

.

T

0

(s, a, pre(s, a, s0

)) = T(s, a, s0

)

T

0

(pre(s, a, s0

), b, s0

) = 1

R0

(s, a) = 0

R0

(pre(s, a, s0

), b) = γ

− 1

2 R(s, a, s0

)

γ

0 = γ

1

2

Then, using pre as shorthand for pre(s, a, s’):

U(s) = maxa[R0

(s, a) + γ

1

2

P

pre T

0

(s, a, pre)(maxb[R0

(pre, b) + γ

1

2

P

s

0 T

0

(pre, b, s0

) ∗ U(s

0

))]]

U(s) = maxa[

P

s

0 T(s, a, s0

)(R(s, a, s0

) + γU(s

0

)]

Now do the same to convert MDPs with R(s, a) into MDPs with R(s).

Similar to part (c), create a state post(s, a) for every s, a such that

T

0

(s, a, post(s, a, s0

)) = 1

T

0

(post(s, a, s0

), b, s0

) = T(s, a, s0

)

R0

(s) = 0

R0

(post(s, a, s0

)) = γ

− 1

2 R(s, a)

γ

0 = γ

1

2

Then, using post as shorthand for post(s, a, s’):

U(s) = R0

(s) + γ

1

2 maxa[

P

post T

0

(s, a, post)(R0

(post) + γ

1

2 maxb[

P

s

0 T

0

(post, b, s0

)U(s

0

))]]

U(s) = maxa[R(s, a) + γ

P

s

0 T(s, a, s0

)U(s

0

)]

3

You might be interested in
If you need to write a function that will compute the cost of some candy, where each piece costs 25 cents, which would be an app
masya89 [10]
The best answer would be

D. Int calculateCost(int count);
6 0
3 years ago
What are the ropes of secretaries and treasures in a meeting​
Nastasia [14]

Answer:

To prepare and issue notices and agendas of all meetings in consultation with the chairman, and to ensure that any background papers are available well before the meeting. To attend and take the minutes of every committee meeting. To circulate minutes to all committee members, and to conduct the correspondence

Explanation:

I think you want to say roles.

4 0
3 years ago
A packet weighs 40kg in air but when it is totally submerged into a 1mx1m square tank the weight of the packet is only 18kg. How
Irina18 [472]

Answer:

water  rise = 22 mm

Explanation:

weight of packet IN AIR = 40 *9.81 =392.4 N

weight of packet  IN WATER= 18 *9.81 =176.58 N

by Archimedi's principle

difference in weight = weight of displaced water

w_a - w_w = \rho_w v_d g

392.4 - 176.58 = 1000* v_d* 9.81

v_d = 0.022 m^3

v_d = A*H_rise

0.022 =1*H_rise

H_rise = 0.022 m = 22 mm

water  rise = 22 mm

5 0
3 years ago
A westbound section of freeway currently has three 12-ft wide lanes, a 6-ft right shoulder, and no ramps within 3 miles upstream
iogann1982 [59]

Answer:

The level of the service is loss and the density is 34.2248 pc/mi/ln

Explanation:

the solution is attached in the Word file

Download docx
6 0
3 years ago
Hello, I want to introduce you to our hosting - VPSDOM
Solnce55 [7]

Answer:

pogchamp

Explanation:

sussy balls

4 0
2 years ago
Other questions:
  • In C++ the declaration of floating point variables starts with the type name float or double, followed by the name of the variab
    14·1 answer
  • In order to test the feasibility of drying a certain foodstuff, drying data were obtained in a tray dryer with air flow over the
    14·1 answer
  • Technician A says a basic circuit problem can be caused by something in the circuit that increases voltage. Technician B says a
    8·1 answer
  • What type of companies would employ in mechanics engineering​
    8·1 answer
  • Consider an aircraft powered by a turbojet engine that has a pressure ratio of 9. The aircraft is stationary on the ground, held
    9·1 answer
  • Air enters the combustor of a jet engine at p1=10 atm, T1=1000°R, and M1=0.2. Fuel is injected and burned, with a fuel/air mass
    7·1 answer
  • Tech A says that in some cases, the electronic brake control module can be programmed with a new tire size to restore proper ele
    7·1 answer
  • Write a class named FBoard for playing a game, where player x is trying to get her piece to row 7 and player o is trying to make
    7·1 answer
  • Electricians will sometimes call ______ "disconnects" or a "disconnecting means."
    15·1 answer
  • Technician A says that synthetic blend oil has the same service life as that of full synthetic oils. Technician B says that conv
    6·1 answer
Add answer
Login
Not registered? Fast signup
Signup
Login Signup
Ask question!