1answer.
Ask question
Login Signup
Ask question
All categories
  • English
  • Mathematics
  • Social Studies
  • Business
  • History
  • Health
  • Geography
  • Biology
  • Physics
  • Chemistry
  • Computers and Technology
  • Arts
  • World Languages
  • Spanish
  • French
  • German
  • Advanced Placement (AP)
  • SAT
  • Medicine
  • Law
  • Engineering
Ahat [919]
3 years ago
15

Show how am MDP with a reward function R(s, a, s’) can be transformed into a different MDP with reward function R(s, a), such th

at optimal policies in the new MDP correspond exactly to optimal policies in the original MDP
Engineering
1 answer:
sasho [114]3 years ago
7 0

Answer:

U(s) = maxa[R0

(s, a) + γ

1

2

P

pre T

0

(s, a, pre)(maxb[R0

(pre, b) + γ

1

2

P

s

0 T

0

(pre, b, s0

) ∗ U(s

0

))]]

U(s) = maxa[

P

s

0 T(s, a, s0

)(R(s, a, s0

) + γU(s

0

)]

U(s) = R0

(s) + γ

1

2 maxa[

P

post T

0

(s, a, post)(R0

(post) + γ

1

2 maxb[

P

s

0 T

0

(post, b, s0

)U(s

0

))]]

U(s) = maxa[R(s, a) + γ

P

s

0 T(s, a, s0

)U(s

0

)]

Explanation:

MDPs

MDPs can formulated with a reward function R(s), R(s, a) that depends on the action taken or R(s, a, s’) that

depends on the action taken and outcome state.

To Show how am MDP with a reward function R(s, a, s’) can be transformed into a different MDP with reward

function R(s, a), such that optimal policies in the new MDP correspond exactly to optimal policies in the

original MDP.

One solution is to define a ’pre-state’ pre(s, a, s’) for every s, a, s’ such that executing a in s leads not to s’

but to pre(s, a, s’). From the pre-state there is only one action b that always leads to s’. Let the new MDP

have transition T’, reward R’, and discount γ

0

.

T

0

(s, a, pre(s, a, s0

)) = T(s, a, s0

)

T

0

(pre(s, a, s0

), b, s0

) = 1

R0

(s, a) = 0

R0

(pre(s, a, s0

), b) = γ

− 1

2 R(s, a, s0

)

γ

0 = γ

1

2

Then, using pre as shorthand for pre(s, a, s’):

U(s) = maxa[R0

(s, a) + γ

1

2

P

pre T

0

(s, a, pre)(maxb[R0

(pre, b) + γ

1

2

P

s

0 T

0

(pre, b, s0

) ∗ U(s

0

))]]

U(s) = maxa[

P

s

0 T(s, a, s0

)(R(s, a, s0

) + γU(s

0

)]

Now do the same to convert MDPs with R(s, a) into MDPs with R(s).

Similar to part (c), create a state post(s, a) for every s, a such that

T

0

(s, a, post(s, a, s0

)) = 1

T

0

(post(s, a, s0

), b, s0

) = T(s, a, s0

)

R0

(s) = 0

R0

(post(s, a, s0

)) = γ

− 1

2 R(s, a)

γ

0 = γ

1

2

Then, using post as shorthand for post(s, a, s’):

U(s) = R0

(s) + γ

1

2 maxa[

P

post T

0

(s, a, post)(R0

(post) + γ

1

2 maxb[

P

s

0 T

0

(post, b, s0

)U(s

0

))]]

U(s) = maxa[R(s, a) + γ

P

s

0 T(s, a, s0

)U(s

0

)]

3

You might be interested in
The primary difference between LEED Certification for Buildings and LEED Professional Certifications is that professionals can c
TEA [102]

Answer:false

Explanation:

8 0
3 years ago
Yo can someone find me an e-boy?
Vera_Pavlovna [14]
I got a friend how old are you and are you ok dating a bi guy
4 0
4 years ago
A steam power plant operates on a simple ideal Rankine cycle between the pressure limits of 3000 kPa and 25 kPa. The temperature
mafiozo [28]

Answer:

a)31%

b)34MW

Explanation:

A rankine cycle is a generation cycle using water as a working fluid, when heat enters the boiler the water undergoes a series of changes in state and energy until generating power through the turbine.

This cycle is composed of four main components, the boiler, the pump, the turbine and the condenser as shown in the attached image

To solve any problem regarding the rankine cycle, enthalpies in all states must be calculated using the thermodynamic tables and taking into account the following.

• The pressure of state 1 and 4 are equal

• The pressure of state 2 and 3 are equal

• State 1 is superheated steam

• State 2 is in saturation state

• State 3 is saturated liquid at the lowest pressure

• State 4 is equal to state 3 because the work of the pump is negligible.

Once all enthalpies are found, the following equations are used using the first law of thermodynamics

Wout = m (h1-h2)

Qin = m (h1-h4)

Win = m (h4-h3)

Qout = m (h2-h1)

The efficiency is calculated as the power obtained on the heat that enters

Efficiency = Wout / Qin

Efficiency = (h1-h2) / (h1-h4)

For this problem, we will first find the enthalpies in all states

h1=3231kJ/Kg

h2=2310kJ/Kg

h3=h4=272kJ/Kg

A) using the eficiency ecuation

Efficiency = (h1-h2) / (h1-h4)

Efficiency =(3231-2310)/(3231-272)=0.31=<u>31%</u>

b)using ecuation for Wout

Wout = m (h1-h2)

Wout=37(3231-2310)=34077KW=<u>34.077MW</u>

6 0
4 years ago
A laboratory in the Y building keep a vacuum pressure of 0.1 kPa abs. What is the net force acting on the door considering the a
seropon [69]

Answer:

net force acting on the floor is 100 kN

Explanation:

Given data:

P_{vaccum} = 0.1 kPa

P_{atm} = 101.325 kPa

dimension of floor = 2 m \times 0.5 m

we know that

Net force can be calculated as follow

f_{net} = P_{vaccum} \times area

f_{net} = 0.1\times 10^3 \times 2\times 0.5

f_{net} = 0.1\times 10^3 \times 1

f_{net} = 100 kN

Therefore net force acting on the floor is 100 kN

7 0
4 years ago
Compare and contrast an electric vehicle with one powered by a fuel cell.
zmey [24]

Answer:

both technologies offer a cleaner alternative to internal combustion engines, and both use electric motors powered by electrochemical device.

3 0
3 years ago
Other questions:
  • The human circulatory system consists of a complex branching pipe network ranging in diameter from
    10·1 answer
  • Small droplets of carbon tetrachloride at 68 °F are formed with a spray nozzle. If the average diameter of the droplets is 200 u
    10·1 answer
  • A gas in a piston–cylinder assembly undergoes a compression process for which the relation between pressure and volume is given
    10·1 answer
  • Two plates are separated by a 1/4 in space. The lower plate is stationary; the upper plate moves at 10 ft/s. Oil (viscosity of 2
    10·1 answer
  • Consider a circular array implementation of a queue contained using an array of size 6. I repeat x times (where x is 1 + the thi
    13·1 answer
  • Please answer ASAP!!
    8·2 answers
  • can you give me a paragraph on What is Electrical Engineering and why is it considered the largest branch of Engineering? I real
    15·1 answer
  • I need this answer please help
    6·1 answer
  • A 60-Hz 3-phase induction motor is required to drive a load at approximately 850 rpm. How many poles should the motor have
    9·1 answer
  • (2pts) In _______________, the pulse amplitude is made proportional to the amplitude of the modulating signal. pulse-amplitude m
    7·1 answer
Add answer
Login
Not registered? Fast signup
Signup
Login Signup
Ask question!