1answer.
Ask question
Login Signup
Ask question
All categories
  • English
  • Mathematics
  • Social Studies
  • Business
  • History
  • Health
  • Geography
  • Biology
  • Physics
  • Chemistry
  • Computers and Technology
  • Arts
  • World Languages
  • Spanish
  • French
  • German
  • Advanced Placement (AP)
  • SAT
  • Medicine
  • Law
  • Engineering
Ahat [919]
3 years ago
15

Show how am MDP with a reward function R(s, a, s’) can be transformed into a different MDP with reward function R(s, a), such th

at optimal policies in the new MDP correspond exactly to optimal policies in the original MDP
Engineering
1 answer:
sasho [114]3 years ago
7 0

Answer:

U(s) = maxa[R0

(s, a) + γ

1

2

P

pre T

0

(s, a, pre)(maxb[R0

(pre, b) + γ

1

2

P

s

0 T

0

(pre, b, s0

) ∗ U(s

0

))]]

U(s) = maxa[

P

s

0 T(s, a, s0

)(R(s, a, s0

) + γU(s

0

)]

U(s) = R0

(s) + γ

1

2 maxa[

P

post T

0

(s, a, post)(R0

(post) + γ

1

2 maxb[

P

s

0 T

0

(post, b, s0

)U(s

0

))]]

U(s) = maxa[R(s, a) + γ

P

s

0 T(s, a, s0

)U(s

0

)]

Explanation:

MDPs

MDPs can formulated with a reward function R(s), R(s, a) that depends on the action taken or R(s, a, s’) that

depends on the action taken and outcome state.

To Show how am MDP with a reward function R(s, a, s’) can be transformed into a different MDP with reward

function R(s, a), such that optimal policies in the new MDP correspond exactly to optimal policies in the

original MDP.

One solution is to define a ’pre-state’ pre(s, a, s’) for every s, a, s’ such that executing a in s leads not to s’

but to pre(s, a, s’). From the pre-state there is only one action b that always leads to s’. Let the new MDP

have transition T’, reward R’, and discount γ

0

.

T

0

(s, a, pre(s, a, s0

)) = T(s, a, s0

)

T

0

(pre(s, a, s0

), b, s0

) = 1

R0

(s, a) = 0

R0

(pre(s, a, s0

), b) = γ

− 1

2 R(s, a, s0

)

γ

0 = γ

1

2

Then, using pre as shorthand for pre(s, a, s’):

U(s) = maxa[R0

(s, a) + γ

1

2

P

pre T

0

(s, a, pre)(maxb[R0

(pre, b) + γ

1

2

P

s

0 T

0

(pre, b, s0

) ∗ U(s

0

))]]

U(s) = maxa[

P

s

0 T(s, a, s0

)(R(s, a, s0

) + γU(s

0

)]

Now do the same to convert MDPs with R(s, a) into MDPs with R(s).

Similar to part (c), create a state post(s, a) for every s, a such that

T

0

(s, a, post(s, a, s0

)) = 1

T

0

(post(s, a, s0

), b, s0

) = T(s, a, s0

)

R0

(s) = 0

R0

(post(s, a, s0

)) = γ

− 1

2 R(s, a)

γ

0 = γ

1

2

Then, using post as shorthand for post(s, a, s’):

U(s) = R0

(s) + γ

1

2 maxa[

P

post T

0

(s, a, post)(R0

(post) + γ

1

2 maxb[

P

s

0 T

0

(post, b, s0

)U(s

0

))]]

U(s) = maxa[R(s, a) + γ

P

s

0 T(s, a, s0

)U(s

0

)]

3

You might be interested in
Why not just put all the set up steps within each step? it is because we want to keep our code __ ? (3 letters)
andrezito [222]

Why not simply include each step's setup instructions within it? is it due to our desire to maintain DRY (Don't Repeat Yourself) code?

<h3>What is the purpose of unit testing?</h3>

Program testing is known as "unit testing" involves testing individual software components. When developing an application, unit testing is done on the software product. An individual component could be a technique or a specific function.

Unit testing's primary goal is to separate written code for testing to see if it functions as intended. Unit testing is a crucial stage in the development process because, when done properly, it can aid in finding early code issues that could be more challenging to identify in subsequent testing phases.

The core of the testing process consists of unit testing and functional testing. The primary distinction between the two is that during the development cycle, the developer conducts unit testing. The tester does functional testing at the system testing level.

To learn more about unit testing refers to:

brainly.com/question/22900395

#SPJ4

8 0
1 year ago
2. One of the many methods used for drying air is to cool the air below the dew point so that condensation or freezing of the mo
REY [17]

Answer:

0.5°c

Explanation:

Humidity ratio by mass can be expressed as

the ratio between the actual mass of water vapor present in moist air - to the mass of the dry air

Humidity ratio is normally expressed in kilograms (or pounds) of water vapor per kilogram (or pound) of dry air.

Humidity ratio expressed by mass:

x = mw / ma                                  (1)

where

x = humidity ratio (kgwater/kgdry_air, lbwater/lbdry_air)

mw = mass of water vapor (kg, lb)

ma = mass of dry air (kg, lb)

It can be as:

x = 0.005 (100) / [(100 - 100)]

x = 0.005 x 100 / (100 - 100)

x = 0.005 x 100 / 0

x = 0.5°c

So the temperature to which atmospheric air must be cooled in order to have humidity ratio of 0.005 lb/lb is 0.5°c

6 0
3 years ago
Find: factor of safety (n)for point A and B by using both MSS and DE (you can neglect shear stress due to shear force and also n
gladu [14]

Answer:

Hello your question is incomplete attached below is the complete question

Answer : Factor of safety for point A :

i) using MSS

(Fos)MSS =  3.22

ii) using DE

(Fos)DE = 3.27

Factor of safety for point B

i) using MSS

(Fos)MSS =  3.04

ii) using DE

(Fos)DE = 3.604

Explanation:

Factor of safety for point A :

i) using MSS

(Fos)MSS =  3.22

ii) using DE

(Fos)DE = 3.27

Factor of safety for point B

i) using MSS

(Fos)MSS =  3.04

ii) using DE

(Fos)DE = 3.604

Attached below is the detailed solution

8 0
3 years ago
Air at atmospheric pressure and at 300K flows with a velocity of 1.5m/s over a flat plate. The transition from laminar to turbul
Savatey [412]

Answer:3.47 m

Explanation:

Given

Temperature(T)=300 K

velocity(v)=1.5 m/s

At 300 K

\mu =1.846 \times 10^{-5} Pa-s

\rho =1.77 kg/m^3

And reynold's number is given by

Re.=\frac{\rho v\time x}{\mu }

5\times 10^5=\frac{1.77\times 1.5\times x}{1.846\times 10^{-5}}

x=\frac{5\times 10^5\times 1.846\times 10^{-5}}{1.77\times 1.5}

x=3.47 m

5 0
3 years ago
Why is it important to cut all the way through an electrical wire on the first try?
lbvjy [14]

Answer:

I always thought it was so that the older wire could not have a problem and have another electrician must come back and fix it.

Explanation:

6 0
3 years ago
Other questions:
  • g The pump inlet is located 1 m above an arbitrary datum. The pressure and velocity at the inlet are 100 kPa and 2 m/s, respecti
    8·1 answer
  • The design for a new cementless hip implant is to be studied using an instrumented implant and a fixed simulated femur.
    11·1 answer
  • A bar of steel has the minimum properties Se = 40 kpsi, S = 60 kpsi, and S-80 kpsi. The bar is subjected to a steady torsional s
    6·1 answer
  • A soil element is subjected to a minor principle stress of 50 kPa on a plane rotated 20 ° counterclockwise from vertical. If the
    10·1 answer
  • I really need a good grade please help
    13·1 answer
  • The heat transfer surface area of a fin is equal to the sum of all surfaces of the fin exposed to the surrounding medium, includ
    6·1 answer
  • Shelly cashman word 2016 module 2 - project a pdf - does anyone have the fished paper
    11·1 answer
  • 6 A square silicon chip (k 150 W/m K) is of width w 5 mm on a side and of thickness t 1 mm. The chip is mounted in a substrate s
    9·1 answer
  • The diameter of a cylindrical water tank is Do and its height is H. The tank is filled with water, which is open to the atmosphe
    11·1 answer
  • Describe how to mix and apply body filler?
    13·1 answer
Add answer
Login
Not registered? Fast signup
Signup
Login Signup
Ask question!