1answer.
Ask question
Login Signup
Ask question
All categories
  • English
  • Mathematics
  • Social Studies
  • Business
  • History
  • Health
  • Geography
  • Biology
  • Physics
  • Chemistry
  • Computers and Technology
  • Arts
  • World Languages
  • Spanish
  • French
  • German
  • Advanced Placement (AP)
  • SAT
  • Medicine
  • Law
  • Engineering
Ahat [919]
3 years ago
15

Show how am MDP with a reward function R(s, a, s’) can be transformed into a different MDP with reward function R(s, a), such th

at optimal policies in the new MDP correspond exactly to optimal policies in the original MDP
Engineering
1 answer:
sasho [114]3 years ago
7 0

Answer:

U(s) = maxa[R0

(s, a) + γ

1

2

P

pre T

0

(s, a, pre)(maxb[R0

(pre, b) + γ

1

2

P

s

0 T

0

(pre, b, s0

) ∗ U(s

0

))]]

U(s) = maxa[

P

s

0 T(s, a, s0

)(R(s, a, s0

) + γU(s

0

)]

U(s) = R0

(s) + γ

1

2 maxa[

P

post T

0

(s, a, post)(R0

(post) + γ

1

2 maxb[

P

s

0 T

0

(post, b, s0

)U(s

0

))]]

U(s) = maxa[R(s, a) + γ

P

s

0 T(s, a, s0

)U(s

0

)]

Explanation:

MDPs

MDPs can formulated with a reward function R(s), R(s, a) that depends on the action taken or R(s, a, s’) that

depends on the action taken and outcome state.

To Show how am MDP with a reward function R(s, a, s’) can be transformed into a different MDP with reward

function R(s, a), such that optimal policies in the new MDP correspond exactly to optimal policies in the

original MDP.

One solution is to define a ’pre-state’ pre(s, a, s’) for every s, a, s’ such that executing a in s leads not to s’

but to pre(s, a, s’). From the pre-state there is only one action b that always leads to s’. Let the new MDP

have transition T’, reward R’, and discount γ

0

.

T

0

(s, a, pre(s, a, s0

)) = T(s, a, s0

)

T

0

(pre(s, a, s0

), b, s0

) = 1

R0

(s, a) = 0

R0

(pre(s, a, s0

), b) = γ

− 1

2 R(s, a, s0

)

γ

0 = γ

1

2

Then, using pre as shorthand for pre(s, a, s’):

U(s) = maxa[R0

(s, a) + γ

1

2

P

pre T

0

(s, a, pre)(maxb[R0

(pre, b) + γ

1

2

P

s

0 T

0

(pre, b, s0

) ∗ U(s

0

))]]

U(s) = maxa[

P

s

0 T(s, a, s0

)(R(s, a, s0

) + γU(s

0

)]

Now do the same to convert MDPs with R(s, a) into MDPs with R(s).

Similar to part (c), create a state post(s, a) for every s, a such that

T

0

(s, a, post(s, a, s0

)) = 1

T

0

(post(s, a, s0

), b, s0

) = T(s, a, s0

)

R0

(s) = 0

R0

(post(s, a, s0

)) = γ

− 1

2 R(s, a)

γ

0 = γ

1

2

Then, using post as shorthand for post(s, a, s’):

U(s) = R0

(s) + γ

1

2 maxa[

P

post T

0

(s, a, post)(R0

(post) + γ

1

2 maxb[

P

s

0 T

0

(post, b, s0

)U(s

0

))]]

U(s) = maxa[R(s, a) + γ

P

s

0 T(s, a, s0

)U(s

0

)]

3

You might be interested in
The mechanical advantage of a screw is always ____________________ than/to 1. Question 5 options: less, greater, equal, none of
torisob [31]

Answer:well u can use to make a shelter but that's all I can think of ??

Explanation:

3 0
3 years ago
When handling chemicals and solvents, technicians are recommended to
Luda [366]

Answer:

  1. To wear PPE
  2. Have prior knowledge of explosive levels and elemental properties
  3. Know procedure to eliminate any threat
7 0
3 years ago
Read 2 more answers
W10L1-Show It: Pythagorean Theorem<br> Calculate the total material in the picture.<br> 4<br> 3
Fantom [35]

Answer:

35

Explanation: I really dont even know, I just used up all my tries on it and got it wrong on every other thing i chose. So it's 35 i believe cause its the only answer i didnt choose.

7 0
3 years ago
A video inspection snake is use
LekaFEV [45]

Answer:

very good thx

Explanation:

5 0
3 years ago
Air exits a compressor operating at steady-state, steady-flow conditions at 150 oC, 825 kPa, with a velocity of 10 m/s through a
ioda

Answer:

a) Qe = 0.01963 m^3 / s , mass flow rate m^ = 0.1334 kg/s

b) Inlet cross sectional area = Ai = 0.11217 m^2 , Qi = 0.11217 m^3 / s    

Explanation:

Given:-

- The compressor exit conditions are given as follows:

                  Pressure ( Pe ) = 825 KPa

                  Temperature ( Te ) = 150°C

                  Velocity ( Ve ) = 10 m/s

                  Diameter ( de ) = 5.0 cm

Solution:-

- Define inlet parameters:

                  Pressure = Pi = 100 KPa

                  Temperature = Ti = 20.0

                  Velocity = Vi = 1.0 m/s

                  Area = Ai

- From definition the volumetric flow rate at outlet ( Qe ) is determined by the following equation:

                   Qe = Ae*Ve

Where,

           Ae: The exit cross sectional area

                   Ae = π*de^2 / 4

Therefore,

                  Qe = Ve*π*de^2 / 4

                  Qe = 10*π*0.05^2 / 4

                  Qe = 0.01963 m^3 / s

 

- To determine the mass flow rate ( m^ ) through the compressor we need to determine the density of air at exit using exit conditions.

- We will assume air to be an ideal gas. Thus using the ideal gas state equation we have:

                   Pe / ρe = R*Te  

Where,

           Te: The absolute temperature at exit

           ρe: The density of air at exit

           R: the specific gas constant for air = 0.287 KJ /kg.K

             

                ρe = Pe / (R*Te)

                ρe = 825 / (0.287*( 273 + 150 ) )

                ρe = 6.79566 kg/m^3

- The mass flow rate ( m^ ) is given:

               m^ = ρe*Qe

                     = ( 6.79566 )*( 0.01963 )

                     = 0.1334 kg/s

- We will use the "continuity equation " for steady state flow inside the compressor i.e mass flow rate remains constant:

              m^ = ρe*Ae*Ve = ρi*Ai*Vi

- Density of air at inlet using inlet conditions. Again, using the ideal gas state equation:

               Pi / ρi = R*Ti  

Where,

           Ti: The absolute temperature at inlet

           ρi: The density of air at inlet

           R: the specific gas constant for air = 0.287 KJ /kg.K

             

                ρi = Pi / (R*Ti)

                ρi = 100 / (0.287*( 273 + 20 ) )

                ρi = 1.18918 kg/m^3

Using continuity expression:

               Ai = m^ / ρi*Vi

               Ai = 0.1334 / 1.18918*1

               Ai = 0.11217 m^2          

- From definition the volumetric flow rate at inlet ( Qi ) is determined by the following equation:

                   Qi = Ai*Vi

Where,

           Ai: The inlet cross sectional area

                  Qi = 0.11217*1

                  Qi = 0.11217 m^3 / s    

- The equations that will help us with required plots are:

Inlet cross section area ( Ai )

                Ai = m^ / ρi*Vi  

                Ai = 0.1334 / 1.18918*Vi

                Ai ( V ) = 0.11217 / Vi   .... Eq 1

Inlet flow rate ( Qi ):

                Qi = 0.11217 m^3 / s ... constant  Eq 2

               

6 0
3 years ago
Other questions:
  • Thermal energy generated by the electrical resistance of a 5-mm-diameter and 4-m-long bare cable is dissipated to the surroundin
    12·1 answer
  • What is the entropy of a closed system in which 25 distinguishable grains of sand are distributed among 1000 distinguishable equ
    5·2 answers
  • In which forms do MOST of the Sun's energy reach Earth's surface?
    15·1 answer
  • A pressure gage at the inlet to a gas compressor indicates that the gage pressure is 40.0 kPa. Atmospheric pressure is 1.01 bar.
    5·1 answer
  • A hydraulic cylinder has a 125-mm diameter piston with hydraulic fluid inside the cylinder and an ambient pressure of 1 bar. Ass
    8·1 answer
  • Choose the three questions that an engineer should ask himself or herself when identifying the need of a problem.
    6·2 answers
  • The forklift exiting an aisle in a warehouse has the right of way?
    15·1 answer
  • A plant has ten machines and currently operates two 8-hr shift per day, 5 days per week, 50 weeks per year. the ten machines pro
    6·1 answer
  • Javier’s class visited a power plant near his city, and they learned how it produced electricity. What does this form of power d
    7·1 answer
  • What are the reasons why fine grained of alkali igneous rocks can not be used in cement
    11·1 answer
Add answer
Login
Not registered? Fast signup
Signup
Login Signup
Ask question!