1answer.
Ask question
Login Signup
Ask question
All categories
  • English
  • Mathematics
  • Social Studies
  • Business
  • History
  • Health
  • Geography
  • Biology
  • Physics
  • Chemistry
  • Computers and Technology
  • Arts
  • World Languages
  • Spanish
  • French
  • German
  • Advanced Placement (AP)
  • SAT
  • Medicine
  • Law
  • Engineering
Ahat [919]
3 years ago
15

Show how am MDP with a reward function R(s, a, s’) can be transformed into a different MDP with reward function R(s, a), such th

at optimal policies in the new MDP correspond exactly to optimal policies in the original MDP
Engineering
1 answer:
sasho [114]3 years ago
7 0

Answer:

U(s) = maxa[R0

(s, a) + γ

1

2

P

pre T

0

(s, a, pre)(maxb[R0

(pre, b) + γ

1

2

P

s

0 T

0

(pre, b, s0

) ∗ U(s

0

))]]

U(s) = maxa[

P

s

0 T(s, a, s0

)(R(s, a, s0

) + γU(s

0

)]

U(s) = R0

(s) + γ

1

2 maxa[

P

post T

0

(s, a, post)(R0

(post) + γ

1

2 maxb[

P

s

0 T

0

(post, b, s0

)U(s

0

))]]

U(s) = maxa[R(s, a) + γ

P

s

0 T(s, a, s0

)U(s

0

)]

Explanation:

MDPs

MDPs can formulated with a reward function R(s), R(s, a) that depends on the action taken or R(s, a, s’) that

depends on the action taken and outcome state.

To Show how am MDP with a reward function R(s, a, s’) can be transformed into a different MDP with reward

function R(s, a), such that optimal policies in the new MDP correspond exactly to optimal policies in the

original MDP.

One solution is to define a ’pre-state’ pre(s, a, s’) for every s, a, s’ such that executing a in s leads not to s’

but to pre(s, a, s’). From the pre-state there is only one action b that always leads to s’. Let the new MDP

have transition T’, reward R’, and discount γ

0

.

T

0

(s, a, pre(s, a, s0

)) = T(s, a, s0

)

T

0

(pre(s, a, s0

), b, s0

) = 1

R0

(s, a) = 0

R0

(pre(s, a, s0

), b) = γ

− 1

2 R(s, a, s0

)

γ

0 = γ

1

2

Then, using pre as shorthand for pre(s, a, s’):

U(s) = maxa[R0

(s, a) + γ

1

2

P

pre T

0

(s, a, pre)(maxb[R0

(pre, b) + γ

1

2

P

s

0 T

0

(pre, b, s0

) ∗ U(s

0

))]]

U(s) = maxa[

P

s

0 T(s, a, s0

)(R(s, a, s0

) + γU(s

0

)]

Now do the same to convert MDPs with R(s, a) into MDPs with R(s).

Similar to part (c), create a state post(s, a) for every s, a such that

T

0

(s, a, post(s, a, s0

)) = 1

T

0

(post(s, a, s0

), b, s0

) = T(s, a, s0

)

R0

(s) = 0

R0

(post(s, a, s0

)) = γ

− 1

2 R(s, a)

γ

0 = γ

1

2

Then, using post as shorthand for post(s, a, s’):

U(s) = R0

(s) + γ

1

2 maxa[

P

post T

0

(s, a, post)(R0

(post) + γ

1

2 maxb[

P

s

0 T

0

(post, b, s0

)U(s

0

))]]

U(s) = maxa[R(s, a) + γ

P

s

0 T(s, a, s0

)U(s

0

)]

3

You might be interested in
Steam at 4 MPa and 350°C is expanded in an adiabatic turbine to 125kPa. What is the isentropic efficiency (percent) of this turb
guajiro [1.7K]

Answer:

\eta_{turbine} = 0.603 = 60.3\%

Explanation:

First, we will find actual properties at given inlet and outlet states by the use of steam tables:

AT INLET:

At 4MPa and 350°C, from the superheated table:

h₁ = 3093.3 KJ/kg

s₁ = 6.5843 KJ/kg.K

AT OUTLET:

At P₂ = 125 KPa and steam is saturated in  vapor state:

h₂ = h_{g\ at\ 125KPa} = 2684.9 KJ/kg

Now, for the isentropic enthalpy, we have:

P₂ = 125 KPa and s₂ = s₁ = 6.5843 KJ/kg.K

Since s₂ is less than s_g and greater than s_f at 125 KPa. Therefore, the steam is in a saturated mixture state. So:

x = \frac{s_2-s_f}{s_{fg}} \\\\x = \frac{6.5843\ KJ/kg.K - 1.3741\ KJ/kg.K}{5.91\ KJ/kg.K}\\\\x = 0.88

Now, we will find h_{2s}(enthalpy at the outlet for the isentropic process):

h_{2s} = h_{f\ at\ 125KPa}+xh_{fg\ at\ 125KPa}\\\\h_{2s} = 444.36\ KJ/kg + (0.88)(2240.6\ KJ/kg)\\h_{2s} = 2416.088\ KJ/kg

Now, the isentropic efficiency of the turbine can be given as follows:

\eta_{turbine} = \frac{h_1-h_2}{h_1-h_{2s}}\\\\\eta_{turbine} = \frac{3093.3\ KJ/kg-2684.9\ KJ/kg}{3093.3\ KJ/kg-2416.088\ KJ/kg}\\\\\eta_{turbine} = \frac{408.4\ KJ/kg}{677.212\ KJ/kg}\\\\\eta_{turbine} = 0.603 = 60.3\%

3 0
3 years ago
Technician A says that a voltage drop of 0.8 volts on the starter ground circuit is within specifications. Technician B says tha
Romashka-Z-Leto [24]

Answer:

Technician A is wrong

Technician B is right

Explanation:

voltage drop of 0.8 volts on the starter ground circuit is not within specifications. Voltage drop should be within the range of 0.2 V to 0.6 V but not more than that.

A spun bearing can seize itself around the crankshaft journal causing it not to move. As the car ignition system is turned on, the stater may draw high current in order to counter this seizure.

8 0
3 years ago
A rich industrialist was found murdered in his house. The police arrived at the scene at 11:00 PM. The temperature of the corpse
d1i1m1o1n [39]

Answer:

The dude was killed around 6:30PM

Explanation:

Newton's law of cooling states:

    T = T_m + (T_0-T_m)e^{kt}

where,

T_0 = initial temp

T_m = temp of room

T = temp after t hours

k = how fast the temp is changing

t = time (hours)

T_0 = 31     because the body was initlally 31ºC when the police found it

T_m = 22   because that was the room temp

T = 30  because the body temp drop to 30ºC after 1 hour

t = 1 because that's the time it took for the body temp to drop to 30ºC

k=???   we don't know k so we must solve for this

rearrange the equation to solve for k

T = T_m + (T_0-T_m)e^{kt}

T - T_m= (T_0-T_m)e^{kt}

\frac{T - T_m}{(T_0-T_m)}= e^{kt}

ln(\frac{T - T_m}{T_0-T_m})=kt

\frac{ln(\frac{T - T_m}{T_0-T_m})}{t}=k

plug in the numbers to solve for k

k = \frac{ln(\frac{T - T_m}{T_0-T_m})}{t}

k = \frac{ln(\frac{30 - 22}{31-22})}{1}

k=ln(\frac{8}{9})

Now that we know the value for k, we can find the moment the murder occur. A crucial information that the question left out is the temperature of a human body when they're still alive. A living human body is about 37ºC. We can use that as out initial temperature to solve this problem because we can assume that the freshly killed body will be around 37ºC.

T_0 = 37     because the body was 37ºC right after being killed

T_m = 22   because that was the room temp

T = 31  because the body temp when the police found it

k=ln(\frac{8}{9})   we solved this earlier

t = ???   we don't know how long it took from the time of the murder to when the police found the body

Rearrange the equation to solve for t

T = T_m + (T_0-T_m)e^{kt}

T - T_m= (T_0-T_m)e^{kt}

\frac{T - T_m}{(T_0-T_m)}= e^{kt}

ln(\frac{T - T_m}{T_0-T_m})=kt

\frac{ln(\frac{T - T_m}{T_0-T_m})}{k}=t

plug in the values

t=\frac{ln(\frac{T - T_m}{T_0-T_m})}{k}

t=\frac{ln(\frac{31 - 22}{37-22})}{ln(8/9)}

t=\frac{ln(3/5)}{ln(8/9)}

t=\frac{ln(3/5)}{ln(8/9)}

t ≈ 4.337 hours from the time the body was killed to when the police found it.

The police found the body at 11:00PM so subtract 4.337 from that.

11 - 4.33 = 6.66 ≈ 6:30PM

7 0
3 years ago
Case Study # 1: Cadbury Crisis Management (Worm Controversy)
Rasek [7]
H is the answer
Step by step
3 0
2 years ago
A small metal particle passes downward through a fluid medium while being subjected to the attraction of a magnetic field such t
bekas [8.4K]

Answer:

a)Δs = 834 mm

b)V=1122 mm/s

a=450\ mm/s^2

Explanation:

Given that

s = 15t^3 - 3t\ mm

a)

When t= 2 s

s = 15t^3 - 3t\ mm

s = 15\times 2^3 - 3\times 2\ mm

s= 114 mm

At t= 4 s

s = 15t^3 - 3t\ mm

s = 15\times 4^3- 3\times 4\ mm

s= 948 mm

So the displacement between 2 s to 4 s

Δs = 948 - 114 mm

Δs = 834 mm

b)

We know that velocity V

V=\dfrac{ds}{dt}

\dfrac{ds}{dt}=45t^2-3

At t=  5 s

V=45t^2-3

V=45\times 5^2-3

V=1122 mm/s

We know that acceleration a

a=\dfrac{d^2s}{dt^2}

\dfrac{d^2s}{dt^2}=90t

a= 90 t

a = 90 x 5

a=450\ mm/s^2

4 0
3 years ago
Other questions:
  • A slight breeze is blowing over the hot tub above and yields a heat transfer coefficient h of 20 W/m2 -K. The air temperature is
    15·1 answer
  • Please answer the following questions.
    9·2 answers
  • In Josiah Johnson Hawes and Albert Sands Southworth, Early Operation under Ether, Massachusetts General Hospital the elevated vi
    11·1 answer
  • Do you understand entropy? Why the concept of entropy is difficult to engineering students?
    11·1 answer
  • The difference in potential energy between an electron at the negative terminal and one at the positive terminal is called the _
    11·1 answer
  • Which of the following is NOT a line used on blueprints?
    13·1 answer
  • Explain moment of inertia<br>​
    9·1 answer
  • a coiled spring is stretched 31.50 cm by a 2.00N weight. How far is it stretched by a 10.00 N weight?
    6·1 answer
  • Identify three questions a patient might ask of the nuclear medicine technologist performing a nuclear medicine exam.
    11·1 answer
  • A common boundary-crossing problem for engineers is when their home country' values come into sharp contrast with the host count
    7·1 answer
Add answer
Login
Not registered? Fast signup
Signup
Login Signup
Ask question!