1answer.
Ask question
Login Signup
Ask question
All categories
  • English
  • Mathematics
  • Social Studies
  • Business
  • History
  • Health
  • Geography
  • Biology
  • Physics
  • Chemistry
  • Computers and Technology
  • Arts
  • World Languages
  • Spanish
  • French
  • German
  • Advanced Placement (AP)
  • SAT
  • Medicine
  • Law
  • Engineering
Ahat [919]
3 years ago
15

Show how am MDP with a reward function R(s, a, s’) can be transformed into a different MDP with reward function R(s, a), such th

at optimal policies in the new MDP correspond exactly to optimal policies in the original MDP
Engineering
1 answer:
sasho [114]3 years ago
7 0

Answer:

U(s) = maxa[R0

(s, a) + γ

1

2

P

pre T

0

(s, a, pre)(maxb[R0

(pre, b) + γ

1

2

P

s

0 T

0

(pre, b, s0

) ∗ U(s

0

))]]

U(s) = maxa[

P

s

0 T(s, a, s0

)(R(s, a, s0

) + γU(s

0

)]

U(s) = R0

(s) + γ

1

2 maxa[

P

post T

0

(s, a, post)(R0

(post) + γ

1

2 maxb[

P

s

0 T

0

(post, b, s0

)U(s

0

))]]

U(s) = maxa[R(s, a) + γ

P

s

0 T(s, a, s0

)U(s

0

)]

Explanation:

MDPs

MDPs can formulated with a reward function R(s), R(s, a) that depends on the action taken or R(s, a, s’) that

depends on the action taken and outcome state.

To Show how am MDP with a reward function R(s, a, s’) can be transformed into a different MDP with reward

function R(s, a), such that optimal policies in the new MDP correspond exactly to optimal policies in the

original MDP.

One solution is to define a ’pre-state’ pre(s, a, s’) for every s, a, s’ such that executing a in s leads not to s’

but to pre(s, a, s’). From the pre-state there is only one action b that always leads to s’. Let the new MDP

have transition T’, reward R’, and discount γ

0

.

T

0

(s, a, pre(s, a, s0

)) = T(s, a, s0

)

T

0

(pre(s, a, s0

), b, s0

) = 1

R0

(s, a) = 0

R0

(pre(s, a, s0

), b) = γ

− 1

2 R(s, a, s0

)

γ

0 = γ

1

2

Then, using pre as shorthand for pre(s, a, s’):

U(s) = maxa[R0

(s, a) + γ

1

2

P

pre T

0

(s, a, pre)(maxb[R0

(pre, b) + γ

1

2

P

s

0 T

0

(pre, b, s0

) ∗ U(s

0

))]]

U(s) = maxa[

P

s

0 T(s, a, s0

)(R(s, a, s0

) + γU(s

0

)]

Now do the same to convert MDPs with R(s, a) into MDPs with R(s).

Similar to part (c), create a state post(s, a) for every s, a such that

T

0

(s, a, post(s, a, s0

)) = 1

T

0

(post(s, a, s0

), b, s0

) = T(s, a, s0

)

R0

(s) = 0

R0

(post(s, a, s0

)) = γ

− 1

2 R(s, a)

γ

0 = γ

1

2

Then, using post as shorthand for post(s, a, s’):

U(s) = R0

(s) + γ

1

2 maxa[

P

post T

0

(s, a, post)(R0

(post) + γ

1

2 maxb[

P

s

0 T

0

(post, b, s0

)U(s

0

))]]

U(s) = maxa[R(s, a) + γ

P

s

0 T(s, a, s0

)U(s

0

)]

3

You might be interested in
Looking at the response vehicles (pictured above), explain two options you have in order to abide by the Move
SIZIF [17.4K]

Answer:

  • slow down
  • change lanes

Explanation:

The "Move Over law" varies by state, but generally requires you vacate the adjacent lane (the one you're currently traveling in), or slow down. Some states have specific speed requirements; others require only "safe and prudent" speed.

The sort of parked vehicles that require you to "move over" also vary by state. It would be "safe and prudent" to move over for <em>any</em> vehicle parked on the shoulder, especially if there are people or animals around those vehicles.

8 0
2 years ago
A _____ satellite system employs many satellites that are spaced so that, from any point on the Earth at any time, at least one
Wittaler [7]

Answer:

d. low earth orbit (LEO)

Explanation:

This type of satellites form a constellation deployed as a series of “necklaces” in such a way that at any time, at least one satellite is visible by a receiver antenna, compensating the movement due to the earth rotation.

Opposite to that, a geostationary satellite is at an altitude that makes it  like a fixed point over the Earth´s equator, rotating synchronously with the Earth, so it is always visible in a given area.

3 0
2 years ago
Given a two-dimensional steady inviscid air flow field with no body forces described by the velocity field given below. Assuming
kolbaska11 [484]

Answer:

the pressure gradient in the x direction = -15.48Pa/m

Explanation:

  • The concept of partial differentiation was used in the determination of the expression for u and v.
  • each is partially differentiated with respect to x and the appropriate substitution was done to get the value of the pressure gradient as shown in the attached file.

4 0
3 years ago
Which of the following ranges depicts the 2% tolerance range to the full 9 digits provided?
Lyrx [107]

Answer:

the only one that meets the requirements is option C .

Explanation:

The tolerance of a quantity is the maximum limit of variation allowed for that quantity.

To find it we must have the value of the magnitude, its closest value is the average value, this value can be given or if it is not known it is calculated with the formula

         x_average = ∑ x_{i} / n

The tolerance or error is the current value over the mean value per 100

         Δx₁ = x₁ / x_average

         tolerance = | 100 -Δx₁  100 |

bars indicate absolute value

let's look for these values ​​for each case

a)

    x_average = (2.1700000+ 2.258571429) / 2

    x_average = 2.2142857145

fluctuation for x₁

        Δx₁ = 2.17000 / 2.2142857145

        Tolerance = 100 - 97.999999991

        Tolerance = 2.000000001%

fluctuation x₂

        Δx₂ = 2.258571429 / 2.2142857145

        Δx2 = 1.02

        tolerance = 100 - 102.000000009

        tolerance 2.000000001%

b)

    x_average = (2.2 + 2.29) / 2

    x_average = 2,245

fluctuation x₁

         Δx₁ = 2.2 / 2.245

         Δx₁ = 0.9799554

         tolerance = 100 - 97,999

         Tolerance = 2.00446%

fluctuation x₂

          Δx₂ = 2.29 / 2.245

          Δx₂ = 1.0200445

          Tolerance = 2.00445%

c)

   x_average = (2.211445 +2.3) / 2

   x_average = 2.2557225

       Δx₁ = 2.211445 / 2.2557225 = 0.9803710

       tolerance = 100 - 98.0371

       tolerance = 1.96%

       Δx₂ = 2.3 / 2.2557225 = 1.024624

       tolerance = 100 -101.962896

       tolerance = 1.96%

d)

   x_average = (2.20144927 + 2.29130435) / 2

   x_average = 2.24637681

       Δx₁ = 2.20144927 / 2.24637681 = 0.98000043

       tolerance = 100 - 98.000043

       tolerance = 2.000002%

       Δx₂ = 2.29130435 / 2.24637681 = 1.0200000017

       tolerance = 2.0000002%

e)

   x_average = (2 +2,3) / 2

   x_average = 2.15

   Δx₁ = 2 / 2.15 = 0.93023

   tolerance = 100 -93.023

   tolerance = 6.98%

   Δx₂ = 2.3 / 2.15 = 1.0698

   tolerance = 6.97%

Let's analyze these results, the result E is clearly not in the requested tolerance range, the other values ​​may be within the desired tolerance range depending on the required precision, for the high precision of this exercise the only one that meets the requirements is option C .

4 0
3 years ago
A well insulated rigid tank contains 4 kg of argon gas at 450 kPa and 30 C. A valve is opened, allowing the argon to escape unti
natima [27]

Answer:

Final mass of Argon=  2.46 kg

Explanation:

Initial mass of Argon gas ( M1 ) = 4 kg

P1 = 450 kPa

T1 = 30°C = 303 K

P2 = 200 kPa

k ( specific heat ratio of Argon ) = 1.667

assuming a reversible adiabatic process

<u>Calculate the value of the M2 </u>

Applying ideal gas equation ( PV = mRT )

P₁V / P₂V = m₁ RT₁ / m₂ RT₂

hence : m2 = P₂T₁ / P₁T₂ * m₁

                   = (200 * 303 ) / (450 * 219 ) * 4

                   = 2.46 kg

<em>Note: Calculation for T2 is attached below</em>

5 0
2 years ago
Other questions:
  • A groundwater contains the following cations (expressed as the cation):
    5·2 answers
  • One cylinder in the diesel engine of a truck has an initial volume of 650 cm3 . Air is admitted to the cylinder at 35 ∘C and a p
    7·1 answer
  • How to identify this fossil
    9·1 answer
  • How may a Professional Engineer provide notice of licensure to clients?
    9·1 answer
  • What are three automotive safety systems
    14·1 answer
  • 3. A steel pipe of outside diameter 20 mm and thickness 3 mm is
    14·1 answer
  • Which of the following would be addressed by an employer completing an EAP template?
    11·1 answer
  • Identify the prefixes used in the International System of
    15·1 answer
  • 8. If you push a 2000 N weight up a ramp with 400 N of force and you raise the weight 1 meter,
    9·1 answer
  • assuming complementary inputs are available, the minimum number of transistors needed to realize a two input xor gate is:
    11·1 answer
Add answer
Login
Not registered? Fast signup
Signup
Login Signup
Ask question!