1answer.
Ask question
Login Signup
Ask question
All categories
  • English
  • Mathematics
  • Social Studies
  • Business
  • History
  • Health
  • Geography
  • Biology
  • Physics
  • Chemistry
  • Computers and Technology
  • Arts
  • World Languages
  • Spanish
  • French
  • German
  • Advanced Placement (AP)
  • SAT
  • Medicine
  • Law
  • Engineering
MAVERICK [17]
3 years ago
9

In a coin game, you repeatedly toss a biased coin (0.4 for head, 0.6 for tail). Each head represents 3 points and tail represent

s 1 point. You can either Toss or Stop if the total number of points you have tossed is no more than 7. Otherwise, you must Stop. When you Stop, your utility is equal to your total points (up to 7), or 0 if you get a total of 8 points or higher. When you Toss, you receive no utility. There is no discounting (= 1).
(a) What are the states and the actions for this MDP? Which states are terminal?
(b) What is the transition function and the reward function for this MDP? Hint: The problem may be simpler to formulate using the general version of rewards: R(s, a, s')
(c) Run value iteration to find the optimal value function V* for the MDP. Show each Vk step (starting from Vo(s) = 0 for all states s). For a reasonable MDP formulation, this should converge in fewer than 10 steps. If you find it too tedious to do by hand, you may write a program to do this for you; however, there may be some benefit in seeing the calculation unfolding in front of you.
(d) Using the V* you found, determine the optimal policy for this MDP.
Mathematics
1 answer:
Sladkaya [172]3 years ago
5 0

Answer:

See answer in explanation

Step-by-step explanation:

State: current points if stop plus a terminal state, that is, 0,1,2,3,4,5,6,7,DONE

Action: Toss, Stop

2. What is the transition function and the reward function for this MDP?

Transition function:

T(Si , TOSS, Si+3) = 0.4 if i < 3

T(Si , TOSS, DONE) = 0.4 if i ≥ 3

T(Si , TOSS, Si+1) = 0.6 if i < 7

T(Si , TOSS, DONE) = 0.4 if i = 7

T(Si , STOP, DONE) = 1

Reward function:

R(Si , TOSS, ANY ) = 0

R(Si , STOP, DONE) = i

R(DONE, STOP, DONE) = 0

3. What is the optimal policy for this MDP? Please write down the steps to show how

you get the optimal policy.

Optimal policy: Toss for 0,1,2 ; STOP for others.

You should include the steps of value iteration. The value iteration will converge at

iteration 3. Result of iteration 3 is as follow,

V3: 0: 4.5 from Toss; 1: 5.4 from Toss; 2: 5.9 from Toss; 3: 3 from

Stop; 4: 4 from Stop; 5: 5 from Stop; 6: 6

You might be interested in
if a snail can travels 200 inches in 2 hours, in 2 hours how long will it take the snail to travel 50 inches
Veronika [31]
I'm pretty sure you just answered your own question, (no hate), but I'm going to take a guess and say 2 hours??
7 0
3 years ago
Read 2 more answers
What is the answer to 2*3(15-5+3)4*4-6
Andreas93 [3]

hope it helps u..........

3 0
3 years ago
What is the area of a circle with a diameter of 36 millimeters?
Aleonysh [2.5K]
1st step is to find the radius, then substitute into the formula for area of a circle 

A = \pi r^2~~~~~~r = radius

to find the radius, just halve the diameter
6 0
3 years ago
Read 2 more answers
2(x + 3) = x - 4 <br>And <br>4(5x-2)=2(9x+3)<br>​
weqwewe [10]

2x+6 - х -4

2x - x = - 4 - 6

x=-10

20x - 8 = 18x +6

20x - 18x = 6 +8

2x = 14

x=7

4 0
3 years ago
8.35x - 1.5 = 71.98<br> (a) 8.6<br> (b) 8.8<br> (c) 9<br> (d) 10.3
prisoha [69]
<h3>  Hola! :D ¡te invito a recibir ayuda de un latinoamericano puto!</h3><h2><u> _____________________________________ </u></h2><h2>                        8.35x - 1.5 = 71.98</h2><h2> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -</h2>

                           <u>El -1,5 qlero pasaría al otro lado positivo</u>

<h3>                                     8.35x = 71.98 + 1.5</h3><h2> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -</h2>

                              <u>Ahora, se suma 71.98 + 1.5 = 73,48</u>

<h3>                                         8.35x = 73,48</h3><h2> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -</h2>

    <u>El 8.35 qlero que está multiplicando, pasa al otro lado pero dividiendo</u>

<h3>                                       x = 73,48 ÷ 8.35</h3><h2> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -</h2>

                                                    <u>Dividimos</u>

<h3>                                               x = 8,8</h3><h2> - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -</h2><h2>                 <u>(b) 8,8</u>  es la opción correcta</h2>
3 0
3 years ago
Other questions:
  • Explain how you can find 4times 754 using two different methods
    12·1 answer
  • What is the probability that you will get a 3-digit combination correct if no digits in the combination can be repeated?
    7·2 answers
  • Jenny is a 60% free-throw shooter. She gets to shoot a second free throw if and only if she makes her first shot. She can score
    6·2 answers
  • What is the average speed in miles per hour of a car that’s travels 956.4 miles in 15.9 hours? Round the answer to nearest tenth
    11·1 answer
  • How can i simplify the expression 8m+7m
    14·2 answers
  • This pair of figures is similar. Find the missing side.​
    9·2 answers
  • Equation with angles can someone help me please
    13·1 answer
  • 6. You buy a pair of shoes that cost $120. You use your 20% off coupon and pay an 9% tax. What is your
    9·1 answer
  • The goalie on the Iceblades hockey team saves (blocks) 73% of the opponent's shots. With 10 minutes to go, the Iceblades are ahe
    8·2 answers
  • 6 and 186 its divison i dont know division<br> <img src="https://tex.z-dn.net/?f=%5Csqrt%7Bx%7D" id="TexFormula1" title="\sqrt{x
    13·2 answers
Add answer
Login
Not registered? Fast signup
Signup
Login Signup
Ask question!