1answer.
Ask question
Login Signup
Ask question
All categories
  • English
  • Mathematics
  • Social Studies
  • Business
  • History
  • Health
  • Geography
  • Biology
  • Physics
  • Chemistry
  • Computers and Technology
  • Arts
  • World Languages
  • Spanish
  • French
  • German
  • Advanced Placement (AP)
  • SAT
  • Medicine
  • Law
  • Engineering
MAVERICK [17]
3 years ago
9

In a coin game, you repeatedly toss a biased coin (0.4 for head, 0.6 for tail). Each head represents 3 points and tail represent

s 1 point. You can either Toss or Stop if the total number of points you have tossed is no more than 7. Otherwise, you must Stop. When you Stop, your utility is equal to your total points (up to 7), or 0 if you get a total of 8 points or higher. When you Toss, you receive no utility. There is no discounting (= 1).
(a) What are the states and the actions for this MDP? Which states are terminal?
(b) What is the transition function and the reward function for this MDP? Hint: The problem may be simpler to formulate using the general version of rewards: R(s, a, s')
(c) Run value iteration to find the optimal value function V* for the MDP. Show each Vk step (starting from Vo(s) = 0 for all states s). For a reasonable MDP formulation, this should converge in fewer than 10 steps. If you find it too tedious to do by hand, you may write a program to do this for you; however, there may be some benefit in seeing the calculation unfolding in front of you.
(d) Using the V* you found, determine the optimal policy for this MDP.
Mathematics
1 answer:
Sladkaya [172]3 years ago
5 0

Answer:

See answer in explanation

Step-by-step explanation:

State: current points if stop plus a terminal state, that is, 0,1,2,3,4,5,6,7,DONE

Action: Toss, Stop

2. What is the transition function and the reward function for this MDP?

Transition function:

T(Si , TOSS, Si+3) = 0.4 if i < 3

T(Si , TOSS, DONE) = 0.4 if i ≥ 3

T(Si , TOSS, Si+1) = 0.6 if i < 7

T(Si , TOSS, DONE) = 0.4 if i = 7

T(Si , STOP, DONE) = 1

Reward function:

R(Si , TOSS, ANY ) = 0

R(Si , STOP, DONE) = i

R(DONE, STOP, DONE) = 0

3. What is the optimal policy for this MDP? Please write down the steps to show how

you get the optimal policy.

Optimal policy: Toss for 0,1,2 ; STOP for others.

You should include the steps of value iteration. The value iteration will converge at

iteration 3. Result of iteration 3 is as follow,

V3: 0: 4.5 from Toss; 1: 5.4 from Toss; 2: 5.9 from Toss; 3: 3 from

Stop; 4: 4 from Stop; 5: 5 from Stop; 6: 6

You might be interested in
I need help bro!!!!!!!
konstantin123 [22]

Answer:

The one that is on the bottom half and on the right side.

Step-by-step explanation:

(0,30) to (10,0)

Because the x value 0 represents the time in which the paint job started and ends in the value 10

While the y value 30 is the amount of paint he started with and ends with 0

5 0
3 years ago
Which is equivalent to -√10 3/4^x?
Galina-37 [17]

3√10^4x is equivalent to -√10 3/4^x and -4√10^3x they are all equivalent

4 0
3 years ago
Read 2 more answers
Find the value of x if mPYA 12x 20 and m LPLA 110O 220O 240O 20O 40
romanna [79]
M<PLA = 1/2 m PYA
110 = 1/2 (12x - 20)
12x - 20 = 220
12x = 220 + 20 = 240
x = 240 / 12 = 20
4 0
3 years ago
Jacob traveled 171 miles in 3.8 hours. He wants to know how many miles he traveled in one hour, so he set up this problem:
Makovka662 [10]

The answer is 45


------------------------------------------------------------

8 0
2 years ago
Read 2 more answers
In a newspaper, it was reported that the number of yearly robberies in Springfield in
scoundrel [369]

Answer:

32 robberies

Step-by-step explanation:

To find the amount here, divide 64 by 2 (or multiply by 1/2).

64 / 2

How many times does 2 go into 6? (3)

3-

4 / 2

How many times does 2 go into 4? (2)

32 robberies

7 0
3 years ago
Other questions:
  • A music download service charges a flat fee each month and $0.99 per download. The total cost for downloading 27 songs this mont
    10·2 answers
  • Write (y^4)^2 without exponents.
    6·1 answer
  • Which equations are in standard form? Check all that apply y = 2x + 5 2x + 3y = –6 –4x + 3y = 12 y = y equals StartFraction 3 Ov
    9·2 answers
  • What is the prime factorization of 44?
    5·2 answers
  • A town has 10,000 two-child families. Design a simulation to estimate the percentage of two-child families with two girl. Choose
    5·2 answers
  • Which of the following is the equation of a line in slope-intercept form for a
    6·1 answer
  • PLS help
    8·1 answer
  • Write the first four terms of sequence Rule:start at 28.6, subtract 3.1
    5·1 answer
  • PLEASE HELP <br> suppose
    7·1 answer
  • A father’s age now is three times the age that his son was 4 years ago. In
    7·1 answer
Add answer
Login
Not registered? Fast signup
Signup
Login Signup
Ask question!