1answer.
Ask question
Login Signup
Ask question
All categories
  • English
  • Mathematics
  • Social Studies
  • Business
  • History
  • Health
  • Geography
  • Biology
  • Physics
  • Chemistry
  • Computers and Technology
  • Arts
  • World Languages
  • Spanish
  • French
  • German
  • Advanced Placement (AP)
  • SAT
  • Medicine
  • Law
  • Engineering
MAVERICK [17]
3 years ago
9

In a coin game, you repeatedly toss a biased coin (0.4 for head, 0.6 for tail). Each head represents 3 points and tail represent

s 1 point. You can either Toss or Stop if the total number of points you have tossed is no more than 7. Otherwise, you must Stop. When you Stop, your utility is equal to your total points (up to 7), or 0 if you get a total of 8 points or higher. When you Toss, you receive no utility. There is no discounting (= 1).
(a) What are the states and the actions for this MDP? Which states are terminal?
(b) What is the transition function and the reward function for this MDP? Hint: The problem may be simpler to formulate using the general version of rewards: R(s, a, s')
(c) Run value iteration to find the optimal value function V* for the MDP. Show each Vk step (starting from Vo(s) = 0 for all states s). For a reasonable MDP formulation, this should converge in fewer than 10 steps. If you find it too tedious to do by hand, you may write a program to do this for you; however, there may be some benefit in seeing the calculation unfolding in front of you.
(d) Using the V* you found, determine the optimal policy for this MDP.
Mathematics
1 answer:
Sladkaya [172]3 years ago
5 0

Answer:

See answer in explanation

Step-by-step explanation:

State: current points if stop plus a terminal state, that is, 0,1,2,3,4,5,6,7,DONE

Action: Toss, Stop

2. What is the transition function and the reward function for this MDP?

Transition function:

T(Si , TOSS, Si+3) = 0.4 if i < 3

T(Si , TOSS, DONE) = 0.4 if i ≥ 3

T(Si , TOSS, Si+1) = 0.6 if i < 7

T(Si , TOSS, DONE) = 0.4 if i = 7

T(Si , STOP, DONE) = 1

Reward function:

R(Si , TOSS, ANY ) = 0

R(Si , STOP, DONE) = i

R(DONE, STOP, DONE) = 0

3. What is the optimal policy for this MDP? Please write down the steps to show how

you get the optimal policy.

Optimal policy: Toss for 0,1,2 ; STOP for others.

You should include the steps of value iteration. The value iteration will converge at

iteration 3. Result of iteration 3 is as follow,

V3: 0: 4.5 from Toss; 1: 5.4 from Toss; 2: 5.9 from Toss; 3: 3 from

Stop; 4: 4 from Stop; 5: 5 from Stop; 6: 6

You might be interested in
Thom worked 20 hours last week at the sporting-goods store and earned
Ilia_Sergeevich [38]

Answer:

He must work additional 8 hours.

Step-by-step explanation:

Because 1550.00/20=7.75

217.00/7.75=28

28-20=8

So,an additional of 8 hours

6 0
3 years ago
Calculate the value of each expression.<br><br><br><br><br> −18/−3<br><br> omg please helpp
Jet001 [13]

Answer:

6

Step-by-step explanation:

7 0
3 years ago
4.7 is 10 times as much as what number
viktelen [127]

Answer: 4.7 is 10 as much as the number 0.47.

If you multiply 0.47 x 10 it will equal 4.7

4 0
3 years ago
Read 2 more answers
Can somebody write me a 5 sentence paragraph summary about geometry ??
kotegsom [21]

Step-by-step explanation:

Geometry is one of the oldest branches of math. Geometry is mostly about distance, shape, size, and relative position of figures. It is related to  measurement, relationships of points, lines, angles, surfaces, and solids. There are 8 types of Geometry and the basic concepts of Geometry are point, line and plane. It isn't possible to exactly define the terms, however, we know it is refers to the mark of the position and has an accurate location.

Hope this helps :)

7 0
2 years ago
Passes through (1,4) and is parallel to the line y=2x-3
Novay_Z [31]

Answer:

y = 2x + 2

Step-by-step explanation:

Slope = 2

y-intercept: 4 - (2)(1) = 2

3 0
2 years ago
Other questions:
  • I dont understand this can someone help it urgent ...............
    7·1 answer
  • PLEASE HELP! there are multiple answers. Juan has a budge of $20.00 for lunch each week. Choose one of the meals from the part b
    5·1 answer
  • Whichof the following lists of numbers is in order from greatest to least
    15·2 answers
  • If 3 sisters receive $180 and split it so that the middle sister gets twice as much as the youngest sister and the oldest sister
    8·1 answer
  • What is the solution to 6|3x+5| less than equal to 14?
    14·1 answer
  • Is it possible for a quadrilateral to have only 2 right angles
    14·2 answers
  • Please help I don’t understand this.
    7·1 answer
  • Where do the slope and y-intercept show up in the tables and graphs?
    12·2 answers
  • 115.2 divided by 9.6
    14·2 answers
  • on a number line, the directed line segment from q to s has endpoints q at –8 and s at 12. point r partitions the directed line
    14·2 answers
Add answer
Login
Not registered? Fast signup
Signup
Login Signup
Ask question!