1answer.
Ask question
Login Signup
Ask question
All categories
  • English
  • Mathematics
  • Social Studies
  • Business
  • History
  • Health
  • Geography
  • Biology
  • Physics
  • Chemistry
  • Computers and Technology
  • Arts
  • World Languages
  • Spanish
  • French
  • German
  • Advanced Placement (AP)
  • SAT
  • Medicine
  • Law
  • Engineering
MAVERICK [17]
3 years ago
9

In a coin game, you repeatedly toss a biased coin (0.4 for head, 0.6 for tail). Each head represents 3 points and tail represent

s 1 point. You can either Toss or Stop if the total number of points you have tossed is no more than 7. Otherwise, you must Stop. When you Stop, your utility is equal to your total points (up to 7), or 0 if you get a total of 8 points or higher. When you Toss, you receive no utility. There is no discounting (= 1).
(a) What are the states and the actions for this MDP? Which states are terminal?
(b) What is the transition function and the reward function for this MDP? Hint: The problem may be simpler to formulate using the general version of rewards: R(s, a, s')
(c) Run value iteration to find the optimal value function V* for the MDP. Show each Vk step (starting from Vo(s) = 0 for all states s). For a reasonable MDP formulation, this should converge in fewer than 10 steps. If you find it too tedious to do by hand, you may write a program to do this for you; however, there may be some benefit in seeing the calculation unfolding in front of you.
(d) Using the V* you found, determine the optimal policy for this MDP.
Mathematics
1 answer:
Sladkaya [172]3 years ago
5 0

Answer:

See answer in explanation

Step-by-step explanation:

State: current points if stop plus a terminal state, that is, 0,1,2,3,4,5,6,7,DONE

Action: Toss, Stop

2. What is the transition function and the reward function for this MDP?

Transition function:

T(Si , TOSS, Si+3) = 0.4 if i < 3

T(Si , TOSS, DONE) = 0.4 if i ≥ 3

T(Si , TOSS, Si+1) = 0.6 if i < 7

T(Si , TOSS, DONE) = 0.4 if i = 7

T(Si , STOP, DONE) = 1

Reward function:

R(Si , TOSS, ANY ) = 0

R(Si , STOP, DONE) = i

R(DONE, STOP, DONE) = 0

3. What is the optimal policy for this MDP? Please write down the steps to show how

you get the optimal policy.

Optimal policy: Toss for 0,1,2 ; STOP for others.

You should include the steps of value iteration. The value iteration will converge at

iteration 3. Result of iteration 3 is as follow,

V3: 0: 4.5 from Toss; 1: 5.4 from Toss; 2: 5.9 from Toss; 3: 3 from

Stop; 4: 4 from Stop; 5: 5 from Stop; 6: 6

You might be interested in
Blake bought a total of 24
Harman [31]
5/8 = 0.625 = 62.5%

If 62.5% were glazed then you must find 62.5% of 24.

Equation…

x=0.625(24) = 15

if each costs $0.84 cents plus tax then you will multiply by 15.

15 x .84 = 12.6

The total cost was $12.60
3 0
3 years ago
Read 2 more answers
Write an equation for the translation of y = |x| and the translation is up 4 right 2.
Darya [45]
<span> y = |x-2|+4

it is a negative 2 because you do the inverse</span>
8 0
3 years ago
Tori works two jobs to pay for college. She tutors for $30 per hour and also works as a receptionist for $10 per hour. Due to he
Mkey [24]
D, because  tutoring and receptionist is UPTO 20 hours and the money is AT LEAST 200
5 0
3 years ago
I need to round 16,500 nearest thousand feet
Verizon [17]
17,000 feet is your answer
4 0
3 years ago
Read 2 more answers
Simplify (5n^4)^-3. Assume n = 0.
denpristay [2]

Answer:

0

Step-by-step explanation:

5*0=0

0^4=0

0^-3=0

8 0
2 years ago
Read 2 more answers
Other questions:
  • How do i know that the variable x has a uniform distribution function?
    6·1 answer
  • An Internet service company earned $770 in set-up fees for new homes in a neighborhood. They charged $35 for each new set-up.
    8·1 answer
  • Finding theoretical probability throwing a dart in a 3x3 yellow square that is centered inside a 6x6 blue square
    7·1 answer
  • PLEASE I NEED AN ANSWER NOW
    9·2 answers
  • A: Which expression shows how many ways 7 people can be arranged on individual Ferris wheel seats?
    15·1 answer
  • △DEF has vertices D (2,2), E (-2,-1), and F (-3,5). Complete the following charts indicating the location of △D′E′F′ and △D″E″F″
    11·1 answer
  • Daniel is 16. His parents are both the same age. The 3 of them have lived a 3
    7·1 answer
  • Evaluate (-A) 2 for A = 5, B = -4, and C= 2.<br> 0-25<br> -10<br> 10<br> 25
    14·1 answer
  • 0.965 x 102 help me plz​
    11·2 answers
  • Help pls i'll give brainliest
    9·2 answers
Add answer
Login
Not registered? Fast signup
Signup
Login Signup
Ask question!