1answer.
Ask question
Login Signup
Ask question
All categories
  • English
  • Mathematics
  • Social Studies
  • Business
  • History
  • Health
  • Geography
  • Biology
  • Physics
  • Chemistry
  • Computers and Technology
  • Arts
  • World Languages
  • Spanish
  • French
  • German
  • Advanced Placement (AP)
  • SAT
  • Medicine
  • Law
  • Engineering
MAVERICK [17]
3 years ago
9

In a coin game, you repeatedly toss a biased coin (0.4 for head, 0.6 for tail). Each head represents 3 points and tail represent

s 1 point. You can either Toss or Stop if the total number of points you have tossed is no more than 7. Otherwise, you must Stop. When you Stop, your utility is equal to your total points (up to 7), or 0 if you get a total of 8 points or higher. When you Toss, you receive no utility. There is no discounting (= 1).
(a) What are the states and the actions for this MDP? Which states are terminal?
(b) What is the transition function and the reward function for this MDP? Hint: The problem may be simpler to formulate using the general version of rewards: R(s, a, s')
(c) Run value iteration to find the optimal value function V* for the MDP. Show each Vk step (starting from Vo(s) = 0 for all states s). For a reasonable MDP formulation, this should converge in fewer than 10 steps. If you find it too tedious to do by hand, you may write a program to do this for you; however, there may be some benefit in seeing the calculation unfolding in front of you.
(d) Using the V* you found, determine the optimal policy for this MDP.
Mathematics
1 answer:
Sladkaya [172]3 years ago
5 0

Answer:

See answer in explanation

Step-by-step explanation:

State: current points if stop plus a terminal state, that is, 0,1,2,3,4,5,6,7,DONE

Action: Toss, Stop

2. What is the transition function and the reward function for this MDP?

Transition function:

T(Si , TOSS, Si+3) = 0.4 if i < 3

T(Si , TOSS, DONE) = 0.4 if i ≥ 3

T(Si , TOSS, Si+1) = 0.6 if i < 7

T(Si , TOSS, DONE) = 0.4 if i = 7

T(Si , STOP, DONE) = 1

Reward function:

R(Si , TOSS, ANY ) = 0

R(Si , STOP, DONE) = i

R(DONE, STOP, DONE) = 0

3. What is the optimal policy for this MDP? Please write down the steps to show how

you get the optimal policy.

Optimal policy: Toss for 0,1,2 ; STOP for others.

You should include the steps of value iteration. The value iteration will converge at

iteration 3. Result of iteration 3 is as follow,

V3: 0: 4.5 from Toss; 1: 5.4 from Toss; 2: 5.9 from Toss; 3: 3 from

Stop; 4: 4 from Stop; 5: 5 from Stop; 6: 6

You might be interested in
I need to have this done by tonight and I don’t know what to put
a_sh-v [17]

Answer:

A. Option 1

Sign up cost = 0

1 Month = 125

2 Months = 150

3 Months = 175

4 Months = 200

Step-by-step explanation:

For every option simply write what the starting cost is in the top box for every option. Then multiply the monthly cost by how many months for each box and then add the starting cost. The result of the multiplication and addition will be what you put in each box.

5 0
3 years ago
J(2x+9) m(6x-1)<br> i don’t get this at all
kherson [118]

Answer:

omg that one is hard but it is 18j-6m

4 0
3 years ago
Calculate the speed of a car that went a distance of 125 miles in 2 hours<br> time.
DerKrebs [107]

Answer:

62.5

Step-by-step explanation:

speed = distance / time taken

= 125/2

=62.5miles per hour

5 0
3 years ago
Which is longer 1/3 of a minute or 2/3 of a minute?
Evgesh-ka [11]
2/3 of a minute is longer than 1/3 of a minute
3 0
3 years ago
Read 2 more answers
Determine whether the underlined value is a parameter or a statistic. One of the greatest baseball hitters of all time has a car
Ad libitum [116K]

Answer:

B. The value is a parameter because the career at-bats of a baseball player are a population.

Step-by-step explanation:

The hitter had a career batting average of 0.366.

This value comes from information about the population of every time he has ever gone up to bat. Therefore, it is a parameter.

If on the other hand, we take the batting average of 5 games, the 5 games will be a sample and the batting average will be a statistic.

5 0
3 years ago
Other questions:
  • Please help me because I don't know the answer​
    7·2 answers
  • PLEASE HELP ASAP OFFERING 10 POINTS
    11·1 answer
  • PLEASE I NEED HELP ON THIS!!! 29 POINTS!!
    7·1 answer
  • Two mechanics worked on a car. The first mechanic worked for
    7·1 answer
  • Give all solutions if the nonlinear system of equations, including those with no real complex components.
    6·2 answers
  • Partial products of 310x434
    9·2 answers
  • Simplify products and quotients
    14·1 answer
  • 9) Three times a number added to 12 gives -6. Find the number.​
    10·2 answers
  • Find the length of the diameter of circle O. Round to the nearest tenth.
    8·1 answer
  • Need help solving those problems ..No spam.... Hope you will help​
    12·2 answers
Add answer
Login
Not registered? Fast signup
Signup
Login Signup
Ask question!