1answer.
Ask question
Login Signup
Ask question
All categories
  • English
  • Mathematics
  • Social Studies
  • Business
  • History
  • Health
  • Geography
  • Biology
  • Physics
  • Chemistry
  • Computers and Technology
  • Arts
  • World Languages
  • Spanish
  • French
  • German
  • Advanced Placement (AP)
  • SAT
  • Medicine
  • Law
  • Engineering
MAVERICK [17]
3 years ago
9

In a coin game, you repeatedly toss a biased coin (0.4 for head, 0.6 for tail). Each head represents 3 points and tail represent

s 1 point. You can either Toss or Stop if the total number of points you have tossed is no more than 7. Otherwise, you must Stop. When you Stop, your utility is equal to your total points (up to 7), or 0 if you get a total of 8 points or higher. When you Toss, you receive no utility. There is no discounting (= 1).
(a) What are the states and the actions for this MDP? Which states are terminal?
(b) What is the transition function and the reward function for this MDP? Hint: The problem may be simpler to formulate using the general version of rewards: R(s, a, s')
(c) Run value iteration to find the optimal value function V* for the MDP. Show each Vk step (starting from Vo(s) = 0 for all states s). For a reasonable MDP formulation, this should converge in fewer than 10 steps. If you find it too tedious to do by hand, you may write a program to do this for you; however, there may be some benefit in seeing the calculation unfolding in front of you.
(d) Using the V* you found, determine the optimal policy for this MDP.
Mathematics
1 answer:
Sladkaya [172]3 years ago
5 0

Answer:

See answer in explanation

Step-by-step explanation:

State: current points if stop plus a terminal state, that is, 0,1,2,3,4,5,6,7,DONE

Action: Toss, Stop

2. What is the transition function and the reward function for this MDP?

Transition function:

T(Si , TOSS, Si+3) = 0.4 if i < 3

T(Si , TOSS, DONE) = 0.4 if i ≥ 3

T(Si , TOSS, Si+1) = 0.6 if i < 7

T(Si , TOSS, DONE) = 0.4 if i = 7

T(Si , STOP, DONE) = 1

Reward function:

R(Si , TOSS, ANY ) = 0

R(Si , STOP, DONE) = i

R(DONE, STOP, DONE) = 0

3. What is the optimal policy for this MDP? Please write down the steps to show how

you get the optimal policy.

Optimal policy: Toss for 0,1,2 ; STOP for others.

You should include the steps of value iteration. The value iteration will converge at

iteration 3. Result of iteration 3 is as follow,

V3: 0: 4.5 from Toss; 1: 5.4 from Toss; 2: 5.9 from Toss; 3: 3 from

Stop; 4: 4 from Stop; 5: 5 from Stop; 6: 6

You might be interested in
Alex purchased a new car for $28,000. The car's value depreciates 7.25% each year. What will be the value of the car 5 years aft
Anestetic [448]
A = 28000(0.9275)^5

a = 19219

$19,219 is the answer

Hoped this helped 
8 0
3 years ago
Struct quadrilaterals for measurements given belo
Ierofanga [76]
I’m just trynna get points
6 0
3 years ago
Alldlyziny Work Tor Errors
stiks02 [169]

Answer: The correct answer is (B)

Step-by-step explanation: No. Ari should have written the percent ratio as 5/100

8 0
3 years ago
Read 2 more answers
If f(x) = x2 + x - 8, find f(-3).
dalvyx [7]

Answer:

-17

Step-by-step explanation:

it's telling you to replace x with -3.

You have to replace all the x's with -3.

(-3)2 + (-3) - 8 = -17

5 0
2 years ago
(-5,7) and (2,5) <br> find the slope of the line passing through the given points
svp [43]

Answer: -2/7

Step-by-step explanation:

To find the slope, you use the formula m=\frac{y_2-y_1}{x_2-x_1}.

m=\frac{5-7}{2-(-5)} =\frac{-2}{7}=-\frac{2}{7}

Now, we know the slope is -2/7.

6 0
2 years ago
Other questions:
  • Megans room was remodeled. The new area of the room is 175% of the previous area. Only the length of the room changed.
    14·1 answer
  • What is the percent of increase from 25 to 35?
    12·1 answer
  • Worth 21 points and giving brianliest!!!! Please help ASAP
    7·1 answer
  • The population of a town increased from 3500 in 2005 to 5600 in 2009. Find the absolute and relative (percent) increase.
    11·1 answer
  • 3. Marilyn purchased 400 shares of stock for $22.75 per
    10·1 answer
  • X3 - 7x,<br> f(x) = 8,<br> 2x + 3,<br> x &lt; -3<br> - 3 &lt; x &lt; 3<br> x &gt; 3
    5·1 answer
  • Given F (x) equals -3x + 21 find X when f(x)<br> equals 6
    7·1 answer
  • IF A+B+C=Π prove that sin(3A)+sin(3B)+sin(3C)=-4cos(3A÷2)cos(3B÷2)cos(3C÷2)​
    14·1 answer
  • Help me please and thank you
    13·1 answer
  • Hurry for Brainlist =Accurate Answer
    14·1 answer
Add answer
Login
Not registered? Fast signup
Signup
Login Signup
Ask question!