1answer.
Ask question
Login Signup
Ask question
All categories
  • English
  • Mathematics
  • Social Studies
  • Business
  • History
  • Health
  • Geography
  • Biology
  • Physics
  • Chemistry
  • Computers and Technology
  • Arts
  • World Languages
  • Spanish
  • French
  • German
  • Advanced Placement (AP)
  • SAT
  • Medicine
  • Law
  • Engineering
MAVERICK [17]
3 years ago
9

In a coin game, you repeatedly toss a biased coin (0.4 for head, 0.6 for tail). Each head represents 3 points and tail represent

s 1 point. You can either Toss or Stop if the total number of points you have tossed is no more than 7. Otherwise, you must Stop. When you Stop, your utility is equal to your total points (up to 7), or 0 if you get a total of 8 points or higher. When you Toss, you receive no utility. There is no discounting (= 1).
(a) What are the states and the actions for this MDP? Which states are terminal?
(b) What is the transition function and the reward function for this MDP? Hint: The problem may be simpler to formulate using the general version of rewards: R(s, a, s')
(c) Run value iteration to find the optimal value function V* for the MDP. Show each Vk step (starting from Vo(s) = 0 for all states s). For a reasonable MDP formulation, this should converge in fewer than 10 steps. If you find it too tedious to do by hand, you may write a program to do this for you; however, there may be some benefit in seeing the calculation unfolding in front of you.
(d) Using the V* you found, determine the optimal policy for this MDP.
Mathematics
1 answer:
Sladkaya [172]3 years ago
5 0

Answer:

See answer in explanation

Step-by-step explanation:

State: current points if stop plus a terminal state, that is, 0,1,2,3,4,5,6,7,DONE

Action: Toss, Stop

2. What is the transition function and the reward function for this MDP?

Transition function:

T(Si , TOSS, Si+3) = 0.4 if i < 3

T(Si , TOSS, DONE) = 0.4 if i ≥ 3

T(Si , TOSS, Si+1) = 0.6 if i < 7

T(Si , TOSS, DONE) = 0.4 if i = 7

T(Si , STOP, DONE) = 1

Reward function:

R(Si , TOSS, ANY ) = 0

R(Si , STOP, DONE) = i

R(DONE, STOP, DONE) = 0

3. What is the optimal policy for this MDP? Please write down the steps to show how

you get the optimal policy.

Optimal policy: Toss for 0,1,2 ; STOP for others.

You should include the steps of value iteration. The value iteration will converge at

iteration 3. Result of iteration 3 is as follow,

V3: 0: 4.5 from Toss; 1: 5.4 from Toss; 2: 5.9 from Toss; 3: 3 from

Stop; 4: 4 from Stop; 5: 5 from Stop; 6: 6

You might be interested in
What is the difference between 480 °C<br> – 60 °C
vazorg [7]

Answer:

540

Step-by-step explanation:

6 0
3 years ago
Read 2 more answers
Mindy makes an annual salary of $24,590. What is the maximum monthly rent she could afford to pay?
Otrada [13]
24590/12=2029.166....
Your answer would be $2049.17
6 0
3 years ago
Read 2 more answers
How would I go about solving this problem???
schepotkina [342]
So.. if you take a peek at the picture below

the trunk is really just a half-cylinder on top of a square, with a depth of 2 meters

what's the volume?   well, easy enough, take the volume of the cylinder, then half it
take the volume of the rectangular prism, and then add them both

\bf \textit{volume of a cylinder}\\\\&#10;&#10;\begin{array}{llll}&#10;C=\pi r^2 h\\\\&#10;\textit{half that}\\\\&#10;\cfrac{\pi r^2 h}{2}&#10;\end{array}\qquad &#10;\begin{cases}&#10;r=radius\\&#10;h=height\\&#10;-------\\&#10;r=\frac{1}{2}\\&#10;h=2&#10;\end{cases}\implies \cfrac{C}{2}=\cfrac{\pi \left(  \frac{1}{2}\right)^2 2}{2}\\\\&#10;-----------------------------\\\\&#10;\textit{volume of a square}\\\\&#10;V=lwh\qquad &#10;\begin{cases}&#10;l=length\\&#10;w=width\\&#10;h=height&#10;----------\\&#10;l=1\\&#10;w=1\\&#10;h=2&#10;\end{cases}\implies V=2



now.. for the surface area... \bf \textit{surface area of a cylinder}\\\\&#10;\begin{array}{llll}&#10;S=2\pi r(h+r)\\\\&#10;\textit{half that}\\\\&#10;\cfrac{2\pi r(h+r)}{2}&#10;\end{array}\begin{cases}&#10;r=radius\\&#10;h=height\\&#10;-------\\&#10;r=\frac{1}{2}\\&#10;h=2&#10;\end{cases}

now.. for the surface area of the prism... well

is really just 6 rectangles stacked up to each other at the edges

so... get the area of the lateral rectangles, and the one at the bottom, skip the rectangle atop, because is the one overlapping the cylinder, and is not outside, and thus is not surface area then

for the lateral ones, you have a front of 1x1, a back of 1x1 and a left of 1x2 and a right of 1x2, and then the one at the bottom, which is a 1x2

then add both surface areas, and that's the surface area of the trunk

5 0
3 years ago
Find the value of tangent for angle A
gavmur [86]
AC = √(40²-12²) = √(1600-144) = √1456 = 4√91

tanA = 12/4√91 = 3/√91 = 3√91/91


option а.
4 0
3 years ago
Read 2 more answers
Help me please ASAP!
il63 [147K]

C the value of m and b doesn't affect the sign of the x-int

m is the slope of the line and b is the y-int

5 0
3 years ago
Other questions:
  • a car travels 294 miles on a full tank of gas the cars gas tank holds 14 gallons how many yards per gallon can this car travel
    12·2 answers
  • What number is equivalent to |−27| ?
    10·2 answers
  • What are the points for y=-3x+4 and how do I graph it
    8·1 answer
  • What is 157% in a Fraction
    12·2 answers
  • 1/5 + 5/13 - 3/14 = ?
    6·1 answer
  • A baseball player hit 60 home runs in a season. Of the 60 home runs, 19 went to right field, 18 went to right center field, 12 w
    6·1 answer
  • Please help!!!
    8·2 answers
  • . Write the following exponential expressions as equivalent radical expressions. <br> a. 2^(1 /2)
    7·1 answer
  • PLEASE HELP ME IM STUCK ON THIS QUESTION
    8·1 answer
  • A candy jar contains pieces of cherry and cinnamon candy. Seventy five percent of the candies
    8·1 answer
Add answer
Login
Not registered? Fast signup
Signup
Login Signup
Ask question!