1answer.
Ask question
Login Signup
Ask question
All categories
  • English
  • Mathematics
  • Social Studies
  • Business
  • History
  • Health
  • Geography
  • Biology
  • Physics
  • Chemistry
  • Computers and Technology
  • Arts
  • World Languages
  • Spanish
  • French
  • German
  • Advanced Placement (AP)
  • SAT
  • Medicine
  • Law
  • Engineering
MAVERICK [17]
3 years ago
9

In a coin game, you repeatedly toss a biased coin (0.4 for head, 0.6 for tail). Each head represents 3 points and tail represent

s 1 point. You can either Toss or Stop if the total number of points you have tossed is no more than 7. Otherwise, you must Stop. When you Stop, your utility is equal to your total points (up to 7), or 0 if you get a total of 8 points or higher. When you Toss, you receive no utility. There is no discounting (= 1).
(a) What are the states and the actions for this MDP? Which states are terminal?
(b) What is the transition function and the reward function for this MDP? Hint: The problem may be simpler to formulate using the general version of rewards: R(s, a, s')
(c) Run value iteration to find the optimal value function V* for the MDP. Show each Vk step (starting from Vo(s) = 0 for all states s). For a reasonable MDP formulation, this should converge in fewer than 10 steps. If you find it too tedious to do by hand, you may write a program to do this for you; however, there may be some benefit in seeing the calculation unfolding in front of you.
(d) Using the V* you found, determine the optimal policy for this MDP.
Mathematics
1 answer:
Sladkaya [172]3 years ago
5 0

Answer:

See answer in explanation

Step-by-step explanation:

State: current points if stop plus a terminal state, that is, 0,1,2,3,4,5,6,7,DONE

Action: Toss, Stop

2. What is the transition function and the reward function for this MDP?

Transition function:

T(Si , TOSS, Si+3) = 0.4 if i < 3

T(Si , TOSS, DONE) = 0.4 if i ≥ 3

T(Si , TOSS, Si+1) = 0.6 if i < 7

T(Si , TOSS, DONE) = 0.4 if i = 7

T(Si , STOP, DONE) = 1

Reward function:

R(Si , TOSS, ANY ) = 0

R(Si , STOP, DONE) = i

R(DONE, STOP, DONE) = 0

3. What is the optimal policy for this MDP? Please write down the steps to show how

you get the optimal policy.

Optimal policy: Toss for 0,1,2 ; STOP for others.

You should include the steps of value iteration. The value iteration will converge at

iteration 3. Result of iteration 3 is as follow,

V3: 0: 4.5 from Toss; 1: 5.4 from Toss; 2: 5.9 from Toss; 3: 3 from

Stop; 4: 4 from Stop; 5: 5 from Stop; 6: 6

You might be interested in
9 out of the 25 items are made of glass. what percent of the cookware items are glass.
NeTakaya

\frac{9}{25}  \times 100 = 36\%
Here is your answer!
5 0
3 years ago
Read 2 more answers
What is the area of this composite figure?
anygoal [31]

Answer:

<em>Well, Your answer will be is </em><em>D. 100 feet squared. </em><em>Because, </em>

<em>1. Multiply 10x6 and 10x4. </em>

<em> </em>

<em>2. Add 60 and 40, the results from the previous step. </em>

<em> </em>

<em>3. You get 100, or 100 feet squared. </em><em>Good Luck!</em>

<em> </em>

<em />

3 0
3 years ago
Read 2 more answers
Solve the given equation by using the quadratic formula: 2x^2-3x-4=0
kaheart [24]
Let's solve your equation step-by-step.<span><span><span><span>2<span>x2</span></span>−<span>3x</span></span>−4</span>=0</span>Step 1: Use quadratic formula with a=2, b=-3, c=-4.<span>x=<span><span><span>−b</span>±<span>√<span><span>b2</span>−<span><span>4a</span>c</span></span></span></span><span>2a</span></span></span><span>x=<span><span><span>−<span>(<span>−3</span>)</span></span>±<span>√<span><span><span>(<span>−3</span>)</span>2</span>−<span><span>4<span>(2)</span></span><span>(<span>−4</span>)</span></span></span></span></span><span>2<span>(2)</span></span></span></span><span>x=<span><span>3±<span>√41</span></span>4</span></span><span><span>x=<span><span>34</span>+<span><span><span><span>14</span><span>√41</span></span><span> or </span></span>x</span></span></span>=<span><span>34</span>+<span><span><span>−1</span>4</span><span>√<span>41</span></span></span></span></span>
7 0
4 years ago
Read 2 more answers
( YOU’LL GET MARKED BRAINLIEST !! )
Mariulka [41]

Answer:

1/4

Step-by-step explanation:

The answer to this question is the amount of blue gummy bears over the total amount of gummy bears. This is 15/60 or 1/4.

4 0
3 years ago
Read 2 more answers
I don’t know how to do this, what’s the area?
Anastaziya [24]

Answer:

Hello!

Step-by-step explanation:

To find the area of a triangle, multiply the base by the height, and then divide by 2. The division by 2 comes from the fact that a parallelogram can be divided into 2 triangles. For example, in the diagram to the left, the area of each triangle is equal to one-half the area of the parallelogram.

So,you have to multiply.

Hope this helps.

6 0
3 years ago
Other questions:
  • A trampoline salesman makes $25,000 annually plus 6% commission on his total sales. If he sold $40,000 worth of trampolines this
    5·1 answer
  • I’m confused on this one
    12·1 answer
  • Is 0.70 a rational or irrational?
    9·1 answer
  • How to solve it using elimination
    15·1 answer
  • I need this ASAP plz answer
    8·2 answers
  • Which of the following mood words has a negative connotation?
    7·2 answers
  • Please help quick it’s for 15 points
    5·1 answer
  • Solve and get brainliest!!<br>please be sure of your answer!!!!​
    11·2 answers
  • BRAINLIEST + FOLLOW FOR QUICKEST ANSWER
    6·1 answer
  • The rent for an apartment is $800 per month. The landlord charges one month's rent as a deposit plus a nonrefundable damage cost
    5·1 answer
Add answer
Login
Not registered? Fast signup
Signup
Login Signup
Ask question!