代做hw06留学生作业、代写java/python编程语言作业、代写Hypothesis作业、代写C/C++课程设计作业
hw06-Copy1November 16, 20181 Homework 6: Probability and Hypothesis Testing1.1 Due Sunday November 18th, 11:59pmDirectly sharing answers is not okay, but discussing problems with the course staff or with otherstudents is encouraged.You should start early so that you have time to get help if you’re stuck.In [ ]: #: Don't change this cell; just run it.import numpy as npfrom datascience import *%matplotlib inlineimport matplotlib.pyplot as pltplt.style.use('fivethirtyeight')from client.api.notebook import Notebookok = Notebook('hw06.ok')_ = ok.auth(inline=True)Important: The ok tests don’t usually tell you that your answer is correct. More often, theyhelp catch careless mistakes. It’s up to you to ensure that your answer is correct. If you’re notsure, ask someone (not for the answer, but for some guidance about your approach).Once you’re finished, you must do two things:1.1.1 a. Turn into OKSelect "Save and Checkpoint" in the File menu and then execute the submit cell below. The resultwill contain a link that you can use to check that your assignment has been submitted successfully.If you submit more than once before the deadline, we will only grade your final submission.In [ ]: #: turn in your notebook_ = ok.submit()1.1.2 b. Turn PDF into GradescopeSelect File > Download As > PDF via LaTeX in the File menu. Turn in this PDF file into therespective assignement at https://gradescope.com/. If you submit more than once before thedeadline, we will only grade your final submission11.2 1. Numbers in a Slot MachineYou are in front of a slot machine with three slots. Each slot in the slot machine has 10 possibleoutcomes: the numbers from 0-9. When you press the "Spin" button on the slot machine, each ofthe three slots spins independently and stops at a number. Assume that the slot machine alwayspicks a number randomly.Question 1. Suppose you win the jackpot if you are lucky enough to encounter the followingsequence of spins, in order:Spin 1: You see a 777 in the slot machine.Spin 2: You see a 999 in the slot machine.What is the probability that you win the jackpot if you press the "Spin" button twice? Assignyour answer to jackpot_chance.In [ ]: jackpot_chance =jackpot_chanceIn [ ]: #: grade 1.1_ = ok.grade('q1_1')Question 2. What is the probability that you see a number greater than 700 when you press"Spin" once? Assign your answer to greater_than_700.In [ ]: greater_than_700 = ...greater_than_700In [ ]: #: grade 1.2_ = ok.grade('q1_2')Question 3. Write a function called simulate_one_spin. It should take no arguments, and itshould return a random number that is equally-likely to come up in the slot-machine. Note thatsince it is a number, the leading zeros are ignored. For example, if the slot number spits out 009,then the corresponding return value of your function should be 9.In [ ]: # Place your answer here. It may contain several lines of code.In [ ]: #: grade 1.3_ = ok.grade('q1_3')Question 4. Call the function simulate_one_spin 100,000 times. What proportion of timesdoes the slot machine output 777? Assign your answer to proportion_777. Your solution maytake more than one line.In [ ]: proportion_777 = ...proportion_777In [ ]: #: grade 1.4_ = ok.grade('q1_4')2Question 5. Compute the probability that at least one of the slots in the slot machine (out of thethree) gives out a 7. You can write it as an expression which can be evaluated by Python. Assignyour answer to at_least_one_7.In [ ]: at_least_one_7 = ...at_least_one_7In [ ]: #: grade 1.5_ = ok.grade('q1_5')1.3 2. Apples and OrangesSuppose you are given a huge farm that yields apples and oranges.In [ ]: #: Don't change this cell, just run itapples = ['Apple' for _ in range(400)]oranges = ['Orange' for _ in range(600)]farm_table = Table().with_column('Fruit Type', apples + oranges)farm_tableQuestion 1. Because you like apples more, you’re interested in the proportion of applesin the farm. Calculate the true proportion of apples in the farm. Store it in the variableapples_true_prop.In [ ]: apples_true_prop =apples_true_propIn [ ]: #: grade 2.1_ = ok.grade('q2_1')Question 2. Which of the following would create a representative sample of fruits and why?Explain your answer.1. farm_table.take(np.arange(200))2. farm_table.sample(200)Option 2 would create a representative sample of fruits becuause .sample would choose 200fruits at random so each fruit has an equal chance of being selected; whereas np.arange does notdo it by random.Question 3. Let’s say we have a fruit basket that can contain at most 200 fruits. We pick 200fruits (without replacement) from the farm and place it in our fruit basket using the sampling youchose in question 3 above. Write a function called pick_200_fruits that simulates this. Specifi-cally, the function should take no arguments and should return an array of 200 fruits selected asper your choice in question 3.In [ ]: # Place your answer here. It may contain several lines of code.3In [ ]: #: grade 2.3_ = ok.grade('q2_3')Question 4. As we mentioned, we’re interested in knowing the true proportion of apples in thefarm. But we can pick only 200 fruits at a time in our fruit basket. Hence, we simulate this experimentin 500 trials. For each trial, we decide to calculate the proportion of apples in our basket. Simulatethe experiment and store the array of proportions in the variable apples_empirical_props.In [ ]: # Place your answer here. It may contain several lines of code.In [ ]: #: grade 2.4_ = ok.grade('q2_4')Question 5. Now, compute the average of apples_empirical_props. You claim that thisaverage is a good estimate of the proportion of apples in the farm. Store your proportion inapples_claim_prop.In [ ]: apples_claim_prop = ...apples_claim_propIn [ ]: #: grade 2.5_ = ok.grade('q2_5')Question 6. How far away is your claim from the true proportion of apples. Compute the absolutedifference between the two and store it in the variable error. Remember that you calculatedthe true proportion of apples in Question 2In [ ]: error = ...errorIn [ ]: #: grade 2.6_ = ok.grade('q2_6')1.4 3. Broken PhonesA phone manufacturing company claims that it produces phones that are 99% non-faulty. In otherwords, only 1% of the phones that they manufacture have some fault in them. They open a retailshop in the friendly neighbourhood of La Jolla. Because the phones are cheap and nice, 100 UCSDstudents have bought phones at this shop. However, it is soon discovered that four of the studentshad faulty phones. You’re angry and argue that the company’s claim is wrong. But the companyis adament that they are right. You decide to investigate.Question 1. Assign null_probabilities to a two-item array such that the first element containsthe chance of a phone being non-faulty and the second element contains the chance that thephone is faulty under the null hypothesis.In [ ]: null_probabilities = ...null_probabilitiesIn [ ]: #: grade 3.1_ = ok.grade('q3_1')4Question 2. Using the function you wrote above, simulate the buying of 100 phones5,000 times, using the proportions that you assigned to null_probabilities. Create an arraysimulations with the number of faulty phones in each simulation.Note that the number of faulty phones in a simulation of sample size x is the proportion offaulty phones in the simulation multiplied by x.In [ ]: # Place your answer here. It may contain several lines of code.In [ ]: #: Consider the resulting histogram of the fault_statistics arrayTable().with_column("Faulty Statistic", simulations).hist(bins=np.arange(8))In [ ]: #: grade 3.2_ = ok.grade('q3_2')Question 3. Using the results of your simulation, calculate an estimate of the p-value, i.e.,the probability of observing four or more faulty phones under the null hypothesis. Assign youranswer to p_value_3_3In [ ]: p_value = ...p_valueIn [ ]: #: grade 3.3_ = ok.grade('q3_3')Question 4. Given the results of your above experiment, do you reject the null hypothesis?Explain why.Write your answer here.1.5 4. Bias towards customersThe insurance company LivLife10 classifies its customers into 3 categories - low-income, midincomeand high-income. The company claims that it treats all of its customers equally and makesno compromises on the quality of the products that it provides. You know that the companyhas 10,000 customers, 40% of which are low-income customers, 30% mid-income and 30% highincomecustomers. However, over the past year, 60% of the complaints that the company hasreceived are from low-income customers, 30% from mid-income customers and 10% from highincomecustomers.In [ ]: #: Don't change the below three linestype_of_customers = ["low-income", "mid-income", "high-income"]proportion_of_customers = np.array([0.4, 0.3, 0.3])proportion_of_complaints = np.array([0.6, 0.3, 0.1])insurance_customers = Table().with_columns("Type of Customers", type_of_customers,"Proportion of Customers", proportion_of_customers,"Proportion of Complaints", proportion_of_complaints)insurance_customers5You have a suspicion that the insurance company is biased towards its high-income customers.That is, the insurance company is providing a better product to the high-income customers thanto others. A better product is one that generates lesser complaints. You decide to test your idea.Your null hypothesis is:Null hypothesis: The complaints are drawn from the population according to the proportionof customers which are low-, mid-, and high-income.Question 1. What is the expected proportion of complaints that should be heardfrom the high-income customers under the null hypothesis? Assign your answer tocomplaints_proportion_null.In [ ]: complaints_proportion_null = ...complaints_proportion_nullIn [ ]: #: grade 4.1_ = ok.grade('q4_1')Question 2. You wish to check the bias in the insurance company towards different categoriesof customers. However, there are three categories of customers: high-, mid-, and low-income.Which among the following do you think is not a reasonable choice of test statistic for yourhypothesis. You may include more than one answer. Append all your choices in a list calledunreasonable_test_statistics. For example, if you think statistics 1, 2, and 3 are unreasonable,you should have unreasonable_test_statistics = [1,2,3]1. Average of the absolute difference between proportion of customers and proportion of correspondingcomplaints2. Sum of the absolute difference between proportion of customers and proportion of correspondingcomplaints3. The total number of complaints that the company has received in the past year4. The total variation distance between the probability distribution of customers and the distributionof complaints5. The absolute difference between the sum of proportion of customers and the sum of proportionof corresponding complaints6. Average of the sum of the proportion of customers and the proportion of correspondingcomplaintsIn [ ]: unreasonable_test_statistics = ...unreasonable_test_statisticsIn [ ]: #: grade 4.2_ = ok.grade('q4_2')Question 3. Say you went ahead with the total variation distance as your test statisticWrite a function called total_variation_distance that takes in two probability distributionsas arrays and calculates the total variation distance between them.In [ ]: # Place your answer here. It may contain several lines of code.In [ ]: #: Use the below code to test your functiontotal_variation_distance(np.array([1,0,0]), np.array([0,0,1])) # Output should be 1.06In [ ]: #: grade 4.3_ = ok.grade('q4_3')Question 4. Write a simulation which computes the TVD statistic 5000 times on data generatedunder the null hypothesis. Save the simulated statistics in an array called empirical_tvds.Hint: Use sample_proportions.In [ ]: # Place your answer here. It may contain several lines of code.In [ ]: #: grade 4.4_ = ok.grade('q4_4')Question 5. Calculate the total variation distance in the actual scenario, that is, the observedscenario. Save the result in observed_tvd.In [ ]: observed_tvd = ...observed_tvdLet us plot a histogram of empirical_tvds and compare that to our observed_tvdIn [ ]: #: VisualizeTable().with_column("Empirical TVDs", empirical_tvds).hist()plt.scatter(observed_tvd, 0, color='red', s=30)In [ ]: #: grade 4.5_ = ok.grade('q4_5')Question 6. Recall that the null hypothesis was that the complaints are drawn from the populationaccording to the proportion of customers which where low-, mid-, and high-income. Lookingat the histogram above, do you think it is likely that the null hypothesis is true? Write your answerin the variable insurance_claim_true. The value of the boolean variable should be True if youagree that the null hypothesis is true, and False otherwise.In [ ]: insurance_claim_true = ...insurance_claim_trueIn [ ]: #: grade 4.6_ = ok.grade('q4_6')Question 7. Does rejecting the null hypothesis in this case prove (or otherwise highly suggest)that the company is biased in its treatment of customers? Why or why not?Write your answer here.1.6 5. Loaded Die... And we are back to rolling dice! A loaded die is one that is unfair, i.e., does not have equalprobability for each of the outcomes 1–6 (inclusive).Question 1. Your friend Aby has a model that says that the die is loaded in a way such thatthe probability of "1" coming up is 0.5 and all the other values have the same probabilities.Write down Aby’s model’s distribution as an array. It should contain 6 elements, each describingthe probability of seeing the corresponding face of the die, and it should sum to 1.7In [ ]: aby_hypothesis_model_distribution = ...aby_hypothesis_model_distributionIn [ ]: #: grade 5.1_ = ok.grade('q5_1')Question 2. Say we want to test Aby’s model. In particular, we wish to test if the probabilityof "1" coming up is 0.5. We roll the die 10 times and we got "1" a whopping 8 times. We claim thatAby’s model is wrong. In order to substantiate our claim, we run a simulation of the die-roll.Write a simulation and run it 5000 times, maintaining an array differences which keeps trackof the absolute difference between number of ’1’s that were seen and the expected number (5) ineach simulation.In [ ]: # Place your answer here. It may contain several lines of code.In [ ]: #: Visualize with a histogramTable().with_column("Difference", differences).hist(bins=np.arange(8))In [ ]: #: grade 5.2_ = ok.grade('q5_2')Question 3. Recall that we saw the die come up "1" eight times. Set the variablenull_hypothesis_boolean below to True if you think Aby’s model is plausible or False if itshould be rejected.In [ ]: null_hypothesis_boolean = ...null_hypothesis_booleanIn [ ]: #: grade 5.3_ = ok.grade('q5_3')Question 4. Now, we check the p-value of our claim. That is, compute the proportion oftimes in our simulation that we saw a difference of 3 or more between the number of ’1’s and theexpected number of ’1’s. Assign your result to p_value_5_4In [ ]: p_value_5_4 = ...p_value_5_4In [ ]: #: grade 5.4_ = ok.grade('q5_4')To submit:1. Select Run All from the Cell menu to ensure that you have executed all cells, including thetest cells.2. Read through the notebook to make sure everything is fine.3. Submit using the cell below.4. Save PDF and submit to gradescopeIn [ ]: #: For your convenience, you can run this cell to run all the tests at once!import os_ = [ok.grade(q[:-3]) for q in os.listdir('tests') if q.startswith('q')]81.7 Before submitting, select "Kernel" -> "Restart & Run All" from the menu!Then make sure that all of your cells ran without error.In [ ]: #: submit your notebook_ = ok.submit()1.8 Don’t forget to submit to both OK and Gradescope!因为专业,所以值得信赖。如有需要,请加QQ:99515681 或邮箱:
微信:codinghelp