Find the expected value . Find the variance . Determine the cumulative distribution function (by hand). Graph the probability mass function.

DISCUSSION QUESTION

  1. Suppose we have two jars. The first jar contains 4 white and 6 black marbles while the second one contains 4 white and 3 black marbles.  One marble is drawn at random from one of the jars, and it is found to be black.  Find the probability that it was drawn from the first jar.

 Hint: Probability of selecting jar 1 and jar 2 is the same.

 To answer problems 2 through 5, consider the random experiment of rolling two fair, distinct dice.  Let  be the maximum of the two numbers.  Recall, for example,  max (12, 15) = 15 and max (10, 10) = 10.

  1. Determine the probability mass function .
  2. Determine the cumulative distribution function .
  3. Find the expected value .
  4. Find the variance .

R Part

Let

  1 3 7 11 15 21 23 37
  0.1 0.2 0.2 0.1 0.1 0.05 0.05 0.2

 

 

  1. Find the expected value .
  2. Find the variance .
  3. Determine the cumulative distribution function (by hand).
  4. Graph the probability mass function.
  5. Graph the cumulative distribution function.

 

 

 

If the reliability of a single wire is 0.15, what is a minimal number of wires counted from 70 a cable must have in order to maintain the reliability of the cable (and thus of the system) to at least 0.98?

DISCUSSION QUESTION

1 Explain in Example 5.8 (Chapter I, pages 45-46) of the Lecture Notes (a) how A was obtained and (b) what went wrong with the Poisson approximation. (5 pts) Problem #2 is computational, and it should be rendered using the R-programming (no other language or tool). Provide me not only with an answer, but also with the source code of the program. You must copy and paste your program code from the R console into your paper and provide me with snapshots of your screen (embedded in the paper) as evidence that you really ran the program and no one else. Otherwise, the problem won’t be graded. Also, your work must be rendered independently, without any collaboration with others. A joint or similar work won’t be accepted. Hint. See Example 5.3 on page 39-41 and a source code in the notes.

2 A parallel reliability system consists of cables of wires supporting a bridge. Each cable must have at least 70 wires. If the reliability of a single wire is 0.15, what is a minimal number of wires counted from 70 a cable must have in order to maintain the reliability of the cable (and thus of the system) to at least 0.98? (10 pts)

 

Name the distribution, with specific parameter values, which can be used to model X. Write out the correct formula, with specific numbers substituted, to find the probability that there is at most one working battery among the four sampled batteries. Do not solve.

Stat 311 Quiz 2

1 This is an untimed quiz that must be completed and uploaded before the end of the window. This quiz is open Stat 311 notes, textbook, and homework/solutions only. All responses must be your own. If I suspect that you collaborated with other people or put down answers that match something you found on the internet, your quiz score will be zero and I will file a report with the Student Conduct office. By uploading your quiz to Gradescope, you are acknowledging that you adhered to the rules and academic conduct standards set by the University of Washington.

  • Unless specified otherwise, to receive credit you must show your work on all computations or give brief explanations, where appropriate.

o You will be graded only on the work you show.

o Partial credit may be given when you show the process used to solve a problem, even if your answer is incorrect.

o You only need to summarize answers in sentences where specified.

  • Any probabilities or quantiles can be found using R. When you use R, be sure to include the code and output.
  • All final numerical answers should be accurate to two decimal places (nearest hundredth); however, probabilities should be reported to four decimal places, if applicable
  • All intermediate calculations that are not probabilities should be accurate to at least three decimal places.
  • Do not forget units where applicable.
  • Read each problem carefully and follow directions. Make sure you answer the question that is asked.
  • When writing or typing out your solutions, please do NOT squish all your work together. Leave space between problems and the parts of each problem. Make your answers easy to read! If we cannot easily ready your work, it will not be graded.

The quiz consists of 9 multi-part problems. Each part of every problem is equally weighted. 2Problem 1 (1 point each): A junk box in your room contains 16 old batteries, 6 of which are dead. Define the random variable X = the number of working batteries in a sample of 4 batteries from the junk box.

  1. a) Name the distribution, with specific parameter values, which can be used to model X.
  2. b) Write out the correct formula, with specific numbers substituted, to find the probability that there is at most one working battery among the four sampled batteries. Do not solve.

Problem 2 (1 point each): The table below summarizes 272 films from 2011 that have been classified into a genre and have a rating. Define the events R = Rating PG-13 and G = Comedy. Use events R and G and the table below to answer parts(a) – (d). Show your work and summarize your answers in a sentence in the context of the problem.

Rating

Genre PG PG-13 R Total

Action/Adventure 20 15 20 55

Comedy 20 22 31 73

Documentary 8 11 9 28

Drama 20 39 57 116

Total 68 87 117 272

  1. a) For a randomly selected film what is 𝑃𝑃(𝑅𝑅 ∩ 𝐺𝐺)?
  2. b) For a randomly selected film what is 𝑃𝑃(𝑅𝑅|𝐺𝐺)?
  3. c) For a randomly selected film what is 𝑃𝑃(𝐺𝐺|𝑅𝑅)?
  4. d) Are the events R and G independent? Justify your answer using an equation with all the numbers filled in.

Problem 3 (1 point each): A machine operation produces bearings whose diameters are normally distributed with 𝜇𝜇 = 4.2 mm and 𝜎𝜎 = 0.08 mm. Use R to find probabilities and quantiles, as appropriate. For any problems where you use R, be sure to include the code and R output. Summarize each answer in a sentence in the context of the problem.

  1. a) If you randomly select one bearing from this population, what is the probability that the bearing has a diameter less than 4 mm?
  1. b) What bearing diameter separates the largest 6% of diameters from among all diameters from this population?
  2. c) A randomly selected bearing from this group has a diameter of 4.25 mm. What percentile does this correspond to?
  1. d) A random sample of 20 bearings are selected from this population. What is the probability distribution for the sample mean diameter? Be sure to name the distribution and the values of any parameters.
  1. e) What is the probability that the mean bearing diameter exceeds 4.26 mm?
  2. f) Find the probability that a random sample of 20 bearings has a mean diameter of exactly 4.2 mm.

3Problem 4 (1 point each): Jerry and Jill are playing in a bowling tournament. Their scores vary as they play multiple games. Jerry’s scores, X, follow a 𝑁𝑁(𝜇𝜇 = 208, 𝜎𝜎 = 7) distribution. Jill’s scores, Y, vary from game to game according to a 𝑁𝑁(𝜇𝜇 = 200, 𝜎𝜎 = 9) distribution. Define a new random variable 𝐷𝐷 = 𝑋𝑋 – 𝑌𝑌 for a single game in the tournament.

  1. a) Assuming Jerry and Jill play independently, what is the distribution of D? Include the name of the distribution and the values of the parameters. Show your work.
  1. b) What is the probability that Jerry will score higher than Jill in the next game in the tournament? Summarize your answer in a sentence.

Problem 5 (1 point each): A study concluded that among people infected with Cytomegalovirus (CMV), 98.1% of tests were correctly positive, while for people not infected with the virus, 97.6% of the tests were correctly negative. We also know that 20% of people carry the virus.

  1. a) What is the probability that a randomly selected person tests negative for CMV?
  2. b) What is the probability that a randomly selected person testing negative for CMV is truly CMV free?

Problem 6 (1 point each): Define X to be the time to wait for placing an order at a drive through window and assume X follows a continuous uniform distribution between 0 and 11 minutes. Include units as appropriate.

  1. a) What is the height of the probability density function?
  2. b) Find the mean wait time for placing an order at the drive through window.
  3. c) Find the probability that the time to wait for placing an order is between 5 and 7 minutes.
  4. d) About 75% of the customers are expected to wait at most x minutes. Find x.

Problem 7 (1 point each): The number of typos per page in a certain printing of a novel has an average of 1.2 typos/page. Let 𝑋𝑋 be the number of typos in a random selection of 7 pages and assume 𝑋𝑋 follows a Poisson distribution.

  1. a) Specify the values of any parameters for the distribution of 𝑋𝑋.
  2. b) Write out the formula, with numbers substituted, to calculate the probability that you observe at least three typos in a random selection of 7 pages, but do not solve. (1 point)

4Problem 8 (1 point each): A test consists of 32 multiple choice questions with five choices for each question. As an experiment, you GUESS on each answer without even reading the questions. Define X to be the number

of questions you get correct based on guessing and assume X follows a binomial distribution. Use this

information to answer parts (a) and (b). Include units where appropriate.

  1. a) Find 𝜇𝜇 𝑋𝑋 .
  2. b) Which of the options below is the exact probability that you guess correctly on at least 14 questions?
  3. 1 − ∑ �32

𝑖𝑖 � (0.2)𝑖𝑖 (0.8) 32−𝑖𝑖32

𝑖𝑖=14

  1. 1 − ∑ �32

𝑖𝑖 � (0.2)𝑖𝑖 (0.8)32−𝑖𝑖13

𝑖𝑖=1

  1. ∑ �32

𝑖𝑖 � (0.2)14 (0.8)1832

𝑖𝑖=14

  1. ∑ �32

𝑖𝑖 � (0.2)𝑖𝑖 (0.8) 32−𝑖𝑖32

𝑖𝑖=15

  1. 1 − ∑ �32

𝑖𝑖 � (0.2)𝑖𝑖 (0.8)32−𝑖𝑖13

𝑖𝑖=0

  1. None of the above.

Problem 9 (1 point each): A game is played in two steps. First you flip an unfair coin with 𝑃𝑃(𝐻𝐻) = 0.6. If you get heads, you draw a marble from an Urn that has 6 red marbles and 4 blue marbles. If you get tails, you flip the coin again.

  1. a) What is the sample space for this two-step game?
  2. b) Let 𝐴𝐴 be the event the outcome has a tail. What is the probability of event 𝐴𝐴?
  3. c) Let 𝐵𝐵 be the event the outcome has a red marble. What is the probability of event 𝐵𝐵?

 

Discuss the distribution and identify characteristics of the distribution in one paragraph. Write this paragraph using a scientific voice in the writing and use scientific notation.

Descriptive statistics

Using the data provided, calculate the following descriptive statistics: (You may use EXCEL to complete this work. Show all formulas and work in your answer).

  1. ) mean, median, and mode
  2. ) Range, Variance, and Standard Deviation
  3. ) Calculate standard scores for each individual (z and t scores)

. Discuss the distribution and identify characteristics of the distribution (e.g., is it a normal or skewed distribution – be sure to discus the why of your decision) in one paragraph. Write this paragraph using a scientific voice in the writing and use scientific notation.

Look for outliers and unusual scores in the data. Write a paragraph or two discussing your findings and interpreting the data. Be sure to compare z or t scores and note significance in your comparisons.

 

Create a summary of the dataset Retail_ TMA.xlsx in tabular format. You should identify each data field’s type (nominal, ordinal, interval, or ratio), examine individual variables, and calculate summary measures when applicable.

Question 1

You are to analyse a dataset Retail_ TMA.xlsx to answer the following questions. In the real world of data analysis, when you are not querying the database directly and data is provided to you, it is essential to evaluate the data and ask pertinent questions about the data quality. It enables us to confirm the validity of any insights drawn from the data. The provided dataset contains transaction records from an online retailer. Each record represents a transaction involving a unique product (Product Code) and a specific customer (Customer ID) on a certain day (Receipt Date). The Quantity displays the total number of units sold, and the Price displays the Product Code unit price. There may be more than one Product Code in a Receipt. Location is a representation of the customer’s country, and Description is a description of the product. Download the dataset “Retail_ TMA.xlsx” from Canvas and answer the following questions. You may use any software tool (such as Excel, PowerBI, etc.) to produce the chart(s) and table(s).

(a) Identify one (1) business question statement that can be answered by analysing the dataset Retail_ TMA.xlsx. Your description should clearly explain what data fields are relevant and how the data fields can be used to answer the business question. (Up to 150 words for part (a))(15 marks)

(b) Create a summary of the dataset Retail_ TMA.xlsx in tabular format. You should identify each data field’s type (nominal, ordinal, interval, or ratio), examine individual variables, and calculate summary measures when applicable. (Up to 200 words for part (b))  (15 marks)

(c) Prepare the dataset Retail_ TMA.xlsx in a form suitable for analysis by fixing the data issues and errors with explanation and justification of necessary data transformations. Illustrate any necessary data preparation with example screenshot(s). (Up to 200 words for part (c))  (28 marks)

(d) Employ two (2) graphical charts and one (1) pivot table to present the key features of the data variables or to explore the relationship among the variables in the dataset Restaurant_Grades.xlsx. You may use any software tool (such as Excel, PowerBI, etc.) to produce the proposed graphical charts and the pivot table. Provide a screenshot of each produced chart/pivot table. Use up to 250 words to explain how the charts and the pivot table are created and discuss why the charts and the pivot table are recommended.  (22 marks

 

Below is a list of 8 discussion topics. For each topic write a response of at least 200 words to describe what you learned about the topic as a student.

Statistics Discussion Topics

Below is a list of 8 discussion topics. For each topic write a response of at least 200 words to describe what you learned about the topic as a student.

  • Statistics, Frequency Tables, and Graphic Presentations
  • Numerical Summaries of Data.
  • Linear Regressions and Correlation
  • Probabilities using addition and multiplication rules
  • The Normal Distribution
  • Sampling Methods and the Central Limit Theorem
  • Confidence Intervals and Hypothesis Testing
  • Inference in Linear Models, Analysis of Variance

 

In how many ways a sample containing non-defective and defective part can be chosen from a group containing non-defective and defective parts?

ASSIGNMENT A

1.Suppose a point is to be selected at random from the unit square. Write the sample space associated with this random experiment.

2.Let , the set of odd natural numbers, and  the set of natural numbers divisible by 5.  What is the set ?

3.A computer password consists of two lowercase letters followed by a capital letter and four digits. Find the total number of possible passwords.

4.A picture can be added to a text in nine different locations on a page.  If four different pictures are to be placed in the text, how many different designs are there?

5.In how many ways a sample containing non-defective and  defective part can be chosen from a group containing  non-defective and  defective parts?

6.How many different ways can we order the letters of the word BANANAS?

 

R Problems: For full credit, all session commands and all related results must be copied and pasted from R.

1Compute

2.If is the vector (12, -3, 0, 14, -7, 4) and is the vector (1, 0, 0.25, 7, 15, 0.6), find the vector .

3.Let be the vector (3, 7, 0, 3, 5),  be the vector (a, t, c, u, g), and  be the vector (-8, 0, -4, 0, -5).  Write the data frame whose columns are these vectors.

4.Let . Write this as a function in R and find

 

 

 

Using the definitions found in Chapter 1 of Quantitative Analysis, the Internet, and your own personal experiences, make notes on and post one example of each of the following to the class Discussion Board topic “Deterministic and Probabilistic Models”.

Action Items

Using the definitions found in Chapter 1 of Quantitative Analysis, the Internet, and your own personal experiences, make notes on and post one example of each of the following to the class Discussion Board topic “Deterministic and Probabilistic Models”.

  • A deterministic model ;
  • A probabilistic model ; and
  • A situation in which you could use post optimality analysis (also known as sensitivity analysis).

 

What type of hypothesis are you testing – difference between groups or an association between variables? How many groups are you analyzing (1, 2, 3+)?

Choosing the Correct Statistical Test Worksheet

Directions: Review your chapter readings and the following resources found in Topic 8 Resources: Choosing the Right Statistical Test: Types and Examples and An Introduction to Statistics: Choosing the Correct Statistical Test, then walk through the tree diagram and answer the following questions.

1.What type of hypothesis are you testing – difference between groups or an association between variables?

2.How many groups are you analyzing (1, 2, 3+)?

3.What type of data are you analyzing (nominal, ordinal, interval/ratio)?

4.What type of design are you using (between-subjects experimental, within-subjects experimental, quasi-experimental, correlational)?

5.Based on your proposed design, what type of statistical test (analysis) do you think will be appropriate and why?

 

 

 

Quantify the uncertainty of your estimate by giving the standard deviation of the distribution. Round your response to at least three decimal places.

CASE STUDY

Not all visitors to a certain company’s website are customers or potential customers. In fact, the company’s executives estimate that about 8% of all visitors to the website are looking for other websites. Assume that this estimate is correct and that a random sample of 40 visitors to the website is taken.

Answer the following. (If necessary, consult a list of formulas.)

(a) Estimate the number of visitors in the sample who actually are looking for the company’s website by giving the mean of the relevant distribution (that is, the expectation of the relevant random variable). Do not round your response.

(b) Quantify the uncertainty of your estimate by giving the standard deviation of the distribution. Round your response to at least three decimal places.