In this post, we are going to see how variations in framing an Event can change the way we solve the problem. This post is a part of our continuing series on Probability.
Consider this problem[1]statement:
Suppose \(k\) identical boxes contain n balls numbered \(1\) through \(n\). One ball is drawn from each box. What is the probability that \(m\) is the largest number drawn?
This problem can be seen in any of the two similar ways of interpreting repeated trials:
- There is an experiment in which we draw a ball from a box containing balls numbered \(1\) through \(n\). Then, the experiment is repeated \(k\) times. Alternatively, a ball is drawn from the same box with replacement \(k\) times.
- There is just one experiment of drawing one ball each from \(k\) identical boxes.
In the first case, the sample space of each trial contains just the numbers on the balls in the box \(n \)as its elements. In the second case, the sample space consists of \(n^k\)elements.
So, let us see how the approach to the solution may vary in both the cases:
Let the numbers be \(\{1,2,3\ldots m-1,m, m+1,\ldots n-1, n\}\)
Repeat Experiment k times:
Since we are interested in finding out the probability that \(m\) is the largest number drawn, we can define it as a combination of mutually exclusive events.
We want to see in how many ways can the event be possible. One way of doing it might be:
(Getting m exactly once and all other k-1 observations are less than m) + (Getting m exactly twice and all other k-2 observations are less than m) + ... + (Getting m exactly k-1 times and 1 observation is less than m) + (Getting m exactly k times and no observation is less than m)
Each of the events is mutually exclusive of the other events.
Here, we can look into one particular event and hope to generalize it, resulting in a series.
For simplicity, consider one particular sequence of repeating the experiment k times and getting a ball with number 5 (i.e. m=5) exactly 2 times. The balls are numbered from 1 through 10 (i.e. n=10).
\(1,2,3,1,4,5,3,4,5,2\ldots 2,3,4\)
Here, one may reason that the probability of picking a 5 numbered ball in a single draw is \(\frac{1}{10}\)So, the entire sequence may be modelled as a sequence of tosses, where getting a 5 means success and not getting a 5 means failure. Note that Failure is a composite event of (Getting less than 5 or more than 5). Further, each iteration of the experiment is assumed to be independent of each other. Then, the probability of getting this sequence with exactly 2 successes and k-2 failures distributed as k-2 numbers less than 5 and 0 numbers greater than 5, is:
\((\frac{1}{10})^2(\frac{4}{10})^{k-2}(\frac{5}{10})^0\)
And this is just one sequence. There are \(k \choose 2\)ways of getting the same probability (from different combinations).
Then, total probability of getting exactly 2 times, a m=5 numbered ball as the highest numbered ball in k draws is:
\({k\choose 2} (\frac{1}{n})^2(\frac{m-1}{n})^{k-2}\)
Generalizing, the total probability of getting exactly i times, a 'm' numbered ball as the highest numbered ball in k draws is:
\({k\choose i} (\frac{1}{n})^i(\frac{m-1}{n})^{k-i}\)
Using this equation, the total probability that m is the highest number drawn is:
\(\sum_{i=1}^{k}{k\choose i} (\frac{1}{n})^i(\frac{m-1}{n})^{k-i}\)
Single experiment of k draws with replacement
Another approach can be based on the following reasoning grounded in the collective trail of k draws with replacement:
(Getting exactly 1 draw with number m) + (Getting k-1 draws with number m or less than m) + (Getting 0 draws with number more than m)
This is really short! Let us see what happened here:
In order to have m as the highest number, it must be drawn at least once. So, the (Getting exactly 1 draw with number m) ensures that. Beyond that, it may be drawn more than once, but might not be drawn at all. This is ensured by (Getting k-1 draws with number m or less). Finally, no number greater than m must be drawn.
This probability can then be computed as a multinomial probability. In how many ways can we distribute k balls among 3 partitions such that partition 1 gets exactly one ball, partition 2 gets k-1 and partition 3 gets 0 balls?
\(\frac{k!}{(k-1)!\cdot1!}(P(getting\text{ }m))(P(getting\text{ }m\text{ }or\text{ }less))^{k-1} = \frac{k!}{(k-1)!\cdot1!}(\frac{1}{n})(\frac{m}{n})^{k-1}\)
This method, however, is wrong! This multinomial solution overcounts. Here's an illustration:
Consider k=2, n=10, m=5. Then, we have had 2 draws. Two partitions are possible. So, either 'exactly 5' goes to the first partition, or to the second partition. Consequently, '5 or less' goes to the other partition. But, in the overall experiment, a combination of (5,5) will be repeated twice with this method, and we'll get an excess of 1/10.
Let's see if this is correct.
For 2 draws, the possible favourable outcomes are: {(both 5), (first 5 and second less than 5), (first less than 5 and second 5)}.
The probabilities with each event are \(\frac{1}{10}^2\), \((\frac{1}{10}\times\frac{4}{10})\), \((\frac{4}{10}\times \frac{1}{10})\) respectively. The sum comes out to be \(\frac{9}{100}\).
Using the multinomial formula, it comes out to be \(\frac{10}{100}\) which is \(\frac{1}{10}\) greater than what we computed before. Thus, this line of reasoning, though cogent, is wrong! The approach given before gives the correct solution.
Alternative Solution
There is yet another way of getting at the desired probability, much like how the CDF is defined:
Consider a discrete number line starting from 0 and extending to a large integer number n. On this line, a value m is defined. Rather than telling us the Probability that the number 'r' occurs with certain probability, its distribution is given:
\(P(x\leq r) = P_r\)
Then, if one is to compute the probability \(P(X=r)\), one can do so as \(P(x\leq r) - P(x \leq r-1)\).
On similar lines, one may find the probability P('m' is the largest number drawn in k draws) by evaluating the probability as P(m or less than m is drawn in k draws) - P(m-1 or less than m-1 is drawn in k draws).
So, what is the probability P(m or less than m is drawn k times)?
\((\frac{m}{n})^k\)
So, the probability final probability is:
\(P(m\text{ }is\text{ }the\text{ }highest\text{ }number\text{ }drawn) = (\frac{m}{n})^k - (\frac{m-1}{n})^k\)
Plug in the values from the example as k=2, n=10, m=5, we have the probability as \(\frac{9}{100}\).
Compare this method with the iterative method discussed above. It is much neater!
This can be simulated and interesting observations can be made from the simple experiment. Here's a code for Matlab/Octave:
%%Initialize variables: num_iter = 100000; %Number of iterations n = 10; %Number of balls numbered 1...n m = 5; %Number we want to compute probability for k = 2; %Number of boxes num_favourable = 0; for ii=1:num_iter outcomes = randi(n, [1,k]); %Outcomes from each box if(max(outcomes)==m) num_favourable = num_favourable+1; end end fprintf('\nProbability that m=%d number is the highest is %f\n', ... m, num_favourable/num_iter);
An interesting observation can be made with this code and also from the Alternative Formulation above: Increasing 'k' (number of boxes) reduces the chances of small numbers to win. That is, if k increases, there are more chances that every number appears in the boxes. Then, the higher numbers will have increased chances of winning. From the forumla above, increasing k will reduce smaller 'm's more than larger 'm's keeping 'n' same. Example: for k=2, n=10 and m=5, Probability of m being largest is \(\frac{900}{10000}\). For k=10, it comes to be \(\frac{87}{10000}\). The chances have reduced significantly. Similarly, for m=10, the respective probabilities are \(\frac{19}{100}\) and \(\frac{65}{100}\)
References:
[1] Athanasios Papoulis and S. Unnikrishna Pillai, "Probabilty, Random Variables and Stochastic Processes", Fourth Edition, Chapter 2, Problem 2.17