Section \(8.5.1\) of Rice discusses multinomial cell probabilities.
Data consisting of: \[ X_1, X_2, \ldots, X_m\]
are counts in cells \(1, \ldots, m\) and follow a multinomial distribution
\[f(x_1, \ldots, x_n \mid p_1, \ldots, p_m ) = { n! \over \prod_{j=1}^m x_i ! } \prod_{j=1}^m p_j ^{x_j}\]
where
\((p_1, \ldots, p_m)\) is the vector of cell probabilities with \(\sum_{i=1}^m p_i=1.\)
\(n=\sum_{j=1}^m x_i\) is the total count.
These data arise from a random sample of single-count Multinomial random variables, which are a generalization of Bernoulli random variables (\(m\) distinct outcomes versus 2 distinct outcomes).
Suppose
\(W_1, W_2, \ldots, W_n\) are iid \(W \sim Multinomial(n, probs=(p_1, \ldots, p_m))\) random variables:
The sample space of each \(W_i\) is \({\cal W} = \{1, 2, \ldots, m\}\), a set of \(m\) distinct outcomes.
\(P( W_i=k) = p_k,\) \(k=1,2, \ldots, m.\)
Define
\(X_k = \sum_{i=1}^n 1( W_i =k)\), (sum of indicators of outcome \(k\)), \(k=1, \ldots, m\)
X\(=(X_1, \ldots, X_m)\)
\[\begin{array}{lcl} lik(p_1, \ldots, p_m) &=& log [ f(x_1, \ldots, x_m \mid p_1, \ldots, p_m )] \\ &=& log (n!) - \sum_{j=1}^m log( x_j !) + \sum_{j=1}^m x_j log(p_j) \\\end{array}\]
The MLE of \((p_1, \ldots, p_m)\) maximizes \(lik(p_1, \ldots, p_m)\) (with \(x_1, \ldots, x_m\) fixed!)
Maximum achieved when differential is zero
Constraint: \(\sum_{j=1}^m p_j =1\)
Apply method of Lagrange multipliers
Solution: \(\hat p_j = x_j /n\), \(j=1, \ldots, m.\)
Note: if any \(x_j =0,\) then \(\hat p_j=0\) solved as limit
Equilibrium frequency of genotypes: \(AA\), \(Aa\), and \(aa\)
\(P(a)=\theta\) and \(P(A)=1-\theta\)
Equilibrium probabilities of genotypes: \((1-\theta)^2\), \(2(\theta)(1-\theta)\), and \(\theta^2.\)
Multinomial Data: \((X_1, X_2, X_3)\) corresponding to counts of \(AA\), \(Aa\), and \(aa\) in a sample of size \(n.\)
See, e.g.
http://www.nature.com/scitable/definition/hardy-weinberg-equation-299
http://www.nature.com/scitable/definition/hardy-weinberg-equilibrium-122
Sample Data
\[\begin{array}{|l|lll|}\hline Genotype& AA & Aa & aa & Total \\ Count &X_1 & X_2 & X_3 & n\\ Frequency & 342 & 500 & 187 & 1029 \\ \hline \end{array}\]
\(\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\) \(\tilde \theta = (\tilde p)^{1/2} = (X_3/n)^{1/2}\) \(=\sqrt{187/1029}=.4263.\)
\(\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\) Solve for MLE
\[\begin{array}{rcl} l(\theta ) &=& log( f(x_1, x_2, x_3 \mid p_1(\theta), p_2(\theta), p_3(\theta)))\\ &=& log( { n! \over x_1! x_2! x_3!} p_1(\theta)^{x_1} p_2(\theta)^{x_2} p_3(\theta)^{x_3} )\\ &=&x_1 log((1-\theta)^2) + x_2 log( 2 \theta ( 1-\theta)) + x_3 log ( \theta^2) + (\text{non-}\theta \; terms)\\ &=& (2 x_1 + x_2) log(1-\theta) + (2x_3 + x_2) log(\theta) + (\text{non-}\theta \; terms)\\ &&\\ \Longrightarrow \;\; \hat \theta & =& {2 x_3 + x_2 \over 2 x_1 + 2x_2 + 2x_3} = {2 x_3 + x_2 \over 2n} = 0.4247\\ \end{array} \]
Which estimate is better?
Conduct Parametric Bootstrap Simulation!