## Moments expectation variance standard deviation and coefficient of variation

We made the analogy between a discrete random variable and the frequency histograms that one might prepare when dealing with data and will continue to do so. For concreteness, suppose that zk represents the size of plants in the kth category and fk represents the frequency of plants in that category and that there are n categories. The sample mean (or average size) is defined as Z — ^1=i fkzk and the sample variance (of size), which is the average of the dispersion (zk —Z) and usually given the symbol a2, so that a2 — 1=1 f (zk — Z).

These data-based ideas have nearly exact analogues when we consider discrete random variables, for which we will use E{Z} to denote the mean, also called the expectation, and Var{Z} to denote the variance and we shift from f, representing frequencies of outcomes in the data, to p, representing probabilities of outcomes. We thus have the definitions n n

For a continuous random variable, we recognize that f(z)dz plays the role of the frequency with which the random variable falls between z and z + dz and that integration plays the role of summation so that we define (leaving out the bounds of integration)

Here's a little trick that helps keep the calculus motor running smoothly. In the first expression of Eq. (3.16), we could also write f (z) as — (d/dz)[1 — F(z)], in which case the expectation becomes

We integrate this expression using integration by parts, of the form J udv = uv — J vdu with the obvious choice that u = z and find a new expression for the expectation: E{Z} = J (1 — F(z))dz. This equation is handy because sometimes it is easier to integrate 1 — F(z) than zf(z). (Try this with the exponential distribution from Eq. (3.13).)

For a continuous random variable, the variance is Var{Z} = J (z — E{Z})2 f (z)dz. Show that an equivalent definition of variance is Var{Z} = E{Z2} — (E{Z})2 where we define E{Z2} = Jz2f (z)dz.

In this exercise, we have defined the second moment E{Z2} of Z. This definition generalizes for any function g(z) in the discrete and continuous cases according to

In biology, we usually deal with random variables that have units. For that reason, the mean and variance are not commensurate, since the mean will have units that are the same as the units of the random variable but variance will have units that are squared values of the units of the random variable. Consequently, it is common to use the standard deviation defined by

since the standard deviation will have the same units as the mean. Thus, a non-dimensional measure of variability is the ratio of the standard deviation to the mean and is called the coefficient of variation

Three series of data are shown below:

Series B: 1401, 1388, 1368, 1379, 1382, 1383, 1395

Ask at least two of your friends to, by inspection, identify the most variable and least variable series. Also ask them why they gave the answer that they did. Now compute the mean, variance, and coefficient of variation of each series. How do the results of these calculations shed light on the responses?

We are now in a position to discuss and understand a variety of other probability distributions that are components of your toolkit.

0 0