Moments of a Probability Distribution

We are now familiar with some of the properties of probability distributions. On this page we will introduce a set of numbers that describe various properties of such distributions. Some of these have already been encountered in our previous discussion, but now we will see that these fit into a pattern of quantities called moments of the distribution.



Moments


Let $ f(x)  $ be any function which is defined and positive on an interval $ [a,b] $ . We might refer to the function as a distribution, whether or not we consider it to be a probability density distribution. Then we will define the following moments of this function:


\[ 
{\rm zero'th~moment}~~M_0 =~\int_a^b ~f(x)~dx 
\] 
\[ 
{\rm first~moment}~~M_1 =~\int_a^b ~x~f(x)~dx 
\] 
\[ 
{\rm second~moment}~~M_2 =~\int_a^b ~x^2~f(x)~dx 
\] 
\[ 
\vdots 
\] 
\[ 
{\rm n'th~moment}~~M_n =~\int_a^b ~x^n~f(x)~dx 
\]


Observe that moments of any order are defined by integrating the distribution $ f(x)  $ with a suitable power of x over the interval [a,b]. However, in practice we will see that usually moments up to the second are usefully employed to describe common attributes of a distribution.



Moments of a Probability Density Distribution


In the particular case that the distribution is a probability density, we have already established the following :


\[ 
M_0 = 1 
\] 
\[ 
M_1 =~\int_a^b ~x~p(x)~dx = {\bar x} = \mu 
\]

This follows from the facts that probability distributions are normalized so that the area under the curve is always 1, (hence the zero'th moment is 1) and the average, or mean of the distribution is defined by the integral that also happens to be the first moment. In the past we have used the symbol $ {\bar x}  $ to represent the mean or average value of x but often the symbol $ \mu  $ is also used for this quantity.
But what role does the second moment,

\[ 
M_2 =~\int_a^b ~x^2~p(x)~dx 
\]

play ? We will shortly see that the second moment helps describe the way that the "mass" or probability density is distributed about its mean. For this purpose, we must describe the notion of variance or standard deviation.



Variance and Standard Deviation

Two kids of roughly the same size can balance on a teeter-totter by sitting very close to the point at which the beam pivots as shown in the diagram below.




They can also achieve a balance by sitting at the very ends of the beam, equally far away as shown in the next diagram.




In both cases, the center of mass of the distribution is at the same place: precisely at the pivot point. However, the mass is distributed very differently in these two cases. In the first case, the mass is clustered close to the center, whereas in the second, it is distributed further away. The line segment under the two diagrams represents how far away the masses are from the center of mass. In the first case, this distance is small. In the second case it is larger.

If we want to be able to describe how mass is distributed, we need to talk about attributes of the mass distribution other than just where its center of mass is located. Similarly, if we want to explain to someone how a probability density distribution is distributed about its mean, we would have to consider moments higher than the first. This is precisely what we shall do below. We will use the idea of the variance to describe whether the distribution is clustered close to its mean, or spread out over a great distance from the mean.

The variance is defined as the average value of the quantity $ (distance ~from~mean)^2  $ . This average is taken over the whole distribution. (The reason for the square is that we would not like values to the left and right of the mean to cancel out. )

The standard deviation is defined as $ \sqrt{variance}  $ .

If we had a random variable that takes on only discrete values $ x_i $ , with probability $ p_i $ and this discrete probability distribution has mean $ \mu $ we would define the variance as the average given by

\[ 
V= \sum ~(x_i-\mu)^2 p_i 
\]

Note that it is not necessary to divide by the number of values because the sum of the discrete probabilities is 1, i.e. $ \sum p_i=1 $ . Now for a continuous probability density, with mean $ \mu $ , we define similarly


\[ 
V= \int_a^b~(x-\mu)^2 ~p(x) ~dx 
\]

The standard deviation is then

\[ 
\sigma=\sqrt{V} 
\]

Let us see what this implies about the connection between the variance and the moments of the distribution. From the equation for variance we calculate that

\[ 
V= \int_a^b~(x-\mu)^2 ~p(x) ~dx ~=~\int_a^b~(x^2-2\mu x + \mu^2) ~p(x) ~dx 
\]

Thus

\[ 
V=~\int_a^b~x^2~p(x)dx   - ~\int_a^b~ 2\mu x~p(x) ~dx + ~\int_a^b~ 
\mu^2 ~p(x) ~dx 
\] 
\[ 
~~~=~\int_a^b~x^2~p(x)dx -2 \mu ~\int_a^b~ x~p(x) ~dx +\mu^2 ~\int_a^b ~p(x) ~dx 
\]

We recognize the integrals in the above expression, since they are simply moments of the probability distribution. Plugging in these facts, we arrive at


\[ 
V=~M_2 -2 \mu ~\mu  +\mu^2 =M_2- \mu^2 
\]

Thus the variance is clearly related to the second moment and to the mean of the distribution. Further, the standard deviation is then


\[ 
\sigma~=~\sqrt{M_2- \mu^2} 
\]


Example

Consider the continuous distribution, in which the probability is constant for values of x in the interval [a,b] and zero for values outside this interval. Such a distribution is called a uniform distribution. (It has the shape of a rectangular band of height C and base (b-a).) It is easy to see that the value of the constant C should be 1/(b-a) so that the area under this rectangular band will be 1, in keeping with the property of a probability distribution.
We compute that

\[ 
M_0 = \int_a^b p(x) dx = \frac{1}{b-a}\int_a^b 1 dx =1 
\]

(this was already known to us, since we have determined that the zero'th moment of any probability density is 1.) We also find that

\[ 
M_1 = \int_a^b ~x~p(x)~ dx = \frac{1}{b-a}\int_a^b~ x ~dx 
\] 
\[ 
~~~ = \frac{1}{b-a} \frac{x^2}{2}|_a^b = \frac({b^2-a^2)}{2(b-a)} 
\]

This last expression can be simplified by factoring, leading to


\[ 
\mu=M_1  \frac{(b-a)(b+a)}{2(b-a)} = \frac{b+a}{2} 
\]

Thus we have found that the mean $ \mu $ is in the center of the interval [a,b], as expected. The median would be at the same place by a simple symmetry argument: half the area is to the left and half the area is to the right of this point.

To find the variance we might first calculate the second moment,

\[ 
M_2 =\int_a^b ~x^2~p(x)~ dx = \frac{1}{b-a}\int_a^b~ x^2 ~dx 
\]

It can be shown by simple integration that this yields the result

\[ 
M_2 = \frac{b^3-a^3}{3(b-a)} 
\]

We would then compute the variance

\[ 
V= M_2-\mu^2 = \frac{b^3-a^3}{3(b-a)}- \frac{(b+a)^2}{4} 
\]

After simplification, we get

\[ 
V= \frac{(b-a)^2}{12} 
\]

The standard deviation is then

\[ 
\sigma=  \frac{(b-a)}{2~ \sqrt{3}} 
\]



For your consideration:




In the demonstration below, you are invited to experiment with a variety of mass distributions and determine the size of the standard deviation, shown as the line segment with arrows underneath the diagram. Notice what happens when you add masses, and change their relative position. You should be able to change both center of mass and standard deviation, or change them independently.