Probability Densities

In the last section, we discussed centers of mass, which we interpreted as a weighted average of x coordinates. In this section, we will also encounter a weighted average but now within the context of probability.

The average of a quiz

First, we will consider a discrete version of our topic which is familiar to all students. Suppose that 10 students take a quiz which is marked out of a maximum of 5 marks. How can we compute the average score?

Student Number 1 2 3 4 5 6 7 8 9 10
Score 2 5 4 2 1 0 2 4 5 4

To compute the average score $  \bar{s}  $ , we can add all the scores together and divide by the total number of scores which is 10. Once we write this, however, it becomes clear that we can collect all the common scores together.

\bar{s} & = & \frac{1}{10} (2+5+4+2+1+0+2+4+5+4) \\ 
& = & \frac{1}{10}(0 + 1 + 2 + 2 + 2+ 4 + 4 + 4 + 5 + 5) \\ 
& = & \frac{1}{10}(0\cdot 1 + 1 \cdot 1 + 2\cdot 3 +3\cdot 0 + 
	4\cdot 3 + 5\cdot 2) \\ 
& = & \frac{1}{10}\sum_{s=0}^5 s ~n(s) 

In this last line, $  n(s)  $ represents the number of students who earn the mark s . This shows us a slightly different way to compute averages. In this case, the average score is given by averaging the possible scores weighted by the number of students who earn that score. We can now write down some simple observations which will be useful for later.


  1. $  \sum_{s=0}^5 n(s) = 10  $ This is because if we add up the number of students who earn each possible score, we are just counting the number of students in the class.

  2. In the same way, $  \sum_{s=s_1}^{s_2} n(i)  $ represents the number of students who score between $  s_1  $ and $  s_2  $ , inclusive.

  3. \[  \bar{s} = \frac{\sum_{s=0}^5 s~n(s)} 
{\sum_{s=0}^5 n(s)}  \]

In fact, it will be useful for us to consider a slightly different way of recording the information. Define $  p(s) = \frac{n(s)}{10}  $ . This represents the ratio of the number of students who score s to the total number of students. In other words, it gives us the fraction of students who score s . This new quantity satisfies properties similar to those above.

  1. $  \sum_{s=0}^5 p(s) = 1  $ .

  2. $  \sum_{s=s_1}^{s_2} p(s)  $ equals the fraction of students who score between $  s_1  $ and $  s_2  $ .

  3. $  \bar{s} = \sum_{s=0}^5 s~p(s)  $

We will call such a function a probability density which describes the results of the quiz. This name comes from the fact that property 2 above tells us how to measure the probability that a typical student earns a mark in a particular range.

To describe what we have found in rough terms, we could say that to find the average value of a list of numbers, we compute

\frac{\sum_{\mbox{possible values}} \mbox{value} \times \mbox{number of times this value occurs in the list}}{\mbox{total number of values on the list}} 

The Continuous Case

In the example above, the possible scores on the quiz are all integers and so form a discrete set. We would like to apply these ideas to the case where the possible values vary in a continuous way. Before we jump into a lot of definitions, let's think about an example.

Suppose that we have a radioactive substance with a half-life of 1000 years. From our investigations last term, we know that the mass of this substance is described by an exponentially decaying function: that is, $  M(t) = M_0 e^{-kt}  $ where $  M_0  $ is the original mass and k is the constant

\[  k = \frac{\ln 2}{1000} 

Our goal will be to determine the average length of time it takes a particle to decay. In fact, let's be a bit more specific just for the purposes of this example and ask the following:

Question: Of the mass which decays in the first 1000 years, what is the average time of decay?

We can answer this question by employing the same methods as for our quiz scores. In particular, we can write:

\bar{t} = \frac{\sum_{\mbox{possible times}} \mbox{time} \times 
\mbox{mass that decays at that time}} 
{\mbox{total mass decaying in 1000 years}} 

It will be helpful to introduce a function $  F(t)  $ which describes how much mass has decayed in the first t years. We can understand this since $  F(t)  $ , the amount which has decayed in t years, equals the original amount minus the amount left after t years. That is, $  F(t) = M_0 - M_0e^{-kt} = M_0(1-e^{-kt})  $ .

This gives us our first piece of information: the total mass that decays in 1000 years is $  F(1000) = M_0(1-e^{-1000k}) = \frac 12 M_0  $ (this shouldn't be too surprising since the half life is 1000 years).

Now we will divide up the interval from 0 to 1000 years into smaller pieces labeled by $  t_j  $ .

Now we will ask the question: how much mass decays during the time interval $  [t_{j-1},t_j]  $ . This can be easily answered in terms of our function $  F(t)  $ . In particular, the amount decaying in this time interval is the amount which has decayed in the first $  t_j  $ years minus the amount which has decayed in the first $  t_{j-1}  $ years. In other words, the amount which decays in the time interval $  [t_{j-1},t_j]  $ is

F(t_j) - F(t_{j-1}) 

We can make an approximation which will help us: in this time interval, we know how much mass decays. The problem is that it could be decaying at different times within the time interval. We will make the approximation, however, that all the mass which decays in this interval decays at time $  t_j  $ . Of course, our ultimate aim is to shrink the width of these time intervals and the approximation improves when this is done.

We can now write the approximate average value as

\bar{t} \approx \frac{\sum_{j=1}^n t_j ~(F(t_j) - F(t_{j-1}))} 
	{\frac 12 M_0} 

This looks like it could be describing an integral, but there is no factor of $  \Delta t  $ . That's all right though because we can simply put it in by multiplying and dividing:

\bar{t} \approx \frac{\sum_{j=1}^n t_j~\frac{F(t_j) - F(t_{j-1})}{\Delta t}\Delta t}{\frac 12 M_0} 

Notice that as we shrink the width of the intervals, $  \Delta t \to 0  $ and so

\frac{F(t_j) - F(t_{j-1})}{\Delta t} \to F^\prime(t_j) 

We can then evaluate

\bar{t} \approx \frac{\sum_{j=1}^n t_j F^\prime(t_j)\Delta t}{\Delta t} 

which means that

\bar{t} = \frac{\int_0^{1000} t F^\prime(t)~dt}{\frac 12M_0} 

This is a wonderfully useful relationship and we will come back to it shortly. But now, let's finish our task by evaluating the integral.

Since $  F(t) = M_0(1-e^{-kt})  $ , it follows that $  F^\prime(t) = kM_0e^{-kt}  $ . The integral we need to evaluate is then

\int_0^{1000} kM_0te^{-kt}~dt = kM_0\int_0^{1000} te^{-kt}~dt 

which can be evaluated by integration by parts. To do this, we will set $  u = t  $ and $  dv = e^{-kt}~dt  $ . Then we have $  du = dt  $ and $  v = -\frac 1k e^{-kt}  $ .

kM_0\int_0^{1000} te^{-kt}~dt & = & -M_0 te^{-kt}|_0^{1000} + M_0\int_0^{1000} e^{-kt}~dt \\ 
& = & -1000M_0e^{-1000k} - \frac{M_0}{k}e^{-kt}|_0^{1000} \\ 
& = & -1000M_0e^{-1000k} + \frac{M_0}{k}(1-e^{-1000k}) \\ 
& = & -500M_0 + \frac{500M_0}{\ln 2} \\ 
& = & 500M_0(\frac{1}{\ln 2} - 1) 

In this computation, we have used the fact that $  k = \frac{\ln 2}{1000}  $ . This leads to

\bar{t} = 1000(\frac{1}{\ln 2} - 1) \approx 450 \mbox{ years} 

If we think about it, this result feels right. Remember that we are only considered the mass that decays in the first 1000 years. Also, more mass decays during the early years when there is more of the substance than decays in the later years when there is less. So it is not surprising that the result should be smaller than the halfway value of 500 years.

Probability Densities

Now that we have finished our calculuation, let's look at it a little more carefully. Remember that $  F(t)  $ represented the total mass that decays in the first t years and so $  F(1000)  $ is the total mass that decays in 1000 years. Then we found that

\bar{t} = \frac{\int_0^{1000} tF^\prime(t)~dt}{F(1000)} 
=\int_0^{1000} t \frac{F^\prime(t)}{F(1000)}~dt 

We will define a new function $  p(t) = \frac{F^\prime(t)}{F(1000)} = 2ke^{-kt} $ and call it the probability density for the process. In fact, it is very similar to the analogous quantity we defined for the students' scores on the quiz above. In particular, it has the following properties.

Properties of the Probability Density Function:

  1. $  \int_{t_1}^{t_2} p(t)~dt  $ is the fraction of mass which decays in the time interval $  [t_1,t_2] $ .

    This is because

\int_{t_1}^{t_2} p(t)~dt = \frac{1}{F(1000)}\int_{t_1}^{t_2} F^\prime(t)~dt = \frac{F(t_2) - F(t_1)}{F(1000)} 

    This is the fraction of mass which decays during this time interval. In fact, this is why the function $  p(t)  $ is called the probability density: given a little interval of width $  dt  $ around time t , the chance that a typical particle decays during that time interval is $  p(t) ~dt  $ .

  2. $  \int_0^{1000} p(t)~dt = 1  $

    We sometimes say that the probability density is normalized to have a total integral (or total probability) equal to 1 . Using our first observation, this is just saying that the probability is 1 (which means that it must happen) that a particle which decays during the first 1000 years actually decays during the first 1000 years.

  3. $  \bar{t} = \int_0^{1000} tp(t)~dt  $

We call the function $  D(t) = \int_0^t p(T)~dT  $ the cumulative distribution. It is measuring what fraction of the mass decays during the first t years.

The demonstration below shows the relationship between the probability density function and the cumulative distribution.

What can we learn from the probability density

Let's consider what kind of information is available from the probability density.


We have already used the probability density to find the average time of decay. In fact, we can do a bit more.

  1. Suppose we want to know the probability that a typical particle decays in the first 500 years. The information is available to us as

\int_0^{500}p(t)~dt = 2k\int_0^{500} e^{-kt}~dt = -2e^{-kt}|_0^{500} = 2(1-e^{-500k}) = 2(1-\frac{1}{\sqrt{2}}) = .586 

    In other words, there is a 58.6% chance that a typical particle decays in the first 500 years. This makes sense since we know that the mass is decaying more rapidly at the beginning.

  2. The median time for decay is the time for which the cumulative distribution $  D(t) = \frac 12  $ . This means that a typical particle is just as likely to decay before this time as it is to decay after this time. In other words, half the particles decay before this time and half decay after.

    We can find the median time by setting $  D(t) = 2(1-e^{-kt}) = \frac 12. $ This means that

& 1-e^{-kt} = \frac 14 \\ 
& e^{-kt} = \frac 34 \\ 
& t = -\frac{\ln \frac 34}{k} = 415 \mbox{ years} 

    Below is a graphical interpretation of the median. The two shaded areas are equal at the median value.

    It is interesting to ask why the median is different from the average value. Basically, the median does not detect information about the shape of the distribution other than when the two areas are balanced. The average, for this distribution, will be pulled up to a value higher that the median because there are some particles decaying for relatively long times.