In the last section, we discussed centers of mass, which we interpreted as a weighted average ofxcoordinates. In this section, we will also encounter a weighted average but now within the context of probability.

The average of a quizFirst, we will consider a discrete version of our topic which is familiar to all students. Suppose that 10 students take a quiz which is marked out of a maximum of 5 marks. How can we compute the average score?

Student Number 1 2 3 4 5 6 7 8 9 10 Score 2 5 4 2 1 0 2 4 5 4 To compute the average score , we can add all the scores together and divide by the total number of scores which is 10. Once we write this, however, it becomes clear that we can collect all the common scores together.

In this last line, represents the number of students who earn the mark

s. This shows us a slightly different way to compute averages. In this case, the average score is given by averaging the possible scores weighted by the number of students who earn that score. We can now write down some simple observations which will be useful for later.

Observations:

- This is because if we add up the number of students who earn each possible score, we are just counting the number of students in the class.
- In the same way, represents the number of students who score between and , inclusive.
In fact, it will be useful for us to consider a slightly different way of recording the information. Define . This represents the ratio of the number of students who score

sto the total number of students. In other words, it gives us the fraction of students who scores. This new quantity satisfies properties similar to those above.

- .
- equals the fraction of students who score between and .
We will call such a function a probability density which describes the results of the quiz. This name comes from the fact that property 2 above tells us how to measure the probability that a typical student earns a mark in a particular range.

To describe what we have found in rough terms, we could say that to find the average value of a list of numbers, we compute

The Continuous CaseIn the example above, the possible scores on the quiz are all integers and so form a discrete set. We would like to apply these ideas to the case where the possible values vary in a continuous way. Before we jump into a lot of definitions, let's think about an example.Suppose that we have a radioactive substance with a half-life of 1000 years. From our investigations last term, we know that the mass of this substance is described by an exponentially decaying function: that is, where is the original mass and

kis the constantOur goal will be to determine the average length of time it takes a particle to decay. In fact, let's be a bit more specific just for the purposes of this example and ask the following:

Question:Of the mass which decays in the first 1000 years, what is the average time of decay?We can answer this question by employing the same methods as for our quiz scores. In particular, we can write:

It will be helpful to introduce a function which describes how much mass has decayed in the first

tyears. We can understand this since , the amount which has decayed intyears, equals the original amount minus the amount left aftertyears. That is, .This gives us our first piece of information: the total mass that decays in 1000 years is (this shouldn't be too surprising since the half life is 1000 years).

Now we will divide up the interval from

0to1000years into smaller pieces labeled by .Now we will ask the question: how much mass decays during the time interval . This can be easily answered in terms of our function . In particular, the amount decaying in this time interval is the amount which has decayed in the first years minus the amount which has decayed in the first years. In other words, the amount which decays in the time interval is

We can make an approximation which will help us: in this time interval, we know how much mass decays. The problem is that it could be decaying at different times within the time interval. We will make the approximation, however, that all the mass which decays in this interval decays at time . Of course, our ultimate aim is to shrink the width of these time intervals and the approximation improves when this is done.

We can now write the approximate average value as

This looks like it could be describing an integral, but there is no factor of . That's all right though because we can simply put it in by multiplying and dividing:

Notice that as we shrink the width of the intervals, and so

We can then evaluate

which means that

This is a wonderfully useful relationship and we will come back to it shortly. But now, let's finish our task by evaluating the integral.

Since , it follows that . The integral we need to evaluate is then

which can be evaluated by integration by parts. To do this, we will set and . Then we have and .

In this computation, we have used the fact that . This leads to

If we think about it, this result feels right. Remember that we are only considered the mass that decays in the first 1000 years. Also, more mass decays during the early years when there is more of the substance than decays in the later years when there is less. So it is not surprising that the result should be smaller than the halfway value of 500 years.

Probability DensitiesNow that we have finished our calculuation, let's look at it a little more carefully. Remember that represented the total mass that decays in the firsttyears and so is the total mass that decays in 1000 years. Then we found thatWe will define a new function and call it the

probability densityfor the process. In fact, it is very similar to the analogous quantity we defined for the students' scores on the quiz above. In particular, it has the following properties.

Properties of the Probability Density Function:

- is the fraction of mass which decays in the time interval .
This is because

This is the fraction of mass which decays during this time interval. In fact, this is why the function is called the probability density: given a little interval of width around time

t, the chance that a typical particle decays during that time interval is .We sometimes say that the probability density is

normalizedto have a total integral (or total probability) equal to1. Using our first observation, this is just saying that the probability is1(which means that it must happen) that a particle which decays during the first 1000 years actually decays during the first 1000 years.We call the function the

cumulative distribution.It is measuring what fraction of the mass decays during the firsttyears.The demonstration below shows the relationship between the probability density function and the cumulative distribution.

What can we learn from the probability densityLet's consider what kind of information is available from the probability density.

ExamplesWe have already used the probability density to find the average time of decay. In fact, we can do a bit more.

- Suppose we want to know the probability that a typical particle decays in the first 500 years. The information is available to us as
In other words, there is a 58.6% chance that a typical particle decays in the first 500 years. This makes sense since we know that the mass is decaying more rapidly at the beginning.

- The median time for decay is the time for which the cumulative distribution . This means that a typical particle is just as likely to decay before this time as it is to decay after this time. In other words, half the particles decay before this time and half decay after.
We can find the median time by setting This means that

Below is a graphical interpretation of the median. The two shaded areas are equal at the median value.

It is interesting to ask why the median is different from the average value. Basically, the median does not detect information about the shape of the distribution other than when the two areas are balanced. The average, for this distribution, will be pulled up to a value higher that the median because there are some particles decaying for relatively long times.