## Harold Jeffreys on Probability

Between 1919 and 1923 Harold JeffreysandDorothy Wrinchwrote three papers on probability and scientific inference. These are:D Wrinch and H Jeffreys, On Some Aspects of the Theory of Probability,

Philosophical Magazine38(1919), 715-731.D Wrinch and H Jeffreys, On Certain Fundamental Principles of Scientific Inquiry,

Philosophical Magazine42(1921), 369-390,D Wrinch and H Jeffreys, On Certain Fundamental Principles of Scientific Inquiry,

Philosophical Magazine45(1923), 368-374.In their views of probability they were influenced by William Ernest Johnson and John Maynard Keynes. Dorothy Wrinch had, in fact, attended lectures by Johnson. Jeffreys used the ideas from these papers in his book

Scientific Inferencepublished by Cambridge University Press in 1931. We give below an extract from Jeffreys' book onProbability.

## Probability

by

Harold Jeffreys

1. What is probability?Suppose that a man wishes to catch a train announced to start at 1.00 p.m. When he is a quarter of a mile from the station he looks back and sees that a church clock some distance away indicates 12.55. Will he catch the train?

From previous experience he knows that a quarter of a mile in five minutes means comfortable walking Without wasting time. The distance, with slight exertion, can be done in four minutes. Hence he may reasonably expect to catch the train, especially if he hurries slightly. But he has to get a ticket before he will be admitted to the platform. If he finds nobody waiting at the booking office this is a matter of ten seconds; but if there is a queue of ten people it will take two minutes, and he has no means of knowing which will occur in this case. Again, though the church clock is usually reliable, it has been known on a few occasions to be as much as three minutes slow. If that is so on this occasion, and the train is punctual, his chance of catching the train disappears. On the other hand, if the train is a few minutes late, as sometimes happens, he will catch it even if there is a queue and the clock is slow. Further, there is always the possibility of something quite unforeseen, such as an accident on the line. In that event the 11.14 train may arrive at 1.30 and his problem will be solved.

Now we notice that in this situation the man has some definite information, which is relevant to the proposition "he will catch the train". But numerous other possibilities, none of which he can foresee, are also intensely relevant. Therefore his available knowledge, though relevant to the proposition at issue, is not such as to make it possible to assert definitely that this proposition is true or false. Further, extra data will have a definite effect on his attitude to the proposition. If he meets an astronomer whose watch has just been compared with a wireless time signal, and who assures him that the church clock is accurate, he feels more confident. On the other hand, if a crowded omnibus passes him he expects his worst fears about the queue to be verified. Thus the attitude to the proposition under discussion does not amount to a definite assertion of its truth or falsehood; it is an impression capable of being modified at any time by the acquisition of new knowledge.

Probability expresses a relation between a proposition and a set of data. When the data imply that the proposition is true, the probability is said to amount to certainty; when they imply that it is false, the probability becomes impossibility. All intermediate degrees of probability can arise.

The relation of the laws of science to the data of observation is one of probability. The more facts are in agreement with the inferences from a law, the higher the probability of the law becomes; but a single fact not in agreement may reduce a law, previously practically certain, to the status of an impossible one. A specimen of a practically certain law is Ohm's law for solid conductors. Newton's inverse square law of gravitation first became probable when it was shown to give the correct ratio of gravity at the earth's surface to the acceleration of the moon in its orbit. Its probability increased as it was shown to fit the motions of the planets, satellites, and comets, and those of double stars, with an astonishing degree of accuracy. Leverrier's discovery of the excess motion of the perihelion of Mercury scarcely changed this situation, for the phenomenon was qualitatively explicable by the attraction of the visible matter within Mercury's orbit. Newton's law was first shown to be wrong, as a universal proposition, when it was found that such matter could not actually be present in sufficient quantity to account for the anomalous motion of Mercury.

The fundamental notion of probability is intelligible a priori to everybody, and is regularly used in everyday life. Whenever a man says "I think so" or "I think not" or "I am nearly sure of that" he is speaking in terms of this concept; but an addition has crept in. If three persons are presented with the same set of facts, one may assert that he is nearly certain of a result, another that he believes it probable, while the third will express no opinion at all. This might suggest that probability is a matter of differences between individuals. But an analogous situation arises with regard to purely logical inference. One person, reading the proof of Euclid's fifth proposition, is completely convinced; another is entirely unable to grasp it; while there, is at any rate one case on record when a student said that the author had rendered the result highly probable. Nobody says on this account that logical demonstration is a matter for personal opinion. We say that the proposition is either proved or not proved, and that such differences of opinion are the result of not understanding the proof, either through inherent incapacity or through not having taken the necessary trouble. The logical demonstration is right or wrong as a matter of the logic itself, and is not a matter for personal judgment. We say the same about probability. On a given set of data p we say that a proposition q has in relation to these data one and only one probability. If any person assigns a different probability, he is simply wrong, and for the same reasons as we assign in the case of logical judgments. Personal differences in assigning probabilities in everyday life are not due to any ambiguity in the notion of probability itself, but to mental differences between individuals, to differences in the data available to them, and to differences in the amount of care taken to evaluate the probability.

2. Principles of probabilityThe mathematical discussion of probability depends on the principle that probabilities can be expressed by means of numbers. This depends in turn on two deeper postulates:

1.The relations greater than and less than are transitive; that is, if one probability is greater than a second, and the second greater than a third, then the first probability is greater than the third. If one probability is greater than a second, the second is said to be less than the first; and if neither of two probabilities is greater than the other we say that they are equal. This postulate ensures the existence of a definite order among probabilities, such that each probability follows all smaller ones and precedes all greater ones.If we have two sets of datapandp', and two propositionsqandq', and we consider the probabilities ofqgivenp, and ofq' givenp', then whateverp,p',q,q' may be, the probability ofqgivenpis either greater than, equal to, or less than that ofq' givenp'.2.

All propositions impossible on the data have the same probability, which is not greater than any other probability; and all propositions certain on the data have the same probability, which is not less than any other probability.

Such an order once established, we can construct a correspondence between probabilities and real numbers, so that to every probability corresponds one and only one number, and so that of every pair of probabilities the less corresponds to the smaller number. When this is done the system of numbers can be used as a scale of reference for probabilities. But the choice is not yet unique. Obviously if

x_{1},x_{2}, ...,x_{n}are a set of positive numbers in increasing order of magnitude,x_{1}^{2},x_{2}^{2}, ...,x_{n}^{2}are another set, exp(x_{1}), exp(x_{2}), ..., exp(x_{n}) a third,

a fourth, and any number of such sets can be found, such that if probabilities correspond term by term with the numbers of one set in order of magnitude they will correspond equally well with those of any other set. We need a further rule before we can decide what number to attach to any given probability. Such a rule is a mere method of working, or convention; it expresses no new assumption. We decide thatx_{1}/(1+x_{1}),x_{2}/(1+x), ...,x_{n}/(1+x_{n})

3.If we do this it follows at once that 0 is the number to be attached to a proposition impossible on the data. For consider any three mutually exclusive propositionsIf several propositions are mutually contradictory on the data, the number attached to the probability that some one of them is true shall be the sum of those attached to the probabilities that each separately is true.

p,q,r, and suppose we have the further datum thatpis true. The number attached to a proposition impossible on the data beinga, it follows that the numbers attached toqandrseparately on the data are botha. Hence, by our rule, sinceqandrare mutually exclusive, the number attached to the proposition that one of them is true is 2a. But the proposition "q or r is true" is itself impossible on the data and therefore has the numberaattached to it. Hence 2a=a, and thereforea= 0.Again, let us consider any set of

mequally probable and mutually contradictory propositions, and call the number attached to any one of them, on the same data,x. If we select anytof them, the number attached to the proposition that one of thesetis true istx, by our rule.Now take

t=m, and suppose that on our data there is just one true proposition among them, but that we have no means of knowing which it is. The number attached to the proposition that one of thempropositions is true ismx. But on our data this proposition is certain, and thereforemxis the number corresponding to certainty, which is a definite constant by Prop. 2. We therefore choose 1 as the constant to be attached to certainty. This is another convention. Thusmx= 1, and we derive the rule:

4.The conditions for the application of this method are practically realizable. Suppose thatIf m propositions are equally probable on the data and mutually contradictory, and one of them is known to be true, each has the number1/m associated with it. Further, the proposition that one out of any t of them is true has the number t/m associated with it.

mballs, one of them with a characteristic mark on it, but indistinguishable by touch, were placed in a bag and shaken.tballs are then withdrawn. Then the proposition that any particular ball is the marked one is inconsistent with the proposition that any other is marked, and all such propositions are equally probable. We have therefore a set of equally probable and mutually exclusive propositions, m in number. Our rule therefore has a practical application. Alsommay be any integer, andtmay be any integer less thanmor equal to it. Hence

5.We shall call the class of probabilities expressible by rational fractionsAny rational proper fraction, including0and1, can be a probability number.

R-probabilities.It follows from this that any probability can be made to correspond to a real number, rational or irrational. For any given probability

Peither corresponds to a rational fraction or does not. In the former case the proposition is granted. In the latter case everyR-probability is either greater or less thanP. HencePdivides theR-probabilities into two classesR_{1}andR, such that the probabilities inR_{1}are all less thanPand those inRare all greater thanP. Also, since the relation "greater than" among probabilities is transitive, every fraction corresponding to anRprobability is greater than every fraction corresponding to anR_{1}probability. HencePdetermines a cut in the series of rational fractions. But this is precisely the method of defining a real irrational number. when it is specified which rational fractions are on one side of the cut and which on the other side, there is one and only one real number that can occupy the cut. We then associate the probabilityPwith this number. In this way we arrive at the result:

6.We still have to prove that the results given by our rules are consistent; that is, if a probabilityEvery probability can be associated with a real number, rational or irrational.

Pis greater than another probabilityQ, that the number associated withPby our rules is greater than that associated withQ. Suppose first thatPandQare bothR-probabilities. Then we can find four integerst,m,r,sso that the number associated withPist/mand that associated withQisr/s. Now consider a class ofmsmutually exclusive propositions containing one true one. We may divide them up intomsets ofseach; one and only one of these sets contains the true proposition. The probability-number that one oftof these sets contains the true proposition ist/m. But this is also the probability-number that one oftspropositions selected from the originalmspropositions shall be the true one, which by our rule ists/msand equal tot/m, as it should be. Thust/mis the number associated with the proposition that one out of thetsalternatives is true; similarlyr/sis associated with the proposition that one out ofrmalternatives is true. If thenPis greater thanQ, the number of alternatives needed to give probabilityPmust exceed that needed to give probabilityQ; thereforetsis greater thanrm. But this is equivalent to saying thatt/mis greater thanr/s; and therefore the greater probability is associated with the greater number.Consistency is therefore proved for

R-probabilities. For others the result is easily generalized. For if two non-rational probabilities are associated with real numbersaandb, of whichais the greater, we can find a rational fractiont/mlying between them. Then the probability associated withais greater than that associated witht/m, and that associated witht/mis greater than that associated withb. Hence, in virtue of the transitive property of the relation more probable than, the probability associated withais greater than that associated withb. In other words, the greater number corresponds to the greater probability.We have seen how definite numbers can be associated with probabilities, so that the higher number always corresponds to the higher probability. In consequence of our fundamental assumption our rules always imply the existence of a definite probability-number. The rules, as we stated before, are conventions and not hypotheses; for if the probability-number assigned by our rules is

x, any function ofxthat always increases withxwould satisfy the fundamental assumption. But the choice that we have made seems to be far the most convenient. Henceforth we shall have no need to speak of probabilities apart from their associated numbers, and when we speak of the probability of a proposition on given data we shall mean the number associated with the probability by our rules.

JOC/EFR August 2007

The URL of this page is:

http://www-history.mcs.st-andrews.ac.uk/Extras/Jeffreys_Probability.html