# The Problem of Induction

*First published Wed Nov 15, 2006; substantive revision Tue Mar 2, 2010*

Until about the middle of the previous century induction was treated
as a quite specific method of inference: inference of a universal
affirmative proposition (All swans are white) from its instances
(*a* is a white swan, *b* is a white swan, etc.) The
method had also a probabilistic form, in which the conclusion stated a
probabilistic connection between the properties in question. It is no
longer possible to think of induction in such a restricted way; much
synthetic or contingent inference is now taken to be inductive; some
authorities go so far as to count all contingent inference as
inductive. One powerful force driving this lexical shift was certainly
the erosion of the intimate classical relation between logical truth
and logical form; propositions had classically been categorized as
universal or particular, negative or affirmative; and modern logic
renders those distinctions unimportant. (The paradox of the ravens
makes this evident.) The distinction between logic and mathematics
also waned in the twentieth century, and this, along with the simple
axiomatization of probability by Kolmogorov in 1933 (Kolmogorov, FTP)
blended probabilistic and inductive methods, blending in the process
structural differences among inferences.

As induction expanded and became more amorphous, the problem of induction was transformed too. The classical problem if apparently insoluble was simply stated, but the contemporary problem of induction has no such crisp formulation. The approach taken here is to provide brief expositions of several distinctive accounts of induction. This is not comprehensive, there are other ways to look at the problem, but the untutored reader may gain at least a map of the terrain.

- 1. What is the Problem?
- 2. Hume, induction and justification
- 3. Verification, Confirmation, and the Paradoxes of Induction
- 4. Induction, Causality, and Laws of Nature
- 5. Probability and Induction
- 6. Induction, Values, and Evaluation
- 7. Induction, deduction and rationality
- 8. Justification and Support of Induction
- Bibliography
- Other Internet Resources
- Related Entries

## 1. What is the Problem?

The Oxford English Dictionary defines “induction”, in the sense relevant here, as follows:

7.Logica.The process of inferring a general law or principle from the observation of particular instances (opposed to DEDUCTION, q.v.).

That induction is opposed to deduction is not quite right, and the
rest of the definition is outdated and too narrow: much of what
contemporary epistemology, logic, and the philosophy of science count
as induction infers neither from observation nor from particulars and
does not lead to general laws or principles. This is not to denigrate
the leading authority on English vocabulary—until the middle of
the previous century induction was understood to be what we now know
as *enumerative induction* or *universal inference*;
inference from particular instances:

a_{1},a_{2}, …,a_{n}are allFs that are alsoG,

to a general law or principle

AllFs areG.

A weaker form of enumerative induction, singular predictive inference, leads not to a generalization but to a singular prediction:

1.a_{1},a_{2}, …,a_{n}are allFs that are alsoG.2.

a_{n+1}is alsoF.Therefore,

3.

a_{n+1}is alsoG.

Singular predictive inference also has a more general probabilistic form:

1. The proportion

pof observedFs have also beenGs.2.

a, not yet observed, is anF.Therefore,

3. It is probable that

ais aG.

The problem of induction was, until recently, taken to be to justify these forms of inference; to show that the truth of the premises supported, if it did not entail, the truth of the conclusion. The evolution and generalization of this question—the traditional problem has become a special case—is discussed in some detail below. Section 3, in particular, points out some essential difficulties in the traditional view of enumerative induction.

### 1.1 Mathematical induction

As concerns the parenthetical opposition between induction and deduction; the classical way to characterize valid deductive inference is as follows: a set of premises deductively entails a conclusion if no way of interpreting the non-logical signs, holding constant the meanings of the logical signs, can make the premises true and the conclusion false. For present purposes the logical signs include always the truth-functional connectives (and, not, etc) the quantifiers (all, some) and the sign of identity (=). Enumerative induction and singular predictive inference are clearly not valid deductive methods when deduction is understood in this way. (A few revealing counterexamples are to be found in section 3.2 below.)

Regarded in this way, mathematical induction is a deductive method,
and is in this opposed to induction in the sense at issue here.
*Mathematical induction* is the following inferential rule
(*F* is any numerical property):

Premises:

- 0 has the property
F.- For every number
n, ifnhas the propertyFthenn+1 has the propertyF.Conclusion:

- Every number has the property
F.

When the logical signs are expanded to include the basic vocabulary
of arithmetic (__ __ is a number, +, ×, ′,
0) mathematical induction is seen to be a deductively valid method:
any interpretation in which these signs have their standard
arithmetical meaning is one in which the truth of the premises assures
the truth of the conclusion. Mathematical induction, we might say, is
*deductively valid in arithmetic*, if not in pure logic.

Mathematical induction should thus be distinguished from induction in the sense of present concern. Mathematical induction will concern us no further beyond a brief terminological remark: the kinship with non-mathematical induction and its problems is fostered by the particular-to-general clause in the common definition. (See section 5.4 of the entry on Frege's logic, theorem, and foundations for arithmetic, for a more complete discussion and justification of mathematical induction.)

### 1.2 The contemporary notion of induction

A few simple counterexamples to the OED definition may suggest the increased breadth of the contemporary notion:

- There are (good) inductions with general premises and particular
conclusions:
All observed emeralds have been green.

Therefore, the next emerald to be observed will be green. - There are valid deductions with particular premises and general
conclusions:
New York is east of the Mississippi.

Delaware is east of the Mississippi.

Therefore, everything that is either New York or Delaware is east of the Mississippi.

Further, on at least one serious view, due in differing variations to
Mill and Carnap, induction has not to do with generality at all; its
primary form is the *singular predictive inference*—the
second form of enumerative induction mentioned above—which leads
from particular premises to a particular conclusion. The inference to
generality is a dispensable middle step.

Although inductive inference is not easily characterized, we do have a
clear mark of induction. Inductive inferences are contingent,
deductive inferences are necessary. (But see the entry Formal Learning
Theory where this distinction is elaborated.) Deductive inference can
never support contingent judgments such as meteorological forecasts,
nor can deduction alone explain the breakdown of one's car, discover
the genotype of a new virus, or reconstruct fourteenth century trade
routes. Inductive inference can do these things more or less
successfully because, in Peirce's phrase, inductions are
*ampliative*. Induction can amplify and generalize our
experience, broaden and deepen our empirical knowledge. Deduction on
the other hand is *explicative*. Deduction orders and
rearranges our knowledge without adding to its content.

Of course, the contingent power of induction brings with it the risk of error. Even the best inductive methods applied to all available evidence may get it wrong; good inductions may lead from true premises to false conclusions. (A competent but erroneous diagnosis of a rare disease, a sound but false forecast of summer sunshine in the desert.) An appreciation of this principle is a signal feature of the shift from the traditional to the contemporary problem of induction. (See sections 3.2 and 3.3 below.)

How to tell good inductions from bad deductions? That question is a simple formulation of the problem of induction. In its general form it clearly has no substantive answer, but its instances can yield modest and useful questions. Some of these questions, and proposed answers to them, are surveyed in what follows.

Some authorities, Carnap in the opening paragraph of (Carnap 1952) is an example, take inductive inference to include all non-deductive inference. That may be a bit too inclusive; perception and memory are clearly ampliative but their exercise seems not to be congruent with what we know of induction, and the present article is not concerned with them. (See the entries on epistemological problems of perception and epistemological problems of memory.)

Testimony is another matter. Although testimony is not a form of induction, induction would be all but paralyzed were it not nourished by testimony. Scientific inductions depend upon data transmitted and supported by testimony and even our everyday inductive inferences typically rest upon premises that come to us indirectly. (See the remarks on testimony in section 8.4.3, and the entry on epistemological problems of testimony.)

### 1.3 Can induction be justified?

There is a simple argument, due in its first form to Hume (Hume THN,
I.III.VI) that induction (not Hume's word) cannot be justified. The
argument is a dilemma: Since induction is a contingent
method—even good inductions may lead from truths to
falsehoods—there can be no deductive justification for
induction. Any inductive justification of induction would, on the
other hand, be circular. Hume himself takes the edge off this argument
later in the *Treatise*. “In every judgment,” he writes,
“…we ought always to correct the first judgment, deriv'd
from the nature of the object, by another judgment, deriv'd from the
nature of the understanding” (Hume THN, 181f.).

A more general question is this: Why trust induction more than other methods of fixing belief? Why not consult sacred writings, the pronouncements of authorities or “the wisdom of crowds” to explain the movements of the planets, the weather, automotive breakdowns or the evolution of species? We return to these and related questions in section 8.3.

## 2. Hume, induction and justification

The source for the problem of induction as we know it is Hume's brief
argument in Book I, Part III, section VI of
the *Treatise*, (Hume THN). The great historical importance of
this argument, not to speak of its intrinsic power, recommends that
reflection on the problem begin with a rehearsal of it. The brief
summary in sections 10 and 11 of the entry on
Hume
provides what is needed, and those who are not familiar with the
argument are well advised to read them in conjunction with the present
section; It will also be helpful in understanding the deceptively
simple argument to have some idea of Hume's project in
the *Treatise*. For this section 4 of that entry is most
useful. Indeed, the first twelve sections of the article serve as a
brief and comprehensive introduction to Hume's theory of
knowledge. Reference to this article permits an abbreviated account
here of his classic argument.

First two notes about vocabulary. The term ‘induction’
does not appear in Hume's argument, nor anywhere in
the *Treatise* or the first *Inquiry*, for that matter.
Hume's concern is with inferences concerning causal connections,
which, on his account are the only connections “which can lead
us beyond the immediate impressions of our memory and senses”
(Hume THN, 89). But the difference between such inferences and what
we know today as induction is largely a matter of
terminology. Secondly, Hume divides all reasoning into demonstrative,
by which he means deductive, and probabilistic, by which he means the
generalization of causal reasoning. In what follows we paraphrase and
interpolate freely so as to ease the application of the argument in
contemporary contexts.

It should also be remarked that Hume's argument applies just to enumerative induction, and primarily to singular predictive inference, but, again, its generalization to other forms of inductive reasoning is straightforward.

The argument should be seen against the background of Hume's
project as he announces it in the introduction to
the *Treatise*: This project is the development of the
empirical science of human nature. The epistemological sector of this
science involves describing the operations of the mind, the
interactions of impressions and ideas and the function of the
liveliness that constitutes belief. But this cannot be a merely
descriptive endeavor; accurate description of these operations entails
also a considerable normative component, for, as Hume puts it, “[o]ur
reason [to be taken here quite generally, to include the imagination]
must be consider'd as a kind of cause, of which truth is the natural
effect; but such-a-one as by the irruption of other causes, and by the
inconstancy of our mental powers, may frequently be prevented” (Hume
THN, 180). The account must thus not merely describe what goes on in
the mind, it must also do this in such a way as to show that and how
these mental activities lead naturally, if with frequent exceptions,
to true belief. (See Loeb 2006 for further discussion of these questions.)

Now as concerns the argument, its conclusion is that in induction
(causal inference) experience does not produce the idea of an effect
from an impression of its cause by means of the understanding or
reason, but by the imagination, by “a certain association and
relation of perceptions.” The center of the argument is a
dilemma: If inductive conclusions were produced by the understanding,
inductive reasoning would be based upon the premise that nature is
uniform; “*that instances of which we have had no experience,
must resemble those of which we have had experience, and that the
course of nature continues always uniformly the same.*”
(Hume THN, 89) And were this premise to be established by reasoning,
that reasoning would be either deductive or probabilistic
(i.e. causal). The principle can't be proved deductively, for whatever
can be proved deductively is a necessary truth, and the principle is
not necessary; its antecedent is consistent with the denial of its
consequent. Nor can the principle be proved by causal reasoning, for
it is presupposed by all such reasoning and any such proof would be
a *petitio principii*.

The normative component of Hume's project is striking here: That the principle of uniformity of nature cannot be proved deductively or inductively shows that it is not the principle that drives our causal reasoning only if our causal reasoning is sound and leads to true conclusions as a “natural effect” of belief in true premises. This is what licenses the capsule description of the argument as showing that induction cannot be justified or licensed either deductively or inductively; not deductively because (non-trivial) inductions do not express logically necessary connections, not inductively because that would be circular. If, however, causal reasoning were fallacious, the principle of the uniformity of nature might well be among its principles.

The negative argument is an essential first step in Hume's general account of induction. It rules out accounts of induction that view it as the work of reason. Hume's positive account begins from a constructive dilemma: Inductive inference must be the work either of reason or of imagination.; Since the negative argument shows that it cannot be a species of reasoning, it must be imaginative.

Hume's positive account of causal inference can be simply described: It amounts to embedding the singular form of enumerative induction in the nature of human, and at least some bestial, thought. The several definitions offered in (Hume EHU, 60) make this explicit:

[W]e may define a cause to bean object, followed by another, and where all objects similar to the first are followed by objects similar to the second. Or, in other words,where, if the first object had not been, the second never had existed.

Another definition defines a cause to be:

an object followed by another, and whose appearance always conveys the thought to that other.

If we have observed many *F*s to be followed by *G*s, and
no contrary instances, then observing a new *F* will lead us to
anticipate that it will also be a *G*. That is causal
inference.

It is clear, says Hume, that we do make inductive, or, in his terms,
causal, inferences; that having observed many *F*s to be
*G*s, observation of a new instance of an *F* leads us
to believe that the newly observed *F* is also a *G*. It
is equally clear that the epistemic force of this inference, what Hume
calls the *necessary connection* between the premises and the
conclusion, does not reside in the premises alone:

All observedFs have also beenGs,

and

ais anF,

do not imply

ais aG.

It is false that “instances of which we have had no experience must resemble those of which we have had experience” (Hume EHU, 89).

Hume's view is that the experience of constant conjunction fosters a “habit of the mind” that leads us to anticipate the conclusion on the occasion of a new instance of the second premise. The force of induction, the force that drives the inference, is thus not an objective feature of the world, but a subjective power; the mind's capacity to form inductive habits. The objectivity of causality, the objective support of inductive inference, is thus an illusion, an instance of what Hume calls the mind's “great propensity to spread itself on external objects” (Hume THN, 167).

It is important to distinguish in Hume's account causal inference from causal belief: Causal inference does not require that the agent have the concept of cause; animals may make causal inferences (Hume THN, 176–179; Hume EHU, 104–108) which occur when past experience of constant conjunction leads to the anticipation of the subsequent conjunct upon experience of the precedent. Causal beliefs, on the other hand, beliefs of the form

AcausesB,

may be formed when one reflects upon causal inferences as, presumably, animals cannot (Hume THN, 78).

Hume's account raises the problem of induction in an acute form: One would like to say that good and reliable inductions are those that follow the lines of causal necessity; that when

All observedFs have also beenGs,

is the manifestation in experience of a causal connection between *F*
and *G*, then the inference

All observedFs have also beenGs,

ais anF,

Therefore,a, not yet observed, is also aG,

is a good induction. But if causality is not an objective feature of the world this is not an option. The Humean problem of induction is then the problem of distinguishing good from bad inductive habits in the absence of any corresponding objective distinction.

Two sides or facets of the problem of induction should be
distinguished: The *epistemological* problem is to find a
method for distinguishing good or reliable inductive habits from bad
or unreliable habits. The second and deeper problem is
*metaphysical*. This is the problem of saying what the
difference is between reliable and unreliable inductions. This is the
problem that Whitehead called “the despair of philosophy”
(Whitehead 1948, 35). The distinction can be illustrated in the
parallel case of arithmetic. The by now classic incompleteness results
of the last century show that the epistemological problem for
first-order arithmetic is insoluble; that there can be no method, in a
quite clear sense of that term, for distinguishing the truths from the
falsehoods of first-order arithmetic. But the metaphysical problem for
arithmetic has a clear and correct solution: the truths of first-order
arithmetic are precisely the sentences that are true in all arithmetic
models. Our understanding of the distinction between arithmetic
truths and falsehoods is just as clear as our understanding of the
simple recursive definition of truth in arithmetic, though any method
for applying the distinction must remain forever out of our
reach.

Now as concerns inductive inference, it is hardly surprising to be told that the epistemological problem is insoluble; that there can be no formula or recipe, however complex, for ruling out unreliable inductions. But Hume's arguments, if they are correct, have apparently a much more radical consequence than this: They seem to show that the metaphysical problem for induction is insoluble; that there is no objective difference between reliable and unreliable inductions. This is counterintuitive. Good inductions are supported by causal connections and we think of causality as an objective matter: The laws of nature express objective causal connections. Ramsey writes in his Humean account of the matter:

Causal laws form the system with which the speaker meets the future; they are not, therefore, subjective in the sense that if you and I enunciate different ones we are each saying something about ourselves which pass by one another like “I went to Grantchester”, “I didn't” (Ramsey 1931, 241).

A satisfactory resolution of the problem of induction would account for this objectivity in the distinction between good and bad inductions.

It might seem that Hume's argument succeeds only because he has made the criteria for a solution to the problem too strict. Enumerative induction does not realistically lead from premises

All observedFs have also beenGs

ais anF,

to the simple assertion

Therefore,a, not yet observed, is also aG.

Induction is contingent inference and as such can yield a conclusion only with a certain probability. The appropriate conclusion is

It is therefore probable that,a, not yet observed, is also aG.

Hume's response to this (Hume THN, 89) is to insist that probabilistic connections, no less than simple causal connections, depend upon habits of the mind and are not to be found in our experience of the world. Weakening the inferential force between premises and conclusion may divide and complicate inductive habits, it does not eliminate them. The laws of probability alone have no more empirical content than does deductive logic. If I infer from observing clouds followed by rain that today's clouds will probably be followed by rain this can only be in virtue of an imperfect habit of associating rain with clouds. This account is treated in more detail below.

Hume is also the progenitor of one sort of theory of inductive inference which, if it does not pretend to solve the metaphysical problem, does offer an at least partial account of reliability. We consider this tradition below in section 8.1.

### 2.1. Induction and its justification

Hume's argument is often credited with raising the problem of induction in its modern form. For Hume himself the conclusion of the argument is not so much a problem as a principle of his account of induction: Inductive inference is not and could not be reasoning, either deductive or probabilistic, from premises to conclusion, so we must look elsewhere to understand it. Hume's positive account, discussed in sections 5.3 and 8.3 below, does much to alleviate the epistemological problem—how to distinguish good inductions from bad ones—without treating the metaphysical problem. His account is based on the principle that inductive inference is the work of association which forms a “habit of the mind” to anticipate the consequence, or effect, upon witnessing the premise, or cause. He provides illuminating examples of such inferential habits in sections I.III.XI and I.III.XII of (Hume THN). The latter accounts for frequency-to-probability inferences in a comprehensive way. It shows that and how inductive inference is “a kind of cause, of which truth is the natural effect.”

Although Hume is the progenitor of modern work on induction, induction presents a problem, indeed a multitude of problems, quite in its own right. The by now traditional problem is the matter of justification: How is induction to be justified? There are in fact several questions here, corresponding to different modes of justification. One very simple mode is to take Hume's dilemma as a challenge, to justify (enumerative) induction one should show that it leads to true or probable conclusions from true premises. It is safe to say that in the absence of further assumptions this problem is and should be insoluble. The realization of this dead end and the proliferation of other forms of induction have led to more specialized projects involving various strengthened premises and assumptions. The several approaches treated below exemplify this.

Hume's dilemma also sponsors a much more sweeping challenge: Neither deduction nor induction can give reason to trust induction, so what reason is there to trust it at all? Why, in particular, trust induction rather than other methods of fixing belief? Why not consult sacred writings, the pronouncement of authorities or “the wisdom of crowds” to explain and predict the movements of the planets, the weather, automotive breakdowns or the evolution of species? We return to these and related questions in section 8.4.

## 3. Verification, Confirmation, and the Paradoxes of Induction

### 3.1 Verifiability and confirmation

The verifiability criterion of meaning was essential to logical positivism (see the section on verificationism in the entry the Vienna Circle). In its first and simplest form the criterion said just that the meaning of a synthetic statement is the method of its empirical verification. (Analytic statements were held to be logically verifiable.) The point of the principle was to class metaphysical statements as meaningless, since such statements (Kant's claim that noumenal matters are beyond experience was a favored example) could obviously not be empirically verified. This initial formulation of the criterion was soon seen to be too strong; it counted as meaningless not only metaphysical statements but also statements that are clearly empirically meaningful, such as that all copper conducts electricity and, indeed, any universally quantified statement of infinite scope, as well as statements that were at the time beyond the reach of experience for technical, and not conceptual, reasons, such as that there are mountains on the back side of the moon. These difficulties led to modification of the criterion: The latter to allow empirical verification if not in fact then at least in principle, the former to soften verification to empirical confirmation. So, that all copper conducts electricity can be confirmed, if not verified, by its observed instances. Observation of successive instances of copper that conduct electricity in the absence of counterinstances supports or confirms that all copper conducts electricity, and the meaning of “all copper conducts electricity” could thus be understood as the experimental method of this confirmation.

Empirical confirmation is inductive, and empirical confirmation by instances is a sort of enumerative induction. The problem of induction thus gains weight, at least in the context of modern empiricism, for induction now founds empirical meaning: to show that a statement is empirically meaningful we describe a good induction which, were the premises true, would confirm it. “There are mountains on the other side of the moon” is meaningful (in 1945) because space flight is possible in principle and the inference from

Space travelers observed mountains on the other side of the moon,

to

There are mountains on the other side of the moon,

is a good induction. “Copper conducts electricity” is meaningful because the inference from

Many observed instances of copper conduct and none fail to conduct,

to

All copper conducts,

is a good induction.

### 3.2 Some inductive paradoxes

That enumerative induction is a much subtler and more complex process than one might think is made apparent by the paradoxes of induction. The paradox of the ravens is a good example: By enumerative induction:

ais a raven and is black,

confirms (to some small extent)

All ravens are black.

That is just a straightforward application of instance confirmation. But the same rule allows that

ais non-black and is a non-raven,

confirms (to some small extent)

All non-black things are non-ravens.

The latter is logically equivalent to “all ravens are black”, and hence “all ravens are black” is confirmed by the observation of a white shoe (a non-black, non-raven). But this is a bad induction, and this case of enumerative induction looks to be unsound.

The paradox resides in the conflict of this counterintuitive result with our strong intuitive attachment to enumerative induction, both in everyday life and in the methodology of science. This conflict looks to require that we must either reject enumerative induction or agree that the observation of a white shoe confirms “all ravens are black”.

The (by now classic) resolution of this dilemma is due to C.G. Hempel (Hempel 1945) who credits discussion with Nelson Goodman. Assume first that we ignore all the background knowledge we bring to the question, such as that there are very many things that are either ravens or are not black, and that we look strictly at the truth-conditions of the premise (this is a white shoe) and the supported hypothesis (all ravens are black). The hypothesis says (is equivalent to)

Everything is either a black raven or is not a raven.

This hypothesis divides the world into three exclusive and exhaustive classes of things: non-black ravens, black ravens, and things that are not ravens. Any member of the first class falsifies the hypothesis. Each member of the other two classes confirms it. A white shoe is a member of the third class and is thus a confirming instance.

If this seems implausible it is because we in fact do not, as assumed, ignore the background knowledge that we bring to the question. We know before considering the inference that there are some black ravens and that there are many more non-ravens, many of which are not black. Observing a white shoe thus tells us nothing about the colors of ravens that we don't already know, and since induction is ampliative, good inductions should increase our knowledge. If we did not know that many non-ravens are not black, the observation of a white shoe would increase our knowledge.

On the other hand, we don't know whether any of the unobserved ravens are not black, i.e., whether the first and falsifying class of things has any members. Observing a raven that is black tells us that this object at least is not a falsifying instance of the hypothesis, and this we did not know before the observation.

As Goodman puts it, the paradoxical inference depends upon “tacit and illicit evidence” not stated in its formulation:

Taken by itself, the statement that the given object is neither black nor a raven confirms the hypothesis that everything that is not a raven is not black as well as the hypothesis that everything that is not black is not a raven. We tend to ignore the former hypothesis because we know it to be false from abundant other evidence — from all the familiar things that are not ravens but are black. (Goodman 1955, 72)

The important lesson of the paradox of the ravens and its resolution is that inductive inference, because it is ampliative, is sensitive to background information and context. What looks to be a good induction when considered in isolation turns out not to be so when the context, including background knowledge, is taken into account. The inductive inference from

ais a white shoe,

to

All ravens are black,

is not so much unsound as it is uninteresting and uninformative.

More recent discussion of the paradox continues and improves on the
Hempel — Goodman account by making explicit, and thus licit, the
suppressed evidence. (See, for example, Maher 1999 for a proposal of
this sort in a Carnapian framework.) Further development, along
vaguely Bayesian lines, generalizes the earlier approach by defining
comparative (*A* confirms *H* better than
does *B*) and quantitative (*A* confirms *H* to
degree *p*) concepts of confirmation capable of differentiating
support for the two hypotheses in question. (Fitelson and Hawthorne
2010) is an encyclopedic account of these efforts and includes also a
comprehensive bibliography.

There are however other faulty inductions that look not to be accounted for by reference to background information and context:

Albert is in this room and is safe from freezing,

confirms

Everyone in this room is safe from freezing,

but

Albert is in this room and is a third son,

does not confirm

Everyone in this room is a third son,

and no amount of background information seems to explain this difference. The distinction is usually marked by saying that “Everyone in this room is safe from freezing” is a lawlike generalization, while “Everyone in this room is a third son” is an accidental generalization. But this distinction amounts to no more than that the first is confirmed by its instances while the second is not, so it cannot very well be advanced as an account of that difference. The problem is raised in a pointed way by Nelson Goodman's famous grue paradox (Goodman 1955, 73–75). (See (Norton, 2006), (Olson, 2006) and the entry on formal learning theory for recent commentary on the paradox.)

Grue Paradox:

Suppose that at timetwe have observed many emeralds to be green. We thus have evidence statementsEmeraldais green,

Emeraldbis green,

etc.and these statements support the generalization:

All emeralds are green.But now define the predicate “grue” to apply to all things observed before

tjust in case they are green, and to other things just in case they are blue. Then we have also the evidence statementsEmeraldais grue,

Emeraldbis grue,

etc.and these evidence statements support the hypothesis

All emeralds are grue.Hence the same observations support incompatible hypotheses about emeralds to be observed in the future; that they will be green and that they will be blue.

A few cautionary remarks about this frequently misunderstood paradox:

- No one thinks that the grue hypothesis is well supported. The paradox makes it clear that there is something wrong with instance confirmation and enumerative induction as initially characterized.
- Neither the grue evidence statements nor the grue hypothesis entails that any emeralds change color. This is a common confusion; see, for examplem Armstrong 1983, 58; and Nix & Paris 2007, 36).
- The grue paradox cannot be resolved, as was the raven paradox, by looking to background knowledge (as would be the case if it entailed color changes). Of course we know that it is extremely unlikely that any emeralds are grue. That just restates the point of the paradox and does nothing to resolve it.
- That the definition of “grue” includes a time
parameter is sometimes advanced as a criticism of the definition.
But, as Goodman remarks, were we to take “grue” and its
obverse “bleen” (“blue up to
*t*, green thereafter”) instead of “green” and “blue” as primitive terms, definitions of the latter would include time parameters (“green” =_{def}“grue if observed before*t*and bleen if observed thereafter”). The question here is whether inductive inference should be relative to the language in which it is formulated. Deductive inference is relative in this way as is Carnapian inductive logic.

### 3.3 Confirmation and deductive logic

Induction helps us to localize our actual world among all the possible
worlds. This is not to say that induction applies only in the actual
world: The premises of a good induction confirm its conclusion whether
those premises are true or false in the actual world. This leads to a
few principles relating confirmation and deduction. If *A* and
*B* are true in the same possible worlds, then whatever
*A* confirms also confirms *B* and whatever confirms
*B* also confirms *A*:

Equivalence principle:

IfAconfirmsBthen any logical equivalent ofAconfirms any logical equivalent ofB.

(We appealed to this principle in stating the paradox of the ravens
above.) A second principle follows from the truth that if *B*
logically implies *C* then every subset of the *B*
worlds is also a subset of the *C* worlds:

Implicative principle:

IfAconfirmsB, thenAconfirms every logical consequence ofB.

But we do not have that whatever implies *A* confirms whatever
*A* confirms:

That a presidential candidate wins the state of New York confirms that he will win the election.

That a candidate wins New York and loses California and Texas does not confirm that he will win the election, though “wins New York and loses California and Texas” logically implies “wins New York”.

This marks an important contrast between confirmation and logical
implication, between induction and deduction. Logical implication is
transitive: whatever implies a proposition implies all of its logical
consequences, for implication corresponds to the transitive subset
relation among sets of worlds. But when *A* implies *B*
and *B* confirms *C*, the *B* worlds in which
*C* is true may (as in the example) exclude the *A*
worlds. Inductive reasoning is said to be *non-monotonic*, for
in contrast to deduction, the addition of premises may annul what was
a good induction (the inference from the premise *P* to the
conclusion *R* be may be inductively strong while the inference
from the premises *P*, *Q* to the conclusion *R*
may not be). (See the entry on
non-monotonic logic,
and section 7.1 below for a striking example.) For this
reason induction and confirmation are subject to the *principle of
total evidence* which requires that all relevant evidence be taken
into account in every induction. No such requirement is called for in
deduction; adding premises to a valid deduction can never make it
invalid.

Yet another contrast between induction and deduction is revealed by the lottery paradox. (See section 3.3 of the entry on conditionals.) If there are many lottery tickets sold, just one of which will win, each induction from these premises to the conclusion that a given ticket will not win is a good one. But the conjunction of all those conclusions is inconsistent with the premises, for some ticket must win. Thus good inductions from the same set of premises may lead to conclusions that are conjunctively inconsistent. This paradox is at least softened by some theories of conditionals (e.g., Adams 1975).

## 4. Induction, Causality, and Laws of Nature

What we know as the problem of enumerative induction Hume took to be the problem of causal knowledge, of identifying genuine causal regularities. Hume held that all ampliative knowledge was causal and from this point of view, as remarked above, the problem of induction is narrower than the problem of causal knowledge so long as we admit that some ampliative knowledge is not inductive. On the other hand, we now think of causal connection as being a particular kind of contingent connection and of inductive reasoning as having a wider application, including such non-causal forms as inferring the distribution of a trait in a population from its distribution in a sample from that population.

### 4.1 Causal inductions

Causal inductions are a significant subclass of inductions. They form a problem, or a constellation of problems, of induction in their own right. One of the classic twentieth century accounts of the problem of induction, that of Nelson Goodman (Goodman 1955), focuses on enumerative inductions that support causal laws. Goodman argued that three forms of the problem of enumerative induction turn out to be equivalent. These were: (1) Supporting subjunctive and contrary to fact conditionals; (2) Establishing criteria for confirmation that would not stumble on the grue paradox; and (3) Distinguishing lawlike hypotheses from accidental generalizations. (A sentence is lawlike if it is like a law of nature with the possible exception of not being true.) Put briefly, a counterfactual is true if some scientific law permits inference of its consequent from its antecedent, and lawlike statements are confirmed by their instances. Thus

If Nanook of the north were in this room he would be safe from freezing,

is a true counterfactual because the law

If the temperature is well above freezing then the residents are safe from freezing,

(along with background information) licenses inference of the consequent

Nanook is safe from freezing,

from the antecedent

Nanook is in this room.

On the other hand, no such law supports a counterfactual like

If my only son were in this room he would be a third son.

Similarly, the lawlike statement

Everyone in this room is safe from freezing.

is confirmed by the instance

Nanook is in this room and is safe from freezing,

whereas

Everyone in this room is a third son,

even if true is not lawlike since instances do not confirm it.
Goodman's formulation of the problem of (enumerative) induction thus
focused on the distinction between lawlike and accidental
generalizations. Generalizations that are confirmed by their instances
Goodman called *projectible*. In these terms projectability
ties together three different questions: lawlikeness, counterfactuals,
and confirmation. Goodman also proposed an account of the distinction
between projectible and unprojectible hypotheses. Very roughly put,
this is that projectible hypotheses are made up of predicates that
have a history of use in projections.

### 4.2 Karl Popper's views on induction

One of the most influential and controversial views on the problem of induction has been that of Karl Popper, announced and argued in (Popper LSD). Popper held that induction has no place in the logic of science. Science in his view is a deductive process in which scientists formulate hypotheses and theories that they test by deriving particular observable consequences. Theories are not confirmed or verified. They may be falsified and rejected or tentatively accepted if corroborated in the absence of falsification by the proper kinds of tests:

[A] theory of induction is superfluous. It has no function in a logic of science.The best we can say of a hypothesis is that up to now it has been able to show its worth, and that it has been more successful than other hypotheses although, in principle, it can never be justified, verified, or even shown to be probable. This appraisal of the hypothesis relies solely upon deductive consequences (predictions) which may be drawn from the hypothesis: There is no need even to mention

“induction”(Popper LSD, 315).

Popper gave two formulations of the problem of induction; the first is
the establishment of the truth of a theory by empirical evidence; the
second, slightly weaker, is the justification of a preference for one
theory over another as better supported by empirical evidence. Both of
these he declared insoluble, on the grounds, roughly put, that
scientific theories have infinite scope and no finite evidence can
ever adjudicate among them (Popper LSD, 253–254,
Grattan-Guiness 2004). He did however hold that theories could be
falsified, and that falsifiability, or the liability of a theory to
counterexample, was a virtue. Falsifiability corresponds roughly to
to the proportion of models in which a (consistent) theory is
false. Highly falsifiable theories thus make stronger assertions and
are in general more informative. Though theories cannot in Popper's
view be supported, they can be *corroborated*: a better
corroborated theory is one that has been subjected to more and more
rigorous tests without having been falsified. Falsifiable and
corroborated theories are thus to be preferred, though, as the
impossibility of the second problem of induction makes evident, these
are not to be confused with support by evidence.

Popper's epistemology is almost exclusively the epistemology of scientific knowledge. This is not because he thinks that there is a sharp division between ordinary knowledge and scientific knowledge, but rather because he thinks that to study the growth of knowledge one must study scientific knowledge:

[M]ost problems connected with the growth of our knowledge must necessarily transcend any study which is confined to common-sense knowledge as opposed to scientific knowledge. For the most important way in which common-sense knowledge grows is, precisely, by turning into scientific knowledge (Popper LSD, 18).

## 5. Probability and Induction

So far only straightforward non-probabilistic forms of the problem of induction have been surveyed. The addition of probability to the question is not only a generalization; probabilistic induction is much deeper and more complex than induction without probability. The following subsections look at several different approaches: Rudolf Carnap's inductive logic, Hans Reichenbach's frequentist account, Bruno de Finetti's subjective Bayesianism, likelihood methods, and the Neyman-Pearson method of hypothesis testing.

### 5.1 Carnap's inductive logic

Carnap's classification of inductive inferences (Carnap LFP, ¶44) will be generally useful in discussing probabilistic induction. He lists five sorts:

*Direct inference*typically infers the relative frequency of a trait in a sample from its relative frequency in the population from which the sample is drawn. The sample is said to be*unbiased*to the extent that these frequencies are the same. If the incidence of lung disease among all cigarette smokers in the U.S. is 0.15, then it is reasonable to predict that the incidence among smokers in California is close to that figure.*Predictive inference*is inference from one sample to another sample not overlapping the first. This, according to Carnap, is “the most important and fundamental kind of inductive inference” (Carnap LFP, 207). It includes the special case, known as*singular predictive inference*, in which the second sample consists of just one individual. Inferring the color of the next ball to be drawn from an urn on the basis of the frequency of balls of that color in previous draws with replacement illustrates a common sort of predictive inference.*Inference by analogy*is inference from the traits of one individual to those of another on the basis of traits that they share. Hume's famous arguments that beasts can reason, love, hate, and be proud or humble (Hume THN, I.III.16, II.I.12, II.II.12) are classic instances of analogy. Disagreements about racial profiling come down to disagreements about the force of certain analogies.*Inverse inference*infers something about a population on the basis of premises about a sample from that population. Again, that the sample be unbiased is critical. The use of polls to predict election results, of controlled experiments to predict the efficacy of therapies or medications, are common examples.*Universal inference*is inference from a sample to a hypothesis of universal form. Simple enumerative induction, mentioned in the introduction and in section 3, is the standard sort of universal inference. Karl Popper's objections to induction, mentioned in section 4, are for the most part directed against universal inference. Popper and Carnap are less opposed than it might seem in this regard: Popper holds that universal inference is never justified. On Carnap's view it is inessential.

#### 5.1.1 Carnapian confirmation theory

**Note:** Readers are encouraged to read section 3.2 of
the entry
interpretations of probability
in conjunction with the remainder of this section. See also (Zabell
2007) for a thorough discussion of Carnapian induction.

Carnap initially held that the problem of confirmation was a logical problem; that assertions of degree of confirmation by evidence of a hypothesis should be analytic and depend only upon the logical relations of the hypothesis and evidence.

Carnapian induction concerns always the sentences of a language as
characterized in section 3.2 of
interpretations of probability.
The languages in question here are assumed to be interpreted,
i.e. the referents of the non-logical constants are fixed, and
identity is interpreted normally. A set of sentences of such a
language is *consistent* if it has a model in which all of its
members are true. A set is *maximal* if it has no consistent
proper superset in the language. (So every inconsistent set is
maximal.) The language in question is said to be finite if it includes
just finitely many maximal consistent subsets. Each maximal consistent
(m.c.) set says all that can be said about some possible situation
described in the language in question. The m.c. sets are thus a
precise way of understanding the notion of *case* that is
critical in the classical conception of probability
(interpretations of probability
section 3.1).

Much of the content of the theory can be illustrated, as is done in
interpretations of probability,
in the simple case of a finite language £ including just one
monadic predicate, *S* (signifying a successful outcome of a
repeated experiment such as draws from an urn), and just finitely many
individual constants, *a*_{1}, …,
*a*_{r}, signifying distinct trials or
draws.

There will in this case be 2^{r} conjunctions
*S*′(*a*_{1})
∧ …
∧
*S*′(*a*_{r}), where
*S*′(*a*_{i}) is either
*S*(*a*_{i}) (success on the
*i*th trial) or its negation
¬*S*(a_{i}). These are the *state
descriptions* of £ . Each maximal consistent set of £
will consist of the logical consequences of one of the state
descriptions, so there will be 2^{r} m.c. sets. Thus,
pursuing the affinity with the classical conception, the probability
of a sentence *e* is just the ratio

m^{†}(e) =n/2^{r}

where *n* is the number of state descriptions that imply
*e*
(interpretations of probability,
section 3.2). c-functions generalize logical implication. In the
finite case a sentence *e* logically implies a sentence
*h* if the collection of m.c. sets each of which includes
*e* is a subset of those that include *h*. The extent to
which *e* confirms *h* is just the ratio of the number
of m.c. sets including *h*
∧
*e* to the number of those including *e*. This is the
proportion of possible cases in which *e* is true in which
*h* is also true.

In this simple example, state descriptions are said to be
*isomorphic* when they include the same number of successes. A
*structure description* is a maximal disjunction of isomorphic
state descriptions. In the present example, a structure description
says how many trials have successful outcomes without saying which
trials these are. (See
interpretations of probability,
section 3.2 for examples.)

Confirmation functions all satisfy two additional qualitative logical
constraints: They are *regular*, which, in the case of a finite
language means that they assign positive value to every state
description, and they are also *symmetrical*. A function on
£ is *symmetrical* if it is invariant for thorough
permutations of the individual constants of £. That is to say,
if the names of objects are switched around the values of *c*
and *m* are unaffected. State descriptions that are related in
this way are isomorphic. “(W)e require that
logic should not discriminate between the individuals but treat them
all on a par; although we know that individuals are not alike, they
ought to be given equal rights before the tribunal of logic”
(Carnap LFP, 485).

Although regularity and symmetry do not determine a unique
confirmation function, they nevertheless suffice to derive a number of
important results concerning inductive inferences. In particular, in
the simple case of a finite language with one predicate, *S*,
these constraints entail that state descriptions in the same structure
description (with the same relative frequency of success) must always
have the same *m* value. And if *d*_{k}
and *e*_{k} are sequences giving outcomes of
trials 1, . . . , *k* (*k* < *r*) with the
same number of *S*s,

c(S(k+ 1),d_{k}) =c(S(k_{}+ 1),e_{k})

In the three-constant language of
interpretations of probability,
*c*^{†}
(*S*_{3} |
*S*′_{1} ∧
*S*′_{2}) = ½ for all values of
*S*′_{1} and *S*′_{2};
*c*^{†} is completely unaffected by the evidence:

c^{†}(S_{3}|S_{1}∧S_{2}) = ½

c^{†}(S_{3}| ¬S_{1}∧ ¬S_{2}) = ½

c^{†}(S_{3}|S_{1}∧ ¬S_{2}) = ½

This strong independence led Carnap to reject
*c*^{†} in favor of *c**. This is the
function that he endorsed in (Carnap LFP) and that is illustrated in
interpretations of probability.
*c** gives equal weight to each structure
description. Symmetry assures that the weight is equally apportioned
to state descriptions within a structure description. *c** thus
weighs uniform state descriptions, those in which one sort of outcome
predominates, more heavily than those in which outcomes are more
equally apportioned. This effect diminishes as the number of trials or
individual constants increases.

(Carnap 1952) generalized the approach of (Carnap LFP) to construct an
infinite system of inductive methods. This is the *λ
system*. The fundamental principle of the λ system is
that degree of confirmation should give some weight to the purely
empirical character of evidence and also some weight to the logical
structure of the language in question. (*c** does this.) The
λ system consists of c-functions that are mixtures of functions
that give total weight to these extremes. See the discussion in
(interpretations of probability,
section 3.2).

Two points, both mentioned in interpretations of probability (section 3.2), should be emphasized: 1. Carnapian confirmation is invariant for logical equivalence within the framing language. Logical equivalence may however outrun epistemic equivalence, particularly in complex languages. The tie of confirmation to knowledge is thus looser than one might hope. 2. Degree of confirmation is relative to a language. Thus the degree of confirmation of a hypothesis by evidence may differ when formulated in different languages.

(Carnap LFP, 569) also includes a first effort at characterizing
analogical inference. Analogies are in general stronger when the
objects in question share more properties. This rough statement
suffers from the lack of a method for counting properties; without
further precision about this, it looks that any two objects must share
infinitely many properties. What is needed is some way to compare
properties in the right way. Carnap's proposal depends upon
characterizing the strongest consistent monadic properties expressible
in a language. Given a finite language £ including only distinct
and logically independent monadic predicates, each conjunctive
predicate including for each atomic predicate either it or its
negation is a *Q-predicate*. Q-predicates are the predicative
analogue of state descriptions. Any sentence formed by instantiating a
Q-predicate with an individual constant throughout is thus a
consistent and logically strongest description of that
individual. Every monadic property expressed in £ is equivalent
to a disjunction of unique Q-predicates, and the *width* of a
property is just the number of Q-predicates in this disjunction. The
width of properties corresponds to their weakness in an intuitive
sense: The widest property is the tautological property, no object can
fail to have it. The narrowest (consistent) properties are the
Q-properties.

Let

ρ_{bc}be the conjunction of all the properties thatbandcare known to share;

ρ_{b}be the conjunction of all the properties thatbis known to have.

So ρ_{b} implies
ρ_{bc} and the analogical inference in question
is

bhas ρ_{b}

bandcboth have ρ_{bc}

chas ρ_{b}

Let *w*(ρ_{bc}) and
*w*(ρ_{b}) be the widths of
ρ_{bc} and
ρ_{b} respectively. (So in the non-trivial
case *w*(ρ_{bc}) <
*w*(ρ_{b}).)

It follows from the above that

c*(chas ρ_{b},bandchave ρ_{bc}) = [w(ρ_{bc}) + 1] / [w(ρ_{b}) + 1]

Now as the proportion of known properties of *b* shared by
*c* increases, this quantity also increases, which is as it
should be.

Although the theory does provide an account of analogical inference in simple cases, in more complicated cases, in which the analogy depends upon the similarity of different properties, it is, as it stands, insufficient. In later work Carnap and others developed an account of similarity to overcome this. See the critical remarks in (Achinstein 1963) and Carnap's response in the same issue.

### 5.2 Reichenbach's frequentism

#### 5.2.1 Reichenbach's theory of probability

Section 3.3 of interpretations of probability, as well as section 2.3 of the entry on Reichenbach should be read in conjunction with this section.

Carnap's logical probability generalized the metalinguistic relation
of logical implication to a numerical function, *c*(*h*,
*e*), that expresses the extent to which an evidence sentence
*e* confirms a hypothesis *h*. Reichenbach's probability
implication is also a generalization of a deductive concept, but the
concept generalized belongs first to an object language of events and
their properties. (Reichenbach's logical probability, which defines
probabilities of sentences, is briefly discussed below.) Russell and
Whitehead in (Whitehead 1957, vol I, 139) wrote

ρ_{x}⊃_{x}φ_{x}

which they called “formal implication”, to abbreviate

(x)(ρ_{x}⊃ φ_{x})

Reichenbach's generalization of this extends classical first-order logic to include probability implications. These are formulas (Reichenbach TOP, 45)

x∈A⊃_{p}x∈B

where *p* is some quantity between zero and one inclusive.
Probability implications may be abbreviated

A⊃_{p}B

In a more conventional notation this probability implication between properties or classes may be written

P(B|A) =p

(There are a number of differences from Reichenbach's notation in
the present exposition. Most notably he writes *P*(*A*,
*B*) rather than *P*(*B* | *A*). The
latter is written here to maintain consistency with the notations of
other sections.) Russell and Whitehead were following Peano (Peano SWP,
193) who, though he lacked fully developed quantifiers, had
nevertheless the notions of formal implication and bound and free
variables on which the *Principia* notation depends. In the
modern theory free variables are read as universally quantified with
widest scope, so the subscripted variable is redundant and the
notation has fallen into disuse. (See Vickers 1988 for a general
account of probability quantifiers including Reichenbachean
conditionals.)

Reichenbach's probability logic is a conservative extension of
classical first-order logic to include rules for probability
implications. The individual variables (*x*, *y*) are
taken to range over events (“The gun was fired”,
“The shot hit the target”) and, as the notation makes
evident, the variables *A* and *B* range over classes of
events (“the class of firings by an expert marksman”,
“the class of hits within a given range of the bullseye”)
(Reichenbach TOP, 47). The formal rules of probability logic assure
that probability implications conform to the laws of conditional
probability and allow inferences integrating probability implications
into deductive logic, including higher-order quantifiers over the
subscripted variables.

Reichenbach's rules of interpretation of probability implications
require, first, that the classes *A* and *B* be infinite
and in one-one correspondence so that their order is established. It
is also required that the limiting relative frequency

lim_{n→∞}N(A_{n}∩B_{n}) /n

where *A*_{n},
*B*_{n} are the first *n* members of
*A*, *B* respectively, and *N* gives the
cardinality of its argument, exists. When this limit does exist it
defines the probability of *B* given *A* (Reichenbach
1971, 68):

P(B|A) =_{def}lim_{n→∞}N(A_{n}∩B_{n}) /nwhen the limit exists.

The complete system also includes higher-order, or, as Reichenbach
calls them *concatenated*, probabilities. First-level probabilities
involve infinite sequences; the ordered sets referred to by the
predicates of probability implications. Second-order probabilities are
determined by lattices, or sequences of sequences. Here is a
simplified sketch of this (Reichenbach 1971, chapter 8; Reichenbach
1971, ¶41).

b_{11}b_{12}… b_{1j}… lim _{n→∞}[N(B_{1n}∩C) /n] =p_{1}b_{21}b_{22}… b_{2j}… lim _{n→∞}[N(B_{2n}∩C) /n] =p_{2}… b_{i1}b_{i2}… b_{ij}… lim _{n→∞}[N(B_{in}∩C) /n] =p_{i}

All the *b*_{ij} are members of
*B*, some of which are also members of *C*. Each row
*i* gives a sequence of members of *B*:

{b_{i}} = {b_{i1},b_{i2}, … }

Where *B*_{in} is the sequence

B_{in}= {b_{i1},b_{i2}, …,b_{in}}

of the first *n* members of the sequence
{*b*_{i}}, we assume that the limit, as
*n* increases without bound, of the proportion of these that
are also members of *C*,

lim_{n→∞}[N(B_{in}∩C) /n]

exists for each row. Hence each row determines a probability,
*p*_{i} :

P_{i}(C|B) = lim_{n→∞}[N(B_{in}∩C) /n] =p_{i}

Now let {*a*_{i}} be a sequence of members
of the set *A* and consider the sequence of pairs

{<a_{1},p_{1}>, <a_{2},p_{2}>, …, <a_{i},p_{i}>, … }

Let *p* be some quantity between zero and one inclusive. For
given *m* the proportion of *p*_{i} in
the first *m* members of this sequence that are equal to
*p* is

[N_{i ≤ m}(p_{i}=p) /m]

Suppose that the limit of this quantity as *m* increases
without bound exists and is equal to *q*:

lim_{m→∞}[N_{i ≤ m}(p_{i}=p) /m] =q

We may then identify *q* as the second order *probability
given A that the probability of C given B is p*:

P{[P(C|B) =p] |A} =q

The method permits higher order probabilities of any finite degree corresponding to matrices of higher dimensions. It is noteworthy that Reichenbach's theory thus includes a logic of expectations of probabilities and other random variables.

Before turning to Reichenbach's account of induction, there are three questions about the interpretation of probability to consider. These are

*1. The problem of extensionality.* The values of the
variables in Reichenbach's theory are events and ordered classes of
events. The theory is in these respects extensional; probabilities do
not depend on how the classes and events of their arguments are
described or intended:

IfA=A′ andB=B′ thenP(x∈B|x∈A) =P(x∈B′ |x∈A′)

Ifx=x′ andy=y′ thenP(x∈A|y∈B) =P(x′ ∈A|y′ ∈B)

But probability attributions are intensional, they vary with differences in the ways classes and events are described. The class of examined green things is also the class of examined grue things, but the role of these predicates in probabilistic inference should be different. Less exotic examples are easy to come by. Here is an inference that depends upon extensionality:

The next toss = the next head ⇒

P(xis a head |x= the next toss) =P(xis a head |x= the next head) = 1

The next toss = the next tail ⇒

P(xis a head |x= the next toss) =P(xis a head |x= the next tail) = 0

Since (The next toss = the next head) or (The next toss = the next tail),

P(xis a head |x= the next toss) = 1 orP(xis a head |x= the next toss) = 0

To block this inference one should have to block replacing “the next toss” by “the next head” and “the next toss” by “the next tail” within the scope of the probability operator, but extensionality of that operator allows just these replacements. Reichenbach seems not to appreciate this difficulty.

*2. The problem of infinite sequences*. This is the problem
of the application of the definition of probability, which presumes
infinite sequences for which limits exist, to actual cases. In the
world of our experience sequences of events are finite. This looks to
entail that there can be no true statements of the form
*P*(*B* | *A*) = p.

The problem of infinite sequences is a consequence of a quite general problem about reference to infinite totalities; such totalities cannot be given in extension and require always some intensional way of being specified. This leaves the extensionality of probability untouched, however, since there is no privileged intension; the above argument continues to hold. Reichenbach distinguishes two ways in which classes can be specified; extensionally, by listing or pointing out their members, and intensionally, by giving a property of which the class is the extension. Classes specified intensionally may be infinite. Some classes may be necessarily finite; the class of organisms, for example, is limited in size by the quantity of matter in the universe; but in some of these cases the class may be theoretically, or in principle, infinite. Such a class may be treated as if it were infinite for the purposes of probabilistic inference. Although our experience is limited to finite subsets of these classes, we can still consider theoretically inifinite extensions of them.

*3. The problem of single case probabilities.* Probabilities
are commonly attributed to single events without reference to
sequences or conditions: The probability of rain tomorrow; the
probability that Julius Caesar was in Britain in 55 BCE, seem not to
involve classes.

From a frequentist point of view, single case probabilities are of
two sorts. In the first sort the reference class is implicit. Thus,
when we speak of the probability of rain tomorrow, we take the
suppressed reference class to be days following periods that are
meteorologically similar to the present period. These are then treated
as standard frequentist probabilities. Single case probabilities of
this sort are hence ambiguous; for shifts in the reference class will
give different single case probabilities. This ambiguity, sometimes
referred to as the problem of the reference class, is ubiquitous;
different classes *A* will give different values for
*P*(*B* | *A*). This is not so much a shortcoming
as it is a fact of inductive life and probabilistic inductive
inference. Reichenbach's principle governing the matter is that one
should always use the smallest reference class for which reliable
statistics are known. This principle has the same force as the
Carnapian requirement of total evidence.

In other cases, the presence of Julius Caesar in Britain is an
example, there seems to be no such reference class. To handle such
cases Reichenbach introduces logical probabilities defined for
collections of propositions or sentences. The notion of truth-value is
generalized to allow a continuum of *weights*, from zero to one
inclusive. These weights conform to the laws of probability, and in
some cases may be calculated with respect to sequences of
propositions. The probability statement will then be of the form

P(x∈B|x∈A) =p

where A is a reference class of propositions (those asserted by
Caesar in *The Gallic Wars*, for example) and *B* is the
true subclass of these.

This account of single-case probabilities obviously depends essentially upon testimony, not to amplify and expand the reach of induction, but to make induction possible.

Reichenbach's account of single-case probabilities contrasts with subjectivistic and logical views, both of which allow the attribution of probabilities to arbitrary propositions or sentences without reference to classes. In the Carnapian case, given a c-function the probability of every sentence in the language is fixed. In subjectivistic theories the probability is restricted only by coherence and the probabilities of other sentences.

#### 5.2.2 Reichenbachian induction

On Reichenbach's view, the problem of induction is just the problem
of ascertaining probability on the basis of evidence (Reichenbach
TOP, 429). The conclusions of inductions are not asserted, they are
*posited*. *“A posit is a statement with which we deal
as true, though the truth value is unknown”* (Reichenbach
TOP, 373).

Reichenbach divides inductions into several sorts, not quite parallel to the Carnapian taxonomy given earlier. These are:

Induction by enumeration, in which an observed initial frequency is posited to hold for the limit of the sequence;

Explanatory inference, in which a theory or hypothesis is inferred from observations;

Cross induction, in which distinct but similar inductions are compared and, perhaps, corrected;

Concatenationor hierarchical assignment of probabilities.

These all resolve to the first—induction by
enumeration—in ways to be discussed below. The problem of
induction (by enumeration) is resolved by the *inductive rule*,
also known as the *straight rule*:

If the relative frequency ofBinA=N(A_{n}∩B_{n}) /nis known for the firstnmembers of the sequenceAand nothing is known about this sequence beyondn, then we posit that the limit lim_{n→∞}[N(A_{n}∩B_{n}) /n] will be within a small increment δ ofN(A_{n}∩B_{n}) /n.

(This corresponds to the Carnapian λ-function
*c*_{0} (λ(κ) = 0) which gives total
weight to the empirical factor and no weight to the logical
factor. See
interpretations of probability,
3.2.)

We saw above how concatenation works. It is a sort of induction by enumeration that amounts to reiterated applications of the inductive rule. Cross induction is a variety of concatenation. It amounts to evaluating an induction by enumeration by comparing it with similar past inductions of known character. Reichenbach cites the famous example of inferring that all swans are white from many instances. A cross induction will list other inductions on the invariability of color among animals and show them to be unreliable. This cross induction will reveal the unreliability of the inference even in the absence of counterinstances (black swans found in Australia). So concatenation, or hierarchical induction, and cross induction are instances of induction by enumeration.

Explanatory inference is not obviously a sort of induction by enumeration. Reichenbach's version (Reichenbach TOP, ¶85) is ingenious and too complex for summary here. It depends upon concatenation and the approximation of universal statements by conditional probabilities close to 1.

Reichenbach's justification of induction by enumeration is known as
a *pragmatic justification*. (See also Salmon 1967,
52–54.) It is first important to keep in mind that the
conclusion of inductive inference is not an assertion, it is a posit.
Reichenbach does not argue that induction is a sound method, his
account is rather what Salmon (Salmon 1963) and others have referred
to as *vindication*: that if any rule will lead to positing the
correct probability, the inductive rule will do this, and it is,
furthermore, the simplest rule that is successful in this sense.

What is now the standard difficulty with Reichenbach's rule of induction was noticed by Reichenbach himself and later strengthened by Wesley Salmon (Salmon 1963). It is that for any observed relative frequency in an initial segment of any finite length, and for any arbitrarily selected quantity between zero and one inclusive, there exists a rule that leads to that quantity as the limit on the basis of that observed frequency. Salmon goes on to announce additional conditions on adequate rules that uniquely determine the rule of induction. More recently Cory Juhl (Juhl, 1994) has examined the rule with respect to the speed with which it approaches a limit.

### 5.3 Subjectivism and Bayesian induction: de Finetti

Section 3 of the article Bayes' theorem should be read in conjunction with this section.

#### 5.3.1 Subjectivism

Bruno de Finetti (1906–1985) is the founder of modern subjectivism in probability and induction. He was a mathematician by training and inclination, and he typically writes in a sophisticated mathematical idiom that can discourage the mathematically naïve reader. In fact, the deep and general principles of de Finetti's theory, and in particular the structure of the powerful representation theorem, can be expressed in largely non-technical language with the aid of a few simple arithmetical principles. De Finetti himself insists that “questions of principle relating to the significance and value of probability [should] cease to be isolated in a particular branch of mathematics and take on the importance of fundamental epistemological problems,” (de Finetti FLL, 99) and he begins the first chapter of the monumental “Foresight” by inviting the reader to “consider the notion of probability as it is conceived by us in everyday life” (de Finetti FLL, 100).

Subjectivism in probability identifies probability with strength of belief. Hume was in this respect a subjectivist: He held that strength of belief in a proposition was the proportion of assertive force that the mind devoted to the proposition. He illustrates this with the famous example of a six-sided die (Hume THN, 127–130), four faces of which bear one mark and the other two faces of which bear another mark. If we see the die in the air, he says, we can't avoid anticipating that it will land with some face upwards, nor can we anticipate any one face landing up. In consequence the mind divides its force of anticipation equally among the faces and conflates the force directed to faces with the same mark. This is what constitutes a belief of strength 2/3 that the die will land with one mark up, and 1/3 that it will land with the other mark up.

There are three evident difficulties with this account. First is
the unsatisfactory identification of belief with mental force, whether
divided or not. It is, outside of simple cases like the symmetrical
die, not at all evident that strength of feeling is correlated with
strength of belief; some of our strongest beliefs are, as Ramsey says
(Ramsey 1931, 169), accompanied by little or no feeling. Second, even
if it is assumed that strength of feeling entails strength of belief, it
is a mystery why these strengths should be additive as Hume's example
requires. Finally, the principle according to which belief is
apportioned equally among exclusive and exhaustive alternatives is not
easy to justify. This is known as the *principle of
indifference*, and it leads to paradox if unrestricted. (See
interpretations of probability,
section 3.1.) The same situation may be partitioned into alternative
outcomes in different ways, leading to distinct partial beliefs. Thus
if a coin is to be tossed twice we may partition the outcomes as

2 Heads, 2 Tails, (Heads on 1 and Tails on 2), (Tails on 1 and Heads on 2)

which, applying the principle of indifference yields *P*(2
Heads) = 1/4

or as

Zero Heads, One Head, Two Heads

which yields *P*(2 Heads) = 1/3.

Carnap's c-functions *c** and *c*^{†},
mentioned in section 5.1 above, provide a more substantial example:
*c*^{†} counts the state descriptions as
alternative outcomes and *c** counts the structure descriptions
as outcomes. They assign different probabilities. Indeed, the
continuum of inductive methods can be seen as a continuum of different
applications of the principle of indifference.

These difficulties with Hume's mentalistic view of strength of belief
have led subjectivists to associate strength of belief not with
feelings but with actions, in accordance with the pragmatic principle
that the strength of a belief corresponds to the extent to which we
are prepared to act upon it. Bruno de Finetti announced that
“PROBABILITY DOES NOT EXIST!” in the beginning paragraphs
of his *Theory of Probability* (de Finetti TOP). By this he
meant to deny the existence of objective probability and to insist
that probability be understood as a set of constraints on partial
belief. In particular, strength of belief is taken to be expressed in
betting odds: If you will put up *p* dollars (where, for
example, *p* = 0.25) to receive one dollar if the event
*A* occurs and nothing (forfeiting the *p* dollars) if
*A* does not occur, then your strength of belief in *A*
is *p*. If £ is a language like that sketched above, the
sentences of which express events, then a *belief system* is
given by a function *b* that gives betting odds for every
sentence in £. Such a system is said to be *coherent* if
there is no set of bets in accordance with it on which the believer
must lose. It can be shown (this is the “Dutch Book
Theorem”) that all and only coherent belief systems satisfy the
laws of probability. (See
interpretations of probability,
section 3.5.2, and section 3 of the entry on
Bayesian epistemology
as well as the supplement to the latter on Dutch Book arguments
for comprehensive discussions.) The Dutch Book Theorem provides a
subjectivistic response to
the question of what probability has to do with partial belief; namely
that the laws of probability are minimal laws of calculative
rationality. If your partial beliefs don't conform to them then there
is a set of bets all of which you will accept and on which your gain
is negative in every possible world.

As just cited the Dutch Book Theorem is unsatisfactory: It is
clear, at least since Jacob Bernoulli's *Ars Conjectandi* in
1713 that the odds at which a reasonable person will bet vary with the
size of the stake: A thaler is worth more to a pauper than to a rich
man, as Bernoulli put it. This means that in fact betting systems are
not determined by monetary odds. Subjectivists have in consequence
taken strength of belief to be given by betting odds when the stakes
are measured not in money but in utility. (See
interpretations of probability,
section 3.5.3.) Frank Ramsey was the first to do this in (Ramsey
1926, 156–198). Leonard J. Savage provided a more sophisticated
axiomatization of choice in the face of uncertainty (Savage
1954). These, and later, accounts, such as that of Richard Jeffrey
(Jeffrey LOD) still face critical difficulties, but the general
principle that associates coherent strength of belief with probability
remains a fundamental postulate of subjectivism.

#### 5.3.2 Bayesian induction

Of the five sorts of induction mentioned above (section 5.1), de
Finetti is concerned explicitly only with predictive inference, though
his account applies as well to direct and inverse inference. He
ignores analogy, and he holds that no particular premises can support
a general hypothesis. The central question of induction is, he says,
“if a prediction of frequency can be, in a certain sense,
confirmed or refuted by experience. … [O]ur explanation of
inductive reasoning is nothing else, at bottom than the knowledge of
… the probability of *E*_{n + 1}
evaluated when the result *A* of [trials]
*E*_{1}, …, *E*_{n} is
known” (de Finetti 1964, 119). That is to say that for de
Finetti, the singular predictive inference is the essential inductive
inference.

One conspicuous sort of inverse inference concerns relative
frequencies. Suppose, for example, from an urn containing balls each
of which is red or black, we are to draw (with replacement) three
balls. What should our beliefs be before drawing any balls? The
classical description of this situation is that the draws are
independent with unknown constant probability, *p*, of drawing
a red ball. (Such probabilities are known as *Bernoullian*
probabilities, recalling that Jacob Bernoulli based the law of large
numbers on them.) Since the draws are independent, the probability of
drawing a red on the second draw given a red on the first draw is

P(R_{2}|R_{1}) =P(R_{2}) =p

where *p* is an unknown probability. Notice that Bernoullian
probabilities are invariant for variations in the order of draws: If
*A*(*n*, *k*) and *B*(*n*,
*k*) are two sequences of length *n* each including just
*k* reds, then

b[A(n,k)] =b[B(n,k)] =p^{k}(1 −p)^{(n − k)}

De Finetti, and subjectivists in general, find this classical account unsatisfactory for several reasons. First, the reference to an unknown probability is, from a subjectivistic point of view, unintelligible. If probabilities are partial beliefs, then ignorance of the probability would be ignorance of my own beliefs. Secondly, it is a confusion to suppose that my beliefs change when a red ball is drawn. Induction from de Finetti's point of view is not a process for changing beliefs. Induction proceeds from reducing uncertainty in prior beliefs about certain processes.

[T]he probability ofE_{n+1}evaluated when one comes to know the resultAof [trials]E_{1}, …,E_{n}is not an element of an essentially novel nature (justifying the introduction of a new term, like “statistical” or “a posteriori” probability.) This probability is not independent of the “a prioriprobability” and does not replace it; it flows in fact from the samea priorijudgment, by subtracting, so to speak, the components of doubt associated with the trials whose results have been obtained (de Finetti FLL, 119, 120).

In the important case of believing the probability of an event to be close to the observed relative frequency of events of the same sort, we learn that certain initial frequencies are ruled out. It is thus critical to understand the nature of initial uncertainty and initial dispositional beliefs, i.e., initial dispositions to wager.

De Finetti approaches the problem of inverse inference by
emphasizing a fundamental feature of our beliefs about random
processes like draws from an urn. This is that, as in the Bernoullian
case, our beliefs are invariant for sequences of the same length with
the same relative frequency of success. For each *n*
and *k* ≤ *n* our belief that there will
be *k* reds in *n* trials is the same regardless of the
order in which the reds and blacks occur. Probabilities (partial
beliefs) of this sort are
*exchangeable*.^{[1]}
If *b*(*n*, *k*) is our
prior belief that *n* trials will yield *k* reds in some
order or other then, since there are

( nk) = n! /k!(n−k)!

distinct sequences of length *n* with *k* reds, the
mean or average probability of *k* reds in *n* trials is
given by the prior belief divided by this quantity:

b(n,k)/ ( nk)

and in the exchangeable case, in which sequences of the same length
and frequency of reds are equiprobable, this is the probability of
each sequence of this sort. Hence, where *b* gives prior belief
and *A*(*n*, *k*) is any given sequence including
*k* reds and *n*−*k* blacks;

b[A(n,k)]= b(n,k)/ ( nk)

In an important class of subcases we might have specific knowledge about the constitution of the urn that can lead to further refinement of exchangeable beliefs. If, for example, we know that there are just three balls in the urn, each either red or black, then there are four exclusive hypotheses incorporating this information:

H_{0}: zero reds, three blacks

H_{1}: one red, two blacks

H_{2}: two reds, one black

H_{3}: three reds zero blacks

Let the probabilities of these hypotheses be
*h*_{0}, *h*_{1},
*h*_{2}, *h*_{3}, respectively. Of
course in the present example

b(R_{j}|H_{0}) = 0

b(R_{j}|H_{3}) = 1

for each *j*. Now if *A*(*n*, *k*) is
any individual sequence of *k* reds and
*n*−*k* blacks, then, since the
*H*_{i} are exclusive and exhaustive
hypotheses,

b[A(n,k)] = ∑_{i}b[A(n,k) ∧H_{i}] = ∑_{i}b[A(n,k) |H_{i}]h_{i}

In the present example each of the conditional probabilities
*b*[ | *H*_{i}]
represents draws from an urn of known composition. These are just
Bernoullian probabilities with probability of success (red):

b(R_{j}|H_{0}) = 0

b(R_{j}|H_{1}) = 1/3

b(R_{j}|H_{2}) = 2/3

b(R_{j}|H_{3}) = 1

*b* (and this is true of exchangeable probabilities in general) is
thus *conditionally Bernoullian*. If we write

p_{i}(X) =b[X|H_{i}]

then for each sequence *A*(*n*, *k*) including
*k* reds in *n* draws,

p_{i}[A(k,n)] =p_{i}(R_{j})^{k}[1 −p_{i}(R_{j})]^{(n − k)}

we see that *b* is a mixture or weighted average of Bernoullian
probabilities where the weights, summing to one, are the
*h*_{i}.

b(X) = ∑_{i}p_{i}(X)h_{i}

#### 5.3.3 The de Finetti Representation Theorem (finite case)

This is a special case of de Finetti's representation theorem. The general statement of the finite form of the theorem is:

Ifbis any exchangeable probability on finite sequences of a random phenomenon thenbis a finite mixture of Bernoullian probabilities on those sequences.

It is easy to see that exchangeable probabilities are closed under
finite mixtures: Let *b* and *c* be exchangeable,
*m* and *n* positive quantities summing to one, and
let

f=mb+nc

be the mixture of *b* and *c* with weights *m*
and *n*. Then if *A* and
*B* are sequences of length *n* each of which includes just *k* reds:

mb(A) =mb(B),nc(A) =nc(B)

mb(A) +nc(A) =mb(B) +nc(B)

f(A) =f(B)

Hence since, as mentioned above, all Bernoullian probabilities are exchangeable, every finite mixture of Bernoullian probabilities is exchangeable.

To see how the representation theorem works in induction, let us
take the *H*_{i} to be equiprobable, so
*h*_{i} = 1/4 for each *i*. (We'll see
that this assumption diminishes in importance as we continue to draw
and replace balls.) Then for each *j*,

b(R_{j}) = (1/4)[(0) + (1/3) + (2/3) + 1] = 1/2

and

b(R_{2}|R_{1})= (1/4)[∑ _{i}p_{i}(R_{1}∧R_{2}) / (1/4)[∑_{i}p_{i}(R_{1})]= [0 + (1/9) + (4/9) + 1] / [0 + (1/3) + (2/3) + 1] = (14/9) / 2 = 7/9

thus updating by taking account of the evidence
*R*_{1}. In this way exchangeable probabilities take
account of evidence, by, in de Finetti's phrase, “subtracting,
so to speak, the components of doubt associated with the trials whose
results have been obtained”.

Notice that *R*_{1} and *R*_{2} are
not independent in *b*:

b(R_{2}) = 1/2 ≠b(R_{2}|R_{1}) = 7/9

so *b* is not Bernoullian. Hence, though all mixtures of
Bernoullian probabilities are exchangeable, the converse does not
hold: *Bernoullian probabilities are not closed under
mixtures*, for *b* is the mixture of the Bernoullian
probabilities *p*_{i} but is not itself
Bernoullian. This reveals the power of the concept of exchangeability:
*The closure of Bernoullian probabilities under mixtures is just
the totality of exchangeable probabilities.*

We can also update beliefs about the hypotheses
*H*_{i}. By Bayes' law (See the article
Bayes' Theorem and section 5.4.1 on
likelihoods below) for each *j*:

b(H_{j}|R_{1}) =b(R_{1}|Hj)hj/ ∑_{i}b(R_{1}|H_{i})h_{i}

so

b(H_{0}|R_{1})= 0 b(H_{1}|R_{1})= (1/3)(1/4) / (2/3)(1/4) + (1)(1/4) = (1/12) / (1/12) + (2/12) + (3/12) = (1/12) / (1/2) = 1/6 b(H_{2}|R_{1})= (2/3)(1/4) / (1/2) = (2/12) / (1/2) = 1/3 b(H_{3}|R_{1})= (1)(1/4) / (1/2) = 1/2

Thus the initial assumption of the flat or
“indifference” measure for the
*h*_{i} loses its influence as evidence
grows.

We can see de Finettian induction at work by representing the three-ball problem in a tetrahedron:

Each point in this solid represents an exchangeable measure on the
sequence of three draws. The vertices mark the pure Bernoullian
probabilities, in which full weight is given to one or another
hypothesis *H*_{i}. The indifference measure
that assigns equal probability 1/4 to each hypothesis is the center of
mass of the tetrahedron. As we draw successively (with replacement)
from the urn, updating as above, exchangeable beliefs, given by the
conditional probabilities

b[R_{(n + 1)}|A(n,k)]

move within the solid. Drawing a red on the first draw puts beliefs
before the second draw in the plane bounded by
*H _{1}, H_{2}*, and

*H*at a point corresponding to the weights

_{3}*h*= 1/6,

_{1}*h*= 1/3 and

_{2}*h*= 1/2. If a black is drawn on the second draw then, conditioning on the evidence (

_{3}*R*

_{1}∧

*B*

_{2})

b(H_{0}|R_{1}∧B_{2}) = 0

By Bayes' theorem,

b(H_{1}|R_{1}∧B_{2}) =b(R_{1}∧B_{2}|H_{1})h_{1}/b(R_{1}∧B_{2}|H_{1})h_{1}+b(R_{1}∧B_{2}|H_{2})h_{2}

b(H_{2}|R_{1}∧B_{2}) =b(R_{1}∧B_{2}|H_{2})h_{2}/b(R_{1}∧B_{2}|H_{1})h_{1}+b(R_{1}∧B_{2}|H_{2})h_{2}

Now *b*( |*H*_{1}) and
*b*( |*H*_{2}) are Bernoullian,
so

b(R_{1}∧B_{2}|H_{1}) =b(B_{2}|H_{1})b(R_{1}|H_{1})

and

b(R_{1}∧B_{2}|H_{2}) =b(B_{2}|H_{2})b(R_{1}|H_{2})

Since *h*_{1} = *h*_{2}

b(H_{1}|R_{1}∧B_{2})= (2/3)(1/3) / (2/3)(1/3) + (1/3)(2/3) = 1/2 b(H_{2}|R_{1}∧B_{2})= 1/2

Beliefs are now at the midpoint of the line
connecting *H*_{1}
and *H*_{2}. Continued draws will move conditional
beliefs along this line. Suppose now that we continue to draw with
replacement, and that *A*(*n*,*k*), with
increasing *n*, is the sequence of draws. Maintaining
exchangeability and updating assures that as the number *n* of
draws increases without bound, conditional beliefs

b[R_{(n + 1)}|A(k,n)]

are practically certain to converge to one of the Bernoullian measures

b(R|H_{i})

The Bayesian method thus provides a solution to the problem of induction as de Finetti formulated it.

#### 5.3.4 Exchangeability

We gave a definition of exchangeability: Every sequence of the same
length with the same frequency of reds has the same probability. In
fact, for given *k* and *n*, this probability is always
equal to the probability of *k* reds followed by
*n*−*k* blacks,

b(R_{1}, …,R_{k},B_{k+1}, …,B_{n})= b(n,k)/ ( nk)

(where *b*(*n*, *k*) = the probability of
*k* reds in *n* trials, in some order or other) for, in the
exchangeable case, probability is invariant for permutations of
trials. There are alternative definitions: First, it follows from the
first definition that

b(R_{1}, …,R_{n}) =b(n,n)

and this condition is also sufficient for exchangeability.
Finally, if the concept of exchangeability is extended to random
variables we have that a sequence {*x*_{i}} of
random variables is exchangeable if for each *n* the mean
μ(*x*_{1}, …,
*x*_{n}) is the same for every
*x*_{1}, …,
*x*_{n}. See the
Supplement on Basic Probability.

The above urn example consists of an objective system—an urn
containing balls—that is known. Draws from such an urn are
random because the precise prediction of the outcomes is very
difficult, if not impossible, due to small perturbing causes (the
irregular agitation of the balls) not under our control. But in the
three-ball example, because there are just four possible contents,
described in the four hypotheses, the perturbations don't affect the
fact that there are just eight possible outcomes. As the number of
balls increases we add hypotheses, but the basic structure remains;
our beliefs continue to be exchangeable and the de Finetti
representation theorem assures that the probability of drawing
*k* reds in *n* trials is always expressed in a
formula

b(n,k) = ∑_{i}h_{i}{p_{i}(R)^{k}[1 −p_{i}(R)]^{(n − k)}}

where the *h*_{i} give the probabilities of
the hypotheses *H*_{i}. In the simple urn
example, this representation has the very nice property that its
components match up with features of the objective urn system: Each
value of *p*_{i} corresponds to a constitution
of the urn in which the proportion of red balls is
*p*_{i}, and each
*h*_{i} is the probability of that
constitution as described in the hypothesis
*H*_{i}. Epistemically, the
*p*_{i} are, as we saw above, conditional
probabilities:

p_{i}(X) =b(X|H_{i})

that express belief in *X* given the hypothesis
*H*_{i} about the constitution.

The critical role of the objective situation in applications of exchangeability becomes clear when we reflect that, as Persi Diaconis puts it, to make use of exchangeability one must believe in it. We must believe in a foundation of stable causes (solidity, number, colors of the balls; gravity) as well as in a network of variable and accidental causes (agitation of the balls, variability in the way they are grasped). There are, in Hume's phrase, “a mixture of causes among the chances, and a conjunction of necessity in some particulars, with a total indifference in others” (Hume THN, 125f.). It is this entire objective system that supports exchangeability. The fundamental causes must be stable and constant from trial to trial. The variable and accidental causes should operate independently from trial to trial. To underscore this Diaconis gives the example of a basketball player practicing shooting baskets. Since his aim improves with continued practice, the frequency of success will increase and the trials will not be exchangeable; the fundamental causes are not stable. Indeed, de Finetti himself warns that “In general different probabilities will be assigned, depending on the order; whether it is supposed that one toss has an influence on the one which follows it immediately, or whether the exterior circumstances are supposed to vary” (de Finetti FLL, 121).

We count on the support of objective mechanisms even when we cannot
formulate even vague hypotheses about the stable causes that
constitute it. De Finetti gives the example of a bent coin, deformed
in such a way that before experimenting with it we have no idea of its
tendency to fall heads. In this case our prior beliefs are plausibly
represented by a “flat” distribution that gives equal
weight to each hypothesis, to each quantity in the [0, 1]
interval. The de Finetti theorem says that in this case the
probability of *k* heads in *n* tosses is

b(n,k)= ∫ ( nk) p^{k}(1 −p)^{(n−k)}f(p)d(p)

where *f*(*p*) gives the weights of the different
Bernoullian probabilities (hypotheses) *p*. We may remain
ignorant about the stable causes (the shape and distribution of the
mass of the coin, primarily) even after de Finetti's method applied to
continued experiments supports conditional beliefs about the strength
of the coin's tendency to fall heads. We may insist that each
Bernoullian probability, each value for *p*, corresponds to a
physical configuration of the coin, but, in sharp contrast to the urn
example, we can say little or nothing about the causes on which
exchangeability depends. We believe in exchangeability because we
believe that whatever those causes are they remain stable through the
trials while the variable causes (such as the force of the throw) do
not.

#### 5.3.5 Meta-inductions

Suppose that you are drawing with replacement from an urn containing a thousand balls, each either red or black, and that you modify beliefs according to the de Finetti formula

b[R_{(k + 1)}|A(k,n)] = ∑_{i}h_{i}[b(A(k,n) |R_{j})b(R_{j}|H_{i})]

where the *h*_{i} give the probabilities of
the updated 1001 hypotheses about the constitution of the
urn. Suppose, however, that unbeknownst to you each time a red ball is
drawn and replaced a black ball is withdrawn and replaced with a red
ball. (This is a variation of the Polya urn in which each red ball
drawn is replaced and a second red ball added.)

Without going into the detailed calculation it is evident that your exchangeable beliefs are in this example not supported. To use exchangeability one must believe in it, and to use it correctly, one might add, that belief must be true; de Finettian induction requires a prior assumption of exchangeability.

Obviously no sequence of reds and blacks could provide evidence for the hypothesis of exchangeability without calling it into question; exchangeability entails that any sequence in which the frequency of reds increases with time has the same probability as any of its permutations. The assumption is however contingent and ampliative and should be subject to inductive support. It is worth recalling Kant's thesis, that regularity of succession in time is the schema, the empirical manifestation, of causal connection. From this point of view, exchangeability is a precise contrary of causality, for its “schema”, its manifestation, is just the absence of regularity of succession, but with constant relative frequency of success. The hypothesis of exchangeability is just that the division of labor between the stable and the variable causes is properly enforced; that the weaker force of variable causes acting in the stable setting of fundamental causes varies order without varying frequency. In the case of gambling devices and similar mechanisms we can provide evidence that the fundamental and determining causes are stable: We can measure and weigh the balls, make sure that none are added or removed between trials, drop the dice in a glass of water, examine the mechanism of the roulette wheel. In less restricted cases—aircraft and automobile accidents, tables of mortality, consumer behavior—the evidence is much more obscure and precarious.

#### 5.3.6 Uncertain Evidence and Jeffrey's Probability Kinematics

Bayesian induction has traditionally taken inductive inference to consist in updating prior beliefs, beliefs before taking account of evidence, on the basis of that evidence. A simple rule of this sort is:

Updating:

IfEis the evidence observed betweent_{0}andt_{1}, then for each propositionX,b_{1}(X) =b_{0}(X|E)

De Finetti, as we've seen, resists distinguishing prior from posterior
beliefs. Updating for him amounts just to using *b*(*X*
| *E*) as belief in *X* after observation of *E* removes
“components of doubt” from *E*.

*Updating* in this form has the following difficulty: If *E* is
the observed evidence then *Updating* implies that
*b*_{1}(*E*) = *b*_{0}(*E*
| *E*) = 1. Evidence, that is to say, becomes certain. De
Finetti's account at least flirts with this difficulty: If doubt is
removed, then *E* becomes certain. But evidence is often if not
typically uncertain. Here is Jeffrey's example.

The agent inspects a piece of cloth by candlelight, and gets the impression that it is green, although he concedes that it might be blue or even (but very improbably) violet. IfG,BandVare the propositions that the cloth is green, blue or violet respectively, then the outcome of the observation might be that, whereas originally his degrees of belief inG,BandVwere .30, .30 and .40, his degrees of belief in those same propositions after his observation are .70, .25 and .05. (Jeffrey FLL, 165)

Here there seems to be no evidence *E* such
that *b*_{1}(*G*)
= *b*_{0}(*G* | *E*).

Jeffrey's resolution of this problem is to take account not only of
the support provided by the uncertain evidence *E* for a
hypothesis *H*, but also of the support for *H* provided by
¬*E*, the negation of *E*.

Jeffrey conditionalization:

b_{1}(H) =b_{0}(H|E)b_{1}(E) +b_{0}(H| ¬E)b_{1}(¬E)

Jeffrey conditionalization is a consequence of the principle that
conditional beliefs should not change from *t*_{0} to
*t*_{1}; in particular that

b_{1}(H|E) =b_{0}(H|E) andb_{1}(H| ¬E) =b_{0}(H| ¬E)

together with the truth

b_{1}(H) =b_{1}(H|E)b_{1}(E) +b_{1}(H| ¬E)b_{1}(¬E)

See (Jeffrey FLL, chapter 11) and Section 6.2 of Bayesian epistemology for more complete accounts.

### 5.4 Testing statistical hypotheses

A *statistical hypothesis* states the distribution of some
random variable. (See the supplementary document
Basic Probability
for a brief description of random variables.) The support of
statistical hypotheses is thus an important sort of inductive
inference, a sort of inverse inference. In a wide class of cases the
problem of induction amounts to the problem of formulating good
conditions for accepting and rejecting statistical hypotheses. Two
specific approaches to this question are briefly surveyed here; the
method of likelihood ratios and that of Neyman-Pearson
statistics. Likelihood can be given short shrift since it is treated
in depth and detail in the article on
inductive logic.
General methodological questions about sampling and the separation of
effects are ignored here. What follows are brief descriptions of the
inferential structures.

Logical, frequentist, and subjectivistic views of induction presuppose specific accounts of probability. Accounts of hypothesis testing on the other hand do not typically include specific theories of probability. They presume objective probabilities but they depend only upon the commonly accepted laws of probability and upon classical principles relating probabilities and frequencies.

#### 5.4.1 Likelihood ratios and the law of likelihood

If *h* is a hypothesis and *e* an evidence statement
then the *likelihood of h relative to e* is just the probability
of *e* conditional upon *h*:

L(h|e) =P(e|h)

Likelihoods are in some cases objective. If the hypothesis implies
the evidence then it follows from the laws of probability that the
likelihood *L*(*h* | *e*) is one. Even when not
completely objective, likelihoods tend to be less relative than the
corresponding confirmation values: If we draw a red ball from an urn
of unknown constitution, we may have no very good idea of the extent
to which this evidence confirms the hypothesis that 2/3 of the balls
in the urn are red, but we don't doubt that the probability of drawing
a red ball given the hypothesis is 2/3. (See
inductive logic, section 3.1.)

Isolated likelihoods are not good indicators of inductive support;
*e* may be highly probable given *h* without confirming
*h*. (If *h* implies *e*, for example, then the
likelihood of *h* relative to *e* is 1, but
*P*(*h* | *e*) may be very small.) Likelihood is
however valuable as a method of comparing hypotheses: The
*likelihood ratio* of hypotheses *g* and *h*
relative to the same evidence *e* is the quotient

L(g|e) /L(h|e)

Likelihood ratios may have any value from zero to infinity
inclusive. The *law of likelihood* says roughly
that *if* *L*(*g* | *e*)
> *L*(*h* | *e*) *then* *e*
*supports* *g* *better than it
does* *h*. (See section 3.2 of the article on
inductive logic
for a more precise formulation.)

The very general intuition supporting the method of likelihood ratios is just inference to the best explanation; accept that hypothesis among alternatives that best accounts for the evidence. Likelihoods figure importantly in Bayesian inverse inference.

#### 5.4.2 Significance tests

Likelihood ratios are a way of comparing competing statistical hypotheses. A second way to do this consists of precisely defined statistical tests. One simple sort of test is common in testing medications: A large sample of people with a disease is treated with a medication. There are then two contradictory hypotheses to be evaluated in the light of the results:

h_{0}: The medication has no effect. (This is thenull hypothesis.)

h_{1}: The medication has some curative effect. (This is thealternative hypothesis.)

Suppose that the known probability of a spontaneous cure, in an
untreated patient, is *p*_{c}, that the sample
of treated patients has *n* members, and that the number of
cures in the sample is
*k*_{e}. Suppose further that sampling has
been suitably randomized so that the sample of *n* members
(before treatment) has the structure of *n* draws without
replacement from a large population. If the diseased population is
very large in comparison with the size *n* of the sample, then
draws without replacement are approximated by draws with replacement
and the sample can be treated as a collection of independent and
equiprobable trials. In this case, if *C* is a group
of *n* untreated patients, for each *k* between zero
and *n* inclusive the probability of *k* cures
in *C* is given by the binomial formula:

P(kcures inC)= b(n,k,p_{c})= ( nk) p_{c}^{k}(1 −p_{c})^{(n − k)}

If the null hypothesis, *h*_{0}, is true we should
expect the probability of *k* cures in the sample to be the
same:

P(kcures in the sample |h_{0})= P(kcures inC)= b(n,k,p_{c})= ( nk) p_{c}^{k}(1 −p_{c})^{(n − k)}

Let *k*_{c} =
*p*_{c}*n*. This is the expected number
of spontaneous cures in *n* untreated patients. If
*h*_{0} is true and the medication has no effect,
*k*_{e} (the number of cures in the medicated
sample) should be close to *k*_{c} and the
difference

k_{e}−k_{c}

(known as the *observed distance*) should be small. As
*k* varies from zero to *n* the random variable

k−k_{c}

takes on values from −*k*_{c} to
*n* − *k*_{c} with
probabilities

b(n, 0,p_{c}),b(n, 1,p_{c}), …,b(n,n,p_{c})

This binomial distribution has its mean at *k* =
*k*_{c}, and this is also the point at which
*b*(*n*, *k*, *p*_{c})
reaches its maximum. A histogram would look something like this.

Distribution ofk−k_{c}

Given *p*_{c} and *n*, this
distribution gives the probability that the observed distance has the
different possible sizes between its minimum,
−*k*_{c}, and its maximum at
*n* − *k*_{c}; probabilities of the
different values of *k* − *k*_{c}
^{}are on the abscissa. *The significance level of the test
is the probability given h _{0} of a distance as large as the
observed distance*.

A high significance level means that the observed distance is
relatively small and that it is highly likely that the difference is
due to chance, i.e. that the probability of a cure given medication is
the same as the probability of a spontaneous, unmedicated, cure. In
specifying the test an upper limit for the significance level is
set. If the significance level exceeds this limit, then the result of
the test is confirmation of the null hypothesis. Thus if a low limit
is set (limits on significance levels are typically 0.01 or 0.05,
depending upon cost of a mistake) it is easier to confirm the null
hypothesis and not to accept the alternative hypothesis. *Caeteris
paribus*, the lower the limit the more severe the test; the more
likely it is that *P*(cure | medication) is close to
*p*_{e} = *k*_{e} /
*n*.

This is not the place for an extended methodological discussion,
but one simple principle, obvious upon brief reflection, should be
mentioned. This is that the size *n* of the sample must be
fixed in advance. Else a persistent researcher could, with arbitrarily
high probability, obtain any ratio *p*_{e} =
*k*_{e} / *n* and hence any observed
difference *k*_{e} −
*k*_{c} desired; for, in the case of Bernoulli
trials, for any frequency *p* the probability that at some
*n* the frequency of cures will be *p* is arbitrarily
close to one.

#### 5.4.3 Power, size, and the Neyman-Pearson lemma

If *h* is any statistical hypothesis a test of *h*
can go wrong in either of two ways: *h* may be rejected though
true—this is known as a *type I error*; or it may be
accepted though false—this is a *type II* error.

If *f* is a (one-dimensional) random variable that takes on
values in some interval of the real line with definite probabilities
and *h* is a statistical hypothesis that determines a
probability distribution over the values of *f*, then a
*pure* *statistical test* of *h* specifies an
experiment that will yield a value for *f* and specifies also a
region of values of *f*—*the rejection region* of
the test. If the result of the experiment is in the rejection region,
then the hypothesis is rejected. If the result is not in the rejection
region, the hypothesis is not rejected. A *mixed statistical
test* of a hypothesis *h* includes a pure test but in
addition divides the results not in the rejection region into two
sub-regions. If the result is in the first of these regions the
hypothesis is not rejected. If the result is in the second sub-region
a further random experiment, completely independent of the first
experiment, but with known prior probability of success, is
performed. This might be, for example, drawing a ball from an urn of
known constitution. If the outcome of the random experiment is
success, then the hypothesis is not rejected, otherwise it is
rejected. Hypotheses that are not rejected may not be accepted, but
may be tested further. This way of looking at testing is quite in the
spirit of Popper. Recall his remark that

The best we can say of a hypothesis is that up to now it has been able to show its worth, and that it has been more successful than other hypotheses although, in principle, it can never be justified, verified, or even shown to be probable. This appraisal of the hypothesis relies solely upon deductive consequences (predictions) which may be drawn from the hypothesis … (Popper LSD, 315)

A hypothesis that undergoes successive and varied statistical tests shows its worth in this way. Popper would not call this process “induction”, but statistical tests are now commonly taken to be a sort of induction.

Given a statistical test of a hypothesis *h* two critical
probabilities determine the merit of the test. The *size* of
the test is the probability of a type *I* error; the
probability that the hypothesis will be rejected though true; and the
*power* of the test is the chance of rejecting *h* if it
is false. A good test will have small size and large power.

size = Prob(rejecthandhis true)

power = Prob(rejecthandhis false)

The *Fundamental Lemma of Neyman-Pearson* asserts that
**for any statistical hypothesis of any given size, there is a unique
test of maximum power** (known as a *best test* of that
size). The best test may be a mixed test, and this is sometimes said
to be counterintuitive: A mixed test (tossing a coin, drawing a ball
from an urn) may, as Mayo puts it, “even be irrelevant to the
hypothesis of interest” (Mayo 1996, 390). Mixed tests bear an
uncomfortable resemblance to consulting tea leaves. Indeed, recent
exponents of the Neyman-Pearson approach favor versions the theory
that do not depend on mixed tests (Mayo 1996, 390 n.).

### 5.5 Formal learning theory

Formal learning theory formulates the problem of induction in general terms as the question of how an agent should use empirical data to confirm and reject hypotheses about the world. In specific instances the theory sets goals of inquiry and compares methods for pursuing those goals. (See the early parts of the entry on the topic for an introduction to the approach. The comparison of the methods embodied in the hypotheses ‘All emeralds are green’ and ‘All emeralds are grue’ is a striking example that reveals the basic workings of the theory.)

Formal learning theory, like many other inductive methods, seeks deductive proof of the reliability of chosen inductive methods. (This effort is discussed in section 8.3 below.)

See (Suppes 1998) for a brief, critical and laudatory appraisal of the theory.

## 6. Induction, Values, and Evaluation

### 6.1 Pragmatism: induction as practical reason

In 1953 Richard Rudner published “The Scientist *qua*
Scientist Makes Value Judgments” in which he argued for the
thesis expressed in its title. Rudner's argument was simple and can be
sketched in the framework of the Neyman-Pearson model of hypothesis
testing: “[S]ince no hypothesis is ever completely verified, in
accepting a hypothesis the scientist must make the decision that the
evidence is *sufficiently* strong or that the probability is
*sufficiently* high to warrant the acceptance of the
hypothesis” (Rudner 1953, 2). Sufficiency in such a decision
will and should depend upon the importance of getting it right or
wrong. Tests of hypotheses about drug toxicity may and should have
smaller size and larger power than those about the quality of a
“lot of machine stamped belt buckles”. The argument is not
restricted to scientific inductions; it shows as well that our
everyday inferences depend inevitably upon value judgments; how much
evidence one collects depends upon the importance of the consequences
of the decision.

Isaac Levi in responding to Rudner's claim, and to later formulations
of it, distinguished cognitive values from other sorts of values;
moral, aesthetic, and so on. (Levi 1986, 43–46) Of course the
scientist *qua* scientist, that is to say in his scientific
activity, makes judgments and commitments of cognitive value, but he
need not, and in many instances should not, allow other sorts of
values (fame, riches) to weigh upon his scientific inductions.

What is in question is the separation of practical reason from theoretical reason. Rudner denies the distinction; Levi does too, but distinguishes practical reason with cognitive ends from other sorts. Recent pragmatic accounts of inductive reasoning are even more radical. Following (Ramsey 1926) and (Savage 1954) they subsume inductive reasoning under practical reason; reason that aims at and ends in action. These and their successors, such as (Jeffrey LOD), define partial belief on the basis of preferences; preferences among possible worlds for Ramsey, among acts for Savage, and among propositions for Jeffrey. (See section 3.5 of interpretations of probability). Preferences are in each case highly structured. In all cases beliefs as such are theoretical entities, implicitly defined by more elaborate versions of the pragmatic principle that agents (or reasonable agents) act (or should act) in ways they believe will satisfy their desires: If we observe the actions and know the desires (preferences) we can then interpolate the beliefs. In any given case the actions and desires will fit distinct, even radically distinct, beliefs, but knowing more desires and observing more actions should, by clever design, let us narrow the candidates.

In all these theories the problem of induction is a problem of
decision, in which the question is which action to take, or which
wager to accept. The pragmatic principle is given a precise
formulation in the injunction to act so as to maximize expected
utility, to perform that action, *A*_{i} among
the possible alternatives, that maximizes

U(A_{i}) = ∑_{j}P(S_{j}|A_{i})U(S_{j}∧A_{i})

where the *S*_{j} are the possible
consequences of the acts *A*_{i}, and
*U* gives the utility of its argument.

### 6.2 On the value of evidence

One significant advantage of this development is that the cost of
gathering more information, of adding to the evidence for an inductive
inference, can be factored into the decision. Put very roughly, the
leading idea is to look at gathering evidence as an action on its
own. Suppose that you are facing a decision among acts
*A*_{i}, and that you are concerned only about
the occurrence or non-occurrence of a consequence *S*. The
principle of utility maximization directs you to choose that act
*A*_{i} that maximizes

U(A_{i}) = ∑jP(S_{j}|A_{i})U(S_{j}∧A_{i})

where the *S*_{j} are the possible
consequences of the acts *A*_{i} and U represents utility.

Suppose further that you have the possibility of investigating to see
if evidence *E*, for or against *S*, obtains. Assume
further that this investigation is cost-free. Then should you
investigate and find *E* to be true, utility maximization would
direct you to choose that act *A*_{i} that
maximizes utility when your beliefs are conditioned on *E*:

U_{E}(A_{i}) =P(S|E∧A_{i})U(S∧E∧A_{i}) +P(¬S|E∧A_{i})U(¬S∧E∧A_{i})

And if you investigate and find *E* to be false, the same
principle directs you to choose *A*_{i} to
maximize utility when your beliefs are conditioned on
¬*E*:

*U*_{¬}_{E}(*A*_{i})
= *P*(*S* | ¬*E*∧*A*_{i})*U*(*S*∧¬*E*∧*A*_{i}) + *P*(¬*S* |
¬*E*∧*A*_{i})*U*(¬*S*∧¬*E*∧*A*_{i})

Hence if your prior strength of belief in the evidence *E* is
*P*(*E*), you should choose
to maximize the weighted average

P(E)(U_{E}(A_{i}) +P(¬E)(U_{¬E}(A_{i})

and if the maximum of this weighted average exceeds the maximum
of *U*(*A*_{i}) then you should
investigate. About this several brief remarks:

- Notice that the utility of investigation depends upon your beliefs about your future beliefs and desires, namely that you believe now that following the investigation you will maximize utility and update your beliefs.
- Investigation in the actual world is normally not cost-free. It takes time, trouble, sometimes money, and is sometimes dangerous. A general theory of epistemic utility should consider these factors.
- I. J. Good (Good 1967) proved that in the cost-free case
*U*(*A*_{i}) can never exceed*U*_{E}(*A*_{i}) and that when the utilities of outcomes are distinct the latter always exceeds the former (Skyrms 1990, chapter 4). - The question of bad evidence is critical. The evidence gathered might take you further from the truth. (Think of drawing a succession of red balls from an urn containing predominantly blacks.)

### 6.3 Predictions

#### 6.3.1 A thesis about induction and probability

This thesis has two interlocking parts: Part one is announced in the title of a paper that supports it: “Why Probability Does Not Capture the Logic of Scientific Justification” (Kelly and Glymour 2004). Part two concerns relations between computation and induction of a particular sort; the claim is that inductive inference of this sort is better understood as structured like computation than in terms of probabilities defined on Boolean (or sigma) algebras (Kelly and Schulte 1995). We look first at part two.

The sort of inference in question postulates a large finite or denumerable sequence of outcomes or trials coded as natural numbers. In the simplest case this will be a sequence of zeros and ones. (1 = green, 0 = not green, for example, where the sequence represents draws without replacement from a large collection of emeralds.) There may also be a special sign (!) to mark the end of the inquiry. The hypothesis tested by such a sequence may in the simplest case be obvious. (All emeralds are green, or some emeralds are not green, for example.) The parallels with computation are evident; a sequence of natural numbers might also be the output of a Turing machine.

Here is how Kelly and Schulte (Kelly and Schulte 1995) situate the discussion.

One intuitive distinction between algorithms and inductive methods is that the former confer certainty whereas the latter do not. This certainty derives from two factors (1) a logical guarantee that the algorithm will produce the right answer on each input in a specified class, and (2) the fact that the algorithm halts, thereby signalling to the user in an unambiguous way what its output is. Hume's problem might be expressed by saying that there is no procedure for inductive inference that has both properties. … The standard response to this difficulty is to exempt inductive inference from condition (1). (Kelly and Schulte 1995, 3)

Freed from the constraint of certainty — condition (1) — induction “looks very different from the theory of computability:” its main concern is to manage uncertainty, mostly by means of probability. This is in sharp contrast to the theory of computability, clearly and traditionally involved with the non-probabilistic methods of modern logic.

Kelly and Schulte propose a different resolution:

[R]elax condition (2) without relaxing condition (1), so that an inductive method is guaranteed to converge to the right answer, but need not inform the user when it has done so. (Kelly and Schulte 1995, 3)

(See the entry Formal Learning Theory where convergence is treated in some detail.)

When one compares induction and computation with respect to their
truth conditions the contrast between them is sharp and deep; the
conjectures of computation are necessarily true or necessarily false,
while inductive conjectures are contingently true or contingently
false. The Kelly–Schulte proposal, on the other hand, invites
the comparison of induction and computation in the experience of the
judging subject or phenomenologically. From this point of view they
have quite similar structures; computational uncertainty doesn't feel
that different from inductive uncertainty and the methodological
parallels between computation and induction are clear. This
conceptual shift, subsuming inductive inference under the regime of
calculation, invites and supports an analogous revision in vocabulary:
An inductive inference is *verifiable* if the hypothesis if
true will be revealed as such at some time. So, *some emeralds are
not green* is verifiable: if it is true then at some stage the
coding sequence will include a zero. An inference
is *refutable* if the hypothesis if false will be revealed as
such at some time. So, *all emeralds are green* is refutable.
An inference is *decidable* if both verifiable and refutable.
So that *the first emerald to be examined will be green *is
decidable. These are just the fundamental categories of computability
for arithmetical functions, now applied to inductive inferences as
well.

A second dimension that orders both realms — inductions and
computations — classifies methods (both inductive and
computational) in terms of the ease and rapidity with which they
converge to solutions. A system (the application of a method to a
coding sequence and a hypothesis) converges *with certainty *if
it announces by a sign when the hypothesis in question is
confirmed. The output !, mentioned above, might function in this way.
A sequence converges to an output *n* *in the limit* if
after some trial *k* every later trial yields *n*. And a
sequence converges *gradually* to *n* if after some trial
*k* every output is within a fixed and small rational distance
of *n*.

It is not the least advantage of this schema that it supports parallel orderings of inductive and computational methods in the dimensions of induction and computation. There are nine stages in the two dimensions of verifiability and convergence, ranging from decidable methods that converge with certainty to refutable methods that converge gradually. Each stage applies equally to empirical and to purely computational problems Further, these stages correspond to the Kleene arithmetical hierarchy of functions (in the computational case) and to the Borel hierarchy of sets (in the empirical case).

As concerns the first part of the thesis — the inadequacy of
probabilistic accounts of scientific justification — there are
first some oft cited difficulties. One of these is the apparent
impossibility of conditionalizing on contingent propositions of
probability zero. The denumerable additivity of probability in the
continuous case raises this in a pointed and critical
way.^{[2]}
A second well known difficulty with Bayesian accounts is the
requirement of logical transparency or omniscience. These accounts
either identify probability with strength of partial belief or at
least require that partial belief conform to the laws of probability.
If belief is defined on the sentences of a structured language, as in
interpretations of probability and sections 5.1 - 5.4 above, then the
probability of every necessarily true sentence must be one. The
concept of necessity at work is typically left unspecified, but it is
difficult to avoid the consequence that if *A*
implies *B* then the probability of *A* cannot exceed
that of *B*. That is to say that strength of belief is
non-decreasing through logical entailment. And this, it is argued,
is unrealistic.

The major argument of (Kelly and Glymour 2004) in support of the
principle that probability cannot provide an adequate account of
scientific justification extends the latter difficulty. Any good
account of scientific reason, they say, must classify and account for
the complexity and difficulty of inductive inference. The above
ordering of methods by verifiability and convergence gives an at least
preliminary classification. Bayesian methods on the other hand are
incapable of accounting for complexity and the interplay of conjecture
and refutation - logical omniscience runs roughshod over the critical
distinctions. The probability P(*h*|*e*) of a given
hypothesis *h* conditional on changing evidence *e* may
fluctuate from close to zero to close to one as *e*
accumulates. This violates the central principle of convergence: A
conjecture if false will be rejected at some stage, and if true will
never be rejected.

The Kelly–Schulte claim cited above: “[R]elax condition (2) without relaxing condition (1), so that an inductive method is guaranteed to converge to the right answer, but need not inform the user when it has done so,” asserts one of the principal desiderata of formal learning theory. Indeed (Kelly and Schulte 1995), together with (Kelly and Glymour 2004) can well be read as motivating prolegomena to formal learning theory.

#### 6.3.2 Prediction Games

Yet another sort of meta-induction involves *prediction
games*. In the simplest case, as in formal learning theory, the
data consist in a sequence *x* = *x*(1), *x*(2),
… of zeros and ones, understood as coding the outcomes or
trials of a process. There are also given at each trial the
predictions of each of a group of experts,
*e*_{1}, … , *e*_{k} for
the next trial. *The meta-inductivist*,
*M*, after trial *n* knows the outcomes *x*(1),
… *x*(*n*) and knows also the past predictions
and the prediction for trial *n* + 1 of each expert. On the
basis of this information
*M* tries to predict *x*(*n* + 1), the outcome of
the next trial. *The problem of induction in this setting is to
find a good general method for such predictions.*

Initially the talk of experts is
just a picturesque way of referring to data streams or sequences,
though at a later stage the experts may be thought of as embodiments
of theories that issue predictions given inputs. The approach is
generalized (briefly discussed below) to treat sequences of values of
a real-valued variable. The exposition here follows the approach of
Gerhard Schurz in a recent article (Schurz
2008)^{[3]}
which makes use of results of (Cesa-Bianchi
and Lugosi 2006). Schurz's account also lends itself to computer
simulations of meta-inductive methods.

##### Learning from the experts

One method for amalgamating the views of the experts is to follow
the majority of them. A famous classical theorem — the
*Condorcet Jury Theorem* — supports this. The import of
the theorem can be expressed as follows. (See Black 1963 for a clear
and accessible proof.)

*Suppose that a group of people each expresses a yes-no opinion
about the same matter of fact, that they reach and express these
opinions independently, and that each has better than 0.5 probability
of being right. Then as the size of the group increases without bound
the probability that a majority will be right approaches one.*

(The condition can be weakened; probabilities need not uniformly exceed 0.5. The theorem also applies to quantitative estimates in which more than two values are in question.) To see why the theorem holds, consider a very simple special case in which everyone has exactly 2/3 probability of being right. Amalgamating the opinions then corresponds to drawing once from each urn in a collection in which each urn contains two red (true) balls and one black (false) ball. The weak law of large numbers entails that as the number of urns, and hence draws, increases without bound the probability that the relative frequency of reds (or true opinions) drawn differs from 2/3 by a fixed small quantity approaches zero. (See the supplementary document Basic probability.) This also underscores the importance of the diversity requirement; if the experts all reached the same conclusion on the basis of the same sources, however independently, the conclusion would be no better supported than that reached by any one of them. And, of course, the requirement that the probabilities, or a sufficient number of them, exceed 0.5 is critical: If these probabilities are all less than 0.5 the theorem implies that a majority will be wrong in the limit.

The Condorcet Jury Theorem, as interesting as it is, can apply in
only very few and special cases of prediction. What one wants are
methods for finding the best performing experts with no assumptions
about their competence or intentions; some experts may be malign
deceivers, some may be simply ignorant. Of course no meta-method can
create knowledge where none exists; an *optimal* meta-method is
one that finds the best performing expert.

A very simple method for exploiting the predictions of the experts is to ignore all subsequent predictions of any expert who predicts wrongly at any trial and to predict the outcome predicted by a majority of the so far infallible experts for the next trial. (If the infallible experts are evenly divided, just flip a coin.)

How good is this method? Suppose, to simplify even more, that we know
that one of the experts — we don't know which one or ones
— is infallible. This expert predicts correctly at every
trial. Now a simple argument (Cesa-Bianchi and Lugosi 2006, 4)
establishes an upper bound on the number of mistakes the majority
method can make as a function of the number *s* of experts: The
essential principle is just that each time the method makes a mistake,
at least half of the so far infallible experts are discarded, for at
least half of those experts (those whose prediction was mimicked by
the meta-method) made that same mistake. Let the total number of
experts be
*s* and the number of infallible experts remaining after
the *n*th mistake be *e*(*n*). So
*e*(0) = *s* and, since
some expert is always infallible, for each *n*

1 ≤

e(n) ≤e(n− 1) / 2 ≤ …. ≤e(0) / 2 =k/21 ≤

s/ 2^{n}2

^{n}≤s

n≤ log_{2}(s)

Thus the total number of mistakes can never exceed
log_{2}(*s*). If there is just one
expert, assumed to be infallible, the method yields 0 =
log_{2}(1) mistakes. In general, if the total number
*s* of experts is 2^{k} for
some integral *k*, then the bound on the number of mistakes is
just *k*. If, for example, every possible sequence of length *r*
represents the predictions of some expert, the upper bound on the
number of mistakes in *r* trials is just the length of these
sequences, or the number of trials. This is hardly an impressive
result, nor is it intended as such. The point is just to illustrate
the general framework.

One obvious shortcoming of the simple majority method is that it takes insufficient account of differences in the accuracy of the methods it exploits; the expert who has made one error in the first 100 trials has no more weight on the 101st trial than does the expert who made 100 mistakes. There is also the possibility of a counter-inductive grue-like situation in which the majority of heretofore infallible experts is always wrong, in which case the method leads to the maximum number of mistakes.

##### Dealing with deceivers

An obvious fix for the obvious shortcoming is to make use of the
*success scores* and *rates* of the experts. We define
the *success score* of the expert *e* at
trial *n* as the number of successful predictions by
*e* up to and including *n*,

SS(e,n) = Number of correct predictions byeat trials ≤n

And the *success rate* of *e* at *n* as
the ratio

sr(e,n) = SS(e,n) /n

One might then follow the leader — follow the prediction for
the next trial of the expert with the best success rate up to the
present trial. What success rate is guaranteed by following the leader?
Zero, for the leaders at a given trial may in every case predict
wrongly at the succeeding trial. Following the leader can be more
reliable if there is one leader to follow; one expert who maintains a
best success rate. Let us say that an expert
*b* is *best* if there is a trial
*t*_{b} after
which *b*'s success rate is never less than
that of any other expert. (*Never* less; best is forever.)
There need be no best expert; two or more experts might continually
switch the lead among them so that for every *n* every expert makes a
mistake at some trial after *n*. If however there is a best
expert *b* then the method of following the
leader will assure that after *t*_{b} *M*'s
truncated success rate

Number of successes aftert_{b}/n−t_{b}

will equal that of the best experts. But following the leader
entails nothing about *M*'s success rate *before*
*t*_{b}; this might be zero. It is zero, for example,
if every expert who becomes a leader
before *t*_{b} predicts wrongly on the next
trial. Hence if a best expert *b*
makes *p* successful predictions
between *t*_{b} and *n*, following the
leader will assure *M* a success rate overall
of *p*/*n* (since *M* may have made no correct
predictions up to *t _{b}*). Hence
as

*t*

_{b}increases and

*n*remains the same the lower bound on success rate guaranteed by following the leader diminishes. If

*t*

_{b}=

*n*− 1, the rate is 1/

*n*.

An expert who predicts wrongly when his success rate is highest is a
*systematic deceiver*. Systematic deceivers can be detected by
conditionalizing success rates: If an expert's success rate
overall is significantly higher than his success rate on trials for
which his rate is highest, then he deceives systematically.
*If systematic deceivers are ignored then the method of
following the leader can assure a success rate that approximates the
maximal success rate of non-deceivers.* (Schurz 2008
Theorem 3). This does *not* show that following the leader
yields an optimal success rate; some systematic deceiver may have a
higher success rate than any non-deceiver. Following the leader thus
does not assure an optimal rate of success, a rate at least as good as
that of any expert.

In fact a simple example (Cesa-Bianchi and Lugosi 2006, 67) shows
that there can be no optimal method for the two-valued case: Let there
be two experts one of whom always predicts one and the other of whom
always predicts zero, and let the data stream include at each trial
just the opposite of *M*'s prediction. Then at least one of the
experts has a success rate of at least 0.5, and the success rate of *M*
is constantly zero.

##### The continuous case; weighted average prediction

Schurz went on to show that in the continuous case, in which
outcomes and predictions take real values in the closed [0, 1]
interval, a much stronger result is available. This depends upon the
notion of *attraction* of experts: An expert
*e* is *attractive* (to the
meta-inductivist *M*) at trial *n* if
*e*'s success rate at *n* exceeds that of
*M*, and the (strength of) attraction of
*e* at *n* is just the difference

At(

e,n,M) = SS(e,n) − SS(M,n) if this difference is positiveAt(

e,n) = 0 otherwise

The *relative attraction*, *ρ*, of *e* (to
*M*) at *n* is just the normalized ratio of
*e*'s attraction at *n* to the sum of the
attractions at *n* of all experts:

ρ(e,n) = At(e,n) / ∑_{i}At(e,_{i}n)

At each trial *n*, as *e*
varies, *ρ*(*e*, *n*) is a probability on the
collection of experts, corresponding for each
*e* to the extent of
*e*'s expertise at *n*. Experts whose
attraction is not positive, i.e., whose success rate at *n* does not
exceed *M*'s, have relative attraction zero at *n*.

Let P(*e*, *n* + 1) be the prediction at *n* of
the expert *e* for trial *n* + 1. At each *n*,
P(*e*, *n* + 1) is a finite random variable with
distribution *ρ*(*e*,
*n*). The *weighted prediction* of *e* at
*n* for *n* + 1 weights this random variable
at *n* by the probability
*ρ*(*e*, *n*).

WP(e,n+ 1) =ρ(e,n) P(e,n+ 1) = [At(e,n) / ∑_{i}At(e_{i},n)]P(e,n+ 1)

And the *average weighted prediction* of all positively
attractive experts at *n* is the weighted mean of the distribution of
P(*e*, *n* + 1)

π(

n+ 1) = ∑_{j}[WP(e_{j},n+ 1)] / ∑_{i}At(e_{i},n) = [∑_{j}At(e_{j},n)P(e_{j},n+ 1)] / ∑_{i}At(e_{i},n)](For

n= 0 set π(n+ 1) = 0.5.)

The *weighted average* method predicts *x*(*n* +
1) = π(*n* + 1)

Of course the data stream describes a series of contingent events.
The weighted prediction method cannot affect the data stream and thus
cannot assure that M's estimates or predictions are not uniformly far
from the values of the outcomes *x*(*n*). What can be
shown, however (Schurz Theorem
4^{[4]})
is that under plausible structural constraints as the number n of
trials increases π(*n* + 1) becomes increasingly close to the
predictions of the maximally correct expert or experts.
*The difference between M's success rate and the
maximum success rate of all experts approaches zero as the number n of
trials increases without bound.*

##### The binary case revisited

We saw above that there can be no optimal method, no method that
assures a maximal success rate, in the two-valued case. The success
rates of systematic deceivers may always be inaccessible. Of course the
special instance of the continuous case, in which elements of the data
stream and the predictions of experts are always either zero or one
while π — the meta-inductive prediction — is in the
closed [0, 1] interval, falls within the scope of the above result; the
meta-inductivist will in the limit approach the maximal success rate
(of zero – one expert predictions). There will necessarily be
non-integral quantities interspersed in this sequence, and it rings a
bit false to call these *predictions*; the meta-inductivist may
know that the data are all integers, and he would in this case be
announcing predictions that he knew *a priori* to be false. It
serves clarity and plausibility to call the weighted averages what they
are: *estimates*.

The resolution of the two-valued case can however be improved by
applying the method of weighted – average prediction to binary
data streams. The principle of the application is to use a cooperating
*team* of meta-inductivists. One then applies weighted average
prediction as above to find at each trial *n* the value
π(*n* + 1). In an extremely simple illustration, which may
nevertheless reveal the leading idea, we assume that this prediction
(or estimate) is a rational in the [0, 1] interval. Suppose now that
π(*n* + 1) = *p*/*q* and that there are
just *q* meta-inductivists all told. Then the method directs
that *p* meta-inductivists predict one and the
remaining *q* − *p* of them predict zero. The
(considerable) complications to accommodate irrational quantities and
different numbers of meta-inductivists accomplished, it can be shown
that *the mean success rate of the meta-inductive team
approaches the maximal expert success rate in the limit.*
(Schurz Theorem 5)

It should be emphasized that it is perfectly compatible with weighted
average prediction that at every trial π(*n* + 1) is far
from
*x*(*n* + 1). What is assured is that no expert can be
much better, or in the limit any better, than this estimate.

Is there good reason to believe weighted average predictions? Not without some reason to credit the uniformity of the experts: that past success rates are good predictors of future success rates. Schurz briefly discusses some ways of supplementing the simple method by assumptions of reliability and the use of nomological predicates. These efforts aside, it is an advantage of meta-induction in general, and of its weighted-average form in particular, that it is free of synthetic principles and can thus contrast different object-inductive methods without substantive presupposition.

## 7. Induction, deduction and rationality

### 7.1 The project of D.C. Williams and D.C. Stove

A quite different view of the problem of induction asks not whether induction can be shown to lead to truth, but whether it is a*rational*or

*reasonable*process. This is the approach of D.C. Williams and David Stove, and also of David Armstrong. (Williams 1947, Stove 1986, Armstrong 1991) We look first at the Williams-Stove account.

Williams argued in (Williams 1947) that one form of inductive
inference is a reasonable method and that this proposition is, in
fact, a necessary truth; Stove repeated the argument with a few
corrections and reformulations four decades later. By claiming that
induction is ‘reasonable’ Williams intended not only that
it is *characterized by ordinary sagacity*. Indeed, he says
that an aptitude for induction is just what we mean by ‘ordinary
sagacity’. His claim is that induction is reasonable in the
stronger (and not quite standard) sense of being
“*logical* or *according to logic.*”
(Williams 1947, 23)

Williams and Stove intended their accounts to defend reason against Hume's argument, in (Hume THN I.III.VI) discussed in section 2 above, that the principle of the uniformity of nature can be supported neither by deductive (“demonstrative”) arguments nor by contingent probabilistic methods, and hence that inductive inference is the work not of reason but of the imagination. Hume, according to Williams held that:

although our nervous tissue is so composed that when we have encountered a succession ofMs which arePwe naturally expect the rest of theMs to beP, and although this expectation has been borne out by the event in the past, the series of observations never provided a jot of logical reason for the expectation, and the fact that the inductive habit succeeded in the past is itself only a gigantic coincidence, giving no reason for supposing it will succeed in the future. (Williams 1947, 15)

Williams and Stove, for their part, maintain that, though there may be no demonstrative proof of the principle of the uniformity of nature, there are good demonstrative or deductive proofs that certain inductive methods yield their conclusions with high probability.

We first give an expository reconstruction of the Williams-Stove argument and then, guided by the analyses of Patrick Maher (Maher 1996) and Scott Campbell (Campbell 2001), remark on some of its complications and difficulties.

The specific form of inductive inference favored by Williams and Stove
is what Carnap called *inverse inference*; inference to a
character of a population on the basis of premises about a sample from
that
population.^{[5]}
Williams and Stove focus on inverse inferences about relative frequency. In
particular on inferences of the form:

- The relative frequency of the trait
*R*in the sufficiently large sample*S*from the finite population*X*is*r*.*f*(*R*|*S*) =*r*

therefore

- The relative frequency of
*R*in*X*is close to*r*.*f*(*R*|*X*) ≈*r*

(Williams 1947, 12; Stove 1986, 71–75) (This includes of
course the special case in which *r* = 1.)

Williams and Stove both set out to show that it is necessarily true that the inference from (i) to (ii) has high probability:

Given a fair sized sample, then, from any [finite] population, with no further material information, we know logically that it very probably is one of those which [approximately] match the population, and hence that very probably the population has a composition similar to that which we discern in the sample. This is the logical justification of induction. (Williams 1947, 97)

Both Williams and Stove (Williams 1947, 162; Stove 1986, 77, 131–144) recognize that induction may depend upon context and also upon the nature of the traits and properties to which it is applied. Neither pretends to resolve the inductive paradoxes, and Stove, at least, does not propose to justify all inductions: “That all inductive inferences are justified is false in any case” (Stove 1986, 77).

Williams' initial argument was simple and persuasive. It turns out, however, to have difficulties. In response to one of these difficulties Stove weakened the thesis considerably, but this response may not be sufficient. There is the further problem that the sense of necessity at issue is not made precise and becomes increasingly stressed as the sometimes contentious dialectic plays out.

There are two lemmata or principles on which the Williams-Stove argument depends:

Lemma 1.IfXis a large finite population in which the relative frequency of a characterRisr, it is necessarily true that the relative frequency ofRin most large samples from that population will be close tor.

Lemma 2.(The proportional syllogism.) When probability is symmetrical,^{[6]}the probability that an individual in a finite population has a traitRis equal to the relative frequency of that trait in the population.

For remarks on the proofs, see the Supplement on the Two Lemmata.

Williams simple argument begins with an induction in a
‘hyperpopulation’ (Williams 1947, 94-96) of all samples of
given size *k* (‘*k*-samples’) drawn from a
large finite population *X* of individuals. The
‘individuals’ of the hyperpopulation
are *k*-samples of individuals from the
population *X*.

Now let Prob be a symmetrical probability. For given
population *X*, trait *R* and *k*-sample
*S*_{0} from *X* in which the r.f. of *R*
is *r*, the content of (i) above can be expressed in two
premises:

Premise A.S_{0}is ak-sample fromX.

Premise B. The r.f. ofRinS_{0}isr, i.e.,f(R|S_{0}) =r.

Williams argued as follows. It follows from Lemma 1 that

(1) The r.f. ofk-samples (in the hyperpopulation) that resembleXis high.

It follows from (1) and Lemma 2 that

(2) Prob(S_{0}resemblesX) is high

It follows from Premise B that

(3) Prob[f(R|X) ≈r|S_{0}resemblesX] is high

It follows from (2) and (3) that

(4) Prob[f(R|X) ≈r] is high

Hence, goes the argument, (i) above implies (ii)

We might like to reason in this way, and Williams did reason in this
way, but as Stove pointed out in (Stove 1986, 65) the argument is not
sound; it ignores the requirement of total evidence: Inductive
inference in general and inductive conditional probabilities in
particular are not monotonic; adding premises may change a good
induction to a bad one and adding conditions may change, and sometimes
reduce, the value of a conditional probability. Here (3) depends on
Premise B but suppresses mention of it, thus failing to respect the
requirement to take explicit account of all relevant and available
evidence. Williams neglected the critical distinction between the
probability of *f*(*R* | *X*) = *r*
conditioned on resemblance:

Prob(f(R|X) =r|S_{0}resemblesX)

and the probability when Premise B, the r.f. of *R*
in *S*_{0}, is added to the condition:

Prob[f(R|X) =r|S_{0}resembles X ∧f(R|S_{0}) =r]

When however the conditions of (3) are expanded to take account of Premise B,

(3*) It is necessarily true that Prob[ f(R|X) =r|S_{0}resemblesX∧f(R|S_{0}) =r] ≈ 1

the result does not follow from the premises; (3*) is true for some values of r and not for others.

As Maher describes this effect: (and as Williams himself had pointed
out (Williams 1947, 89)), “Sample proportions near 0 or 1 increase the
probability that the population is nearly homogeneous
which, *ceteris paribus*, increases the probability that the
sample matches the population; conversely, sample proportions around
1/2 will, *ceteris paribus*, decrease the probability of
matching” (Maher 1996, 426). Thus the addition of Premise B to
the condition of (3) might decrease the probability that
*S*_{0} resembles the population *X*.

Stove's response to this difficulty was to point out, first, that
neither he nor Williams ever claimed that *every* inductive
inference, nor even that every instance of the (i) to (ii) inference,
was necessarily highly probable. Stove, at least, agrees that there
are instances of *r*, *X*, *S* and *R* for
which it is not necessary that *f*(*R* | *X*)
≈ *r* follows from Premises A and B with high
probability. According to Stove however, all that was needed to
establish Williams' thesis was to give one case (of values
for *r*, *X*, *S* and *R*) for which the
inference holds necessarily: This would show that at least one
inductive inference was necessarily rational. And this, he argued,
will follow if we specify values of these parameters for which Premise
B is not negatively relevant to (3). I.e., such that

Prob[ f(R|X) =r|S_{0}resemblesX∧f(R|S_{0}) =r] ≥ Prob[f(R|X) =r|S_{0}resemblesX]

is a necessary truth.

Stove provides a specific instance of Williams' argument in which:

Xis the population of ravens andk= 3020.^{[7]}Premise A* is:

S_{0}is a 3020-sample of ravensPremise B* is:

f(Black |S_{0}) = 0.95

The critical part of the argument is the proof that

C*. Prob[f(Black | Ravens) ≈ 0.95 |S_{0}resembles Ravens ∧f(R|S_{0}) =r] ≥ Prob[f(Black | Ravens) ≈ 0.95 |S_{0}resembles Ravens ]

which Stove accomplishes in detail and concludes that:

It follows necessarily from Premises A* and B* that Prob[f(Black | Ravens)] ≈ 0.95.

It must be remarked that this argument yields an attenuated form of Williams' original claim, quoted above, that sample to population inductive inferences are in general assured: “Given a fair sized sample, then, from any [finite] population, with no further material information, we know logically that it very probably is one of those which [approximately] match the population, and hence that very probably the population has a composition similar to that which we discern in the sample” (Williams 1947, 97). What we have in its place is that we know logically not that all such inferences are necessarily sound with high probability, but, at best, only that one carefully selected inference is, and then only when the probability in question is symmetrical.

Maher went on to argue that Stove's proof was in fact insufficient and that the crucial claim:

Prob[f(Black | Ravens) ≈r∧S_{0}resembles Ravens ∧f(Black |S_{0}) = 0.95] ≥ Prob[f(Black | Ravens) ≈ 0.95 |S_{0}resembles Ravens]

does not follow from premises A* and B*. He claimed that

[I]f some population proportions are more probable than others a priori, then sample proportions close to those more probable proportions will increase the probability of matching [i.e. of resemblance] while sample proportions far from the more probable proportions will decrease the probability of matching. (Maher 1996, 426)

The anticipated response to this from the defenders of Williams and Stove is that a priori probabilities are fixed, as Williams insisted, by the principle of indifference cited above (“cases are equiprobable unless they are known not to be equiprobable”) (Williams 1947, 72) and the classical definition of probability as the ratio of favorable to possible cases. But, as Maher pointed out, the principle of indifference presupposes that the exclusive and exhaustive possible cases are specified and fixed. A collection of propositions can be partitioned in alternative ways yielding different possible cases and in consequence different probabilities. To underscore this relativity Maher gave a plausible partition that yields prior probabilities in Stove's example, such that (assuming Premises A* and B*):

Prob(

S_{0}resembles Ravens) ≈ 1Prob[

f(Black | Ravens] ≈ 0.95 ) ≈ 0Prob[

S_{0}resembles Ravens |f(Black|S_{0}) = 0.95 ] ≈ 0

Hence

Prob[ S_{0}resembles Ravens |f(Black|S_{0}) = 0.95 ] < Prob[S_{0}resembles Ravens]

thus falsifying C* and the conclusion of Stove's argument.

Scott Campbell, in (Campbell 2001) responded to Maher's second
argument with two principal criticisms: Both claim that the low prior
probability of the hypothesis ‘*f*(Black| *X*)
≈ 0.95’ consequent upon applying the principle of
indifference to Maher's partition forces an artificially low posterior
probability even when the evidence supports a higher posterior.

Williams' original argument when expressed in general terms is
simple and seductive: It is a combinatorial fact that the relative
frequency of a trait in a large population is matched by its relative
frequency in most large samples from that population. The proportional
syllogism is a truth of probability theory: in the symmetrical case,
relative frequency equals probability. From these it looks to follow
that it is a necessary truth that it is highly probable that the
frequency of a trait in a given sample from an inclusive population is
close to its frequency in the population. We have seen that in these
terms the consequence does not follow: “[S]ample proportions near 0 or
1 increase the probability that the population is nearly homogeneous
which, *ceteris paribus*, increases the probability that the
sample matches the population; conversely, sample proportions around 1/2
will, *ceteris paribus*, decrease the probability of matching,”
as Maher expresses it. (Maher 1996, 426) Stove proposed a weakened
thesis that for certain select samples and populations this effect is
minimized, and that in these cases the conclusion does follow, thus
partially justifying Williams' claim that certain inductive inferences
necessarily yield their conclusions with high probability.

Maher argued that when prior probabilities are properly taken account
of C in Stove's argument is seen to be false, and Campbell criticized
this argument as depending upon faulty assignments of prior
probabilities. Independently of the outcome of this particular
disagreement, it is plausible that there are at least some examples of
inductions, of instances of *r*, *X*, *S*
and *R*, for which the Williams-Stove thesis is true for which,
that is to say, it is necessary that f(*R* | *X*)
≈ *r* follows from Premises A and B with high
probability. But the Williams-Stove thesis emerges from this dialectic
considerably diluted: It began life as the strong and simple modal
assertion that it is a necessary truth that inductions of a quite
common sort yield their conclusions with high probability. That thesis
is seen to be false. What remains are, at best, certain specific
instances of it.

### 7.2 David Armstrong on states of affairs, laws and induction.

D.M. Armstrong, like Williams and Stove, is a rationalist about induction. There is however a significant difference of emphasis and structure that marks Armstrong's approach off from that of Williams and Stove: The problem of induction was for the latter couple the topic and focus of their work on the question. Armstrong's major project on the other hand has for some three decades been the formulation and development of a theory of universals. (See the entry on properties where Armstrong's theory is discussed.) The problem of induction is treated in a brief paper (Armstrong 1991) and an eight-page section in (Armstrong 1983), which work is itself an application of the theory of universals. Armstrong's account of the problem of induction thus gains depth and richness, first in the light of his thesis that laws of nature are connections of universals, announced and defended in (Armstrong 1983) and secondly because it is a natural application of the elaborate theory of universals and states of affairs in which this thesis is developed. This theory yields a few essential metaphysical principles that underlie much of Armstrong's philosophy of science, including his views on induction, and that are usefully kept in mind:

Naturalism and physicalism:

Everything that exists is a physical entity in space / time.

Factualism:

Everything that exists is either (i) a state of affairs or (ii) a constituent of a state of affairs. These constituents include properties (including relations) and particulars.

Properties are of two sorts:

There are universals and ordinary, or second-class, properties. The difference between them is that second-class properties belong to particulars contingently, while this relation is always necessary in the case of universals.

About one-third of (Armstrong 1983) is devoted to stating and
supporting three criticisms of what Armstrong calls the *regularity
theory* of law. Put very generally, the various forms of the
regularity theory all count laws, if they count them at all, as
contingent generalizations or mere descriptions of the events to which
they apply: “All there is in the world is a vast mosaic of local
matters of fact, just one little thing and then another” as
David Lewis put this view in (Lewis 1986, ix). One sort of regularity
theory holds that laws of nature supervene on Lewis's vast
mosaic. Armstrong argues against all forms of the regularity
theory. Laws, on his view, are necessary connections of universals
that neither depend nor supervene on the course of worldly events but
determine, restrict, and govern those events. The law statement, a
linguistic assertion, must in his view be distinguished from the law
itself. The law itself is not linguistic, it is a state of affairs;
“that state of affairs in the world which makes the law
statement true” (Armstrong 1991, 505). A law of nature is
represented as ‘*N*(*F*, *G*)’
where *F* and *G* are universals and *N*
indicates necessitation: Necessitation is inexplicable, it is “a
primitive, which we are forced to postulate” (Armstrong 1983,
92). That each *F* is a *G*, however, “does not
entail that *F*-ness [the universal *F*] has *N*
to *G*-ness” (Armstrong 1983, 85). That is to say that
the extensional inclusion ‘all *F*s are *G*s
‘ may be an accidental generalization and does not imply a
lawlike connection between *F*s and *G*s. In a
“first formulation” of the theory of laws of nature
(Armstrong 1983, 85), if
*N*(*F*, *G*) is a law, “it entails the
corresponding Humean or cosmic uniformity:
(*x*)(*F**x* ⊃ *G**x*)”.
In later reconsideration, (Armstrong 1983, 149) however, this claim is
withdrawn: *N*(*F*, *G*) does not entail that
all *F*s are *G*s, for some *F*s may be
“interfered with,” preventing the law's power from its
work.

Armstrong's rationalism does not lead him, as it did Williams and
Stove, to see the resolution of the problem of induction as a matter
of demonstrating that induction is necessarily a rational procedure:
“[O]rdinary inductive inference, ordinary inference from the
observed to the unobserved , is, although *invalid*,
nevertheless a rational form of inference. I add that not merely is it
the case that induction is rational, but it is a necessary truth that
it is so” (Armstrong 1983, 52). Armstrong does not argue for
this principle; it is a premise of an argument to the conclusion that
regularity views imply the inevitability of inductive skepticism; the
view, attributed to Hume, that inferences from the observed to the
unobserved are not rational (Armstrong 1983, 52). Armstrong seems to
understand ‘rational’ not in Williams' stronger sense of
entailing deductive proofs, but in the more standard sense of (as the
OED defines it) “Exercising (or able to exercise) one's reason
in a proper manner; having sound judgement; sensible, sane.”
(Williams' “ordinary sagacity,” near enough.)

The problem of induction for Armstrong is to explain why the
rationality of induction is a necessary truth. (Armstrong 1983, 52)
Or, in a later formulation, to lay out “a structure of reasoning
which will more fully reconcile us (the philosophers) to the
rationality of induction” (Armstrong 1991, 505). His resolution
of this problem has two “pillars” or fundamental
principles. One of these is that laws of nature are objective natural
necessities and, in particular, that they are necessary connections of
universals. The second principle is that induction is a species of
inference to the best explanation (IBE). “[T]he core idea is
very simple: observed regularities are best explained by hypotheses of
strong laws of nature [i.e., objective natural necessities],
hypotheses which in turn entail conclusions about the
unobserved” (Armstrong 2001, 503). IBE, as its name suggests, is
an informal and non-metric form of likelihood methods. Gilbert Harman
coined the term in (Harman 1965). See also Harman (1968). Harman
argued that enumerative induction was best viewed as a a form of IBE:
The *explanandum* is a collection of statements asserting that
a number of *F*s are *G*s and the absence of contrary
instances, and the
*explanans*, the best explanation, is the universal
generalization, *all F*s are *G*s. IBE is clearly more
general than simple enumerative induction, can compare and evaluate
competing inductions, and can fill in supportive hypotheses not
themselves instances of enumerative induction. (Armstrong's affinity
for IBE should not lead one to think that he shares other parts of
Harman's views on induction.)

An instantiation of a law is of the form

N(F,G)a's beingF,a's beingG

where *a* is an individual. Such instantiations are states of
affairs in their own right.

As concerns the problem of induction, the need to explain why
inductive inferences are necessarily rational, one part of Armstrong's
resolution of the problem can be seen as a response to the challenge
put sharply by Goodman: Which universal generalizations are supported
by their instances? Armstrong holds that necessary connections of
universals, like *N*(*F*, *G*), are lawlike,
supported by their instances, and, if true, laws of nature. It remains
to show how and why we come to believe these laws. Armstrong's
proposal is that having observed many *F*s that are *G*,
and no contrary instances, IBE should lead us to accept the
law *N*(*F*, *G*). “[T]he argument goes
from the observed constant conjunction of characteristics to the
existence of a strong law, and thence to a testable prediction that
the conjunction will extend to all cases” (Armstrong 1991,
507).

### 7.3 Probabilistic laws of nature

Armstrong's theory is more ambitious than the Williams-Stove account of induction in also including an effort to account for probabilistic laws of nature. These are of the form

(i) (Pr:P)(F,G)

and their instances are of the form

(ii) (Pr:P)(F,G)a's beingF,a's beingG

where (i) “gives the objective probability of an *F*
being *G*, a probability holding in virtue of the
universals *F* and *G*”. Probabilistic laws, says
Armstrong, are probabilities of necessitation, not necessitations of
probabilities. (Armstrong 1983, 128).

Several difficulties come up immediately: If probabilistic laws are to conform to the laws of probability, (ii) should imply

(Pr:1 −P)(F,G) (a's beingF,a's not beingG)

or

(Pr:1 −P)(F, not-G) (a's beingF,a's being not-G)

But this would contradict Armstrong's prohibition of negative states
of affairs. (“Absences and lacks are ontologically
suspect” (Armstrong 1983, 129).) A further problem is that
non-probabilistic laws, of the form *N*(*F*, *G*)
are necessitation relations (themselves states of affairs) holding
between states of affairs, *a*'s being *F*
and *a*'s being *G*. Since (i) is probabilistic, it
looks that it may hold for given *a* that is *F* even
when *a* is not *G*. The relation (ii) would then be
incomplete, lacking its second term. Armstrong's response to this
problem is *The Principle of Instantiation*: the requirement
that laws (*Pr*:*P*)(*F*, *G*) be
instantiated only by individuals that are both *F*
and *G*.

But as Bas van Fraassen pointed out with detailed examples in “Armstrong on Laws and Probabilities” (1987), since this requirement prohibits additivity, it blocks application of the laws of probability. This led Armstrong to require that probabilistic laws be instantiated only when their range is infinite. In the finite case they are to be considered counterfactuals.

*The Principle of Instantiation* has also the consequence that
the ties of probability to frequency are broken. The laws of large
numbers and the classical limit theorems are hence apparently
inapplicable to Armstrong's probabilistic laws, and one is left to
wonder why these laws should be called probabilistic. (See Slowik 2005
for a defense of Armstrong's theory against van Fraassen's
criticisms.)

## 8. Justification and Support of Induction

Hume's argument, famous in its generalized version, was for him a lemma in his positive account of induction. That account makes of induction a habit of the imagining mind: Previous impressions of a cause followed by impressions of its effect form a habit which calls up the idea of the effect upon a new impression of the cause. Hume even gives the details of probabilistic reasoning founded on this same simple model.

We remarked that Hume himself qualifies the bare statement of this
theory. Wise men, he says, review their inferences and reflect upon
their reliability. This review may lead one to correct reasoning in
view of past errors: Noting that I've persistently misestimated the
chances of rain, I may revise my forecast for tomorrow. The process
is properly speaking not circular but regressive or hierarchical; a
meteorological induction is reviewed by an induction not about
meteorology but about inductions, Notice also that the revision of a
forecast of rain may strengthen or reduce belief in rain, but may
also, to put it in modern terms, increase dispersion: What was a
pointed forecast of 2/3 becomes a less precise belief interval, from
about (say) 1/2 to 3/4. This uncertainty will propagate up the
hierarchy of inductions: Reflection leads me to be less certain about
my reasoning about weather forecasts. Continuing the process must, in
Hume's elegant phrase, “weaken still further our first evidence, and
must itself be weaken'd by a fourth doubt of the same kind, and so
on *in infinitum*.” How is it then that our cognitive faculties
are not totally paralyzed? How do we “retain a degree of belief, which
is sufficient for our purpose, either in philosophy or in common
life” (Hume THN, 182, 185). How do we ever arrive at beliefs about
the weather, not to speak of the laws of physics?

### 8.1 General rules and higher-order inductions

Hume's resolution of this puzzle is in terms of *general
rules*, rules for judging (Hume THN, 150). These are of two
sorts. Rules of the first sort lead to singular predictive inferences
when triggered by the experience of successive instances. These when
unchecked may tempt us to wider and more varied predictions than the
evidence supports (to grue-type inferences, for example). Rules of the
second sort are corrective, these lead us to correct and limit the
application of rules of the first sort on the basis of evidence of
their unreliability. It is only by following general rules, says Hume,
that we can correct their errors. (See Bates 2005 for a discussion of
this process.)

Recall that Reichenbach gave an account of higher order or, as he
called them, *concatenated*, probabilities in terms of arrays or
matrices. The second-order probability

P{[P(C|B) =p] |A} =q

is defined as the limit of a sequence of first order probabilities.
This gives a way in a Reichenbachean framework of inductively
evaluating inductions in a given class or sort. Reichenbach refers to
this as the *self-corrective method*, and he cites Peirce,
“who mentioned ‘the constant tendency of induction to
correct itself,’” as a predecessor (Reichenbach TOP,
446n, Peirce 1935, Volume II, 456). Peirce consistently thinks this way:
“Given a certain state of things, required to know what
proportion of all synthetic inferences relating to it will be true
within a given degree of approximation” (Peirce 1935, 184).
Ramsey cites Mill approvingly for “his way of treating the
subject as a body of inductions about inductions” (Ramsey 1931,
198). See, e.g. (Mill 2002, 209). “This is a kind of
pragmatism:” Ramsey writes, “we judge mental habits by
whether they work, i.e., whether the opinions they lead to are for the
most part true” (Ramsey 1931, 197–198). Hume went so far
as to give a set of eight “Rules by which to judge of causes and
effects” (Hume THN, I.III.15), obvious predecessors of Mill's
canons.

### 8.2 Assessing the reliability of inductive inferences: calibration

These considerations suggest deemphasizing the question of justification—show that inductive arguments lead from truths to truths—in favor of exploring methods to assess the reliability of specific inferences. How is this to be done? If after observing repeated trials of a phenomenon we predict success of the next trial with a probability of 2/3, how is this prediction to be counted as right or wrong? The trial will either be a success or not; it can't be two-thirds successful. The approach favored by the thinkers mentioned above is to evaluate not individual inferences or beliefs, but habits of forming such beliefs or making such inferences.

One method for checking on probabilistic inferences can be illustrated
in probabilistic weather predictions. Consider a weather forecaster
who issues daily probabilistic forecasts for the following day. For simplicity of
illustration suppose that only predictions of rain are in question,
and that there are just a few distinct probabilities (e.g., 0, 1/10,
…, 9/10, 1). We say that the forecaster is *perfectly
calibrated* if for each probability *p*, the relative
frequency of rainy days following a forecast of rain with probability
*p* is just *p*, and that calibration is better as these
relative frequencies approach the corresponding probabilities. Without
going into the details of the calculation, the rationale for
calibration is clear: For each probability *p* we treat the
days following a forecast of probability *p* as so many
Bernoulli trials with probability *p* of success. The
difference between the binomial quotient and *p* then measures
the goodness of calibration; the smaller the difference the better the
calibration.

This account of calibration has an obvious flaw: A forecaster who
knows that the relative frequency of rainy days overall is *p*
can issue a forecast of rain with probability *p* every day. He
will then be perfectly calibrated with very little effort, though his
forecasts are not very informative. The standard way to improve this
method of calibration was designed by Glenn Brier in (Brier 1950). In
addition to calibrating probabilities with relative frequencies it
weights favorably forecast probabilities that are closer to zero and
one. The method can be illustrated in the case of forecasts with two
possible outcomes, rain or not. If there are *n* forecasts, let
*p*_{i} be the forecast probability of rain on
trial *i*, *q*_{i} = (1 −
*p*_{i}), 1 ≤ *i* ≤ *n*,
and let *E*_{i} be a random variable which is
one if outcome *i* is rain and zero otherwise. Then the
*Brier Score* for the *n* forecasts is

B= (1/n)∑_{i}(p_{i}−E_{i})^{2}(q_{i}−E_{i})^{2}

Low Brier scores indicate good forecasting: The minimum is reached
when the forecasts are all either zero or one and all correct, then
*B* = 0. The maximum is when the forecasts are all either zero
or 1 and all in error, then *B* = 1. More recently the method
has been ramified and applied to subjective probabilities in general.
See (van Fraassen 1983).

### 8.3 Induction and deduction

If the inductive support of induction need not be simply circular, the deductive support of induction is also seen upon closer examination not to be as easily dismissed as the Humean dilemma might make it seem. The laws of large numbers are the foundations of inductive inference relating frequencies and probabilities. These laws are mathematical consequences of the laws of probability and hence necessary truths. Of course the application of these laws in any given empirical situation will require contingent assumptions, but the inductive part of the reasoning certainly depends upon the deductively established laws.

#### 8.3.1 The Humean dilemma revisited

As concerns deductive justifications, proofs of the reliability of inductions, there are a number of these: The Williams-Stove approach should, its flaws repaired, prove deductively that certain sample-to-population inferences are reliable with high probability; the Neyman-Pearson lemma establishes this for the inductive comparison of conflicting statistical hypotheses; and formal learning theory provides the means to prove deductively the superiority of specific research methods. De Finetti's representation theorem is a deductively established truth that justifies probabilistic predictions on the basis of frequency evidence, Reichenbach's theory provides deductive assurance of the reliability of certain inductive rules, and we have seen a simple case of Carnap's proof of the proportional syllogism.

These results raise an obvious question: Where does Hume's simple dilemma argument go wrong?

The answer has two cooperating parts: There is first the insufficiency of the logic that Hume had at hand. This was based in an algebra of ideas, structured by relations of overlap, exclusion and inclusion. Logical entailment was just the inclusion of the conclusion idea in that of the premise, in which case the premise was unthinkable in the absence of the conclusion. “By knowledge, ”writes Hume, “I mean the assurance arising from the comparison of ideas” (Hume THN, 124). This foundation supports only a very weak logic. Secondly, probabilistic inference was for Hume a function of the imagination and, as such, was always “attended by uncertainty” (Hume THN, 124). That probability is determined by a simple set of laws and may easily be considered an extension of deductive logic, either of the object language, as by Reichenbach, or of the metalanguage, as by Carnap, is well beyond the reach of this scheme. Thus the sort of reasoning practiced by modern probabilists awaited the birth of modern logic and the axiomatization of probability.

#### 8.3.2 The metaphysical and epistemological problems revisited.

If the problem of induction is to distinguish good from bad inductions, then its metaphysical form—say in what the difference consists—seems insoluble. Nor does there seem to be any general solution to the epistemological form of the problem—find a method for distinguishing good or reliable inductive habits from bad or unreliable habits. But modest efforts to solve special cases of the epistemological problem, some of which are discussed above, have in many cases enjoyed dramatic success.

### 8.4 Why trust induction? The question revisited

We can now return to the general question posed in section 1: Why trust induction more than other methods? Why not consult sacred writings, or “the wisdom of crowds” to explain and predict the movements of the planets, the weather, automotive breakdowns or the evolution of species?

#### 8.4.1 The wisdom of crowds

The wisdom of crowds can appear to be an alternative to induction. James Surowiecki argued, in the book of this title (Surowiecki, 2004) with many interesting examples that groups often make better decisions than even informed individuals. It is important to emphasize that the model requires independence of the individual decisions and also a sort of diversity to assure that different sources of information are at work, so it is to be sharply distinguished from judging the mass opinion of a group that shares information and reaches a consensus in discussion. The obvious method suggested by Surowiecki's thesis is to consult polls or predictions markets rather than to experiment or sample on one's own. (See, for example, the link to prediction markets in the Other Internet Resources section of this entry.)

A precise justification for trusting the wisdom of crowds is provided
by the *Condorcet Jury Theorem*. (See section 5.6.2 above) As
the theorem makes evident,the wisdom of crowds is not to be contrasted
with inductive reasoning, indeed it depends upon the inductive
principle expressed in the Condorcet theorem to amalgamate correctly
the individual testimonies as well as upon the diversity of individual
reasonings. What is valuable in the method is the diversity of ways of
forming beliefs. This amounts to a form of the requirement of total
evidence, briefly discussed in section 3.3 above.

The wisdom of crowds can be seen as a primitive prolegomenon to social epistemology. Social epistemology studies methods - stronger and more sophisticated than those supported by the Condorcet theorem - by which groups may conduct inquiry, including scientific inquiry. It pretends not at all to replace induction, but to extend and enrich it. (See Goldman 1999 and the entry on social epistemology.)

As with Reichenbach's account of single-case probabilities, the wisdom of crowds depends essentially upon testimony.

#### 8.4.2 Creationism and Intelligent Design

The wisdom of crowds thus depends upon good inductive reasoning. The use of sacred writings or other authorities to support judgments about worldly matters is, however, another matter. Christian creationism, a collection of views according to which the biblical myth of creation, primarily as found in the early chapters of the book of Genesis, explains, either in literal detail or in metaphorical language, the origins of life and the universe, is perhaps the most popular alternative to accepted physical theory and the Darwinian account of life forms in terms of natural selection. (See Ruse 2005 and the entry on creationism). Christian creationism, nurtured and propagated for the most part in the United States, contradicts inductively supported scientific theories, and depends not at all upon any recognizable inductive argument. Many of us find it difficult to take the view seriously, but, according to a 2005 poll by CBS news, 51% of Americans hold that God created Humans in their present form; 30% hold that humans evolved but God guided the process and 15% believe that humans evolved and God did not guide the process. (Alfano, 1955)

The apparent absurdity of Creationism has led some opponents of
evolutionism and the doctrine of natural selection to eschew biblical
forms of the view and to formulate a weaker thesis, known as *the
theory of intelligent design* (Behe 1996, Dembski
1998). Intelligent design cites largely unquestioned evidence of two
sorts: The delicate balance (that even a minute change in any of many
physical constants would tip the physical universe into disequilibrium
and chaotic collapse) and the complexity of life (that life forms on
earth are very complex). The primary thesis of intelligent design is
that the hypothesis of a designing intelligence explains these
phenomena better than do current physical theories and Darwinian
natural selection.

Intelligent design is thus not opposed to induction. Indeed its central argument is frankly inductive, a claim about likelihoods:

P(balance and complexity | intelligent design) >P(balance and complexity | current physics and biology)

Creationism and its offspring Intelligent Design have also an Islamic form, expressed in (Yahya 2007) also available online in eleven languages. The doctrines are gaining wide currency among Muslims in Europe who demand that they be taught in the schools.

There are a number of difficulties with the theory of intelligent design; these are explained in detail by Elliott Sober in (Sober 2002). (This article also includes an excellent primer on the sorts of probabilistic inference involved in the likelihood claim. See also the article on creationism.) Briefly put, there are problems of two sorts, both clearly put in Sober's article: First, intelligent design theorists “don't take even the first steps towards formulating an alternative theory of their own that confers probabilities on what we observe” as the likelihood principle would require (75). Second, the intelligent design argument depends upon a probabilistic fallacy. The biological argument, to restrict consideration to that, infers from

1. Prob(organisms are very complex | evolutionary theory) = low.

2. Organisms are very complex.

to

3. Prob(evolutionary theory) = low.

To see the fallacy, compare this with

1. Prob(double zero | the roulette wheel is fair) = low.

2. Double zero occurred.

Therefore,

3. Prob(the wheel is fair) = low.

What is to be emphasized here, however, is not the fallaciousness of the arguments adduced in favor of intelligent design. It is that intelligent design, far from presenting an alternative to induction, presumes certain important inductive principles.

#### 8.4.3 Induction and testimony

Belief based on testimony, from the viewpoint of the present article, is not a form of induction. A testimonial inference has typically the form:

An agentAasserts thatX.

Ais reliable.

Therefore,X.

Or, in a more general probabilistic form:

1. An agent Aasserts thatX.2. For any proposition XPr(X|Aasserts thatX) =p.Therefore, 3. Pr(X) =p.

In an alternative form the asserted content is quoted directly.

What is characteristic and critical in inference based on testimony is
the inference from a premise in which the conclusion is expressed
indirectly, in the context of the agent's assertion (*A*
asserts that *X*), to a conclusion in which that content occurs
directly, not mediated by language or mind (*X*). It is also
important that testimony is always the testimony of some agent or
agents. And testimonial inference is not causal; testimony is neither
cause not effect of what is testified to. This is not to say that
testimonial inference is less reliable than induction; only that it is
different. (See Goldman 1999, chapter 4 for a thorough treatment of
the reliability of testimony.)

Although testimonial inference may not be inductive, induction would be all but paralyzed were it not nourished by the testimony of authorities, witnesses, and sources. We hold that causal links between tobacco and cancer are well established by good inductive inferences, but the manifold data come to us through the testimony of epidemiological reports and, of course, texts that report the establishment of biological laws. Kepler's use of Tycho's planetary observations is a famous instance of induction based on testimony. Reichenbach's frequentist account of single-case probabilities as well as the wisdom of crowds require testimonial inference as input for their amalgamating inductions. And actuaries, those virtuosi of inductivism, depend entirely upon reports of data to base their conclusions. Of course inductive inferences from testified or reported data are no more reliable than the data.

### 8.5 Learning to love induction

There are really two questions here: Why trust specific inductive inferences? and Why trust induction as a general method? The response to the first question is: Trust specific inductions only to the extent that they are inductively supported or calibrated by higher-order inductions. It is a great virtue of Ramsey's counsel to treat “the subject as a body of inductions about inductions” that it opens the way to this. As concerns trust in induction as a general method of forming and connecting beliefs, induction is not all that easy to avoid; the wisdom of crowds and Intelligent Design seem superficially to be alternatives to induction, but both turn out upon closer examination to be inductive. Induction is, after all, founded on the expectation that characteristics of our experience will persist in experience to come, and that is a basic trait of human nature. “Nature”, writes Hume, “by an absolute and uncontroulable necessity has determin'd us to judge as well as to breathe and feel” (Hume THN, 183). “We are all convinced by inductive arguments”, says Ramsey, “and our conviction is reasonable because the world is so constituted that inductive arguments lead on the whole to true opinions. We are not, therefore, able to help trusting induction, nor, if we could help it do we see any reason why we should” (Ramsey 1931, 197). We can, however, trust selectively and reflectively; we can winnow out the ephemera of experience to find what is fundamental and enduring.

The great advantage of induction is not that it can be justified or validated, as can deduction, but that it can, with care and some luck, correct itself, as other methods do not.

### 8.6 Naturalized and evolutionary epistemology

“Our reason”, writes Hume, “must be consider'd as a kind of cause, of which truth is the natural effect; but such-a-one as by the irruption of other causes, and by the inconstancy of our mental powers, may frequently be prevented” (Hume THN, 180).

Perhaps the most robust contemporary approaches to the question of inductive soundness are naturalized epistemology and its variety evolutionary epistemology. These look at inductive reasoning as a natural process, the product, from the point of view of the latter, of evolutionary forces. An important division within naturalized epistemology exists between those who hold that there is little or no role in the study of induction for normative principles; that a distinction between correct and incorrect inductive methods has no more relevance than an analogous distinction between correct and incorrect species of mushroom; and those for whom epistemology should not only describe and categorize inductive methods but also must evaluate them with respect to their success or correctness.

The encyclopedia entries on these topics provide a comprehensive introduction to them.

## Bibliography

- Achinstein, Peter, 1963. “Variety and Analogy in Confirmation
Theory,”
*Philosophy of Science*, 30: 207–221. - Adams, Ernest, 1965. “A Logic of
Conditionals,”
*Inquiry*, 8: 166–97. - –––, 1975.
*The Logic of Conditionals*, Dordrecht: Reidel. - Alfano, Sean, 2005. “Poll: Majority Reject
Evolution,“
*CBS News*. Oct. 23, 2005. [Available online] - Ambrose, Alice, 1947. “The Problem of Justifying Inductive
Inference,”
*Journal of Philosophy*, 44: 253–271 - Armstrong, D.M., 1978.
*Universals and Scientific Realism: Vol. 1. Nominalism vs. Realism*, Cambridge: Cambridge University Press. - –––, 1978.
*Universals and Scientific Realism: Vol. 2. A Theory of Universals*, Cambridge: Cambridge University Press. - –––, 1983.
*What is a Law of Nature*, Cambridge: Cambridge University Press. - –––, 1989.
*A Combinatorial Theory of Possibility*, Cambridge: Cambridge University Press. - –––, 1991.“What Makes Induction Rational?,”
*Dialogue*, 30: 503–11. - –––, 1997.
*A World of States of Affairs*, Cambridge: Cambridge University Press. - –––, 1998. “Reply to van Fraassen,”
*Australasian Journal of Philosophy*, 66/2: 224–229. - Ayer, A.J., 1972.
*Probability and Evidence*, New York: Columbia University Press. - Bates, Jared, 2005. “The old problem of induction and the
new reflective equilibrium,”
*Dialectica*, 59/3: 347–356. - Bernoulli, Jacques, 1713,
*Ars Conjectandi*, Basel: Impensis Thurnisiorum, English translation,*The Art of Conjecturing*, Edith D. Sylla (trans.), Baltimore: Johns Hopkins Press, 2006. - Black, Duncan, 1963.
*The Theory of Committees and Elections*, Cambridge: Cambridge University Press. - Brown, M.B., 1987. Review of (Stove 1986),
*History and Philosophy of Logic*, 8: 116–120. - Campbell, Scott and James Franklin, 2004. “Randomness and the
Justification of Induction,”
*Synthese*, 138/1: 79–99. - Campbell, Scott, 2001. “Fixing a Hole in the Ground of
Induction,”
*Australasian Journal of Philosophy*, 79/4: 553–563 - Carnap, Rudolf, 1952.
*The Continuum of Inductive Methods*, Chicago: The University of Chicago Press. - –––, [LFP],
*Logical Foundations of Probability*, second edition, Chicago: The University of Chicago Press, 1962. (First published 1950.) - Cesa-Bianchi, Nicolo and Gabor Lugosi, 2006.
*Prediction, Learning and Games*, Cambridge: Cambridge University Press. - Daly, Chris, 1998 “Review of
*A World of States of Affairs*, by D. M. Armstrong,”*Australasian Journal of Philosophy*, 76/4: 640–642. - Dretske, F., 1977, “Laws of Nature,”
*Philosophy of Science*, 44: 248–268. - Finetti, Bruno de, 1937. “La prevision: ses lois logiques,
ses sources subjective”.
*Annales de l'Institut Henri Poincare*, 7: 1–68. - –––, [FLL] 1964. “Foresight: Its Logical
Laws, Its Subjective Sources”. A translation by Henry Kyburg of
(Finetti 1937), in
*Studies in Subjective Probability*, Henry Kyburg and Howard Smokler (eds.), New York: John Wiley and Sons, 1964. - –––, [TOP].
*Theory of Probability*in two volumes, New York: John Wiley and Sons, 1974. A translation by Antonio Machi and Adrian Smith of*Teoria delle Probabilita*, 1970, Einaudi. - Fitelson, B. and J. Hawthorne, 2010. “How Bayesian
Confirmation Theory Handles the Paradox of the Ravens,” in
E. Eels and J. Fetzer (eds.),
*The Place of Probability in Science*, Chicago: Open Court. - Franklin, James, 2001. “Resurrecting Logical Probability,”
*Erkenntnis*, 55: 277–305. - Friedman, Michael and Richard Creath (eds.), 2007.
*The Cambridge Companion to Carnap*, Cambridge: Cambridge University Press. - George, A., 2007, ‘A Proof of Induction?’,
*Philosopher's Imprint*, 7/2 (March), URL = <http://quod.lib.umich.edu/cgi/p/pod/dod-idx?c=phimp;idno=3521354.0007.002>. - Giaquinto, M., 1987, “Review of
*The Rationality of Induction*by D.C. Stove,”*Philosophy of Science*, 54/4: 612–615. - Goldman, Alvin L., 1999.
*Knowledge in a Social World*, Oxford: Oxford University Press. - Goodman, Nelson, 1955.
*Fact, Fiction, & Forecast*, Cambridge, MA: Harvard University Press. - Gower, Barry, 1990. “Stove on inductive scepticism,”
*Australasian Journal of Philosophy*, 68/1: 109–112. - Hajek, Alan, 2003. “What Conditional Probability Could not Be,”
*Synthese*, 137/3: 273–323. - Harman, Gilbert, 1965. “The Inference to the Best Explanation,”
*The Philosophical Review*, 74/1: 88–95. - –––, 1968. “Enumerative Induction
as Inference to the Best Explanation,”
*The Journal of Philosophy*, 65/18: 529–522. - Helman David H. (ed.), 1988.
*Analogical Reasoning: Perspectives of Artificial Intelligence, Cognitive Science, and Philosophy*, Dordrecht: Kluwer. - Hochberg, Herbert, 1999. “D.M. Armstrong,
*A World of States of Affairs*,”*Noûs*, 33/3: 473–495. - Hume, David. [THN] 1888.
*Hume's Treatise of Human Nature*, edited by L. A. Selby Bigge, Oxford, Clarendon Press. Originally published 1739–40. - –––, [EHU] 1975.
*Enquiries concerning Human Understanding and concerning the Principles of Morals*, reprinted from the posthumous edition of 1777 and edited with introduction, comparative table of contents, and analytical index by L. A. Selby Bigge, MA. Third edition with text revised and notes by P. H. Nidditch. Oxford, Clarendon Press. - Indurkhya, Bipin, 1990. “Some Remarks on the Rationality of
Induction,”
*Synthese*, 85/1: 95–114. - Irzik, Gurol, 1991. “Armstrong's Account of Probabilistic Laws,”
*Analysis*, 51/4: 214–217. - Jeffrey, Richard, [LOD] 1983.
*The Logic of Decision*, second edition. Chicago: The University of Chicago Press. Originally published 1965. - Johnson, W. E., 1921–1924.
*Logic: in three volumes*, Cambridge: Cambridge University Press. Reprinted unchanged by Dover Publications in 1964. - Kelly, Kevin, and Oliver Schulte, 1997. “Church's Thesis and
Hume's Problem,”
*Logic and Scientific Methods*, M. L. Della Chiara*et al*. (eds.), Dordrecht: Kluwer, pp. 383–398. - ––– and Clark Glymour, 2004. “Why
Probability Does not Capture the Logic of Scientific
Justification,”
*Contemporary Debates in the Philosophy of Science*, Christopher Hitchcock (ed.), Oxford: Blackwell. - Kolmogorov, A. N. [FTP].
*Foundations of the Theory of Probability*, 1956. A translation and revision by Nathan Morrison of*Grundbegriffe der Wahrscheinlichskeitrechnung*,*Ergebnisse Der Mathematik*, 1933. Cambridge: Cambridge University Press. Reprinted unchanged by Dover Publications in 1964. - Kyburg, Henry, 1956. “The Justification of Induction,”
*Journal of Philosophy*, 54/12: 394–400. - –––, 1974.
*Logical Foundations of Statistical Inference*, Dordrecht: D. Reidel. - Loeb, Louis E., 2006. “Psychology, epistemology, and
skepticism in Hume's argument about induction,”
*Synthese*, 152: 321–338. - Maher, Patrick, 1996. “The Hole in the Ground of Induction,”
*Australasian Journal of Philosophy*, 74/3: 423–432. - –––, 1999. “Inductive Logic and the Ravens
Paradox,”
*Philosophy of Science*, 66/1: 50–70 - –––, “The Concept of Inductive Probability,”
*Erkenntnis*, 65 (2006) 185 – 206. - Mayberry, Thomas C., 1968. “Donald Williams on Induction,”
*Journal of Thought*, 3: 204–211. - Mayo, Deborah G., 1966.
*Error and the Growth of Experimental Knowledge*, Chicago: The University of Chicago Press. - Miller, Dickinson S., 1947. “Professor Donald Williams
versus Hume,”
*The Journal of Philosophy*, 44/25: 673–684. - Nagel, Ernest, 1947. “Review of
*The Ground of Induction*by Donald Williams,”*The Journal of Philosophy*, 44/25: 685–693. - Nix, C.J. and B. Paris, 2007. “A Note on Binary Inductive Logic,”
*Journal of Philosophical Logic*, 36/6: 735–771. - Okasha, Samir, 2005. “Does Hume's argument against induction
rest on a quantifier–shift fallacy? ”
*Proceedings of the Aristotelean Society*, 105: 237–255. - Oliver, Alex, 1998 “Review of
*A World of States of Affairs*by D.M. Armstrong,”*The Journal of Philosophy*, 95/10: 535–540. - Peano, Giuseppe. [SWP], 1973.
*Selected Works of Giuseppe Peano*, translated and edited by Hubert C. Kennedy, Toronto: University of Toronto Press. - Popper, Karl R. [LSD] 1959.
*The Logic of Scientific Discovery*, New York: Basic Books. A translation by the author with the assistance of Julius Freed and Jan Freed of*Der Logik der Forschung*, Vienna: J. Springer, 1935. - Ramsey, Frank Plumpton. [TAP] “Truth and Probability,” in Ramsey [FOM].
- –––, [FOM].
*The Foundations of Mathematics and Other Logical Essays*, R.B. Braithwaite (ed.), London, Routledge and Kegan Pau, 1931. - Reichenbach, Hans [TOP]
*The Theory of Probability*, Berkeley: University of California Press, 1971. A translation by Ernest R. Hutton and Maria Reichenbach of*Wahrscheinlichkeitslehre. Eine Untersuchung uber die logischen und mathematischen Grundlagen der Wahrscheinlichskeitrechnung*, Leiden, 1935. Revised by the author. - –––, 1938.
*Experience and Prediction*. Chicago: University of Chicago Press, Phoenix edition 1968. - Rowan, Michael, 1993. “Stove on the Rationality of Induction
and the Uniformity Thesis,”
*The British Journal for the Philosophy of Science*, 44/3: 561–566. - Savage, Leonard J., 1954.
*The Foundations of Statistics*. New York: John Wiley & Sons. - Schulte, Oliver, 1999. “Means–Ends
Epistemology,”
*British Journal for the Philosophy of Science*, 50/1: 1–31. - Schurz, Gerhard, 2008. “The Meta-inductivist's Winning
Strategy in the Prediction Game. A New Approach to Hume's
Problem,”
*Philosophy of Science*, 75: 278–305. - Slowik, Edward, 2005. “Natural laws, universals and the
induction problem,”
*Philosophia*, 32/1–4: 241–251. - Sober, Elliot, 2002. “Intelligent Design and Probability
Reasoning.”
*International Journal for the Philosophy of Religion*, 52: 65–80. - Spohn, Wolfgang, 2005. “Enumerative induction and
lawlikeness,”
*Philosophy of Science*, 72/1: 164–187. - Stove, D.C., 1986
*The Rationality of Induction*, Oxford and New York: Oxford University Press. Reprinted 2001. - Suppes, Patrick, 1998. “Review of Kevin Kelly,
*The Logic of Reliable Inquiry*”,*British Journal for the Philosophy of Science*, 49: 351–354. - Surowiecki, James. 2004.
*The wisdom of crowds: why the many are smarter than the few and how collective wisdom shapes business, economies, societies, and nations*, New York: Doubleday. - Tooley, M., 1987.
*Causation*, Oxford: Clarendon Press. - –––, 1977, “The Nature of
Laws”
*Canadian Journal of Philosophy*, 7: 667–698. - van Fraassen, Bas, 1987. “Armstrong on Laws and
Probabilities,”
*Australasian Journal of Philosophy*, 65/3: 243–260. - Vickers, John M., 1988.
*Chance and Structure: An Essay on the Logical Foundations of Probability*, Oxford: Clarendon Press. - White, F.C., 1988. “Armstrong, Rationality and Induction,”
*Australasian Journal of Philosophy*, 66/4: 533–537. - Williams, Donald, 1947.
*The Ground of Induction*, Cambridge, MA, Harvard University Press. Reissued, New York: Russell and Russell Inc., 1963. - –––, 1946. “The Problem of Probability,”
*Philosophy and Phenomenological Research*, 6/4: 619–622. - –––, 1945a. “On the Derivation of Probabilities
from Frequencies,”
*Philosophy and Phenomenological Research*, 5/4: 449–484. - –––, 1945b, “The Challenging Situation in the
Philosophy of Probability,”
*Philosophy and Phenomenological Research*, 6/1: 67–86. - Yahya, Harun, 2007.
*Atlas of Creation*, Hackensack, NJ: Global Publishing Company. - Zabell, S.I., 2005.
*Symmetry and Its Discontents*, Cambridge: Cambridge University Press. - –––, 2007. “Carnap on probabiity and induction,” in Friedman and Creath (eds.) 2007.

## Other Internet Resources

- Teaching Theory of Knowledge: Probability and Induction, organization of topics and bibliography by Brad Armendt (Arizona State University) and Martin Curd (Purdue).
- Forecasting Principles, A brief survey of prediction markets.

## Related Entries

actualism | Bayes' Theorem | Carnap, Rudolf | conditionals | confirmation | epistemology: Bayesian | epistemology: evolutionary | epistemology: naturalized | epistemology: social | fictionalism: modal | Frege, Gottlob: logic, theorem, and foundations for arithmetic | Goodman, Nelson | Hempel, Carl | Hume, David | induction: new problem of | logic: inductive | logic: non-monotonic | memory | Mill, John Stuart | perception: epistemological problems of | Popper, Karl | probability, interpretations of | Ramsey, Frank | Reichenbach, Hans | testimony: epistemological problems of | Vienna Circle

### Acknowledgments

Thanks to Kevin Kelly, Gerhard Schurz and Patrick Maher for helpful comments on sections 6.3.1, 6.3.2 and 7.1 respectively.