Thursday, 30 May 2013

Accuracy-based argument for Conditionalization

In two previous posts (here and here), I described Jim Joyce's accuracy-based argument in favour of Probabilism, the norm that says that an agent's credence function ought to be a probability function.  As noted there, the argument has the following structure:
  • First, Joyce claims that the cognitive value of a credence function at a possible world is given by its accuracy at that world, where this is measured by a particular mathematical function, such as the Brier score.
  • Second, he states a norm of decision theory:  in this case it is Dominance, which says (roughly) that an option is irrational if there is another that is guaranteed to be better than it.  
  • Finally, he proves a mathematical theorem, which shows that an epistemic norm -- in this case, Probabilism -- follows from his account of cognitive value together with the decision-theoretic norm. 
In this post, I will describe an accuracy-based argument in favour of Conditionalization, the norm that says that an agent who learns a proposition with certainty ought to update her credences by conditionalizing on that proposition.  This argument shares Joyce's first premise concerning cognitive value.  But it adopts a different decision-theoretic norm:  Joyce's argument employs Dominance; this argument employs Maximize Subjective Expected Utility.  The argument is originally sketched in (Oddie, 1997); it is presented more fully in (Greaves and Wallace, 2006); and the mathematical theorem on which it turns is due originally (as far as I know) to (Brown, 1976).

Throughout this post, we assume that the credence functions we are discussing are probability functions.

What is Conditionalization?


Our first job is to say precisely what Conditionalization is.  For the purpose of this post it is as follows:

Definition 1  An updating rule is a function that takes a credence function $c$, a partition $\mathcal{E}$, and an element $E$ of $\mathcal{E}$ to a credence function $c_{\mathbf{R}(c, \mathcal{E}, E)}$.

Intuitively, of course, $c_{\mathbf{R}(c, \mathcal{E}, E)}$ is the credence function that the updating rule demands of an agent with initial credence function $c$, who knows their evidence will come from $\mathcal{E}$, and who then receives evidence $E$.

Definition 2  Let $\mathbf{Cond}$ be the following updating rule:
\[
c_{\mathbf{Cond}(c, \mathcal{E}, E)}(X) = c(X | E)
\]
 
Conditionalization  An agent ought to plan to update in accordance with the updating rule $\mathbf{Cond}$.

Brown's pragmatic argument for Conditionalization


Before we turn to the accuracy-based argument for Conditionalization, we present Peter M. Brown's pragmatic argument for that norm.  The accuracy-based argument essentially adapts this argument.

Brown's argument is based on the following result:  Suppose the present time is $t$.  Our agent knows that, by the later time $t'$, she will learn with certainty a proposition from the partition $\mathcal{E}$.  At time $t$, she must choose how she will update her credences in the light of whatever new evidence she obtains.  That is, at $t$ she must choose the updating rule she will use to set her credences at $t'$.  Suppose that one of the available updating rules is $\mathbf{Cond}$.  And suppose further that she knows that, at time $t'$, she will have to choose an action from a set of alternative actions; and she knows she will choose an action that has maximal subjective expected utility when that is calculated relative to her credences at $t'$.  Then, relative to her current credences at $t$, she will always expect herself to do as well if she conditionalizes as if she chooses any of the alternative updating rules; and there will be sets of actions from amongst which she has to choose such that she will expect herself to do better in this choice if she conditionalizes than if she updates using an alternative rule.

More formally:
  • $c$ is the agent's credence function at $t$.
  • $\mathcal{W}$ is the set of possible worlds or states of the world.
  • $\mathcal{A}$ is the set of alternative actions from which the agent must choose at $t'$.
  • $U(a, w)$ is the utility of $a$ at $w$, where $a$ is an action in $\mathcal{A}$ and $w$ is a world in $\mathcal{W}$.
  • $E_w$ is the element of $\mathcal{E}$ that is true at world $w$.
Theorem 1 (Brown)  Suppose $\mathbf{R}$ is an updating rule.  For each $w$ in $\mathcal{W}$:
  • Let $a^*_w$ be an action with maximal subjective expected utility relative to $c_{\mathbf{Cond}(c, \mathcal{E}, E_w)}$.
  • Let $a_w$ be an action with maximal subjective expected utility relative to $c_{\mathbf{R}(c, \mathcal{E}, E_w)}$.
Then:
\[
\sum_{w \in \mathcal{W}} c(w)U(a^*_w, w) \geq \sum_{w \in \mathcal{W}} c(w) U(a_w, w)
\]
Moreover, suppose that, for some $w$ in $\mathcal{W}$, $a^*_w$ is the only action with maximal subjective expected utility relative to $c_{\mathbf{Cond}(c, \mathcal{E}, E_w)}$.  Then:
\[
\sum_{w \in \mathcal{W}} c(w)U(a^*_w, w) > \sum_{w \in \mathcal{W}} c(w) U(a_w, w)
\]
Proof. By the definition of $a^*_w$, we have
\[
\sum_{w \in \mathcal{W}} c_{\mathbf{Cond}(c, \mathcal{E}, E)}(w) U(a^*_w, w) \geq \sum_{w \in \mathcal{W}} c_{\mathbf{Cond}(c, \mathcal{E}, E)}(w) U(a, w)
\]
for all $a$ in $\mathcal{A}$.  Thus, by the definition of $c_{\mathbf{Cond}(c, \mathcal{E}, E)}$, we have
\[
\sum_{w \in \mathcal{W}} c(w|E) U(a^*_w, w) \geq \sum_{w \in \mathcal{W}} c(w|E) U(a, w)
\]
So, writing $w \in E$ to mean that $E$ is true at $w$, we have
\[
\sum_{w \in E} \frac{c(w)}{c(E)} U(a^*_w, w) \geq \sum_{w \in E} \frac{c(w)}{c(E)} U(a, w)
\]
Multiplying both sides by $c(E)$, we get
\[
\sum_{w \in E} c(w) U(a^*_w, w) \geq \sum_{w \in E} c(w) U(a, w)
\]
for all $a$ in $\mathcal{A}$.  In particular:
\[
\sum_{w \in E} c(w) U(a^*_w, w) \geq \sum_{w \in E} c(w) U(a_w, w)
\]
Thus, summing over all $E$s in $\mathcal{E}$, we get
\[
\sum_{w \in \mathcal{W}} c(w) U(a^*_w, w) \geq \sum_{w \in \mathcal{W}} c(w) U(a_w, w)
\]
Clearly the inequality becomes strict if $a^*_w$ is ever the unique expected utility maximizer.
$\Box$

Oddie and Greaves & Wallace's accuracy-based argument for Conditionalization


One way to understand Brown's argument for Conditionalization is this:  An agent is faced with a choice of updating rules; Conditionalization always maximizes subjective expected utility; sometimes, it is the only updating rule that does.  Of course, this relies on an account of the utility of an updating rule at a world.  But our utility function is not defined on updating rules; it is only defined on actions $a$ in $\mathcal{A}$.  However, Brown gives us an account of the utility of an updating rule in terms of the utility of actions.  For instance, for Brown, the utility of $\mathbf{Cond}$ at world $w$ is $U(a^*_w, w)$.  And the utility of $\mathbf{R}$ at $w$ is $U(a_w, w)$.  In general, the utility of an updating rule at a world is given by the utility of the action that an agent would choose based on the credence function demanded by that updating rule at that world (if she were to choose by maximizing subjective expected utility).

This gives us a hint how to give an accuracy-based argument for Conditionalization.  We want to argue that Conditionalization is the updating rule that maximizes subjective expected accuracy.  So we need to say what the accuracy of an updating rule is at a given world.  Following Brown's lead, we define the accuracy of an updating rule at a world to be the accuracy of the credence function that it demands an agent has at that world.  Thus, if the inaccuracy of a credence function $c$ at a world $w$ is given by $I(c, w)$, then the inaccuracy at $w$ of an updating rule $\mathbf{R}$ for an agent with credence function $c$ and partition $\mathcal{E}$ is given by $I(c_{\mathbf{R}(c, \mathcal{E}, E_w)}, w)$.  So, since the accuracy of $c$ at $w$ is $-I(c, w)$, the accuracy of $\mathbf{R}$ at $w$ is $-I(c_{\mathbf{R}(c, \mathcal{E}, E_w)}, w)$.

It might seem that we can now simply take Brown's argument and substitute our measure of accuracy (namely, $-I$) in place of the utility function (namely, $U$) and we will get an accuracy-based argument for Conditionalization.  But that isn't quite true.  Our proof of Brown's theorem began by using the fact that $a^*_w$ has maximal subjective expected utility relative to $c_{\mathbf{R}(c, \mathcal{E}, E)}$.  But, for all that has been said so far, we have no reason for thinking that $c_{\mathbf{R}(c, \mathcal{E}, E)}$ has maximal subjective expected accuracy relative to $c_{\mathbf{R}(c, \mathcal{E}, E)}$.  If we make that assumption, we can simply use Brown's proof, and we get the result we want, namely, that $\mathbf{Cond}$ maximizes subjective expected utility.  But is it a reasonable assumption?

Here is an argument in its favour:  For every probabilistic credence function, there is an evidential situation to which that credence function is the unique rational response, namely, the situation in which one learns that that probability function gives the objective chances (see (Joyce, 2009) for the original argument; see (Hájek, 2009) for an objection; see (Pettigrew, ms, section 5.2.1) for a response to Hájek on Joyce's behalf).  Thus, each credence function ought to assign maximal expected accuracy to itself and only itself.  If it does not, then for an agent who has that credence function, it will be permissible for her to adopt another credence function that also has maximal expected accuracy relative to her original credence function.  But, by hypothesis, this is not permissible for an agent in the evidential situation posited.  Thus, for each probabilistic credence function $c$, we ought to have:
\[
\sum_{w \in \mathcal{W}} c(w) I(c, w) < \sum_{w \in \mathcal{W}} c(w) I(c', w)
\]
for $c' \neq c$.  If this is true, we say that $I$ renders probabilistic credence functions immodest.  We note that the Brier score render probabilistic credence functions immodest, as do any of the inaccuracy measures based on Bregman divergences mentioned in my previous post.  Then we have the following theorem:

Theorem 2 (Greaves and Wallace)  Suppose $\mathbf{R}$ is an updating rule.  And suppose $I$ is an inaccuracy measure that renders probabilistic credence functions immodest.  Then
\[
\sum_{w \in \mathcal{W}} c(w)I(c_{\mathbf{Cond}(c, \mathcal{E}, E_w}), w) < \sum_{w \in \mathcal{W}} c(w) I(c_{\mathbf{R}(c, \mathcal{E}, E_w)}, w)
\]
Proof. Follows the proof of Brown's theorem exactly.
$\Box$

This is the accuracy-based argument for Conditionalization.  It has the following form:
  1. The cognitive value of a credence function is given by its accuracy.  All accuracy measures ought to render all probabilistic credence functions immodest.
  2. Maximize Subjective Expected Utility
  3. Theorem 2
  4. Therefore, Conditionalization.
In this post, as in previous posts, I have assumed that $\mathcal{F}$ is finite.  To see how the argument works when $\mathcal{F}$ is infinite, see (Easwaran, 2013).

References


  • Brown, Peter M. (1976) 'Conditionalization and Expected Utility' Philosophy of Science 43(3): 415-419.
  • Easwaran, Kenny (2013) 'Expected Accuracy Supports Conditionalization--and Conglomerability and Reflection' Philosophy of Science 80(1): 119-142. 
  • Greaves, Hilary and David Wallace (2006) 'Justifying Conditionalization: Conditionalization Maximizes Expected Epistemic Utility' Mind 115(459): 607-632.
  • Hájek, Alan (2009) 'Arguments for--or against--Probabilism?' in Huber, F. & C. Schmidt-Petri (eds.) Degrees of Belief, Synthese Library 342: 229-251.
  • Joyce, James M. (2009) 'Accuracy and Coherence: Prospects for an Alethic Epistemology of Partial Belief' in Huber, F. & C. Schmidt-Petri (eds.) Degrees of Belief, Synthese Library 342: 263-97.
  • Oddie, Graham (1997) 'Conditionalization, Cogency, and Cognitive Value' British Journal for Philosophy of Science 48: 533-541.
  • Pettigrew, Richard (ms) 'Accuracy, Risk, and the Principle of Indifference'

"Arithmetic, Structures and the Rise of Modern Logic": A Colloquium in Honour of Dan Isaacson

This is an announcement for an upcoming meeting, at Oxford, in honor of my colleague, Dan Isaacson.
"Arithmetic, Structures and the Rise of Modern Logic"
A Colloquium in Honour of Dan Isaacson on the occasion of his retirement as University Lecturer in the Philosophy of Mathematics.
15 June 2013, Lecture Room, Philosophy Faculty, (Radcliffe Humanities Building)
In addition to a valedictory lecture by Dan ("Philosophy of mathematics in Oxford"), the invited speakers are Richard Pettigrew, Alex Paseau, Alex Wilkie, Marcus Giaquinto and Paolo Mancosu.

For further details, see here.

Monday, 27 May 2013

Extended Deadline: LORI-4, October 9 - 12, 2013, Zhejiang University Hangzhou, P.R. China

*THE FOURTH INTERNATIONAL WORKSHOP ON LOGIC, RATIONALITY AND INTERACTION*

October 9 - 12, 2013 Center for the Study of Language and Cognition, Zhejiang University
 Hangzhou, P.R. China

*Submission deadline extended to June 7th, 2013*

Scope and mission: the LORI workshop series aims at bringing together researchers working on a wide variety of logic-related fields concerned with the understanding of rationality and interaction. These include Game Theory and Decision Theory, Philosophy and Epistemology, Linguistics, Computer Science and Artificial Intelligence. The series aims at fostering a view of Logic as an interdisciplinary endeavor, and supports the creation of a Chinese community of interdisciplinary researchers.

We invite submissions of contributed papers bearing on any of the broad themes of the LORI workshop series. More specific topics of interest for this edition include but are not limited to:
• argumentation and its role in interaction • norms, normative multiagent systems and social software
• semantic models for knowledge, for belief, and for uncertainty
• dynamic logics of knowledge, information flow, and action
• logical analysis of the structure of games
• belief revision, belief merging • logics of preference and preference representation
• logics of intentions, plans, and goals • logics of probability and uncertainty
• logical approaches to decision making and planning
• logic and social choice theory

*Important dates*:
• Paper submission (Extended!): June 7th, 2013
• Notification of acceptance: July 1, 2013
• Camera ready version: July 16, 2013
• Conference dates: October 9 - 12, 2013

Submissions: papers should be approximately 12 pages long, and should be submitted via Easychair (https://www.easychair.org/account/signin.cgi?conf=lori4). Authors are encouraged to use the LNCS style to format their papers.

Publication: The proceedings of LORI-4 will be published in the Springer LNCS/Folli series.

Invited speakers:
• Giuseppe Dari-Mattiacci (University of Amsterdam)
• Valentin Goranko (Technical University of Denmark)
• Hannes Leitgeb (Ludwig Maximilians University)
• Beishui Liao (Zhejiang University)
• Christian List (London School of Economics)
• Sonja Smets (University of Amsterdam)
• Dongmo Zhang (University of Western Sydney)

Chairs:
• Program chairs: Davide Grossi (University of Liverpool) and Olivier Roy (Ludwig Maximilians University)

• Organizing chair: Huaxin Huang (Zhejiang University)

Program committee:
• Thomas Ågotnes (University of Bergen)
• Natasha Alechina (University of Nottingham)
• Albert Anglberger (Ludwig Maximilians University)
• Alexandru Baltag (University of Amsterdam)
• Hans van Ditmarsch (LORIA)
• Jan Van Eijck (University of Amsterdam and CWI)
• Ulle Endriss (University of Amsterdam)
• Nina Gierasimczuk (University of Amsterdam)
• Jiahong Guo (Beijing Normal University)
• Wesley Holliday (University of California, Berkeley)
• Tomohiro Hoshi (Stanford University)
• Fangzhen Lin (Hong Kong University of Science and Technology)
• Fenrong Liu (Tsinghua University)
• Yongmei Liu (Sun Yat-Sen University)
• Guo Meiyun (Southwest University)
• Eric Pacuit (University of Maryland)
• Henry Prakken (Utrecht University)
• Ramaswamy Ramanujam (Institute of Mathematical Sciences)
• Antonino Rotolo (University of Bologna)
• Jeremy Seligman (University of Auckland)
• Kaile Su (Griffith University)
• Wenfang Wang (National Yang Ming University)
• Yanjing Wang (Peking University)
• Minghui Xiong (Sun Yat-Sen University)
• Tomoyuki Yamada (Hokkaido University)

Organizing committee (Zhejiang University):
• Longbiao Hu
• Li Jin • Beishui Liao
• Cihua Xu

Friday, 24 May 2013

Quine Transform of a Language

This is a first application of this notion (and in fact where the name comes from!).

Let $\mathbf{L} = (\mathcal{L}, \mathcal{A})$ be an interpreted language, such as  might be spoken/cognized by some agent $s$. Here $\mathcal{L}$ is the underlying (uninterpreted) syntax, and $\mathcal{A}$ is an extensional interpretation for $\mathcal{L}$-strings. So, $\mathcal{A}$ specifies, in the usual way, extensional meanings for $\mathbf{L}$'s syntactic components: connectives, quantifiers, names, predicates, etc. For example, if $t$ is a closed term, then its denotation in $\mathbf{L}$ is $t^{\mathcal{A}}$. If $\phi$ is a sentence, then
$\phi$ is true in $\mathbf{L}$ iff $\mathcal{A} \models \phi$.
Let $\pi : A \to A$ be any permutation of $A$. Let $\mathcal{A}^{\pi}$ be the Quine transform of $\mathcal{A}$ under $\pi$.

Definition [Quine Transform of a Language]
The Quine transform of the language $\mathbf{L}$, written $\mathbf{L}^{\pi}$, is defined to be $(\mathcal{L}, \mathcal{A}^{\pi})$.

The reason for being interested in this notion is that Quine argued (as I formulate it) that there cannot be a physical "fact of the matter" (by which Quine intends to include all "use-facts" or U-facts) discriminating between:
  • agent $s$ cognizes/speaks $\mathbf{L}$.
  • agent $s$ cognizes/speaks $\mathbf{L}^{\pi}$.
That is, according to Quine, it is indeterminate which language the agent $s$ speaks/cognizes. Quine's reasoning for this is a matter of dispute, of course. But note that the inteprertations $\mathcal{A}$ and $\mathcal{A}^{\pi}$ are not merely equivalent, in the technical sense, in making the same sentences $\phi$ true; they are isomorphic.

Quine Transform of a Model

Suppose that $\mathcal{A} = (A, \vec{R}, \vec{f})$ is a model, and let $\pi : A \rightarrow A$ be a bijection (permutation of $A$ to itself). Next, define the following notion:

Definition [Quine Transform]
The Quine transform of $\mathcal{A}$ under $\pi$, written $\mathcal{A}^{\pi}$, is given by:
$\mathcal{A}^{\pi}: = (A, \pi[\vec{R}], \pi[\vec{f}])$. 
For example, suppose the model $\mathcal{A} = (A,R)$ is specified as follows:
$A= \{0,1, 2\}$
$R= \{(0,1), (0,2), (1,2) \}$. 
Let $\pi: A \to A$ be the transposition that swaps $0$ to $1$. Then,
$\pi[R] = \{(1,0), (1,2),(0,2)\}$
Consequently, $R$ and $\pi[R]$ are extensionally distinct. However, $\mathcal{A}$ and $\mathcal{A}^{\pi}$ are isomorphic under $\pi$. More generally, one can see that:

Lemma ["Quine Transform Lemma"]
Let $\pi : A \to A$ be any bijection. Then: $\mathcal{A}^{\pi} \cong \mathcal{A}$.

This is all quite simple discrete mathematics. But it has interesting applications.

Nominalism vs. Syntax

It is difficult to maintain, consistently, the following two claims:
(i) There are no abstracta.
(ii) There are syntactical entities (and they behave as our standard accounts say they do).
Consider, for example, how one defines a language $L$. Beginning with two building blocks, $A$ and $B$, we say that $\{A,B\}$ is the alphabet. It's usually implicit, but sometimes needs to be stated, that $A \neq B$. (One has to state this in a formalized theory of syntax.)

For the many of the usual purposes of syntactical theory, it does not matter what these building blocks $A$ and $B$ are. They could be two eggs. They could be the numbers 7 and $\aleph_{57}$. They could be the letter types "a" and "$\aleph$". Or they could be two of my guitars. Or they could be two tokens of the letters "a" and "$\aleph$''. It does not matter. And the fact that it doesn't matter plays an important role in Gödel's incompleteness results, where the leading ideas involve the structural interplay between the properties of numbers, sequences, syntactical entities and finitary computations (plus, times, exponentation, and so on).

Let our alphabet $\Sigma = \{A,B\}$. Next we consider the set $\Sigma^{\ast}$ of finite sequences drawn from $\Sigma$. Finite sequences drawn from an alphabet are usually called,
  • strings
  • words
  • expressions
These are the syntactical entities that one is discussing, quantifying over, referring to, etc. The crucial point is that these are sequences from the alphabet. In particular, $\Sigma^{\ast}$ is closed under sequence concatenation. So,
if $\alpha, \beta \in \Sigma^{\ast}$, then $\alpha ^{\frown} \beta \in \Sigma^{\ast}$.
And:
$|\Sigma^{\ast}| = \aleph_0$.
This means that there are $\aleph_0$-many syntactical entities. The terms $a_0, a_2, \dots, a_n$ occurring in a sequence $\alpha = (a_0, a_1, \dots, a_n)$ may well be concreta. But the sequence $\alpha$ itself is a (possibly mixed) abstractum. More exactly, a sequence is usually understood as a function:
$\alpha : \{0,1,\dots,n\} \to \Sigma$.
This is not mandated. What is mandated is that sequences are individuated in a certain way:
$\alpha = \beta$ if and only if $\alpha$ and $\beta$ have the same terms, in the same order.
So, e.g,, if $(a_0, a_1, \dots, a_n) = (b_0, b_1, \dots, b_k)$, then $n = k$, and $a_0 = b_0$, $a_1 = b_1$, and so on.

Normally, one goes on to define certain special subsets $X, Y, \dots$ of $\Sigma^{\ast}$. Perhaps these are the formulas, or terms, and whatnot. Usually, the definitions satisfy certain computational constraints: e.g., perhaps an inductive definition. So, $X$ might be, e.g., a recursive set or a recursively enumerable set. But for this discussion here, these subsets don't matter. They're subsets, and we discussing the enclosing set, of all strings from the alphabet.

Return to (i) and (ii). Suppose (i) is true. So, there are no abstracta. Hence, there are, a fortiori, no mixed abstracta; and therefore, there are no sequences; and, therefore, there are no strings; and therefore no syntactical entities, except a very, very small number of tokens, which are not closed under concatenation. Hence, (ii) is false.

One might suggest that these claims (i) and (ii) are "really" consistent under some reinterpretation $I$. But what exactly is this $I$? How is $I$ defined? Is it a secret?

I think that the optimal nominalistic responses to the inconsistency of (i) and (ii) are:
  • either to accept the inconsistency and thus simply accept that (ii) (i.e., syntax) is false (see Quine & Goodman 1947, "Steps Toward a Constructive Nominalism"), 
  • or to reinterpret (ii,) to make it "true under a reinterpretation", so that "syntactical entity" refers perhaps to possibilia (i.e., possible concrete tokens: see Burgess & Rosen 1997, A Subject with No Object, for some discussion of this) or perhaps to some kind of physical entity (such as perhaps spacetime regions), assuming there are sufficiently many.
I'm not optimistic about either kind of approach. The first approach is an error theory, and is too damaging to science. An error theory for morality is one thing; an error theory for science is another! The second, "hermeneutic", approach invokes possibilia and this raises similar sceptical and metaphysical worries as abstracta do. (See the final chapter of Shapiro 1997, Philosophy of Mathematics: Structure and Ontology.) It also raises the question of what grounds one might give for the reinterpretation. A classic discussion of some of these topics is Burgess 1983, "Why I am not a Nominalist".

So far as I can tell, the more recent "weaseling" approach to nominalism---which I think is extremely interesting---proposed by Melia 2000 ("Weaseling Away the Indispensability Argument", Mind) and endorsed and developed recently by Yablo 2012 ("Explanation, Extrapolation and Existence", Mind) doesn't seem to apply in the syntactic case. But I'm not sure.

Joyce's argument for Probabilism: the mathematics

Last time, I posted about Jim Joyce's argument for Probabilism.  At its heart lies a mathematical theorem.  In this post, I state this mathematical theorem and prove it; then I generalize it and prove the generalization. 

The framework


Recall from last time:
  • $\mathcal{F}$ is a finite set of propositions.  (It is the set of propositions about which our agent has an opinion.)
  • A credence function is a function $c : \mathcal{F} \rightarrow [0, 1]$.
Throughout the present post, we will adopt a particular representation of credence functions:  We have assumed that $\mathcal{F}$ is finite.  Thus, suppose $\mathcal{F} = \{X_1, \ldots, X_n\}$.  Then we can represent any credence function on $\mathcal{F}$ by a vector in the $n$-dimensional vector space $[0, 1]^n$.  That is, we represent $c : \mathcal{F} \rightarrow [0, 1]$ by the vector
\[
\langle c(X_1), \ldots, c(X_n) \rangle
\]
This is essentially the representation we used last time when we plotted credence functions as points on the Euclidean plane.  To avoid prolixity, I'll use $c$ to refer to the credence function and to the vector that represents it.

Under this representation, we have the following:
  • Let $\mathcal{B}$ be the set of credence functions.  That is, $\mathcal{B} := [0, 1]^n$. 
  • Let $\mathcal{P}$ be set of probability functions.  So $\mathcal{P} \subseteq \mathcal{B} = [0, 1]^n$.
  • The function\[ Q(c, c') := \sum_{X \in \mathcal{F}} (c(X) - c'(X))^2\] genuinely measures the squared Euclidean distance between the vectors that represent $c$ and $c'$.

Recall:
  • $\mathcal{W}$ is the set of classically consistent assignments of truth-values to the propositions in $\mathcal{F}$.  (Thus, we can think of $\mathcal{W}$ as the set of possible worlds grained as finely as the propositions in $\mathcal{F}$ will allow.)
  • Given $w$ in $\mathcal{W}$, define $v_w : \mathcal{F} \rightarrow [0, 1]$ as follows:\[ v_w(X) = \left \{ \begin{array}{ll} 0 & \mbox{if $X$ is false at $w$} \\ 1 & \mbox{if $X$ is true at $w$} \end{array} \right. \]
  • Let $\mathcal{V} := \{v_w : w \mbox{ in } \mathcal{W}\}$.  Thus, under the vector representation, $\mathcal{V} \subseteq [0, 1]^n$.

The core theorem 


With that in hand, we can state Theorem 1, the mathematical core of Joyce's argument for Probabilism.  (In fact, the version we state here is slightly stronger than Joyce's; clause (2) is stronger.)

Theorem 1
  1. If $c \not \in \mathcal{P}$, then there is $c^* \in \mathcal{P}$ such that $Q(v, c^*) < Q(v, c)$ for all $v \in \mathcal{V}$.
  2. If $c \in \mathcal{P}$, then there is no $c^* \in \mathcal{B}$ such that $Q(v, c^*) \leq Q(v, c)$ for all $v \in \mathcal{V}$.
In the first half of this post, we prove Theorem 1.  In the second half, we generalize it by showing that the same result holds for a large class of alternative functions other than $Q$.

The first step in the proof of Theorem 1 is the following Lemma, which is due to (de Finetti, 1974).  It gives an extremely useful characterization of the credence functions that satisfy Probabilism:  they are precisely those that are convex combinations of the omniscient credence functions in $\mathcal{V}$.

Definition 1  Let $\mathcal{V}^+$ be the convex hull of $\mathcal{V}$.  That is, $\mathcal{V}^+$ is the smallest convex set that contains $\mathcal{V}$.

Lemma 1  $\mathcal{P} = \mathcal{V}^+$.

Proof:  Suppose, to begin with, that $\mathcal{F}$ is an algebra.
  1. First, we prove $\mathcal{V}^+ \subseteq \mathcal{P}$.  To do this, we note two things:  first, each $v \in \mathcal{V}$ is a probability function, so $\mathcal{V} \subseteq \mathcal{P}$; second, $\mathcal{P}$ is convex.
  2. Second, we prove $\mathcal{P} \subseteq \mathcal{V}^+$.  Since $\mathcal{F}$ is a finite algebra, it has atoms.  Let $\mathcal{A} \subseteq \mathcal{F}$ be the set of atoms of $\mathcal{F}$.  Then $\mathcal{A}$ and $\mathcal{V}$ stand in one-one correspondence:  if $v \in \mathcal{V}$, there is exactly one $A_v \in \mathcal{A}$ such that $v(A_v) = 1$ and $v(A_{v'}) = 0$ for $v \neq v'$.  Now suppose $c \in \mathcal{P}$.  And suppose $X \in \mathcal{F}$.  Then $X$ is equivalent to the disjunction of the atoms $A_v$ such that $v(X) = 1$.  Thus\[c(X) = c\left (\bigvee_{v : v(X) = 1} A_v \right ) = \sum_{v : v(X) = 1} c(A_v) = \sum_{v} c(A_v)v(X)\]  So\[c = \sum_{v \in \mathcal{V}} c(A_v) v\]  Thus, $c$ is a convex combination of elements of $\mathcal{V}$.  That is, $c \in \mathcal{V}^+$.
If $\mathcal{F}$ is not an algebra, find the smallest algebra $\mathcal{F}^*$ that contains $\mathcal{F}$, extend the credence functions to $\mathcal{F}^*$, run the preceding arguments, and then restrict to $\mathcal{F}$.
$\Box$

This characterization of $\mathcal{P}$, the set of probability functions on $\mathcal{F}$, makes our lives a lot easier, since there is a vast array of results about the behaviour of convex sets.  Here are the two that are important for our purposes:

Lemma 2  Suppose $\mathcal{X} \subseteq \mathbb{R}^n$ is convex.  Then if $x \not \in \mathcal{X}$, there is $x^* \in \mathcal{X}$ such that $Q(y, x^*) < Q(y, x)$ for all $y \in \mathcal{X}$. 

Lemma 3   Suppose $\mathcal{X} \subseteq \mathbb{R}^n$.  Then, if $x, y \in \mathcal{X}^+$ and $x \neq y$, there is $z \in \mathcal{X}$ such that $Q(z, x) < Q(z, y)$.

We can now see how Lemmas 1, 2, and 3 combine to give Theorem 1:  By Lemma 1, $\mathcal{P} = \mathcal{V}^+$.  But, by Lemma 2, if there is a point $c$ outside $\mathcal{V}^+$, there is a point $c^*$ inside $\mathcal{V}^+$ such that $c^*$ is closer to all points in $\mathcal{V}^+$ than $c$ is; thus, in particular, $c^*$ is closer to all $v$ in $\mathcal{V}$ than $c$ is.  This gives Theorem 1(1).  By Lemma 3, if $c, c'$ are in $\mathcal{V}^+$, and $c \neq c'$, then there is $v$ in $\mathcal{V}$ such that $c$ is closer to $v$ than $c'$ is.  Thus, $c$ is not even weakly dominated.  This gives Theorem 1(2).  And we're done!

The core theorem generalized


So far, we've been assuming that distance between credence functions is measured by Squared Euclidean Distance.  Does Theorem 1 depend on that?  That is, is there a broader class of alternative measures of the distance between credence functions such that Theorem 1 holds for every distance measure in that class?  The first thing to say is that Squared Euclidean Distance is not itself a distance measure, strictly speaking:  that is, it isn't a metric.  It doesn't satisfy the triangle inequality.  It is rather what statisticians call a divergence.  In this section, we show that Theorem 1 holds for any Bregman divergence.  Squared Euclidean Distance is a Bregman divergence, but so is Kullback-Leibler divergence, the squared Mahalanobis distance, and Itakura-Saito distance.  And, as we will see, if an inaccuracy measure is generated by a proper scoring rule there is a Bregman divergence that differs from that inaccuracy measure by a constant.

Definition 2  Suppose $\mathcal{X} \subseteq \mathbb{R}^n$ is convex.  Suppose $F : \mathcal{X} \rightarrow \mathbb{R}$ is strictly convex and $\nabla F$ is defined on $\mathrm{int}(\mathcal{X})$ and extends to a bounded, continuous function on $\mathcal{X}$.  Then the Bregman divergence generated by $F$ is
\[
d_F(y, x) := F(x) - F(y) - \langle \nabla F(x), (y-x)\rangle
\]
Essentially, $d_F(y, x)$ is the difference between $F(y)$ and the first-order Taylor expansion around $x$ evaluated at $y$, as the following diagram illustrates:


Given the strict convexity of $F$, it follows that $d_F(y, x) \geq 0$, with equality iff $x = y$.

What follows are some of the crucial theorems concerning Bregman divergences.  Each shows that Bregman divergences share many important geometrical features with Squared Euclidean Distance.  We will be appealing to these various properties over the coming weeks.

First:   Lemmas 2 and 3 holds for any Bregman divergence $d_F$:

Lemma 4  Suppose $\mathcal{X} \subseteq \mathbb{R}^n$ is convex.  Then if $x \not \in \mathcal{X}$, there is $x^* \in \mathcal{X}$ such that $d_F(y, x^*) < d_F(y, x)$ for all $y \in \mathcal{X}$.

Proof:  This is Proposition 3 in (Predd, et al., 2009).

Lemma 5  Suppose $\mathcal{X} \subseteq \mathbb{R}^n$.  Then, if $x, y \in \mathcal{X}^+$ and $x \neq y$, there is $z \in \mathcal{X}$ such that $d_F(z, x) < d_F(z, y)$.

Proof: This is proved as part (a) of Theorem 1 in (Predd, et al., 2009).

Together, these are enough to show that Theorem 1 holds for any Bregman divergence.  Thus, if we can establish that distance between credence functions ought to be measured by a Bregman divergence, we can run Joyce's argument.

Second:  Suppose the inaccuracy of a credence function $c$ at world $w$ is given by $d_F(v_w, c)$ for some Bregman divergence $d_F$.  Then, if credence function $c_1$ is further from $c$ than $c_2$ is, then $c$ will expect $c_1$ to have greater inaccuracy than it will expect $c_2$ to have.

Lemma 6  Recall that, if $v$ is in $\mathcal{V}$, then $A_v$ is the unique atom of $\mathcal{F}$ such that $v(A_v) = 1$.  Then, if $c \in \mathcal{P}$ and $d_F(c, c_1) < d_F(c, c_2)$, then
\[
\sum_{v \in \mathcal{V}} c(A_v)d_F(v_w, c_1) < \sum_{v \in \mathcal{V}} c(A_v)d_F(v_w, c_2)
\]

Proof:  This is proved on page 32 of (Pettigrew, 2013).

One consequence of this is that each probabilistic credence function expects itself to be most accurate.  The following result gives a partial converse to this.

Notice that $B(c, w)$ is obtained by taking the squared difference between each credence $c(X)$ and the corresponding ideal credence $v_w(X)$ and summing the results.  In this sort of situation, we say that $B$ is generated by a scoring rule.

Definition 3  A scoring rule is a function $s : \{0, 1\} \times [0, 1] \rightarrow [0, \infty]$.

The idea is that $s(0, x)$ measures the inaccuracy of having credence $x$ in a false proposition, whereas $s(1, x)$ measures the inaccuracy of having credence $x$ in a true proposition.

Definition 4   Given a scoring rule $s$, we say that $I_s$ is the inaccuracy measure generated by $s$, where
\[
I_s(c, w) = \sum_{X \in \mathcal{F}} s(v_w(X), c(X))
\]

We say that a scoring rule is proper if a particular credence expects itself to be more accurate than it expects any other credence to be.  More precisely:

Definition 5  We say that a scoring rule $s$ is proper if
\[
ps(1, x) + (1-p)s(0, x)
\]
is minimal at $x = p$.

Then we have the following theorem, which connects the inaccuracy measures generated by proper scoring rules and Bregman divergences.

Theorem 2  Suppose $s$ is a continuous proper scoring rule.  Then there is a Bregman divergence $d_F$ such that
\[
I_s(c, w) = d_F(v_w, c) + \sum_{X \in \mathcal{F}} s(v_w(X), v_w(X))
\]

Proof: This is proved as Equation (8) in (Predd, et al.).

References


  • de Finetti, B. (1974) A Theory of Probability (vol 1) (New York: Wiley)
  • Pettigrew, R. (2013) 'A New Epistemic Utility Argument for the Principal Principle' Episteme 10(1): 19-35
  • Predd, et al. (2009) 'Probabilistic Coherence and Proper Scoring Rules' IEEE Transactions on Information Theory 55(10): 4786-4792.

Wednesday, 22 May 2013

Cognitive Reductionism about Proofs

This is a quick comment on the issue about Mochizuki's claimed proof of the abc conjecture that Catarina wrote about a couple of days ago. (I don't know much about this number theory stuff.)

Are proofs cognitive entities? Is every proof cognized? Known? Knowable?
Cognitive Reductionism about Proofs
Every proof P of a mathematical claim is cognizable by some one (or more) agent.
This claim is analogous to certain verificationist claims more generally (e.g., every truth is knowable). I believe that this claim is mistaken, or, at least, not justified. For all I know, Mochizuki has found a proof, but unfortunately, it is simply not cognized yet, by anyone else. This is a bit annoying, of course. So far as I, or anyone else can tell, he has not done anything mathematically wrong. If something is "wrong" here, it belongs to social epistemology.

Maybe related to this is the following result, a consequence of Church's Theorem (on the undecidability of predicate logic), which is an indication of how complicated proofs can be even of theorems of predicate logic:
Let $L$ be a first-order language with a binary predicate $R$. Let $|\phi|$ be the number of symbols in $\phi$. There is no recursive function $f : \mathbb{N} \to \mathbb{N}$ such that, for all $\phi \in L$, where $|\phi| = n$, then if $P$ is a proof of $\phi$ in predicate logic, then $\phi$ has a proof $P^{\ast}$ such that $|P^{\ast}| \leq f(n)$.
[Proof: Suppose that there is such a function $f$. Let $M_f$ be a TM that computes $f$. Now suppose we are given a formula $\phi \in L$. We have the query:
$\vdash \phi$?
Compute $n = |\phi|$ and compute $f(n)$ using $M_f$. Predicate logic proofs can be recursively enumerated in increasing size. Run through the predicate logic proofs in such an enumeration. If a proof $P$ of $\phi$ is reached with $|P| \leq f(n)$, then we have that $\vdash \phi$. If a proof of $\phi$ is NOT reached at this point, then we can conclude, by the defining property of $f$, that $\nvdash \phi$. This is a decision procedure for logical validity in $L$, contradicting Church's Theorem.]

This means that proofs of valid theorems of logic can get larger and larger with no recursively specifiable bound. Consequently, there is no reason whatsoever to suppose that every proof can be cognized, or recognized, by some finite agent.

George Boolos (1987, "A Curious Inference") has given a very nice example of a first-order theorem $\phi_{Bool}$ of logic whose shortest proof in first-order predicate logic is astronomically vast. The underlying idea is that the formula "encodes" the Ackermann function $A$, and a predicate logic proof requires a step-by-step computation of length about equal to the value of this function ($A(4,4)$, if I recall right). He notes, however, that $\phi_{Bool}$ has a short proof in second-order predicate logic. (This is an example of "speed-up".) I wrote a short paper about this issue in Analysis several years ago (2005, "Some More Curious Inferences").

Cognitive Reductionism About Language

Cognitive Reductionism about languages is the following (empirical) claim:
Every language L is spoken/cognized by some one or more speakers.
That is, the claim that languages can be reduced to cognitve states of some one or more speakers. However, I think that cognitive reductionism is deeply mistaken. There are languages which are not spoken, or cognized.

So, on my view, statements of the form:
Agent A cognizes language L
Agents A and B cognize the same ("shared") language L.
are contingent empirical claims. The agent A might not have cognized L. Whether agents A and B cognize a "shared" language is an empirical question.

It seems clear that, as a matter of empirical observation, agents never cognize the same language (though this is contingent, of course). There are lexical, phonological, semantic, pragmatic, etc., differences. And this phenomenon---heterogeneity in speech communities---requires explanation.

Tuesday, 21 May 2013

Cognizing a Language

I see metasemantics has having two major components (cf, David Lewis 1970, "General Semantics"). One component studies languages, what their properties are, how they're individuated, etc. The other component studies how languages are "cognized".

On the first issue, for the metasemantics I prefer, languages are finely-individuated mixed mathematicalia, whose intrinsic syntactic, phonological, semantic, pragmatic, orthographic properties are essential. The corresponding individuation condition is:
$L_1 = L_2$ if and only if they have the same syntax, semantics, etc, etc.
(If this seems somehow too obvious to need saying, or perhaps silly, then what do you suggest? The main alternative, at least, the main one I can think of, would somehow introduce a speech community somehow in the very individuation of languages. But I think this is wrong.)

Languages then do not undergo change, either temporal or modal. Rather, we have various sequences of distinct languages. This theory of language individuation is, more or less, Lewis's, as sketched near the start of "Languages and Language" (1975). It seems to have been endorsed also by Scott Soames and Saul Kripke too.

[The issue is quite complicated because languages are usually mixed mathematicalia, grounding out somehow in concrete/physical "tokens" (e.g., tokens of certain phonemes or tokens of the letter "A", etc.); and, consequently, one needs some sort of account of the individuation criteria for mixed mathematical: e.g., the set of US Presidents or the magnetic field $\mathbf{B}$, which is a function on spacetime, to a certain linear space isomorphic to $\mathbb{R}^3$.]

Once we have got some sort of account of what languages are, and how they're individuated, we next need to provide some sort of account of how they are spoken, or as I like to say, "cognized" by agents.

I believe the most basic notions required in a workable account of this are roughly of this kind:
Agent A assigns meaning M to string $\sigma$
I call these "cognizing" relations. (One can of course add bells and whistles, various parameters, indices, contexts and time and world parameters. But I want to keep it simple.) So, I cognize my idiolect $L_{JK}$ by my mind assigning meanings to various strings (my mind also assigns some kind of syntactic structure too, and pragmatic meaning functions too, but I am ignoring that for the moment).

My mind's assigning meanings is not, I think, for the most part "conscious"; and is not, in many cases, something I can articulate. Somehow,
  • I acquire meanings
  • copy/borrow meanings.
  • bestow meanings.
Admittedly, I don't have a good theory of this. For example, I use the string "Kripke" to mean Saul Kripke. But I do not have a good theory how I acquired this meaning assignment (it must have involved meaning copying/borrowing, as Kripke himself has argued at length). I use the string "finite ordinal number" to refer to elements of $\omega$, and again I simply don't know how I acquired this meaning assignment, except to say, flat-footedly, that I learnt set theory ...

Even so, I'm pretty sure that these features of language cognition---the basic meaning assignment relations---are what need to be clarified for the (more difficult) "cognizing" side of metasemantics.

[Though this is not forced, I'm somewhat sceptical about the notion of "shared" languages. The languages that agents or individuals speak/cognize, are, first and foremost, idiolects. But this is a very big question, involving complicated questions about normativity, "meaning gaps" and some of the topics that arise in debates about semantic internalism and externalism. Admittedly, there's a huge overlap amongst idiolects in language communities. But there is also heterogeneity as well. And this must be accounted for.]

Sunday, 19 May 2013

Joyce's argument for Probabilism

In January, the Department of Philosophy at the University of Bristol launched an ERC-funded four-year research project on Epistemic Utility Theory: Foundations and Applications.  The main researchers will be: Richard Pettigrew, Jason Konek, Ben Levinstein, Pavel Janda (PhD student), and Chris Burr (PhD student).  The website is here.

I thought it would be good to write a few blog posts explaining what I take epistemic utility theory to be, and describing the work that has been done in the area so far.  So, over the next few weeks, that's exactly what I'll do here at M-Phi.  I'll try for one post per week.

The guiding idea behind epistemic utility theory is this:  Over the last decade or so, epistemologists have been increasingly interested in epistemic value.  That is, they have been interested in identifying the features of a doxastic or credal state that make it good qua cognitive state (rather than good qua guide to action).  For instance, we might say that having true beliefs is more valuable than having false beliefs, or that having higher credences in true propositions is better; or we might say that a belief or a credence has greater value the greater its evidential support.  Epistemic utility theory begins by asking a further question:  How can we quantify and measure epistemic value?  Having answered that question it asks another:  What epistemic norms can be justified by appealing to this measure of epistemic value?

Joyce's framework


The original argument in this area is due to Jim Joyce in his paper 'A Non-Pragmatic Vindication of Probabilism' (1998) Philosophy of Science 65(4):575-603.  In this blog post, I'll describe the framework in which Joyce's argument takes place; I'll state the norm he wishes to justify; and I'll present his argument for it in the way I find most plausible.

Represent an agent's cognitive state at a given time by her credence function at that time:  this is the function c that takes each proposition about which she has an opinion and returns the real number that measures her credence in that proposition.  By convention, we represent minimal credence by 0 and maximal credence by 1.  Thus, $c$ is defined on the set $\mathcal{F}$ of propositions about which the agent has an opinion; and it takes values in $[0, 1]$.  If $X$ is in $\mathcal{F}$, then $c(X)$ is our agent's degree of belief or credence in $X$.  Throughout, we assume that $\mathcal{F}$ is finite.  With this framework in hand, we can state the norm of Probabilism:

Probabilism At any time in an agent's credal life, it ought to be the case that her credence function $c$ at that time is a probability function over $\mathcal{F}$ (or, if $\mathcal{F}$ is not an algebra, $c$ can be extended to a probability function over the smallest algebra that contains $\mathcal{F}$).

Joyce's argument


How do we establish this norm?  Jim Joyce offers the following argument:  It is often said that the aim of full belief is truth.  One way to make this precise is to say that the ideal doxastic state is that in which one believes every true proposition about which one has an opinion, and one disbelieves every false proposition about which one has an opinion.  That is, the ideal doxastic state is the omniscient doxastic state (relative to the set of propositions about which one has an opinion).  We might then measure how good an agent's doxastic state is by its proximity to this omniscient state.

Joyce's argument, as I will present it, is based on an analogous claim about credences.  We say that the ideal credal state is that in which our agent assigns credence 1 to each true proposition in $\mathcal{F}$ and credence 0 to each false proposition in $\mathcal{F}$. By analogy with the doxastic case, we might call this the omniscient credal state (relative to the set of propositions about which she has an opinion). Let $\mathcal{W}$ be the set of possible worlds relative to $\mathcal{F}$:  that is, the set of consistent assignments of truth values to the propositions in $\mathcal{F}$.  Now, let $w$ be a world in $\mathcal{W}$.  Then let $v_w$ be the omniscient credal state at $w$: that is, $v_w(X) = 0$ if $X$ is false; $v_w(X) = 1$ if $X$ is true.

We then measure how good an agent's credal state is by its proximity to the omniscient state.  Following Joyce, we call this the accuracy of the credal state.  To do this, we need a measure of distance between credence functions.  Many different measures will do the job, but here I will focus on the most popular, namely, Squared Euclidean Distance.  Suppose $c$ and $c'$ are two credence functions.  Then define the Squared Euclidean Distance between them as follows:
\[
Q(c, c') := \sum_{X \in \mathcal{F}} (c(X) - c'(X))^2
\]
Thus, given a possible world $w$ in $\mathcal{W}$, the cognitive badness or disvalue of the credence function $c$ at $w$ is given by its inaccuracy; that is, the distance between $c$ and $v_w$, namely, $Q(c, v_w)$.  We call this the Brier score of $c$ at $w$, and we write it $B(c, w)$.  So the cognitive value of $c$ at $w$ is the negative of the Brier score of $c$ at $w$; that is, it is $-B(c, w)$.  Thus, $B$ is a measure of inaccuracy; $-B$ is a measure of accuracy.

With this measure of cognitive value in hand, Joyce argues for Probabilism by appealing to a standard norm of traditional decision theory:

Dominance Suppose $\mathcal{O}$ is a set of options, $\mathcal{W}$ is a set of possible worlds, and $U$ is a measure of the value of the options in $\mathcal{O}$ at the worlds in $\mathcal{W}$.  Suppose $o, o'$ in $\mathcal{O}$.  Then we say that
  • $o$ strongly $U$-dominates $o'$ if $U(o', w) < U(o, w)$ for all worlds $w$ in $\mathcal{W}$
  • $o$ weakly $U$-dominates $o'$ if $U(o', w) \leq U(o, w)$ for all worlds $w$ in $\mathcal{W}$ and $U(o', w) < U(o, w)$ for at least one world $w$ in $\mathcal{W}$.
Now suppose $o, o'$ in $\mathcal{O}$ and
  1. $o$ strongly $U$-dominates $o'$;
  2. There is no $o''$ in $\mathcal{O}$ that weakly $U$-dominates $o$.
Then $o'$ is irrational.

Of course, in standard decision theory, the options are practical actions between which we wish to choose.  For instance, they might be the various environmental policies that a government could pursue; or they might be the medical treatments that a doctor may recommend.  But there is no reason why Dominance or any other decision-theoretic norm can only determine the irrationality of such options.  They can equally be used to establish the irrationality of accepting a particular scientific theory or, as we will see, the irrationality of particular credal states.  When they are put to use in the latter way, the options are the possible credal states an agent might adopt; the worlds are, as above, the consistent assignments of truth values to the propositions in $\mathcal{F}$; and the measure of value is $-B$, the negative of the Brier score.  Granted that, which credal states does Dominance rule out?  As the following theorem shows, it is precisely those that violate Probabilism.

Theorem 1
  1. If $c$ is not a probability function, then there is a credence function $c^*$ that strongly Brier dominates $c$.
  2. If $c$ is a probability function, then there is no credence function $c^*$ that weakly Brier dominates $c$.
This, then, is Joyce's argument for Probabilism:
  1. The cognitive value of a credence function is given by its proximity to the ideal credence function:  the ideal credence function at world $w$ is $v_w$; and distance is measured by the Squared Euclidean Distance.  Thus, the cognitive value of a credence function at a world is given by the negative of its Brier score at that world.  (In fact, as we will see next week, Joyce weakens this premise and thus strengthens the argument.)
  2. Dominance
  3. Theorem 1
  4. Therefore, Probabilism
Thus, according to Joyce, what is wrong with an agent who violates Probabilism is that there is a credence function that is more accurate than hers regardless of how the world turns out.

Joyce's argument in action


Let's finish off by seeing the argument in action.  Suppose our agent has an opinion about only two propositions $A$ and $B$.  And suppose that $A$ entails $B$.  For such an agent, the only demand that Probabilism makes is

No Drop If $A$ entails $B$, an agent ought to have a credence function $c$ such that $c(A) \leq c(B)$.

Now, if $\mathcal{F} = \{A, B\}$, then $\mathcal{W} = \{w_1, w_2, w_3\}$, where $A$ and $B$ are both true at $w_1$, $A$ is false and $B$ is true at $w_2$, and $A$ and $B$ are both false at $w_3$.  Also, we can represent a credence function over these propositions as a point on the Euclidean plane.  So we can represent our agent's credence function $c$ like this, but also the omniscient credence functions at the three different possible worlds.  We do so on the diagram below.  On this diagram, the blue shaded area includes all and only the credence functions that satisfy Probabilism.  As we can see, if a credence function lies outside that area, there is a credence function that lies inside it that is closer to each omniscient credence function; but this never happens if the credence function is inside the area to begin with.  This is the content of Theorem 1 in this situation.

Der logische Aufbau der Welt

The title of Rudolf Carnap's 1928 book, Der logische Aufbau der Welt, is normally translated as "The Logical Structure of the World", although apparently a more accurate rendition would be "The Logical Construction of the World".

In working on how to make sense of the claim/significance of Leibniz Equivalence from spacetime theories (roughly: isomorphic spacetime models represent the same possible worlds), I've been trying to work out a version of a propositional view of possible worlds. This has some similarities which Carnap's theory of "state descriptions" (and with Wittgenstein's "picture theory" which influenced Carnap).

The propositional diagram account of possible worlds can be put like this:
$w$ is a possible world if and only if
$w = \hat{\Phi}_{\mathcal{A}}[\vec{R}]$,
where $\mathcal{A}$ is a model, and $\vec{R}$ is a sequence of relations-in-intension.
Here, given a model $\mathcal{A}$ (say, $(A, \vec{S})$ with domain $A$), then $\Phi_{\mathcal{A}}(\vec{X})$ is a formula of pure second-order logic (perhaps infinitary: it has cardinality $max(\omega, |A|^{+})$). $\Phi_{\mathcal{A}}(\vec{X})$ defines the isomorphism type of $\mathcal{A}$. The variables $\vec{X}$ are free second-order variables. I call $\Phi_{\mathcal{A}}(\vec{X})$ the diagram formula for the model $\mathcal{A}$. It correspoinds very closely to what model theorists call the elementary diagram of a model. And then $\hat{\Phi}_{\mathcal{A}}$ is the corresponding (second-order) propositional function, and $\hat{\Phi}_{\mathcal{A}}[\vec{R}]$ is then the result of ``saturating'' $\hat{\Phi}_{\mathcal{A}}$ with the relations $\vec{R}$.

On this "Propositional Diagram" conception of possible worlds, a world $w$ has the form
$w = \hat{\Phi}_{\mathcal{A}}[\vec{R}]$
One might now think of $\hat{\Phi}_{\mathcal{A}}$ as the abstract structure of the world $w$, and think of the sequence $\vec{R}$ as expressing the intensional content of $w$.

This is a kind of form/content distinction. Another way of putting this is to try and define the "representation" relation that holds between models and worlds. Let $\mathcal{A}$ be a model. Let $w$ be a world. Let $\vec{R}$ be a sequence of relations-in-intension with signature matching $\mathcal{A}$. Then:
$\mathcal{A}$ represents $w$ relative to $\vec{R}$ iff $w = \hat{\Phi}_{\mathcal{A}}[\vec{R}]$.

Tuesday, 14 May 2013

What's wrong with Mochizuki's 'proof' of the ABC conjecture?

(Cross-posted at NewAPPS)

A few days ago Eric had a post about an insightful text that has been making the rounds on the internet, which narrates the story of a mathematical ‘proof’ that is for now sitting somewhere in a limbo between the world of proofs and the world of non-proofs. The ‘proof’ in question purports to establish the famous ABC conjecture, one of the (thus far) main open questions in number theory. (Luckily, a while back Dennis posted an extremely helpful and precise exposition of the ABC conjecture, so I need not rehearse the details here.) It has been proposed by the Japanese mathematician Shinichi Mochizuki, who is widely regarded as an extremely talented mathematician. This is important, as crackpot ‘proofs’ are proposed on a daily basis, but in many cases nobody bothers to check them; a modicum of credibility is required to get your peers to spend time checking your purported proof. (Whether this is fair or not is beside the point; it is a sociological fact about the practice of mathematics.) Now, Mochizuki most certainly does not lack credibility, but his ‘proof’ has been made public quite a few months ago, and yet so far there is no verdict as to whether it is indeed a proof of the ABC conjecture or not. How could this be?

As it turns out, Mochizuki has been working pretty much on his own for the last 10 years, developing new concepts and techniques by mixing-and-matching elements from different areas of mathematics. The result is that he created his own private mathematical world, so to speak, which no one else seems able (or willing) to venture into for now. So effectively, as it stands his ‘proof’ is not communicable, and thus cannot be surveyed by his peers.

Let us assume for a moment that the ‘proof’ is indeed correct in that every inferential step in the lengthy exposition is indeed necessarily truth-preserving, i.e. no counterexample can be found for any of the steps. In a quasi-metaphysical sense, the ‘proof’ is indeed a proof, which is a success term (a faulty proof is not a proof at all). However, in the sense that in fact matters for mathematicians, Mochizuki’s ‘proof’ is not (yet) a prof because it has not been able to convince the mathematical community of its correctness; for now, it remains impenetrable. To top it up, Mochizuki is a reclusive man who so far has made no efforts to reach out for his peers and explain the basic outline of the argument.

What does this all mean, from a philosophical point of view? Now, as some readers may recall, I am currently working on a dialogical conception of deductive proofs (see here and here). I submit that the dialogical perspective offers a fruitful vantage point to understand what is going on with the ‘Mochizuki affair’, as I will argue in the remainder of the post. (There are also interesting connections with the debate on computer-assisted proofs and the issue of surveyability, and also with Kenny Easwaran’s notion of the ‘transferability’ of mathematical profs, but for reasons of space I will leave them aside.)

Let me review some of the details of this dialogical conception of proofs. On this conception, a proof is understood as a semi-adversarial dialogue between two fictitious characters, proponent and opponent. The dialogue starts when both participants agree to grant certain statements, the premises; proponent then puts forward further statements, which she claims follow necessarily from what opponent has granted so far. Opponent’s job is to make sure that each inferential step indeed follows of necessity, and if it does not, to offer a counterexample to that particular step. The basic idea is that the concept of necessary truth-preservation is best understood in terms of the adversarial component of such dialogues: it is strategically in proponent’s interest to put forward only inferential steps that are indefeasible, i.e. which cannot be defeated by a countermove even from an ideal, omniscient opponent. In this way, a valid deductive proof corresponds to a winning strategy for proponent.

Now, when I started working on these ideas, my main focus was on the adversarial component of the game, and on how opponent would be compelled to grant proponent’s statements by the force of necessary truth-preservation. But as I started to present this material to numerous audiences, it became increasingly clear to me that adversariality was not the whole story. For starters, from a purely strategic, adversarial point of view, the best strategy for proponent would be to go directly from premises to the final conclusion of the proof; opponent would not be able to offer a counterexample and thus would be defeated. In other words, proponent has much to gain from large, obscure (but truth-preserving) inferential leaps. But this is simply not how mathematical proofs work; besides the requirement of necessary truth-preservation, proponent is also expected to put forward individually perspicuous inferential steps. Opponent would not only not be able to offer counterexamples, but he would also become persuaded of the cogency of the proof; the proof would thus have fulfilled an explanatory function. Opponent would thus be able to see not only that the conclusion follows from the premises, but also why the conclusion follows from the premises. To capture this general idea, in addition to the move of offering a counterexample, opponent also has available to him an inquisitive move: ‘why does this follow?’ It is a request for proponent to be more perspicuous in her argumentation.

This is why I now qualify the dialogue between proponent and opponent as semi-adversarial: besides adversariality, there is also a strong component of cooperation between proponent and opponent. They must of course agree on the premises and on the basic rules of the game, but more importantly, proponent’s goal is not only to force opponent to grant the conclusion by whatever means, but also to show to opponent why the conclusion follows from the premises. Thus understood, a proof has a crucial didactic component.

One way to conceptualize this interplay between adversariality and cooperation from a historical point of view is to view the emergence of the deductive method with Aristotle in the two Analytics as a somewhat strange marriage between the adversarial model of dialogical interaction of the Sophists – dialectic – with the didactic, Socratic method of helping interlocutors to find the truth by themselves by means of questions (as illustrated in Platos’s dialogues). This historical hypothesis requires further scrutiny, and is currently one of the topics of investigation of my Roots of Deduction project, in cooperation with the other members of the project.

Going back to Mochizuki, it is now easy to see why he is not being a good player in the game of deduction. He is not fulfilling his task as proponent to make his proof accessible and compelling to the numerous ‘opponents’ of the mathematical community; in other words, he is failing miserably on the cooperative dimension. As a result, no one is able or willing to play the game of deduction against and with him, i.e. to be his opponent. Now, a crucial feature of a mathematical proof is that it takes (at least) two to tango: a proponent must find an opponent willing to survey the purported proof so that it counts as a proof. (Naturally, this is not an infallible process: there are many cases in the history of mathematics of purported ‘proofs’ which had been surveyed and approved by members of the community, but which were later found to contain mistakes.)

Mochizuki’s tango is for now impossible to dance to/with, and as long as no one is willing to be his opponent, his ‘proof’ is properly speaking not a proof. It is to be hoped that this situation will change at some point, given the importance of the ABC conjecture for number theory. However, this will only happen if Mochizuki becomes a more cooperative proponent, or else if enough opponents are found who are willing and able to engage in this dialogue with him.

Sunday, 12 May 2013

Science Versus Nominalism

The Indispensability Argument, developed by W.V. Quine and Hilary Putnam, and famously rebutted by Hartry Field, is fairly simple:
(1) Nominalism states that there aren't strings, formulas, numbers, sets, sequences, functions, groups, etc.
(2) Science (e.g., physics, linguistics) states that there are strings, formulas, numbers, sets, sequences, functions, groups, etc.
Therefore, nominalism is inconsistent with science.
This is not some containable inconsistency, say to do with idealization, or frictionless planes, etc. Nominalism states that there are no physical quantities. Nominalism states that Peano arithmetic doesn't exist. Nominalism states that an SU(3) gauge theory like QCD is false because there are no groups. Etc. But science states that there are quantities; that Peano arithmetic does exist, and is not finitely axiomatizable; that the gluons are associated with an SU(3) gauge symmetry; etc.

The premises (1) and (2) are justified as follows. The first (1) is how nominalists describe their view. And (2) is justified by looking at a physics textbook.

Friday, 10 May 2013

Leibniz Equivalence (slides)

Here are some slides for a talk on "Leibniz Equivalence" which includes some topics I've written some previous M-Phi posts about (Leibniz abstraction; the notion of abstract structure; possible worlds; the abstract/concrete distinction as modal).

The main things here are the accounts of:
(i) abstract structure: given a model $\mathcal{A}$, its abstract structure is a certain kind of second-order propositional function, $\hat{\Phi}_{\mathcal{A}}$;
(ii) possible worlds: entities $w$ such that
$w = \hat{\Phi}_{\mathcal{A}}[\vec{R}]$
where $\vec{R}$ is a sequence of relations.