Nonboolean

. On standard analyses, indicative conditionals (ICs) behave in a Boolean fashion when interacting with and and or . We test this prediction by investigating probability judgments about sentences of the form ⌜ a → b { and / or } c → d ⌝ . Our findings are incompatible with a Boolean picture. This is challenging for standard analyses of ICs, as well as for several nonclassical analyses. Some trivalent theories, conversely, may account for the data.

(1) If Alia flipped the coin, the coin landed heads.
(2) ⟦(1)⟧ w = true iff, for every world w ′ that is compatible with the speaker's evidence in w and such that Alia flipped the coin in w ′ , the coin landed heads in w ′ These analyses, paired with standard analyses of and and or, predict that ICs interact with the latter in a Boolean way. The truth values of conjunctions and disjunctions involving ICs, such as (3), are determined via the truth tables for the connectives '∧' and '∨'.
(3) If Alia flipped the coin, the coin landed heads, { and/or }, if Billy tossed the die, the die landed even.
Truth-conditional analyses of ICs are controversial, and a number of alternative analyses have been proposed. 1 We don't have space to survey these accounts, but it is worth pointing out that, despite their nonclassical bent, most of them are still Boolean.
This paper aims to test experimentally the prediction that ICs are Boolean. To do this, we investigate probability judgments about sentences of the form ⌜a → b { and/or } c → d⌝. As we will show, our findings suggest that, in some cases, the interaction between ICs and connectives is nonboolean. While there is much room for further investigation, such results challenge both standard truth-conditional theories and their non-truthconditional cousins that still uphold that ICs are Boolean. In the final section, we briefly suggest that trivalent theories of ICs are well-placed to predict the data.
We proceed as follows: §2 introduces relevant theoretical background, §3 describes our experiments, and §4 sketches a trivalent analysis of conditionals that is able to capture the data.
2. Background: Connectives and probability. On classical theories, the meaning of the sentential connectives and and or in natural language is captured by the truth tables of the corresponding Boolean connectives '∧' and '∨' in first-order logic (see Table 1). Table 1: Classical truth tables for '∧' and '∨'. Boolean interpretations of and and or entail constraints about probability. Consider and. Since A ∧ B is true when and only when both conjuncts are true, the probability of a conjunction is a lower bound on the probability of each conjunct: the probability of A is at least as high as the probability of A ∧ B, and possibly higher. Moreover, if A does not entail B, the probability of A is strictly greater than the probability of A ∧ B, since there are some A-possibilities that are not B-possibilities, and hence not A ∧ B-possibilities (see Figure 1, left). Analogous facts hold for or, mutatis mutandis. Figure 1: Diagrams illustrating the validity of and-drop and or-drop.
In sum, the following principles hold for all natural language sentences that express propositions, on the assumption that connectives are Boolean: 2 For an illustration, suppose that a fair six-sided die is tossed, and consider (4) and (5).
(4) If the die landed odd, it landed on 1.
(5) If the die landed odd, it landed on 1, and, if it landed even, it landed on 2.
Given and-drop, and given that (very plausibly) the two ICs in (5) do not entail each other, classical analyses predict the probability of (4) should be higher than that of (5).
The intuition driving the current paper is that ICs trigger failures of and-drop and or-drop. Just (4) and (5) provide an illustration. Consider (4) first. Since there are three equiprobable cases in which the die lands on an odd number, the intuitive probability of (4) is 1/3. Via and-drop, we would expect the probability of (5) to be lower. Yet, intuitively, the probability of (5) is also 1/3. To see this, notice that (5) is intuitively equivalent to (6).
Since (6) is true in two of six equiprobable cases, its probability is also 1/3. But then, given the equivalence between (5) and (6), the probability of (5) should also be 1/3.
So there is at least a first pass intuitive case for the invalidity of and-drop. But we don't think that the empirical situation can be settled by these simple judgments. First, and-drop is a basic principle, and it takes very solid data to reject it. Second, the case that we just discussed appears to rely on controversial assumptions about the probabilities of ICs. In particular, we assumed that the probability of (4) equals the relevant conditional probability, in conformity to so-called Stalnaker's Thesis.
Stalnaker's Thesis. For any A and B such that P r(A) > 0: Stalnaker's Thesis is highly intuitive, as is the probability judgment about (4). But Stalnaker's Thesis is also notoriously problematic, giving rise to so-called triviality results in combination with fairly minimal assumptions. 3 We intend to sidestep any assumptions about Stalnaker's Thesis and probabilities of conditionals. We will show that, even without these assumptions, we can find evidence for the invalidity of and-drop and or-drop.

Experiments.
We set out to test the widely-held view that conjunctions and disjunctions of ICs are Boolean. If this view is correct, conjunctions of ICs should be estimated at a lower probability than either of their conjuncts, and disjunctions of ICs should be estimated at a higher probability than either of their conjuncts.
We chose our experimental methodology in response to two major desiderata. (i) We wanted it to be as simple as possible, given the complexity of the sentences to be tested. (ii) We wanted to avoid making assumptions about the probabilities of the relevant uncoordinated conditionals, so that we didn't need to rely on Stalnaker's Thesis. In our task, participants were presented with simple shapes whose color changes to one of two colors in dynamically-unfolding scenes. We then let them generate their own likelihood estimations for the conditionals, based on the frequencies of the observed events. So our experiment makes no normative assumptions about what the 'correct' likelihoods of certain statements are supposed to be.
3.1. Experiment 1. We ran a likelihood estimation task featuring different frequencies of simple event types and asked whether participants' responses to simple and complex ICs about those event types showed Boolean behavior.
3.1.1. Participants. We recruited 200 participants on Amazon's Mechanical Turk (MTurk) platform in accordance with a protocol approved by USC's Institutional Review Board. Participation selection was restricted to individuals with IP addresses located in the United States, whose HIT approval rate was greater than or equal to 99%, and whose number of approved HITs was greater or equal to 1000. We did not require Master status. Our experiment design included an attention check that allowed us to filter participants according to whether they passed or failed this check. Based on this, we excluded 47 participants (23.5%) prior to data analysis, for a sample of 153 participants in Experiment 1.

Materials.
We designed a set of 8 base animations involving two shapes-a square and a circle-turning one of four colors. The square always turned either red or yellow, and the circle always turned either green or blue. The depicted events involved one or two shapes "traveling" in a "car" into a tunnel; once the car entered the tunnel, the shape(s) changed one of the two colors (see the sample in Figure 2, left). These base animations could felicitously be described as, e.g., The square turned red. We also designed a "mystery" animation in which the identity of the shape(s) were hidden (see Figure 2, right). 3.1.3. Design. Our task tested coordinated and uncoordinated conditional sentences in three phases. In a first Experience phase, participants were exposed to a series of 24 events in random order, and asked for two binary judgments about sentences describing what happened in this phase (attention check; both intended to be judge true), (7)-(8). 4 In the second Uncertainty phase, they were exposed to a "mystery" event and asked to judge the likelihood that a certain (uncoordinated) conditional sentence was true. Finally, in the Test phase, they were exposed to further "mystery" events and asked the likelihood of conjoined or disjoined conditionals. Our design's three factors-connective type, compatibility, and frequency-were manipulated across these phases.
(7) If the square enters the tunnel, it always turns red or yellow.
(8) If the circle enters the tunnel, it always turns green or blue.
The first factor, connective type, was tested within subjects and concerned whether the target sentences involved a connective, and if so, which. The no connective condition was tested in the Uncertainty phase: participants were exposed to 4 "mystery" animations, and asked to judge how likely each type of event was (square-red, square-yellow, circlegreen, circle-blue) using the sentences schematized in (9)-(10). The and and or conditions were tested in the Test phase: participants were exposed to a second round of 4 mystery animations, and evaluated the coordinated ICs schematized in (11)-(12).
(9) If the car was carrying the square, the square turned { red, yellow }.
(10) If the car was carrying the circle, the circle turned { green, blue }.
(11) If the car was carrying the square, the square turned red { and / or } if the car was carrying the circle, the circle turned green.
(12) If the car was carrying the square, the square turned yellow { and / or } if the car was carrying the circle, the circle turned blue.
The second factor, frequency (50/50, 75/25), was tested between subjects and concerns the base rates in the Experience phase of the types of events our sentences are about. In the 50/50 condition, the rate of the square turning red or yellow and of the circle turning green or blue were equal. In the 75/25 condition, the square turned red 75% of the time and the circle turned green 75% of the time. In the 50/50 condition, then, there was no distinction in frequency for (11) and (12). In the 75/25 condition, the targets in (11) counted as high frequency and those in (12) counted as low.
The third factor, compatibility (compatible, incompatible), was also tested between subjects, and concerns whether the Experience phase involved animations involving just one, or additionally two entities traveling in the car at a time. (The color changes of the two entities, and their relative frequencies, were independent of whether the entities traveled together or alone.) If the participants' experiences showed that the car could carry two shapes, then the antecedents of the conditionals in our coordinated sentences (11)-(12) were compatible, i.e. they could both be true. If their experience showed that the car always carried only one shape, the antecedents were incompatible.
3.1.4. Procedure. After accepting the HIT on MTurk, participants were instructed to click a link that would take them to the experiment hosted on Google's Firebase platfrom analysis. form. There, participants were welcomed to the study and presented with an instruction screens that explained the task and what they were being asked to do in overview. The Experience phase was described as follows: "Your task is to keep track of the frequency of the color changes for each shape. At the end of the series, you'll be asked to judge simple statements about what you saw." The message to keep track of the color changes and their frequency was reinforced on a second screen introducing this phase. Following the randomly-ordered presentation of the 24 events, participants were presented with the attention check, evaluating (7)-(8) in random order in response to the question, "Does the statement accurately describe what you know about the { square, circle }?" Immediately following this, the Uncertainty phase began, with participants first reminded that they would be seeing "mystery" animations, and that they would be asked to judge the likelihood of a statement about each one, "given what [they] learned in the first part of the study." The statements were as in (9)-(10). There were 4 such trials.
There were no additional instructions leading into the Test phase; participants simply saw 4 more mystery animations and, after each, were asked about the coordinated conditionals schematized in (11)-(12). For all likelihood estimations in the Uncertainty and Test phases, participants were asked, "Given the animation you just saw, how likely is it that the complex statement below is true?", and had to click and drag a slider ranging from 0 ("completely unlikely") to 100 ("completely likely") to record their response.
3.1.5. Results. We report the results of a 3x2x2 ANOVA with a within-subject error term for connective type.
We found that our participants overestimated input frequencies in the 50/50 condition ('balanced' inputs occurred 50% of the time, mean estimate 68%) and in the lower frequency events of the 75/25 condition ('lower' input 25%, estimate 46%; cp. 'higher' input 75%, estimate 75%). Importantly for us, however, the ordering between estimates was accurate, and the 50/50 and 75/25 conditions were significantly different (F = 8.15, p < .005). Probing this result further, we conducted pairwise t-tests with Bonferroni adjustment on judgments between the 25%, 50%, and 75% inputs, and all were significantly different (ps < .001). Crucially, however, participants' likelihood estimates were not impacted by the factors connective type or compatibility (both ps > .53).
The lack of effect of connective type (and compatibility) shows that uncoordinated ICs were assigned, on average, the same probability as conjunctions and disjunctions of ICs. Yet we cannot simply attribute these results to task difficulty or the like, given the evidence that subjects tracked input frequencies broadly accurately. See Figure 3. 3.1.6. Discussion. The results of Experiment 1 militate against the validity of and-drop and or-drop. These principles lead to the prediction that connective type would make a difference in likelihood estimations such that: the likelihoods assigned to uncoordinated ICs should be higher than the likelihoods assigned to conjunctions of ICs, and lower than the likelihoods assigned to disjunctions of ICs. This is not what we found.
If these results accurately reflect speakers' knowledge of ICs, then it turns out that ICs aren't necessarily Boolean. We sketch in §4 how this may be accommodated.
A potential worry with this experiment is that likelihood judgments can be unreli- able in the best of cases, let alone in the case of complex coordinated ICs. For example, some well-known results in the psychology of reasoning show that subjects can fall into 'cognitive illusions', which cause them to evaluate some conjunctions as more likely than conjuncts. 5 One might worry, then, that the patterns we observed are due merely to distortions in judgments about likelihood, as opposed to issues in the semantics of ICs.

Experiment 2.
This experiment tests whether our likelihood estimation task would generate Boolean behavior under different linguistic circumstances. We presented participants with the same task as in Experiment 1, with targeted modifications to support the evaluation of minimally-different, but non-conditional sentences.
3.2.1. Participants. We recruited 100 participants on MTurk with the same filter parameters as for Experiment 1. Our task incorporated the same binary attention check as did Experiment 1, and based on the results of that check we excluded 17 people (17%), with the result that we report the data from 83 participants.

Materials.
We used the same set of 8 base animations as Experiment 1, but our "mystery" animations were different. Instead of obscuring the identity of the shape(s) in the car, those shapes were visible, and only their color change in the tunnel was obscured.
3.2.3. Design. We modified the design of Experiment 1 to accommodate testing nonconditional coordinated and disjoined sentences. Here, the Experience phase manipulated frequency as in Experiment 1, and at the end of that phase, the attention check sentences (our basis for any exclusions from the participant pool) were as in (13)-(14). In Experiment 2, though, we only used the distribution of animations corresponding to the compatible level of compatibility (i.e., a mixture of single-and double-entity animations). Without the use of conditional sentences in this experiment, there was no manipulation that tested the joint (un)satisfaction of their antecedents.
(13) The circle always turns blue or green.
(14) The square always turns red or yellow.
In the Uncertainty phase, participants were presented with the modified mystery animations in which the identity of a single shape in the car was shown, but its color change was hidden. At the end of each mystery animation, they were asked to evaluate one of the 4 sentences schematized in (15)-(16). This corresponded to a test of the level 'no connective' level of connective type in this experiment. The label of the shape always matched the identity of the shape in the animation.
Finally, in the Test phase, participants were presented with the modified mystery animations that showed the two shapes present in the car, but their color changes were hidden. Following each of these mystery animations, they were asked about one of the 4 sentences schematized in (17), in random order.
(17) The square turned { red/yellow } and the circle turned { green/blue }.
3.2.5. Results. We report the results of a 3x2 ANOVA with a within-subject error term for connective type, as in Experiment 1. We did not observe a simple main effect of frequency in this experiment (p > .95), as the overall averages for the 50/50 and 75/25 conditions were quite close (50/50 64.7%, 75/25 64.9%; see Figure 4). Probing this further, we conducted pairwise t-tests with Bonferroni correction to each level of input frequency, and found that each was significantly different from the others (all ps < .001), and, while there were distortions in the estimates, they were nonetheless appropriately ordered (input 25%, estimate 48.9%; input 50%, estimate 64.7%; input 75%, estimate 81.8%), as we previously observed for ICs.
Here, however, we found a main effect of connective type (F = 11.7, p < .001). Pair-wise t-tests with Bonferroni correction between the levels of connective type revealed this result to reflect estimates for and differing significantly from or and no connective (and 58.4%, or 68.6%, none 66.1%; both ps < .007), while or and no connective didn't differ (p = .34). This is suggestive of at least partially Boolean behavior. We also found an interaction effect between connective type and frequency, F = 3.3, p = .04. Unpacking this interaction, we found that the Boolean pattern was stable in the 50/50 condition and minimized in the 75/25 condition. Pairwise t-tests with Bonferroni correction inside the subsets of the data corresponding to the 50/50 and 75/25 conditions revealed the following results. In the 50/50 condition, and was significantly different than or and no connective (both ps < .004), but or didn't differ from no connective (p = .15). In the 75/25 condition, the connectives didn't differ significantly from one another (all ps > .26). This shows expected Boolean behavior at least in the 50/50 condition. 3.2.6. Discussion. In Experiment 2, we observed the expected Boolean behavior for non-ICs in the 50/50 condition. This alleviates, to some extent, concerns that our paradigm wouldn't be sensitive enough to detect such behavior. Of course, we recognize that these results don't fully support the idea that, when conditionals are not involved, subjects give Boolean judgments about our scenarios. More probing is needed, as we emphasize below.
4. General discussion. Our experiment shows at least some evidence that and-drop and or-drop fail. We have pointed out that this finding is challenging for most theories of ICs. But what theories can potentially accommodate it?
As it turns out, some versions of so-called trivalent semantics of ICs predict failures of and-drop and or-drop. The central idea of trivalent theories is simple, and goes back to De Finetti (1936): ICs have a truth value just in case their antecedent is true, and are undefined otherwise. This idea has been developed in a number of ways. 6 Here we present a version of the trivalent theory that is inspired by Bradley 2002. The key idea behind the semantics is that every clause has definedness conditions and truth conditions. We use 'Def(A)' to denote the former and 'True(A)' to denote the latter. The semantic clauses for connectives are the following: true at w iff: if w ∈ Def(A) and w ∈ True(A) or w ∈ Def(B) and w ∈ True(B) Let us emphasize one point. The definedness condition for conjunction, which is borrowed from Bradley 2002, is unusually weak: for a conjunction to be defined, all that is needed is that at least one of the conjuncts is defined. This is nonstandard, even for trivalent frameworks (see e.g. Lassiter 2020 for a different definedness condition for). But it is crucial for predicting the failure of and-drop. Since our language involves truth-value gaps, we have to adopt a non-bivalent notion of probability P Triv . To define the latter, we follow Cantwell 2006 (see also Lassiter 2020). The basic idea is that the trivalent probability P Triv of a sentence A can be defined from standard probabilities, in the following way: P Triv (A) equals the ratio of the probability of the truth of A, divided by the probability that A is defined. 7 P Triv (A) = P r(True(A)) P r(Def(A)) , if P r(Def(A))> 0 We can easily show how, given this semantics and this way of defining P Triv , we can invalidate and-Drop. Consider the epistemic state of a subject who has just observed a 'mystery' animation in our Experiment 1.
Suppose that the epistemic state of the subject includes four worlds. In w 1 and w 2 the car is carrying the circle, which turns green in w 1 and blue in w 2 . In w 3 and w 4 the car is carrying the square, which turns yellow in w 3 and red in w 4 . Suppose moreover that these worlds are assigned equal probability by the subject, i.e. that we have P r(w 1 ) = P r(w 2 ) = P r(w 3 ) = P r(w 4 ) = 1/4. Now consider the two ICs: (18) If the car was carrying the circle, the circle turned blue.
(19) If the car was carrying the square, the square turned red.
(20) is defined at all worlds (since the left conjunct is defined at w 1 and w 2 , the right conjunct is defined at w 3 and w 4 , and conjunctions are defined at a world w iff at least one of the conjuncts is defined at w), and true at w 2 and w 4 . So we have: P Triv (20) = P r(True (20)) P r(Def(20)) = P r({w 2 , w 4 }) P r({w 1 , w 2 , w 3 , w 4 }) = 1/2 1 = 1/2 So we have that P Triv (18) = P Triv (19) = P Triv (20) = 1/2, in violation of and-Drop. The same model can work as a counterexample for or-Drop. Let us emphasize the intuitive reason why and-Drop fails. We are using a notion of probability, P Triv , which is defined relative to a domain of worlds. For some sentences A and B, it can be that the domain over which A∧B is defined is larger than the domains over which A and B are defined. In that case, it might be that the probability of a conjunction is greater than the probability of a conjunct.
Let us end with a note on future directions of research. It is crucial for our thesis that subjects' likelihood judgments are Boolean when conditionals are not involved. Experiment 2 was designed to probe this. While it provides some evidence in favor of the Boolean hypothesis, the overall results are mixed. In ongoing work, we are developing new versions of our experiments that aim at establishing the point more clearly.