Five degrees of (non)sense: Investigating the connection between bullshit receptivity and susceptibility to semantic illusions

. Individual differences in people’s tendency to see bullshit statements such as Perceptual reality transcends subtle truth as meaningful and possibly profound have become an active topic of research in judgment and decision making in recent years. However, (psycho)linguistics has so far paid little attention to the topic, despite its obvious appeal for language processing research. I present an experiment that investigated possible shared traits contributing to individual bullshit receptivity and susceptibility to semantic illusions, which occur when compositionally incongruous sentences receive plausible but unlicensed interpretations (e.g., More people have been to Russia than I have ). The results show relatively little indication of an individual-level tendency to both fall for bullshit and for linguistic illusions. Implications for future psycholinguistic research into bullshit processing are discussed.

For the present investigation, I will focus on two additional traits: Interpretive charity, that is, a person's tendency to assume that statements are meaningful and true by default (e.g., Sperber 2010), and illusory pattern perception or apophenia, that is, a person's tendency to see patterns where none exist (DeYoung et al. 2012). An indiscriminate bias to judge sentences as meaningful and profound has been argued to contribute to pseudo-profound bullshit receptivity in particular (Pennycook et al. 2015), though the ability to discriminate between actual profundity and bullshit appears to be a stronger driver of individual differences (Bainbridge et al. 2019).
Apophenia or illusory pattern perception as a driver of bullshit receptivity has been investigated by Walker et al. (2019) and by Bainbridge et al. (2019). Walker et al. (2019) presented participants with random sequences of 10 coin tosses (e.g., HTHHTTTTHH) and asked them to indicate on a 1-7 scale how strongly they felt that the results were random or pre-determined. In another experiment, participants were shown images of objects embedded in visual noise, as well as images containing only visual noise, and were asked whether the image contained an object. Profundity ratings for pseudo-profound bullshit were positively correlated both with the coin-toss determinism measure and with the tendency to perceive objects in pure noise. Bainbridge et al. (2019) used a more indirect measure of apophenia, by asking participants about various paranormal and otherwise unusual beliefs ("It is possible to move material objects with only one's thoughts"). Paranormal beliefs correlate with illusory pattern perception (van Prooijen et al. 2018). Bainbridge et al. (2019) found a positive correlation between apophenia and pseudo-profound bullshit receptivity, in line with Walker et al.'s findings. To my knowledge, despite interesting implications for semantic and pragmatic processing, bullshit has not received much attention in psycholinguistics. 3 The fact that some people find the sentences in (1) meaningful suggests a role for processes that create meaning independently of the compositional makeup of the utterance and the lexical meanings of the words, to the extent that the average reader even has access to the lexical meaning of a word like ionization.
However, there is a related class of phenomena that has been studied in psycholinguistics: semantic illusions. A semantic illusion occurs when a semantically incongruous and arguably meaningless sentence appears well-formed and sensible. The sentences in (2) are examples of such illusions.
(2) a. More people have been to Russia than I have.
b. No head injury is too trivial to be ignored.
Sentence (2-a) is an instance of the so-called comparative illusion (CI). The sentence is compositionally incongruous because an amount of people (more people than X) is compared with an event (I have [been to Russia]), which should not yield a sensible meaning. Yet, the anomaly of CI sentences is often not consciously detected, and the sentences are rated as relatively acceptable (e.g., O'Connor 2015, Wellwood et al. 2018). The illusion is not random, however: CI sentences are rated more highly when they contain repeatable rather than unrepeatable events (?More girls graduated high school than the boy did), suggesting a preference for an illicit event comparison reading (Wellwood et al. 2018, De Dios-Flores 2016, but also when they contain semantically plural object NPs (More cats have mouse toys than the dog does; O'Connor 2015), suggesting the availability of an equally illicit cardinality comparison (more mouse toys). In both cases, the ungrammatical sentence is apparently coerced into an interpretable form. Sentence (2-b) is a so-called depth charge (DC) sentence (e.g., Sanford & Sturt 2002). Like CI sentences, DC sentences are compositionally incongruous: The degree phrase too trivial to be ignored presupposes that more trivial things are less likely to be ignored, which runs contrary to world knowledge (compare too trivial to be treated). Furthermore, (2-b) is usually interpreted to mean Don't ignore head injuries, but the compositional meaning is Ignore all head injuries (compare No landmine is too small to be banned; Wason & Reich 1979). Different theories have attributed the DC illusion to superficial processing (Wason & Reich 1979, Paape et al. 2020, to unconscious repair of an assumed speech error (Zhang et al. 2022), or to the availability of a stored grammatical construction with an idiosyncratic meaning (Fortuin 2014, Cook & Stevenson 2010. As for CI sentences, mechanisms beyond standard compositional semantics must be recruited in order to make sense of the construction.
Both CI sentences and DC sentences show considerable variability in judgments between speakers (Wellwood et al. 2018, Leivada 2020, Paape et al. 2020. Some speakers tend to almost always find illusion sentences acceptable while others categorically reject them. However, to my knowledge, a possible shared susceptibility to different semantic illusions within the same speaker has never been investigated. Nevertheless, it is plausible that such general differences in susceptibility exist. Both for CI and for DC sentences, it has been suggested that there is a threshold of processing complexity that differs between speakers, and that processing is aborted once this threshold is reached and the meaning representation is deemed "good enough" (Paape et al. 2020, Paape 2021, Leivada 2020. Low threshold settings can result in failures to notice linguistic anomalies and cause acceptability illusions for malformed sentences (e.g., Christianson 2016). This perspective is in line with the proposal that readers have individual standards of coherence that affect their depth of processing during reading (van den Broek et al. 2011(van den Broek et al. , 2001, with lower standards of coherence meaning less processing. Which brings us back to bullshit: It would appear that in order to accept a bullshit statement such as Hidden meaning transforms unparalleled abstract beauty as meaningful, one would need to set a relatively low standard of coherence, because the more one thinks about the semantic content of the sentence, the less meaningful it becomes. Anecdotally, the same is often true for CI and DC illusion sentences. At the same time, however, both bullshit sentences and illusion sentences require enrichment of the linguistic input to be perceived as meaningful: 4 Readers presumably want the sentences to make sense, and thus project plausible meanings into them. Superficial processing on the one hand does not necessarily contradict semantic enrichment on the other: Readers may simply shift their focus away from the actual structure of the input and onto other, stimulus-independent sources of meaning, such as their pre-existing knowledge or opinions (Paape 2021). Highly apophenic individuals with a "greater tendency to go beyond the available data" p. 111) and to "creat[e] meaning where no meaning exists" (ibid., p. 117) may be especially prone to this form of enrichment. The prediction is thus that the perceived meaningfulness of bullshit sentences, CI sentences and DC sentences should covary jointly within speakers, and that highly apophenic readers should be especially likely to see meaning in all three sentence types.
But how can individual differences in apophenia be distinguished from differences in interpretive charity -or are they ultimately the same thing? There are reasons to assume that they are not. Illusory pattern perception, which is at the core of aponenia, is triggered by cues that at least resemble a pattern . Thus, the more a given sentence resembles a wellformed utterance, the more apophenic interpretation should occur. Crucially, bullshit sentences and illusion sentences are not nonsense in the sense that they are random jumbles of words. 5 On some level, these sentences look and feel "good enough" to pass inspection. Interpretive charity, on the other hand, could plausibly make anything appear meaningful: a highly charitable reader may force even "word salad" to have meaning (Fowler 1969), just like any artwork can be imbued with meaning by a charitable viewer.
In what follows, I present a web-based experiment that investigated possible correlations between bullshit receptivity, susceptibility to semantic illusions, and apophenia. Interpretive charity was controlled for by also including sensible sentences and nonsense sentences in the experiment. The experiment used German sentences, and was run with a sample of German native speakers.
2. Experimental study. The experiment deviated from previous studies on bullshit by having participants judge the meaningfulness of the sentences instead of their profundity (e.g., Pennycook et al. 2015) or truthfulness (e.g., Evans et al. 2020). This more basic level of judgment was chosen to allow comparison with the illusion sentences, which are not intended to be profound or necessarily true. Three types of bullshit were included: pseudo-profound bullshit, scientific bullshit, and International Art English. Pseudo-profound bullshit receptivity has been found to correlate both with scientific bullshit receptivity (Evans et al. 2020) and with receptivity towards International Art English , suggesting shared underlying traits.
In order to somewhat offset participants' interpretive charity and to naturalize the judgment of unusual sentences, participants were told to imagine that the sentences had been generated by an AI system, and that they were helping to improve the system by distinguishing between "good" and "bad" sentences. The pattern recognition task intended to measure apophenia was also embedded in the AI scenario (see below). (3) a. Bullshit The invisible is beyond new timelessness.
(pseudo-profound) Energy can deteriorate based on closed-circuit alliterations of an afocal system.
(scientific) The banality of some gestures is disconcerting, and in their strangeness, they convey a future created in the past. ( More spectators in the theater are proud Americans than the actor is. c. Depth charge illusion (DC) In the end, you realize that no plan is too unrealistic to be scrapped. d. Sensible Your teacher can open the door, but you must enter by yourself.
(mundane) e. Nonsense One can say that flowers with a lot of old nettles do not limp without great experience value.
Each participant rated 24 bullshit sentences in total (8 of each type), as well as 16 CI sentences and 12 DC sentences. 8 These were randomly intermixed with 24 sensible sentences (12 profound, 12 mundane) and 20 nonsense sentences. The nonsense sentences were generated in a stream-ofconsciousness manner by the author and validated as being nonsense by three German speakers. The 20 stimuli for the pattern recognition task were scatterplots created in R (R Core Team 2022). For each plot, 20 random floating-point numbers between 0 and 30 were generated for both the X and the Y coordinate using R's runif() function. Examples are shown in Figure 1.
Participants with high apophenia were expected to "detect" more patterns in the scatterplots, and possibly show longer response times, because they may spend more time trying to find patterns.
2.3. PROCEDURE. Participants provided informed consent prior to experimentation. The AI scenario was first introduced. Participants were told that they should rate the meaningfulness of each sentence in comparison to a completely meaningless baseline sentence (Bats don't go defiantly into the computer next to love without pumping). This baseline sentence was presented in each trial. The rating scale for the target sentences ranged from 1 ("equally bad") to 7 ("much better"). Participants could freely reread the sentences as many times as they wished, but were instructed not to "overanalyze" the sentences and to rely more on their linguistic intuition. Reading times in 7 Pseudo-profound bullshit sentences, scientific bullshit sentences and International Art English sentences were translated and adapted from Pennycook et al. (2015), Evans et al. (2020), and Turpin et al. (2019). Comparative illusion sentences were translated and adapted from O'Connor (2015). Depth charge sentences were adapted from Paape et al. (2020). Sensible sentences were mostly translated and adapted from Pennycook et al. (2015), but also featured some new additions. 8 A total of 24 DC sentences were created, but only 12 were presented to each participant to limit the duration of the experimental session and to prevent carryover effects.  After all sentences had been rated, the pattern recognition task was administered. Participants were presented with the scatterplots in random order and were told that these represented the "neuronal activations" of the AI model, which are sometimes random and sometimes structured. They were instructed to give a binary judgment of whether they intuitively felt that each pattern was structured as opposed to random. Participants were also told that all patterns may be random, or all patterns may be structured.
2.4. DATA ANALYSIS. The statistical analysis was carried out using the Stan language for Bayesian inference (Stan Development Team 2022). Because the rating data are ordinal, they were analyzed with a cumulative logit model. This kind of model assumes a continuous latent variable underlying the Likert scale ratings, together with a set of thresholds or "cutpoints" that group the latent values into the rating "bins" enforced by the discrete scale (Liddell & Kruschke 2018). Because the individual differences on the latent scale are of interest, and because participants may differ in their use of the discrete scale, recovering the underlying continuous values by modeling participants' individual cutpoints is of major importance (see below for the implementation).
Reading times were analyzed assuming a lognormal likelihood. For both dependent variables, a hierarchical model with correlated "random" (or varying) intercepts and slopes by participants and by items was fitted (e.g., Pinheiro & Bates 2000, Gelman & Hill 2007. This means that an individual adjustment to the population-level estimate was estimated for each subject and for each item in order to account for the non-independence of measurements and to capture inter-individual differences (Barr et al. 2013). In the present study, the correlations between these adjustments, which are also estimated from the data, are of major interest: The question is whether someone who gives higher-than-average ratings to bullshit sentences will, for instance, also give higherthan-average ratings to comparative illusion sentences.
In Stan, it is possible to set up a model that estimates a separate average (intercept) value for each sentence type, along with participant-and item-specific adjustments, and, crucially, correlations between the adjustments across sentence types. A shortened and simplified notation of the model is shown in equation 1. The model code and data are available at https://osf.io/ 54wh7. For each trial n, the rating comes from an ordered logistic distribution with a vector C of six cutpoints. The values γ 1...6 demarcate the population-level boundaries between rating "bins" in logit space, which are adjusted for each participant in order to account for differences in the use of the Likert scale. A population-level intercept α on the latent scale is estimated for each sentence type and adjusted by participant (u i ) and by item (w j ). The adjustments follow a normal distribution with a to-be-estimated standard deviation σ u , σ w . The correlations between the adjustments u 1...m and w can be recovered from the estimates of the variance-covariance matrix of the random effects. Regularizing priors were used for all parameters, including LKJ priors (Lewandowski et al. 2009) with η set to 2 for the correlations.
Participants' reading times as well as their reaction times and judgments for the scatterplots were also analyzed within the same model. 9 This approach allows for the direct estimation of the critical random-effects correlations across measures, without any pre-aggregration of the data, and thus without any loss of information. Furthermore, the addition of item-specific adjustments allows for better estimation of the subject-level individual differences: If a given participant sees a pattern in a scatterplot that has an above-average "pattern-likeness" across all participants, this is much less informative than if the participant sees a pattern in a scatterplot with a below-average "pattern-likeness".
For the reading times, a slope for the number of characters in the sentence was added to the model, along with adjustments by subject. The slope adjustment for each participant was intended as a measure of their tendency towards superficial or "good enough" processing. It has been found that word length effects are reduced during "mindless" reading (Schad et al. 2012, Reichle et al. 2010, Franklin et al. 2011, so it is plausible that the less attention a participant is paying to the sentences, the less the overall sentence length should affect their reading times. 2.5. RESULTS. Table 1 shows mean ratings and standard deviations of the by-subject means by sentence type. Unsurprisingly, sensible sentences were rated the most meaningful in comparison to the baseline sentence while nonsense sentences were rated the least meaningful. The three other sentence types fall in between, with somewhat higher ratings for DC sentences than for bullshit and CI sentences. CI sentences showed the greatest rating variability between subjects, followed by depth charge sentences. Sensible sentences showed the lowest variability. Figure 2 shows the correlation estimates for the participant-level random effects extracted from the Stan model, along with the associated 95% highest density intervals (HDIs) of the posterior distributions (e.g., Kruschke 2014) computed using the bayestestR package (Makowski et al. 2019). Focusing first on the main prediction, namely a positive correlation between apophenia, as measured by increased pattern spotting and longer reaction times for the scatterplots, and the perceived meaningfulness of bullshit sentences and illusion sentences, the data do not show any strong indication of the predicted relationship. Only the perceived meaningfulness of comparative illusion sentences shows some indication of a positive correlation with pattern spotting at the participant level (95% HDI: [−0.03, 0.43]). Furthermore, the HDIs of the pairwise correlations between byparticipant adjustments for bullshit meaningfulness, CI meaningfulness and DC meaningfulness are centered around values close to zero, indicating no evidence for the predicted positive correlation due to individual differences in interpretive charity.
There is a negative correlation at the participant level between the perceived meaningfulness of sensible sentences and that of nonsense sentences (HDI: [−0.67, −0.19]). Perceived meaningfulness of nonsense is negatively correlated with the effect of sentence length on reading times (HDI: [−0.76, −0.36]), while perceived meaningfulness of sensible sentences is positively correlated with the sentence length effect (HDI: [0.27,0.7]). This suggests that more superficial readers, who were less affected by sentence length, perceived nonsense as more meaningful and sensible sentences as less meaningful than average. Superficial or inattentive readers thus seem less able or willing to distinguish between the sentence types, compared to more attentive readers.
Finally, participants who spent more time looking at the scatterplots showed smaller effects of sentence length on reading time (HDI: [−0.52, −0.08]), as well as higher perceived meaningfulnesss of nonsense sentences (HDI: [−0.03, 0.41]). These correlations can be seen as tentative evidence for a connection between apophenia and superficial language processing. However, there is no indication in the data that individuals who spent more time looking for patterns in the scatterplots ended up "finding" more of them.
3. Discussion. The goal of this study was to investigate individual differences in the processing of bullshit statements (Hidden meaning transforms unparalleled abstract beauty) and two types of semantic illusion, namely the comparative illusion (CI; More people have been to Russia than I have) and the depth charge illusion (DC; No head injury is too trivial to be ignored). The underlying intuition was that readers who find meaning in bullshit statements may also find meaning in semantic illusion sentences. Based on the existing bullshit literature, two traits were hypothesized to contribute to both bullshit receptivity and illusion receptivity: The first is a person's general bias towards finding statements meaningful or even profound, irrespective of content, that is, their tendency towards interpretive charity (e.g. Pennycook et al. 2015). The second is a person's tendency to actively create meaning by seeking patterns for which there is no objective evidence in the data, that is, their tendency towards apophenia .
Statistically, shared individual differences in interpretive charity, apophenia, and the perceived meaningfulness of bullshit and semantic illusions were explored by looking at the correlations of subject-level random effects in a hierarchical model. However, there was little evidence in the data that would support the hypothesized connections. There was no indication that people who found bullshit sentences more meaningful also found illusion sentences more meaningful, or that finding CI sentences meaningful correlates with finding DC sentences meaningful. Regarding effects of apophenia, only the perceived meaningfulness of CI sentences but not that of bullshit sentences or DC sentences showed a positive correlation with finding patterns in random scatterplots, thus casting doubt on a shared underlying mechanism of "meaning creation".
The absence of evidence for a connection between apophenia and bullshit endorsement is unexpected given earlier findings by Walker et al. (2019). However, the mismatch may be due to a variety of factors: the smaller participant sample, the use of a mixture of bullshit "genres" in the current study, and/or differences between the patttern-spotting tasks. For instance, unlike the one used by Walker et al. (2019), the pattern spotting task used in the current study did not have a baseline in which real patterns needed to be identified. Furthermore, the current experiment used judgments of meaningfulness whereas that of Walker et al. used judgments of profundity. Finally, participants in the current study were explicitly told to rely on their intuition, and some participants may have followed this instruction more strictly than others. Despite the differences between studies, there was an isolated positive correlation of apophenia with the perceived meaningfulness of CI sentences in the current study, which should be further investigated in future work.
Interestingly, participants' reactions to the two additional sentence types tested, namely sensible sentences (Your teacher can open the door, but you must enter by yourself ) and nonsense sentences (Instead of big fish, you can also rarely flip seven potatoes) did show some clear correlations: People who tended to see more meaning in nonsense tended to see less meaning in sensible sentences, and vice versa. Furthermore, this tendency was connected to participants' depth of processing: The reading times of participants who distinguished less between nonsense and sensible sentences were less affected by sentence length, suggesting that they were paying less attention. In light of this findings, it is not clear why bullshit sentences and illusion sentences should be unaffected by the depth-of-processing effect, especially given that semantic illusions have been hypothesized to involve superficial processing (Wason & Reich 1979, Paape et al. 2020, Paape 2021, Leivada 2020. However, there was also no evidence in the data that finding meaning in bullshit and illusion sentences correlates with deeper processing, which might be the case if the sentences can be coerced or "repaired" into meaningfulness via additional reasoning steps and/or linguistic operations (Dalton 2016, O'Connor 2015, Wellwood et al. 2018, Zhang et al. 2022. It is possible that depth-of-processing effects are simply more subtle for "semi-meaningful" sentences than for clearly sensible or clearly nonsensical sentences. Thus, further research with larger participant samples, and ideally with more measurements per participant, is clearly needed. Overall, despite the lack of conclusive results, the present work has highlighted the untapped potential of psycholinguistic investigations into bullshit processing. Even though it has proven challenging to define bullshit in terms of linguistic features (e.g., Cohen 2002), there are some notable tendencies, such as the heavy use of nouns and of abstract rather than concrete words , Buekens & Boudry 2015, as well as the heavy use of "genre"-specific jargon (Spicer 2020). Each of these features may make unique contributions to bullshit processing, as well as to individual differences in bullshit receptivity. Applying the entirety of the empirical (psycho)linguistic toolbox to bullshit sentences and investigating connections with other (psycho)linguistic phenomena will undoubtedly lead to many valuable insights in this domain.