When Transformer models are more compositional than humans: The case of the depth charge illusion

. State-of-the-art Transformer-based language models like GPT-3 are very good at generating syntactically well-formed and semantically plausible text. However, it is unclear to what extent these models encode the compositional rules of human language and to what extent their impressive performance is due to the use of relatively shallow heuristics, which have also been argued to be a factor in human language processing. One example is the so-called depth charge illusion, which occurs when a semantically complex, incongruous sentence like No head injury is too trivial to be ignored is assigned a plausible but not compositionally licensed meaning ( Don’t ignore head injuries, even if they appear to be trivial ). I present an experiment that investigated how depth charge sentences are processed by Transformer models, which are free of many human performance bottlenecks. The results are mixed: Transformers do show evidence of non-compositionality in depth charge contexts, but also appear to be more compositional than humans in some respects.

1. Introduction.Transformer-based language models like GPT-3 (Brown et al. 2020) show impressive capabilities in terms of being able to produce naturalistic, usually grammatically correct, and often coherent text (e.g., Dale 2021).Rather than using recurrent and convolutional neural networks like previous state-of-the-art language models, the Transformer architecture is based on a so-called self-attention mechanism: during training on large amounts of human-written text, the model learns which parts of the preceding context are important for predicting the next word (Vaswani et al. 2017).Using only this information, GPT-3 can "write" passable philosophical essays (Elkins and Chun 2020), and even scientific papers about itself (Generative Pretrained Transformer et al. 2022).The outputs are often so convincing that humans cannot distinguish between machine-generated and human-written text (Clark et al. 2021, Uchendu et al. 2021).
At the same time, however, it is relatively easy to unmask Transformer models if one knows what to look for.For instance, due to their architecture and training regime, Transformers often fail at simple arithmetic (Floridi andChiriatti 2020, Patel, Bhattamishra andGoyal 2021), arrive at bizarre deductions in scenarios that require real-world knowledge, and sometimes output obvious non sequiturs with sudden and extreme topic shifts that would be absurd coming from a human writer or speaker (Marcus and Davis 2020).Furthermore, given that any "knowledge" about the world that may be encoded in the model is not grounded in experience or reasoning but is filtered through language and its statistical properties (e.g., Alberts 2022), such as the frequent co-occurrence of certain terms, Transformers often resort to heuristics: they produce associatively plausible rather than factually correct answers to information questions (Sobieszek and Price 2022), and to some extent rely on simple lexical overlap between a premise and a hypothesis to predict entailment or non-entailment (McCoy, Pavlick and Linzen 2019).
That the "knowledge" encoded by Transformers is often heuristic in nature is also highlighted by so-called mispriming effects: For instance, BERT (Devlin et al. 2018), a close cousin of GPT-3, when asked to complete the input Talk?Birds can . . ., will produce talk as the most likely continuation, but will produce fly as the most likely continuation when the prompt is Birds cannot . . .(Kassner and Schütze 2019).Similarly, GPT-3 will assume that a mixture of cranberry juice and grape juice is poisonous if the linguistic context suggests that dangerous substances are being mixed (Marcus and Davis 2020).Even though several of these limitations can be overcome by targeted training and/or the addition of symbolic knowledge (see Helwe et al. 2021 for a review), it is nevertheless striking that untargeted training on very large language corpora often results in reliance on relatively shallow processing strategies.
The implicit or explicit gold standard against which language models are usually compared and evaluated in terms of their "shallowness" is human performance.However, there are many tasks related to language and reasoning on which humans do not perform well, which casts doubt on this rationale (Linzen and Baroni 2021).Many failures of human reasoning can also be seen as being the result of mispriming effects: For instance, the Cognitive Reflection Test (Frederick 2005) and its extension (Thomson and Oppenheimer 2016) contain questions such as How many cubic feet of dirt are there in a hole that is 3' deep x 3' wide x 3' long?, to which 84% of participants answer "27" despite the correct answer being "none".Relatedly, when asked How many animals of each kind did Moses take on the ark?, 81% of participants answer "two" even after having been instructed to look out for possible errors in the question, and even though they know that the biblical story is about Noah (Erickson and Mattson 1981).
Human blindness to incongruous information in otherwise highly congruent contexts and the tendency to fall for verbal misdirection generalize to examples from different thematic domains (e.g., Barton andSanford 1993, Cook et al. 2018), and are not reducible to the default assumption that interlocutors always produce sensible statements and requests (Reder and Kusbit 1991).In light of results such as these, it has been proposed that human language processing is partly heuristic and often just "good enough" (e.g., Ferreira andPatson 2007, Christianson 2016): Instead of constructing detailed syntactic and semantic structures based on compositional rules, people may sometimes use high-level language statistics and world knowledge to derive a "quick and dirty" approximation of meaning.As a case in point, the meaning of implausible passive sentences such as The dog was bitten by the man is often converted into that of an active sentence with reversed roles (The dog bit the man), presumably because this meaning is a priori more plausible, and because the agent of an event is usually mentioned first in English (Ferreira 2003, Christianson, Luke andFerreira 2010).
1.1.THE DEPTH CHARGE ILLUSION.The difference between heuristic language processing and "reasoning" in Transformers and in humans is that the human version can usually be neutralized by explicitly pointing out the problematic element(s) in a given sentence (e.g., Erickson andMattson 1981, Barton andSanford 1993), or by explaining the invalidity of a given inference and explaining the correct solution (e.g., van Benthem 2008, Claidière, Trouche and Mercier 2017, Calvillo, Bratton, Velazquez, Smelter and Crum 2022).However, the incorrect "solutions" to some reasoning problems, like the Monty Hall problem (e.g., Vos Savant 1997, Rosenthal 2008) and the Wason selection task (Wason 1968), famously tend to resist being explained away in this manner.
In the linguistic domain, an example that also has the property of persistence in the face of corrective explanation is the so-called depth charge illusion, which was first discussed by Wason and Reich (1979).The depth charge illusion occurs in (1), which is often interpreted to mean Don't ignore head injuries, even if they appear to be trivial.
(1) No head injury is too trivial to be ignored.
Anecdotally, a small subset of people immediately recognizes that this sentence is incorrect, a second subset is open to considering the possibility that it could be incorrect but cannot see why, and a third subset will stubbornly insist that it is correct, even after being confronted with the following argumentation: 1.The phrase too trivial to be ignored is semantically incongruous, because it presupposes that something can be so trivial that it should not be ignored (compare X is too young to die, which translates to X is so young that they should not die).2. The incongruity is not removed by the initial negation: Asserting that no head injury has the incongruous property of being too trivial to be ignored does not make the property itself any less incongruous.3. The initial negation does cause the overall statement to be affirmative, contrary to the plausible misinterpretation (Don't ignore head injuries): In abstract terms, if no X is too Y to be Z'ed, this means that no X crosses the threshold beyond which it should not be Z'ed, meaning that all X should be Z'ed (that is, ignored).4. Compositionally, the sentence thus means Ignore all head injuries, even if they appear to be trivial.
The sentence can be made compositionally sensible by changing it to No head injury is too trivial to be noticed/treated or to No head injury is trivial enough to be ignored, but speakers will often find these variants difficult to process or even reject them as being malformed.
Despite broad agreement in the literature that the "don't ignore" interpretation of (1) is not compositional, not all scholars agree that the depth charge illusion is due to a processing error, partly because it is so persistent.Explanations generally fall into three categories: The classic shallow processing account (Wason andReich 1979, Paape, Vasishth andvon der Malsburg 2020), an account based on the alleged idiomaticity of the construction No X is too Y to Z (Cook andStevenson 2010, Fortuin 2014), and an account based on unconscious correction of an assumed speech error (Zhang, Ryskin and Gibson 2022).
The shallow processing account claims that due to the syntactic and semantic complexity of (1), working memory becomes overloaded at some point and compositional processing breaks down or is suspended, presumably when the implicit negation contained in the word too is combined with the initial negation (Paape et al. 2020).Readers then use their world knowledge in combination with superficial language heuristics (duplex negatio affirmat; No head injury is too trivial . . .→ All head injuries are too dangerous . . . ) to derive a plausible meaning.By contrast, the idiomatic or construction-based account assumes no breakdown.Instead, its proponents claim that the No X is too Y to Z construction is a stored grammatical unit that can, by virtue of its idiomaticity, violate the compositionality principle and be "legally" interpreted to mean No X should be Z'ed.Finally, the error-correction account claims that readers combine prior expec-tations about plausible utterances with expectations about plausible speech errors to reconstruct the presumably intended sentence (No head injury is so trivial as to be ignored) and derive its meaning.The proposal that the human sentence processor uses prior expectations about sentence meanings and speech errors to "repair" potentially degraded input has also been applied to other cases of linguistic illusions (e.g., Gibson, Bergen andPiantadosi 2013, Frazier andClifton 2015).
All proposed accounts have empirical weaknesses: Shallow processing cannot explain why readers usually don't consciously notice the complexity overload that causes compositional processing to break down, or why the illusion often cannot be explained away: If complexity is the problem, taking one's time and working out the compositional meaning step by step should always lead to success.Conversely, the construction-based account cannot explain why the illusion can sometimes be explained away, why some people appear to be immune to it, and why it generalizes to distinct but compositionally similar constructions (e.g., too . . .as that in German; Paape et al. 2020).Finally, the Bayesian error-correction account cannot explain why readers usually cannot consciously access and report the assumed error correction ("I believe the speaker/writer made a mistake here") even after multiple passes over the sentence,1 and why putting the incongruity in focus by changing the word order weakens the illusion (Too trivial to be ignored is surely no head injury in German; Paape 2021).
Investigating how Transformers handle depth charge sentences may provide a way out of the empirical conundrum.Unlike humans, Transformers don't have limited working memory capacity, so they don't experience complexity overload.Humans suffer from the "now or never" bottleneck (Christiansen and Chater 2016), that is, they must quickly and incrementally integrate incoming information before it is forgotten.Transformers, by contrast, don't process sentences incrementally but holistically, that is, they always have full access to all words in the sentence unless this access is deliberately limited (Kahardipraja, Madureira and Schlangen 2021).On the other hand, Transformers are prone to learning heuristics rather than compositional rules (e.g., McCoy et al. 2019), and are known struggle with negation (Kassner and Schütze 2019, Hossain, Kovatchev, Dutta, Kao, Wei and Blanco 2020, Hosseini, Reddy, Bahdanau, Hjelm, Sordoni and Courville 2021), so that they might to some extent mimic human "good enough" processing of depth charge sentences.
By contrast, under the construction-based account, the Transformer would need to learn from the training data that the No X is too Y to Z construction cannot only be used compositionally (No head injury is too trivial to be treated) but also non-compositionally, that is, idiomatically.However, the non-compositional variant is relatively rare: Cook and Stevenson (2010) report 170 instances of the construction in a written corpus of 1.1 billion words, of which 80% were compositional (e.g., no risk [is] too small to eliminate).Fortuin (2014) reports only 13 instances of the "negative" (idiomatic) construction in a corpus of similar size.Transformers tend to overgeneralize when the number of exceptions to a compositional rule is small, showing a) that compositionality is learned to some degree and b) that a certain amount of counterevidence is needed to "memorize" exceptions (Hupkes et al. 2020). 2  Regarding the Bayesian error-correction account of Zhang et al. ( 2022), it is not entirely clear how it could be applied to Transformers: Transformers don't have "common sense" learned from the real world against which they can evaluate sentence meanings, nor do they have a notion of speech errors, much less a subconscious error correction mechanism.To a standard Transformer, there is no "noise" -every sentence seen during training is data, and the model can only adapt by allowing for more variability in its parameters (Michel and Neubig 2018, Passban, Saladi and Liu  2020).There may be some notion of a "plausible utterance" encoded in the model, in the sense that words with certain meanings tend to co-occur, possibly even in syntactically similar environments, but it is unlikely that the model also encodes the assumption that unintended errors can lead to malformed sentences, and is able to reconstruct the original meaning.
In what follows, I present an experiment in which different Transformer models were tested on the No X is too Y to Z construction and a variety of control constructions.The experiment is exploratory in nature: The aim was not to conclusively answer the question of how the depth charge illusion arises in humans, but to see whether a system trained on many terabytes of text, but without incremental processing, memory bottlenecks, or error correction mechanisms would show the illusion or not.
2. Experimental study.The purpose of the study was to assess the probability different Transformer models assign to the word ignored as the next word after seeing the preamble No head injury is too trivial to be . . . .The log probability of ignored is treated as the dependent variable, and higher probabilities are taken to indicate a stronger depth charge illusion.This is a simplification, as the models may also assign high probabilities to continuations such as overlooked or forgotten about, which are semantically similar to ignored.To solve this problem, one can ask human coders to classify the continuations into "ignore-like" and "treat-like" categories, and then sum the relevant probabilities.However, there is some uncertainty as to which coding scheme should be used, as the relevant dimension of semantic similarity is not easy to capture (Paape et al. 2020, O'Connor 2015).I thus restrict my analysis to the single token ignored here.In sections 2.4 and 2.5, the overall distribution of continuations is discussed in more detail.
2.1.MATERIALS.The experiment had 9 conditions overall, as shown in (2).Here, compositional is used as shorthand for "has a sensible meaning under a compositional analysis", whereas not compositional is used as shorthand for "does not have a sensible meaning under a compositional analysis".Conditions (2-a), (2-b) and (2-c) are the depth charge conditions, while the rest are control conditions designed to test whether the models have encoded knowledge about negation, scales, and the degree particles too and so.The control conditions are syntactically and/or semantically less complex than the depth charge conditions, and a Transformer that has encoded the relevant knowledge should consistently assign higher probability to ignored in the compositional conditions compared to the non-compositional conditions. is frequent enough, a Transformer would presumably treat it as evidence of a grammaticalized rule exception.In addition, there are several academic papers on the depth charge illusion (see Paape 2021 for a review), in addition to discussions on several public web forums (e.g., https://english.stackexchange.com/questions/91612/no-head-injury-is-too-trivial-to-ignore), which may become part of the training data of Transformer models and provide "evidence" of the construction being used.(2) a.No head injury is trivial enough to be → ignored (compositional) b.No head injury is too trivial to be → ignored (not compositional) c.Some head injuries are too trivial to be → ignored (not compositional) d.No head injury is so trivial as to be → ignored (compositional) e.No head injury is so trivial as to not be → ignored (not compositional) f. Head injuries that are too trivial will be → ignored (compositional) g.Head injuries that are not too trivial will be → ignored (not compositional) h. Head injuries that are trivial are more likely to be → ignored (compositional) i. Head injuries that are trivial are less likely to be → ignored (not compositional) Replacing too in the depth charge sentence (2-b) with enough in (2-a) yields a compositionally sensible meaning when the sentence is completed with the verb ignored.Humans do indeed produce compositional, "ignore-like" completions for this construction in the majority of trials (O'Connor 2015).The third depth charge condition (2-c) with some instead of no also leads to more compositional completions in humans, but in this case the completions are "treat-like" rather than "ignorelike" (Paape et al. 2020).Humans also assign lower sensibleness ratings to some-sentences ending with ignored, suggesting that the initial negation is crucially involved in "masking" the incongruity of the degree phrase in (2-b) and creating the depth charge illusion (Paape et al. 2020, Paape 2021).
To see if differences between conditions generalize across different sentence contexts, the 32 German depth charge items used by Paape et al. (2020) were translated into English and adapted to fit the design shown in (2).Only the versions with negative adjectives were used (e.g., No plan is too unrealistic to be → scrapped, No physical theory is too implausible to be → dismissed).Across all sentences, the dependent variable was the log probability of the verb used in the Paape et al. rating experiments.
2.2.TESTED MODELS.Four models were tested.The first two were models of different sizes from the GPT-3 family: Ada, the least powerful version of GPT-3, which "can perform tasks like parsing text, address correction and certain kinds of classification tasks that don't require too much nuance" and Davinci, the most powerful version, which "shines [...] in understanding the intent of text" and "is quite good at solving many kinds of logic problems" (https://beta.openai.com/docs/models/gpt-3).The third model under consideration was Jurassic-1-Jumbo, released by AI21 Labs, which is similar in size to Davinci but outperforms it in terms of predictive accuracy on many corpora (Lieber et al. 2021).The fourth model was RoBERTa, a retrained version of BERT with improved performance (Liu et al. 2019; https://huggingface.co/roberta-large).RoBERTa's training regime is somewhat different from that of GPT-3 and Jurassic-1: Like BERT, RoBERTa is bidirectional, that is, it not only considers the context to the left of a word but also the context to the right.RoBERTa has 355 million parameters, and is thus similar in size to GPT-3 Ada3 .GPT-3 Davinci and Jurassic-1-Jumbo are much larger, with about 175 billion parameters each.
The GPT-3 models were queried via the OpenAI API, Jurassic-1-Jumbo was queried via the AI21 Labs API, and RoBERTa was queried via the HuggingFace API (Wolf et al. 2020).For multi-token completions (e.g., . . . to be ruled out) the log probabilities of the generated tokens were summed.The pretrained models were used as-is; no fine-tuning of any kind was carried out.

RESULTS AND BAYES FACTOR ANALYSIS.
Figure 1 shows the results by model and condition.In order to gauge the amount of statistical evidence in the data for differences between conditions and between models, the returned log probabilities were analyzed using a linear mixedeffects model (LMM) in Stan (Stan Development Team 2022) via the brms package (Bürkner 2017) in R (R Core Team 2022).The LMM assumed a Gaussian likelihood, and contained a fixed effect of model, which was treatment-coded with GPT-3 Ada as the baseline, as well as random intercepts by sentence and random slopes for model by sentence.For the conditions, sum contrasts were defined in the following way: • Normal(0,1) priors were used for all contrasts, and Bayes factors were computed using the bayestestR package (Makowski, Ben-Shachar and Lüdecke 2019).The model code and the data are available at https://osf.io/yw8vk.Bayes factors are interpreted according to the scheme given by Andraszewicz et al. (2015).The results of the analysis are as follows (BF is BF 10 ):4 • For GPT-3 Ada as the baseline, strong evidence for positive differences for all contrasts (all BFs > 15), except for the too / enough contrast (BF = 1.6).• too / enough contrast: Moderate evidence for larger difference in Jurassic-1-Jumbo than in GPT-3 Ada (BF = 5.3).• some / no contrast: No evidence that other models perform differently from GPT-3 Ada (all BFs < 1.1).• as to not / as to contrast: Moderate evidence for larger difference in RoBERTa than in  Ada (BF = 6.7).• are not too / are too contrast: Strong evidence for larger difference in GPT-3 Davinci than in GPT-3 Ada (BF = 27), moderate evidence for larger difference in Jurassic-1-Jumbo than in GPT-3 Ada (BF = 3.2).• less likely / more likely contrast: No evidence that other models perform differently from GPT-3 Ada (all BFs < 1.7).

EXAMPLE COMPLETIONS BY MODEL.
Example sentence completions produced by the different models are shown below, along with indicators showing whether the completion is compositionally sensible or not.The prompt is underlined in each example.
(3) GPT-3 Ada a.No head injury is too trivial to be counted as a crime.
(compositional) b.Some head injuries are too trivial to be taken lightly.
(not compositional) c.Head injuries that are trivial are more likely to be fatal.d.Head injuries that are trivial are less likely to be fatal.
(4) GPT-3 Davinci a.No head injury is too trivial to be ignored.Any recent head injury, no matter how minor, should be included in the patient's history.b.Some head injuries are too trivial to be treated, Dr. Benson acknowledged.
(5) Jurassic-1-Jumbo a.No head injury is too trivial to be noticed by a parent.b.No head injury is too trivial to be ignored.All head injuries need to be taken seriously.
(6) RoBERTa a. Head injuries that are too trivial will be punished.?? b.Some head injuries are too trivial to be ignored.well, which happens very rarely in humans (Paape et al. 2020).Completion (6-a) by RoBERTa is completely unexpected, unless it is taken to mean that whoever caused the head injury is going to be punished, in which case it would be a non-compositional completion.
2.5.FORWARD AND BACKWARD MASK-FILLING WITH ROBERTA.As mentioned above, RoBERTa is bidirectional, and can thus produce completions based on rightward as well as on leftward context, that is, RoBERTa is able to "retrodict" words based on future input.This is achieved by putting a [MASK] token in place of the to-be-inserted word.The masking feature can be used to take a closer look at compositionality in RoBERTa: Given an input string like No head injury is too [MASK] to be ignored, does RoBERTa produce an adjective that results in a compositionally well-formed degree phrase?It is also worthwhile to look at the distribution of completions: Even if the most likely completion is compositional, there may be an alternative, non-compositional completion with a similarly high probability, or the other way around.RoBERTa's top 4 verb and adjective completions with their associated probabilities for a set of example sentences are shown in (7) below.
3. Discussion.The aim of this paper was to investigate whether Transformer-based language models show the depth charge illusion, in which a compositionally incongruous sentence (No head injury is too trivial to be ignored) is given an unlicensed but plausible interpretation (Don't ignore head injuries, even if they appear to be trivial).Transformers are an interesting test case, given the range of proposed explanations for the illusion in human readers: Theoretical proposals range from processing breakdown and recovery through world knowledge and superficial language heuristics (Wason andReich 1979, Paape et al. 2020) The experimental results yielded some evidence that the depth charge illusion is present in Transformers of different types and sizes: Across 32 test sentences, all considered models (GPT-3 Ada, GPT-3 Davinci, Jurassic-1-Jumbo, and RoBERTa) assigned higher probabilities to completions like ignored when the sentence began with a negation (No head injury . . . ) compared to when it did not (Some head injuries . . .), even though the completion results in an internally incongruous degree phrase (too trivial to be ignored) in both cases.Furthermore, apart from Jurassic-1-Jumbo, none of the models appeared to distinguish between too and enough in negated contexts, even though the two degree particles have opposite meanings.At the same time, however, the Transformer models showed evidence of compositional processing in control contexts such as Head injuries that are trivial are less likely to be . . .[*ignored], suggesting that the required syntactic and semantic rules have been encoded.
Taken at face value, these results suggest largely parallel effects between human readers and Transformers with regard to the depth charge illusion, despite the presumably very different underlying processing mechanisms.However, a closer look at the Transformers' sentence completions revealed that they produce a variety of un-humanlike continuations for the control conditions, suggesting that their grammatical "knowledge" may not be as deep as the high-level results suggest (Bender et al. 2021).On the other hand, the mask-filling patterns of RoBERTa suggested that RoBERTa is often more compositional than human readers: For a variety of test sentences, including the most famous example No head injury is too trivial to be . . ., RoBERTa showed a preference for compositional completions like addressed, unlike human participants (Paape et al. 2020, O'Connor 2015).At the same time, non-compositional completions did also appear in the list of most likely tokens, and even dominated for some sentences, especially in "retrodictive" contexts, that is, when RoBERTa had to fill in the adjective based on the verb (e.g., No potential habitat is too [small] to be ruled out).
What are the implications of these findings for the empirical deadlock between the competing psycholinguistic accounts of the depth charge illusion?Proponents of the construction-based view could argue that the Transformer models have picked up on the No X is too Y to Z construction to some extent, but haven't fully mastered it yet, presumably because they haven't encoun-tered enough instances of it in the input data, and because the construction is arguably ambiguous between a compositional and a non-compositional version (Cook andStevenson 2010, Fortuin 2014).A preference for the compositional reading wouldn't be surprising under this view, given that Transformers are known to struggle with infrequent and/or idiomatic constructions (Hupkes et al. 2020, Dankers et al. 2022).This suggests some promising avenues for future research: Providing disambiguating context should allow the Transformer to identify the intended reading, as it arguably does for humans (Fortuin 2014), and additional training on the construction should increase accuracy, in the sense that contextual cues should be more reliably identified.
Meanwhile, proponents of the superficial processing account could argue that even though Transformers don't experience processing breakdown, they could nevertheless be using heuristics to process depth charge sentences.This is a plausible assumption, given that Transformers are known to learn heuristics in other settings, including negation processing (McCoy, Pavlick and Linzen 2019, Helwe, Clavel and Suchanek 2021).That the models haven't achieved human-like syntactic and semantic competence in terms of processing scales (more trivial → higher probability of ignoring), degrees (too trivial) and negation is clear from the many examples in which they produced compositionally incongruous completions in the control conditions.
At the same time, however, the models do not appear to use the simplest possible heuristic for dealing with depth charge sentences: to ignore the beginning of the sentence and only locally evaluate the degree phrase too trivial to be ignored, which is always incongruous, irrespective of whether the sentence begins with no or with some.This strategy is unlikely to be used by human readers, who are limited by their incremental left-to-right processing and the "now or never" bottleneck (Christiansen and Chater 2016), which may lead them to incorrectly combine no and too before they even reach the verb (Paape et al. 2020).Transformers, on the other hand, do not have this limitation, and yet they have apparently learned to pay attention to the initial no in depth charge contexts.
Where does this leave the superficial processing account?The following scenario is possible: Even larger Transformer models with even more (or better) training may eventually acquire human-like syntactic and semantic competence, but may not exhibit the depth charge illusion, because their output is not limited by performance factors. 5Resistance to the illusion may also gradually increase with scale and training, as the amount of abstract compositional knowledge and, presumably, transfer ability in the system increases.Larger models typically do perform better on language tasks, though the current data do not show evidence of scale effects: The strength of the depth charge illusion was similar across models of very different sizes, though Jurassic-1-Jumbo showed some evidence of distinguishing more between too and enough than the other models.
Proponents of the Bayesian error-correction account could argue that despite the absence of "reasoning" in Transformers, the models may have learned to correct for speech errors to some extent.Transformer-based language models often acquire unexpected capabilities that they were not explicitly trained for (e.g., Radford et al. 2019, Brown et al. 2020), and deep learning has been touted as a potentially powerful approach to grammar correction (Dale and Viethen 2021), so error repair may be a latent capability of such models.However, reasoning about what the originally intended form or message of a given linguistic utterance was (No head injury is so trivial as to be ignored; Zhang et al. 2022) and how it might have been transformed into its observed form (No head injury is too trivial to be ignored) is a very complex task.This type of reasoning would presumably require some approximation of a theory of mind, as well as an approximate model of human speech production, relevant topic knowledge (Should all head injuries be treated?), and pragmatics.Capabilities that resemble common-sense reasoning may be present in current Transformers to some extent, but, as Kejriwal et al. (2022) have recently argued, a more diverse set of empirical tools is needed to find out how much "common sense" is really encoded in the models.
A deeper understanding of how Transformers process the depth charge construction will, in all likelihood, benefit future research into why most humans struggle with the construction, and why there are such large differences between people and between specific sentences.At the sentence level, the interpretation of depth charge stimuli partly depends on the strength of world knowledge associated with the sentence (Paape et al. 2020, Zhang et al. 2022), as well as its sentiment polarity, semantic cohesion, and word order (Paape 2021).Investigating whether Transformers are also sensitive to these factors would yield further insights into how similar the mechanisms behind the illusion are in Transformers and humans.At the participant level, working memory capacity has been investigated as a potential source of variability, yielding a null result (Paape et al. 2020).A more promising factor may be language experience.An interesting approach would be to try and fine-tune Transformers in such a way that they become immune to the depth charge illusion, and to then see if a similar approach can be used to immunize human participants."Natural" immunity appears to be rare in humans but may also depend on language experience.If immunization is possible, this would support the original intuition of Wason and Reich (1979) that the depth charge illusion is a processing error, and weaken theories that see the illusion as a result of grammaticalization or pragmatic reasoning.

Figure 1 :
Figure 1: Log probability of the critical verb (e.g., ignored) by construction and model.Error bars show 95% confidence intervals across all 32 items.
, to the existence of an idiomatic No X is too Y to Z construction (Cook and Stevenson 2010, Fortuin 2014), to Bayesian speech error correction (Zhang et al. 2022).Transformers don't experience processing breakdown, may not have seen enough instances of the hypothesized No X is too Y to Z construction to encode it, and have no means of distinguishing between "normal" training data and speech errors.