Mention versus Action: Guiding safety policy for content related to harmful topics

Hadas Kotek; Leon Gatys; Margit Bowler; Yu'an Yang; Shruti Palaskar; Ciro Sannino; Gunnar Lund; Joseph  Cheng; Robert Daland; Charlie Maalouf; Jeffrey Bigham

doi:10.3765/plsa.v11i1.6080

Mention versus Action: Guiding safety policy for content related to harmful topics

Authors

Hadas Kotek Apple Inc.
Leon Gatys Apple Inc.
Margit Bowler Apple Inc.
Yu'an Yang Apple Inc.
Shruti Palaskar Apple Inc.
Ciro Sannino Apple Inc.
Gunnar Lund Apple Inc.
Joseph Cheng Apple Inc.
Robert Daland Apple Inc.
Charlie Maalouf Apple Inc.
Jeffrey Bigham Apple Inc.

DOI:

https://doi.org/10.3765/plsa.v11i1.6080

Keywords:

use/mention, responsible AI, harmfulness

Abstract

As generative AI systems are integrated into high-stakes domains, designing safety policies that accurately distinguish harmful from harm-free content has become a central challenge. A natural starting point is the use/mention distinction from linguistics and philosophy of language: content that merely mentions a harmful topic is typically less harmful than content that uses it to express harm. However, we argue that this binary is insufficient as a basis for responsible AI policy. We propose a mention vs. action framework that extends the use/mention distinction along two additional dimensions: the level of gratuitous detail and the discourse level contribution of the content. Together, these dimensions ground safety assessments in narrowly scoped, operationalizable criteria rather than opaque intent judgments. We demonstrate the framework through case studies in instruction-following, jailbreak attempts, and image content moderation, showing its applicability across modalities and its practical value for safety policy development.

Downloads

Published

2026-06-19

Issue

Vol. 11 No. 1 (2026): Proceedings of the Linguistic Society of America

Section

Articles

License

This work is licensed under a Creative Commons Attribution 4.0 International License.

Published by the LSA with permission of the author(s) under a CC BY 4.0 license.

How to Cite

Kotek, Hadas, Leon Gatys, Margit Bowler, Yu'an Yang, Shruti Palaskar, Ciro Sannino, Gunnar Lund, et al. 2026. “Mention Versus Action: Guiding Safety Policy for Content Related to Harmful Topics”. Proceedings of the Linguistic Society of America 11 (1): 6080. https://doi.org/10.3765/plsa.v11i1.6080.

Download Citation

Mention versus Action: Guiding safety policy for content related to harmful topics

Authors

DOI:

Keywords:

Abstract

Downloads

Published

Issue

Section

License

How to Cite

Information