The Linguistic Status of Predictions and Feature Ranks from SVM Text Classifiers
DOI:
https://doi.org/10.3765/exabs.v0i0.2988Abstract
Text classification systems are capable of predicting certain characteristics of a text's author (e.g., gender and age) using only linguistic properties. This paper asks why such predictions are possible and how they can be interpreted. There are three factors: (1) the nature of the features used by the system; (2) the robustness of the predictions across time and genres; (3) the amount of data required for training and testing. Some classification predictions (e.g., gender) are based on non-content linguistic material that generalizes across time and genre. These classifications are characterized by stable performance and feature ranks, and permit linguistic interpretation.
Downloads
Published
Issue
Section
License
Published by the LSA with permission of the author(s) under a CC BY 4.0 license.