The Linguistic Status of Predictions and Feature Ranks from SVM Text Classifiers

Authors

  • Jonathan Dunn Illinois Institute of Technology

DOI:

https://doi.org/10.3765/exabs.v0i0.2988

Abstract

Text classification systems are capable of predicting certain characteristics of a text's author (e.g., gender and age) using only linguistic properties. This paper asks why such predictions are possible and how they can be interpreted. There are three factors: (1) the nature of the features used by the system; (2) the robustness of the predictions across time and genres; (3) the amount of data required for training and testing. Some classification predictions (e.g., gender) are based on non-content linguistic material that generalizes across time and genre. These classifications are characterized by stable performance and feature ranks, and permit linguistic interpretation.

Downloads

Published

2015-04-13