The Linguistic Status of Predictions and Feature Ranks from SVM Text Classifiers

Jonathan Dunn

doi:10.3765/exabs.v0i0.2988

Authors

Jonathan Dunn Illinois Institute of Technology

DOI:

https://doi.org/10.3765/exabs.v0i0.2988

Abstract

Text classification systems are capable of predicting certain characteristics of a text's author (e.g., gender and age) using only linguistic properties. This paper asks why such predictions are possible and how they can be interpreted. There are three factors: (1) the nature of the features used by the system; (2) the robustness of the predictions across time and genres; (3) the amount of data required for training and testing. Some classification predictions (e.g., gender) are based on non-content linguistic material that generalizes across time and genre. These classifications are characterized by stable performance and feature ranks, and permit linguistic interpretation.

The Linguistic Status of Predictions and Feature Ranks from SVM Text Classifiers

Authors

DOI:

Abstract

Downloads

Published

Issue

Section

License

Information