Treebank Annotation in the Light of the Meaning-Text Theory

Simon Mille, Leo Wanner, Alicia Burga


A treebank may contain the annotation of different phenomena such as word order, morphological features, syntactic and semantic relations, etc., which are rather different in their nature. Quite often, the annota- tion of these phenomena is combined in a single structure, which leads to low-quality training results and is verifiably deficient from a theoretical (linguistic) perspective. We argue that the annotation of corpora requires a well-defined linguistic model which supports multi-level annotation, with one type of phenomenon per level. Our experience with dependency treebanks created or adjusted for surface-oriented natural language generation and based on the Meaning-Text Theory, a multi-level linguistic model, supports this argumentation.


treebank; meaning-text theory; annotation

