A Hybrid Approach to Error Detection in a Treebank and its Impact on Manual Validation Time

Rahul Agarwal, Bharat Ram Ambati, Dipti Misra Sharma


Treebanks are a linguistic resource: a large database where the morphological, syntactic and lexical information for each sentence has been explicitly marked. The critical requirements of treebanks for various NLP activities (research and application) are well known. This also implies that treebanks need to be as error free as possible. However, manual validation of a treebank is very costly, both in terms of time and money. This paper describes an approach to automatically detect errors in a treebank after a complete manual annotation. Over and above improving an earlier error detection tool (Ambati et al. (2011)) for a Hindi treebank. We also present a user study to show that our system reduces the validation time signi


treebank; error detection; Hindi;

Full Text: PDF