Polish Dependency Bank


Abstract


The paper outlines the first Polish dependency bank derived from constituent trees. The conversion is a fully automatic and unambiguous process. The converter takes manually disambiguated constituent trees encoded in the XML format as input and produces dependency structures encoded in the column-based CoNLL format. The conversion is a relatively straightforward process, since constituents have their syntactic centers marked in most cases. However, a certain amount of reorganizing is necessary, in order to make the dependency structures meet annotation principles. The main part of the paper will be devoted to characteristics of Polish dependency types. The Polish dependency bank can be used for training or evaluation of Polish parsers. 

Keywords


treebank; Polish; dependency trees; constituent trees

Full Text: PDF