Ruprecht-Karls-Universität Heidelberg

DeWSD - Resources for German WSD

From this page you can download the resources for German Word Sense Disambiguation created using the methods presented in:

  • Broscheit, S., Frank, A., Jehle, D., Ponzetto, S. P., Rehl, D., Summa, A., Suttner, K., and Vola, S. (2010):
    Rapid bootstrapping of Word Sense Disambiguation resources for German. Proceedings of the 10. Konferenz zur Verarbeitung Natürlicher Sprache (KONVENS), Saarbrücken, Germany, 6-8 September 2010. [ bib | .pdf ]

Overview

  • The gold standard of sentences annotated with GermaNet senses we used in our KONVENS paper.

Download

DeWAC automatically annotated with GermaNet senses

This is a version of the DeWAC corpus annotated with GermaNet senses using UKB, a state-of-art system for knowledge-based WSD.

You can download the corpus here.

Sense annotated gold standard

This contains a set of sentences labeled by human annotators with GermaNet senses. The dataset was built by first selecting the 40 keys from the English SensEval-2 test set3 and translating these into German. The data set reflects the distribution of GermaNet across PoS (the set contains 18 nouns, 16 verbs and 6 adjectives), and yields a range of ambiguity rates between 2 and 25 senses for all PoS. For each target word, we extracted 20 sentences for words with up to 4 senses, and an additional 5 sentences per word for each additional sense. Finally, these sentences are manually annotated with the contextually appropriate GermaNet senses.

You can download the gold standard here.

Contact

Please feel free to send us your technical question, requests and bug reports to dewsd@cl.uni-heidelberg.de