This archive contains 698 instances of implicit arguments and discourse antecedents that were automatically extracted from comparable texts in the Gigaword corpus (Roth and Frank, 2012). Each instance is provided as a stand-off annotation in CSV format. Each annotation takes the following form:

  [document_id],[predicate_sentence_id],[predicate_token_id],[predicate_word],[argument_label],[argument_sentence_id],[argument_token_id(s)],[argument_headword]

If you want to use the data for your own work, please cite Roth and Frank (2013). In case of questions, please do not hesitate to get in touch with the first author at mroth@cl.uni-heidelberg.de.


--
Michael Roth and Anette Frank (2012). Aligning predicate argument structures in monolingual comparable texts: A new corpus for a new task. Proceedings of the First Joint Conference on Lexical and Computational Semantics (*SEM), Montreal, Canada.

Michael Roth and Anette Frank (2013). Automatically identifying implicit arguments to improve argument linking and coherence modeling. Proceedings of the Second Joint Conference on Lexical and Computational Semantics (*SEM), Atlanta, Georgia, USA.

