EPE 2017: Extrinsic Parser Evaluation Shared Task at DepLing & IWPT 2017 Conversion and Preprocessing Tools Version 1.3; June 15, 2017 Overview ======== This archive contains a tool to convert a range of parser output formats to the EPE 2017 interchange format, as well as a baseline stack of preprocessing tools for sentence splitting, PTB-style tokenization, PoS tagging, and lemmatization. These tools are distributed in pre-compiled binary form for a 64-bit Linux (x86) environment. However, part of the preprocessing stack depends on legacy 32-bit binaries; hence, one will need to make sure that the most basic libraries are installed in 32-bit versions too. Please see below for further instructions. For general background on the task set-up, please see: http://epe.nlpl.eu Format Conversion ================= The following parser output formats can be converted to EPE: + CoNLL-X (‘.conllx’ or ‘.conll’): + CoNLL-09 (‘.conll09’ or ‘.conll’): + CoNLL-U (‘.conllu’): + *SEM 2012 (‘.*sem’): + SDP 2014 (‘.sdp’): + SDP 2015 (‘.sdp’): + UPF (‘.dsynt’ and ‘.bpa’): Sample invocation, reading input and output format from file name suffixes: ./logon/bin/epe --convert \ --raw negation/development/raw.txt \ negation/development/udpipe.conllu /tmp/sample.epe Text Preprocessing ================== Sample invocation, processing a ‘raw’ text file: ./logon/epe/bin --prepare negation/development/raw.txt /tmp/sample.tt Annotation Projection ===================== For use with the Sherlock negation system, EPE graphs need to be augmented with gold-standard negation annotations (from the 2012 *SEM Shared Task), e.g. ./logon/bin/epe --project \ --gold negation/development/gold.\*sem+ \ trial/prague/170410/00/negation/training/raw.epe \ trial/prague/170410/00/negation/training/raw.epe+ 32-Bit Compatibility ==================== For Fedora distributions: yum -y install glibc.i686 libstdc++.i686 Communication ============= While you are looking at this archive, please self-subscribe to the mailing list for the shared task: http://lists.nlpl.eu/mailman/listinfo/epe-users Known Errors ============ None, for the time being. Release History =============== [Version 1.3; June 15, 2017] + Corrected reading and alignment of ASCII-encoded em-dashes (‘---’ for ‘—’). [Version 1.2; June 4, 2017] + Support for conversion of UPF-specific formats (Deep Syntactic Structures); corrections (code simplification) in character offsets from preprocessing; increased robustness to ‘unanchored’ graph nodes in annotation projection. [Version 1.1; May 16, 2017] + Inclusion of ‘projection’ for negation annotations; more conversion fixes. [Version 1.0; April 18, 2017] + Alignment improvements in conversion from inputs without character ranges. [Version 0.9; April 9, 2017] + Trial release of the converter and preprocessing tools (from the LOGON tree). Contact ======= For questions or comments, please do not hesitate to email the task organizers at: ‘epe-organizers@nlpl.eu’. Jari Björne Filip Ginter Richard Johansson Emanuele Lapponi Joakim Nivre Stephan Oepen (chair) Anders Søgaard Erik Velldal Lilja Øvrelid