CoNLL Shared Task on Universal Dependency Parsing

The 2018 edition of the Extrinsic Parser Evaluation Initiative (EPE 2018) runs in collaboration with the 2018 Shared Task on Multilingual Parsing from Raw Text to Universal Dependencies at the 2018 Conference on Computational Natural Language Learning (CoNLL 2018).  With minimal additional overhead, participants of the UD Parsing Shared Task will be enabled to have their parsers evaluated against the three EPE downstream systems: biological event extraction, fine-grained opinion analysis, and negation resolution.  Additional background is available through the proceedings volume from the EPE 2017 campaign.  Seeing as the focus of the UD Parsing Shared Task is on different evaluation perspective in 2018, the connection to the EPE framwork will provide new opportunities for correlating intrinsic metrics with downstream effects on three relevant applications.  However, to date the EPE infrastructure regrettably is only applicable to parsing systems for English.

EPE 2018 Results

Initial results were shared with task participants in late July 2018, but debugging of a couple of runs and improved hyper-parameter tuning for the negation resolution system continued well into August.  As of August 15, 2018, final and official end-to-end results for all submissions are available to the public, either in the form of an on-line spreadsheet or (mostly for archival purposes) for download.  Note that some dynamic computations only work in the on-line version, e.g. of averages and ranks on select sub-sets of submissions.  An overview paper from EPE 2018 is forthcoming as part of the proceedings for the ‘core’ UD Parsing Shared Task, seeking to establish meaningful connections (or surprising disconnects) between intrinsic and extrinsic evaluation results.  At the same time, all submitted parser outputs are available for download.  To refer to either of the EPE 2017 or 2018 campaigns, please use the bibligraphic references provided by the task co-organizers.

Participation and Interfaces

Participation in the EPE 2018 campaign will be open to all participants in the UD Parsing Shared Task.  It will require parsing an additional 1.1 million tokens of running English text (from various domains) into basic UD trees.  The EPE parser inputs are provided as a separate data package on the TIRA platform, using the same general interfaces as the core parsing task (e.g. via the metadata.json format; see below).  Parser outputs will be collected in CoNLL-U format and will later be automatically transferred from the TIRA platform to the cluster that runs the EPE infrastructure.  For each individual parser output, the three EPE downstream systems will be retrained and optimized, in oder to avoid downstream bias towards one specific parsing system. 

As the downstream systems interface to syntactic structure by way of character offsets (i.e. substring indices), CoNLL-U parser outputs will be automatically converted to the more general EPE interchange format for dependency graphs.  Therefore, multi-word tokens (indicated by range indices in CoNLL-U) and empty nodes (indicated by non-integer token identifiers) are disallowed in parser outputs to be submitted for EPE downstream evaluation. When parsing into English basic UD trees, neither of these are expected. A basic validation script is available on TIRA, which we ask all participants to invoke (through the web interface) upon completion of each parsing job. Please see the instructions for the core UD parsing task for background.

To not interfere with the potentially busy last few days of the UD Parsing Shared Task, the 2018 EPE initiative closes almost one week after the official end of the (intrinsic) evaluation phase in the UD Parsing Shared Task.  Extrinsic, end-to-end results will be published in a format similar to the 2017 scores; these will be available in time for inclusion in the final, camera-ready versions of system descriptions.  The EPE 2018 organizers will contribute a short summary of downstream results (and correlations to intrinsic evaluation results) to the proceedings volume of the UD Parsing Shared Task.

Data Sets on TIRA

Similar to the multi-lingual UD test data, the EPE parser inputs (for English) are available in two formats: as ‘raw’, running text and in pre-segmented form, with sentence and token boundaries predicted by the CoNLL 2018 baseline parser.  The full EPE data set is called conll18-ud-epe-test on TIRA.  Its metadata.json file contains three entries, one for each of the three EPE downstream applications:

[
 {"lcode":"en", "tcode":"any",
  "rawfile":"events.txt", "psegmorfile":"events-udpipe.conllu", "outfile":"events.conllu"},
 {"lcode":"en", "tcode":"any",
  "rawfile":"negation.txt", "psegmorfile":"negation-udpipe.conllu", "outfile":"negation.conllu"},
 {"lcode":"en", "tcode":"any",
  "rawfile":"opinion.txt", "psegmorfile":"opinion-udpipe.conllu", "outfile":"opinion.conllu"}
]

No gold-standard syntactic annotations are available for these texts, hence there will be no intrinsic evaluation scores computed for these runs.  The ‘wildcard’ treebank code indicates that participants can choose freely which English parsing model to use; however, only the training data (and other materials) provided from the core UD parsing task can be used.  As for the core UD parsing task, participants are welcome to configure multiple systems on TIRA (called ‘software’ in the web interface), but in order to correlate end-to-end extrinsic results with the official intrinsic measures, it is expected that these configurations (and their names) correspond to the final submissions for the core task.  However, a team might decide to pick a different configuration as their primary system for the EPE sub-task, seeing as the EPE texts are English-only and represent a broad variation of domains and genres.  Please email the EPE organizers (see below) before the publication of end-to-end EPE results in case you want a different primary system in the EPE context than in the core UD parsing task.

To facilitate debugging of TIRA runs against the EPE texts (which might in principle exhibit greater variation in typographic and layout conventions than the English UD treebanks), a ‘trial’ version, comprising a little less than half the full EPE data, is available as a public data set on TIRA, called conll18-epe-trial.  When running against the trial version, participants will be able to view system and evaluator outputs; this sub-set of the EPE data is also available for public download.

Tentative Schedule

June 15, 2018
Availability of EPE Parser Inputs on TIRA
July 13, 2018
EPE Parser Outputs are Due on TIRA
July 27, 2018
End-to-End Evaluation Results Available

EPE 2018 Co-Organizers

The organizers of the EPE 2018 initiative can be reached at the following address:

   epe-organizers@nlpl.eu