% Hey, emacs(1), this file is -*- coding: utf-8; -*- got it? the files in this directory provide the input to the final black-box evaluation of the LOGON demonstrator. from the LOGON target corpus JHPSTG, there are the following sub-sets that are used for quality assessments: jhk, psk, tgk --- known-vocabulary held-out data (word lists were available) jhu, psu, tgu --- unknown-vocabulary held-out data (completely unseen) jhe, pse, tge --- proportional sub-sets from the development corpus for each segment, there are the following files: .fan --- fan-out log from the final Fjell integration (15-jan-07); .all --- post-processed fan-out output (from `summarize --output all'); .oracle --- hand-annotated `best' outputs (thanks to victoria rosén) .smt --- alternate translations from (baseline) SMT system finally, there is a script to combine all LOGON outputs into a single file: ./align jhk.all jhk.oracle produces output like the following: |< |Pakk sekken, reis til fjells og begynn å gå.| |@ |Fill your backpack, head for the mountains, and start walking.| |@ |Pack your rucksack, travel to the mountains, and start walking.| |@ |Pack your rucksack, travel to the mountains and start hiking.| |= |Wrap the back pack. travel in the mountains, and begin to go.| |> |Wrap the back pack. leave in the mountains, and begin to go.| |? |Wrap the back pack. leave to the mountains, and start to go.| the |< (source sentence) and |@ (reference translations) lines have the exact same interpretation as in the original fan-out log file. the |= output is the one from the oracle annotations; the |> output is the one ranked highest by the LOGON demonstrator, i.e. the one that was output at the top; finally, the |? line shows the output with the highest BLEU score (in case there are ties, the first such output is used, according to the LOGON ranking). finally, files ending in `.logon' contain the outputs from: for i in jhe jhk jhu pse psk psu tge tgk tgu; do ./align ${i}.all ${i}.oracle > ${i}.logon; done all files are encoded in UTF-8, with Un*x-style newlines.