The Genia data used for segmentation experiments reported in Read et. al. (2012)
was created using the file GENIAcorpus3.02.pos.xml, available from the GENIA
Project. 

We extracted the sentence elements from with the abstract elements (ignoring
titles, since they were generally just one sentence paragraphs), and stripped
all the sentence and word tags. In the unsegmented.txt file given to the
segmenting tools, each abstract is presented as a single paragraph, with
paragraph breaks (blank lines) between abstracts.

Corpus annotations (c) GENIA Project, used under the CC-by license.