The segmentation experiments reported in Read et.al (2012) used four sections of
the recreated WSJ text in the ‘cooked’ directory. (See README in that directory
for details on how the data was produced.)

To create the unsegmented.txt file, used in the segmentation experiments:

cat cooked/wsj0{3,4,5,6}.txt|perl -pe 's/^\[\d+\] \|//;'|\
	perl -pe 's/ +/ /g;'|perl -pe 's/\n/ /;'|\
   perl -pe 's/  +/\n\n/g' > unsegmented.txt

And the segmented.txt, used for evaluation:

cat cooked/wsj0{3,4,5,6}.txt|perl -pe 's/^\[\d+\] \|//;'|\
	grep -v "^$" > segmented.txt