# This version is forked from the original PDFExtract by Øyvind Berg # (see http://elacin.github.com/PDFExtract/). It is distributed under # the same Apache License version 2.0. # distribute patched pdfbox or just patches? # The version of pdfbox distributed here is r1157684, with certain # patches applied (see PDFExtract/parent/patch) # NOTE: These installation steps have been tested on a standard # xubuntu distribution, with Subversion and Maven also installed. # How to build PDFExtract from source: # install TEI P5 model cd TEI-P5-Java-model mvn install cd .. # install and patch PDFBox svn co -r 1157684 http://svn.apache.org/repos/asf/pdfbox/trunk/ pdfbox cd pdfbox patch -p0 < ../PDFExtract/parent/patch/pdfbox_poms.patch patch -p0 < ../PDFExtract/parent/patch/pdfbox-font-bounding-boxes.patch patch -p0 < ../PDFExtract/parent/patch/pdfbox-drawer-visibility.patch mvn -DskipTests=true install cd .. # package PDFExtract cd PDFExtract/parent mvn -DskipTests=true package # the binary distribution will have been created in PDFExtract/pdfextract-cli/target/pdfextract-cli-${VERSION}-bin.tar.bz2