Wednesday, May 04, 2011

Alfresco as an Image Archive Server (TIFF/fax/scan images)

Currently evaluating Alfresco CE 3.4.d for use as an Image Archive/Record Content Management Server. Definition is to store multi-page TIFF images that have 2-6 custom attributes that must be searchable to retrieve the associated images.

The most common usecase that doesn't involve company-specific attributes as an example is storing incoming Fax images where you want to store attributes such as the number dialed to come in (enterprise w/ DID or similar fax setup), date it came in, number it came from (if available). For the number dialed in, you could instead say 'Department'.

Anyway, this post isn't about the custom attributes piece, this is for the image piece.

Req 1, allow to store and view multipage TIFF images (preferably without requiring a TIFF plugin that will likely change on Office upgrades).
Alfresco by default does not handle multipage TIFF. In fact, 3.4.d the supplied ImageMagick doesn't even support TIFF (see /alfresco/common/bin 'convert -list configure', DELEGATES line, should see TIF and it isn't there). 3.4.e DOES support TIF, but only for windows and 64-bit linux, and only the *first* page of the TIF.

Luckily, this wonderful community member of the open source product Alfresco already had a solution: http://fabiostrozzi.eu/2010/10/27/improving-tiff-preview-in-alfresco-share/

With additional modifications to remove ImageMagick, OpenOffice, and other ancillary services that were not needed for something soley to be a TIFF-based Image Server, a rather slim solution that with the default 'SHARE' interface is a good solution. I do have 3.4.d working with this solution, and will be doing a more enterprise-oriented tomcat deploy opposed to the installer approach and feel quite confident in how Alfresco team architected the product to support each companies' unique needs.

Current problem: The FLASH previewer is good, but the challenge with multi-page TIFF is that the tiff2pdf conversion isn't that bad....it's the pdf2swf that is taking 1/4 to 1/2 a second per page.




Research notes for TIFF 2 PDF conversion those interested:



ImageMagick 6.5.4 seems to work, but has huge/escalating memory requirements as TIFF's grow for tiff2pdf:

Memory requirements of 600MB-3GB of system ram (non jvm heap) per image conversion (but fast, 1-4 seconds).
3GB is related to a 7mb test file that seems to have some bad TIF encoding, however
3GB is only because moved to swap space, it may be more.

instead, use a newer version:

wget ftp://ftp.imagemagick.org/pub/ImageMagick/linux/SRPMS/ImageMagick-6.6.9-7.src.rpm


  sudo yum groupinstall "Development Tools"
  sudo yum install rpmdevtool libtool-ltdl-devel
 
sudo yum install djvulibre-devel tcl-devel freetype-devel ghostscript-devel libwmf-devel jasper-devel lcms-devel bzip2-devel librsvg2 librsvg2-devel liblpr-1 liblqr-1-devel libtool-ltdl-devel autotrace-devel

rpmbuild --nodeps --rebuild   ImageMagick-6.6.9-7.src.rpm

cd /home/dhartford/build/RPMS/i686
sudo rpm -ihv --force --nodeps ImageMagick-6.6.9-7.i686.rpm

In the end, same memory requirements (600MB-3GB).


Alternatives reviewed:
A separate medium has been suggested, such as TIFF to GIF, then GIF to PDF:
${img.exe} ${source} gif:- | convert gif:- ${target}
slightly better, but the edge case of 3GB ram still occurs. Also increases diskspace with additional medium.

Switches to work around potential problem areas do not seem to matter:
${img.exe} -monochrome -compress Fax ${source} ${target}
No difference.

TIFF to PNG, may get more performance from GraphicsMagick:
http://superuser.com/questions/233441/use-imagemagick-to-convert-tiff-to-pngs-how-to-improve-the-speed
--not tested

libtiff has a direct **tiff2pdf** that simply 'wraps' the image with PDF headers without
doing dpi/sizing/re-rendering like the ImageMagick/GraphicsMagic approach (which,
under the covers, uses libtiff to read the tiff then sends the resulting image
through image processing for dpi/resolution modifications and then sends it
to Ghostscript to generate the resulting PDF). Note that imagemagick and
graphicsmagick under the covers also uses libtiff anyway for TIFF decoding.

BEST OPTION from testing, tiff2pdf modification testing seems to be around:
Memory requirements of 10MB-80MB of system ram (non jvm heap) per image conversion, ~1 second fast.
--some issues around if bad TIF encoding sending to stdout/stderror, creates an exit status preventing completion in Alfresco transformer.
Asking mailing list if there is a quiet/silent mode so tries best-attempt at conversion without
causing the exit status.
There is no 3GB ram issue (instead 80MB over ~10 sec for the 7MB tiff/99 pages). 
*NOTE: The 7MB example came back as 99 pages in SWF previewer. Using separate system TIFF and PDF viewers, also 99 pages, so consistent.




Research notes on the PDF viewer(s) when used with TIFF 2 pdf conversion:
http://wiki.alfresco.com/wiki/Installing_Alfresco_components#Linux_and_Unix_Installation

version 0.8.1 does not paginate tiff2pdf conversions, causing repeating cycle in the flash previewer.



NOTE: alternate viewer: http://swfviewer.blogspot.com/



REVIEWED: http://packages.sw.be/swftools/, only has rpms up to 0.8.1, and there have been several releases since then.

TODO: 64-bit centos binary: http://wiki.alfresco.com/w/images/1/1d/Swftools-centos54-x86_64.tar.gz


mkdir /opt/swftools
cd /opt/swftools
wget http://www.swftools.org/swftools-0.9.1.tar.gz

tar -xzvf swftools-0.9.1.tar.gz

yum install zlib-devel libjpeg-devel giflib-devel freetype-devel gcc gcc-c++ make

cd swftools-0.9.1
./configure --disable-lame  --prefix=/opt/swftools/swftools-0.9.1-bin/
make
make install


Diskspace footprint for /opt/swftools including source code, configure, make, and binary:
46MB