Wednesday, May 04, 2011

Alfresco as an Image Archive Server (TIFF/fax/scan images)

Currently evaluating Alfresco CE 3.4.d for use as an Image Archive/Record Content Management Server. Definition is to store multi-page TIFF images that have 2-6 custom attributes that must be searchable to retrieve the associated images.

The most common usecase that doesn't involve company-specific attributes as an example is storing incoming Fax images where you want to store attributes such as the number dialed to come in (enterprise w/ DID or similar fax setup), date it came in, number it came from (if available). For the number dialed in, you could instead say 'Department'.

Anyway, this post isn't about the custom attributes piece, this is for the image piece.

Req 1, allow to store and view multipage TIFF images (preferably without requiring a TIFF plugin that will likely change on Office upgrades).
Alfresco by default does not handle multipage TIFF. In fact, 3.4.d the supplied ImageMagick doesn't even support TIFF (see /alfresco/common/bin 'convert -list configure', DELEGATES line, should see TIF and it isn't there). 3.4.e DOES support TIF, but only for windows and 64-bit linux, and only the *first* page of the TIF.

Luckily, this wonderful community member of the open source product Alfresco already had a solution: http://fabiostrozzi.eu/2010/10/27/improving-tiff-preview-in-alfresco-share/

With additional modifications to remove ImageMagick, OpenOffice, and other ancillary services that were not needed for something soley to be a TIFF-based Image Server, a rather slim solution that with the default 'SHARE' interface is a good solution. I do have 3.4.d working with this solution, and will be doing a more enterprise-oriented tomcat deploy opposed to the installer approach and feel quite confident in how Alfresco team architected the product to support each companies' unique needs.

Current problem: The FLASH previewer is good, but the challenge with multi-page TIFF is that the tiff2pdf conversion isn't that bad....it's the pdf2swf that is taking 1/4 to 1/2 a second per page.




Research notes for TIFF 2 PDF conversion those interested:



ImageMagick 6.5.4 seems to work, but has huge/escalating memory requirements as TIFF's grow for tiff2pdf:

Memory requirements of 600MB-3GB of system ram (non jvm heap) per image conversion (but fast, 1-4 seconds).
3GB is related to a 7mb test file that seems to have some bad TIF encoding, however
3GB is only because moved to swap space, it may be more.

instead, use a newer version:

wget ftp://ftp.imagemagick.org/pub/ImageMagick/linux/SRPMS/ImageMagick-6.6.9-7.src.rpm


  sudo yum groupinstall "Development Tools"
  sudo yum install rpmdevtool libtool-ltdl-devel
 
sudo yum install djvulibre-devel tcl-devel freetype-devel ghostscript-devel libwmf-devel jasper-devel lcms-devel bzip2-devel librsvg2 librsvg2-devel liblpr-1 liblqr-1-devel libtool-ltdl-devel autotrace-devel

rpmbuild --nodeps --rebuild   ImageMagick-6.6.9-7.src.rpm

cd /home/dhartford/build/RPMS/i686
sudo rpm -ihv --force --nodeps ImageMagick-6.6.9-7.i686.rpm

In the end, same memory requirements (600MB-3GB).


Alternatives reviewed:
A separate medium has been suggested, such as TIFF to GIF, then GIF to PDF:
${img.exe} ${source} gif:- | convert gif:- ${target}
slightly better, but the edge case of 3GB ram still occurs. Also increases diskspace with additional medium.

Switches to work around potential problem areas do not seem to matter:
${img.exe} -monochrome -compress Fax ${source} ${target}
No difference.

TIFF to PNG, may get more performance from GraphicsMagick:
http://superuser.com/questions/233441/use-imagemagick-to-convert-tiff-to-pngs-how-to-improve-the-speed
--not tested

libtiff has a direct **tiff2pdf** that simply 'wraps' the image with PDF headers without
doing dpi/sizing/re-rendering like the ImageMagick/GraphicsMagic approach (which,
under the covers, uses libtiff to read the tiff then sends the resulting image
through image processing for dpi/resolution modifications and then sends it
to Ghostscript to generate the resulting PDF). Note that imagemagick and
graphicsmagick under the covers also uses libtiff anyway for TIFF decoding.

BEST OPTION from testing, tiff2pdf modification testing seems to be around:
Memory requirements of 10MB-80MB of system ram (non jvm heap) per image conversion, ~1 second fast.
--some issues around if bad TIF encoding sending to stdout/stderror, creates an exit status preventing completion in Alfresco transformer.
Asking mailing list if there is a quiet/silent mode so tries best-attempt at conversion without
causing the exit status.
There is no 3GB ram issue (instead 80MB over ~10 sec for the 7MB tiff/99 pages). 
*NOTE: The 7MB example came back as 99 pages in SWF previewer. Using separate system TIFF and PDF viewers, also 99 pages, so consistent.




Research notes on the PDF viewer(s) when used with TIFF 2 pdf conversion:
http://wiki.alfresco.com/wiki/Installing_Alfresco_components#Linux_and_Unix_Installation

version 0.8.1 does not paginate tiff2pdf conversions, causing repeating cycle in the flash previewer.



NOTE: alternate viewer: http://swfviewer.blogspot.com/



REVIEWED: http://packages.sw.be/swftools/, only has rpms up to 0.8.1, and there have been several releases since then.

TODO: 64-bit centos binary: http://wiki.alfresco.com/w/images/1/1d/Swftools-centos54-x86_64.tar.gz


mkdir /opt/swftools
cd /opt/swftools
wget http://www.swftools.org/swftools-0.9.1.tar.gz

tar -xzvf swftools-0.9.1.tar.gz

yum install zlib-devel libjpeg-devel giflib-devel freetype-devel gcc gcc-c++ make

cd swftools-0.9.1
./configure --disable-lame  --prefix=/opt/swftools/swftools-0.9.1-bin/
make
make install


Diskspace footprint for /opt/swftools including source code, configure, make, and binary:
46MB


10 comments:

babagannoush said...

Hi Darren -

I am unable to get tiff (preview or thumbnails) to work with Alfresco 3.4d. (using the default tiff to png conversion)

The log file shows 'Content conversion failed' & 'No decode delegate for this image format'

I am able to run the same command from the prompt and the tiff to png conversion works.

I am thinking that the tiff > pdf> swf solution provided by Fabio should solve both thumbnail and web-preview issues.

I have modified things as per Fabio's post, but share stills seems to performing its default tiff >png conversion.

Is there some other tweak I need to get share to do tiff>pdf>swf?

Thanks.

dhartford said...

Hi Babagannoush,
It sounds like the environment variables (linux: export LD_LIBRARY_*) are not setup/available to Alfresco itself.

One other thing to try is, from the commandline, to try:

"convert -list configure"

There should be a 'DELEGATES' line, and if TIFF isn't there (which, if I recall the 3.4.D install I tested, the default setup did *not* include TIFF), that is the problem is that the imagemagick used by Alfresco (which might be different than your commandline version) is lacking TIFF setup in Imagemagick.

babagannoush said...

Thanks Darren! You are absolutely correct, as mentioned in your post as well, TIFF is not in the DELEGATES line of the Alfresco supplied Imagemagick.

I was trying "convert - list configure" from root which has TIFF in Delegates.

I fixed that by pointing 'img.exe' to '/usr/bin/convert'
my thumbnails work now, but the preview still fails
the log file shows that the pdf file size is 0 (after TIFF >PDF conversion)

It tries to do PDF>SWF with the PDF file but that obviously doesn't work.

Any ideal why my PDF file size is 0, could it be due to syntax error in the command (due to incorrect paths)?

Thanks.

dhartford said...

Glad I could help Babagannoush!

I haven't run into the 0-size PDF (other than one time with a butchered TIFF), but might be able to fix/replace whatever that problem is with the more efficient approach I found (if on linux) by installing 'libtiff' if it isn't already and modify as listed here:

http://fabiostrozzi.eu/2010/10/27/improving-tiff-preview-in-alfresco-share/comment-page-1/#comment-2027

babagannoush said...

Thank you so much Darren! Everything works perfectly now!

I didn't understand what you meant regarding relocating the files to different directory. I kept them as per Fabio's post and it worked.

Thank you very much!

partymix said...

I am getting the below error inside of where the previewer goes:

08:26:53,615 http-80-10 ERROR [extensions.webscripts.AbstractRuntime] Exception from executeScript - redirecting to status template error: 03270010 Failed to process template org/alfresco/components/preview/web-preview.get.html.ftl
org.springframework.extensions.webscripts.WebScriptException: 03270010 Failed to process template org/alfresco/components/preview/web-preview.get.html.ftl


Do you have any thoughts with this? I get this when I change the web-preview-min.js to add the conditional statement he suggests: if (this.options.mimeType.match(/^image\/tiff$/)) u = o; else

And then also adding the two .xml files and the .properties file in the location he suggests. (I also tried all three in the location that you suggested iwth the names you suggested)

I seem to be having issues :( Any thoughts? I am using alfresco 3.4d and imagemagick 6,6,3,10

partymix said...

nevermind i just replaced the whole viewer with one that doesn't conver the image and can read multipage tiff-
Thanks

partymix said...

nevermind i just replaced the whole viewer with one that doesn't conver the image and can read multipage tiff-
Thanks

Jay said...

Hi Darren

How did you manage to index the files manually, I'm looking at a situation where we have faxes and scans coming in, which then need to be optionally cleaned (by a group of operators) and indexed with type of document, who will handle it, ...

Do you use Share or something else like Liveray or Drupal?

Thanks, J.

dhartford said...

Hi Jay,
For the fax scenario, it was custom code to read straight from our fax software (Rightfax) and insert via CMIS. Only the indexes that were from the Rightfax itself.

Other scenarios involve a regular data capture suite and then exporting to Alfresco (again, custom code through CMIS). I've done this with EMC/Captiva Formware, and can likely be done with Ephesoft or other capture solutions as well...but capture solutions are their own beast to deal with :-).



It isn't very portable unfortunately.