Conversion to text for fulltext search ======================================= text/plain text/csv application/csv cat '%s' application/pdf pdftotext -nopgbrk %s - | sed -e 's/ [a-zA-Z0-9.]\{1\} / /g' -e 's/[0-9.]//g' mutool draw -F txt -q -N -o - %s application/vnd.openxmlformats-officedocument.wordprocessingml.document docx2txt '%s' - application/msword catdoc %s application/vnd.openxmlformats-officedocument.spreadsheetml.sheet xlsx2csv -d tab %s application/vnd.ms-excel xls2csv -d tab %s text/html html2text %s Many office formats unoconv -d document -f txt --stdout '%s' Apache Tika is another option for creating plain text from various document types. Just use curl to send the document to your tika server and get the plain text in return. curl -s -T '%s' http://localhost:9998/tika --header 'Accept: text/plain' Conversion to pdf for pdf preview ================================== text/plain text/csv application/csv application/vnd.oasis.opendocument.text application/msword application/vnd.wordperfect text/rtf unoconv -d document -f pdf --stdout -v '%f' > '%o' image/png image/jpg image/jpeg convert -density 300 '%f' 'pdf:%o' application/vnd.ms-powerpoint application/vnd.openxmlformats-officedocument.presentationml.presentation application/vnd.oasis.opendocument.presentation unoconv -d presentation -f pdf --stdout -v '%f' > '%o' application/vnd.ms-excel application/vnd.openxmlformats-officedocument.spreadsheetml.sheet application/vnd.oasis.opendocument.spreadsheet unoconv -d spreadsheet -f pdf --stdout -v '%f' > '%o' message/rfc822 java -jar emailconverter-2.5.3-all.jar '%f' -o '%o' The emailconverter can be obtained from https://github.com/nickrussler/email-to-pdf-converter It requires wkhtmltopdf which is part of debian. Conversion to png for preview images ===================================== If you have problems running convert on PDF documents then read this page https://askubuntu.com/questions/1081895/trouble-with-batch-conversion-of-png-to-pdf-using-convert It basically instructs you to comment out the line in /etc/ImageMagick-6/policy.xml convert determines the format of the converted image from the extension of the output filename. SeedDMS usually sets a propper extension when running the command, but nevertheless it is good practice to explicitly set the output format by prefixing the output filename with 'png:'. This is of course always needed if the output goes to stdout. image/jpg image/jpeg image/png convert -resize %wx '%f' 'png:%o' text/plain convert -density 100 -resize %wx 'text:%f[0]' 'png:%o' application/pdf gs -dBATCH -dNOPAUSE -sDEVICE=png16m -dPDFFitPage -r72x72 -sOutputFile=- -dFirstPage=1 -dLastPage=1 -q '%f' | convert -resize %wx png:- '%o' convert -density 100 -resize %wx '%f[0]' 'png:%o' mutool draw -F png -w %w -q -N -o %o %f 1 text/plain a2ps -1 -a1 -R -B -o - '%f' | gs -dBATCH -dNOPAUSE -sDEVICE=png16m -dFirstPage=1 -dLastPage=1 -dPDFFitPage -r72x72 -sOutputFile=- -q - | convert -resize %wx png:- 'png:%o' application/msword application/vnd.oasis.opendocument.spreadsheet application/vnd.oasis.opendocument.text application/vnd.openxmlformats-officedocument.spreadsheetml.sheet application/vnd.ms-excel application/vnd.openxmlformats-officedocument.wordprocessingml.document text/rtf application/vnd.ms-powerpoint text/csv application/csv application/vnd.wordperfect unoconv -d document -e PageRange=1 -f pdf --stdout -v '%f' | gs -dBATCH -dNOPAUSE -sDEVICE=pngalpha -dPDFFitPage -r72x72 -sOutputFile=- -dFirstPage=1 -dLastPage=1 -q - | convert -resize %wx png:- 'png:%o'