2019-08-08 05:47:12 +00:00
|
|
|
Conversion to text for fulltext search
|
|
|
|
=======================================
|
|
|
|
|
|
|
|
text/plain
|
|
|
|
text/csv
|
2020-09-17 17:57:54 +00:00
|
|
|
application/csv
|
2019-08-08 05:47:12 +00:00
|
|
|
cat '%s'
|
|
|
|
|
|
|
|
application/pdf
|
|
|
|
pdftotext -nopgbrk %s - | sed -e 's/ [a-zA-Z0-9.]\{1\} / /g' -e 's/[0-9.]//g'
|
|
|
|
|
2023-03-19 08:16:49 +00:00
|
|
|
If pdftotext takes too long on large document you may want to pass parameter
|
|
|
|
-l to specify the last page to be converted
|
|
|
|
|
2021-12-09 09:09:03 +00:00
|
|
|
mutool draw -F txt -q -N -o - %s
|
|
|
|
|
2019-08-08 05:47:12 +00:00
|
|
|
application/vnd.openxmlformats-officedocument.wordprocessingml.document
|
|
|
|
docx2txt '%s' -
|
|
|
|
|
|
|
|
application/msword
|
|
|
|
catdoc %s
|
|
|
|
|
|
|
|
application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
|
2021-10-08 11:11:40 +00:00
|
|
|
xlsx2csv -d tab %s
|
2019-08-08 05:47:12 +00:00
|
|
|
|
|
|
|
application/vnd.ms-excel
|
2021-10-08 11:11:40 +00:00
|
|
|
xls2csv -d tab %s
|
2019-08-08 05:47:12 +00:00
|
|
|
|
|
|
|
text/html
|
|
|
|
html2text %s
|
|
|
|
|
|
|
|
Many office formats
|
|
|
|
unoconv -d document -f txt --stdout '%s'
|
|
|
|
|
2021-07-20 14:31:44 +00:00
|
|
|
Apache Tika is another option for creating plain text from various document
|
|
|
|
types. Just use curl to send the document to your tika server and get the
|
|
|
|
plain text in return.
|
|
|
|
|
|
|
|
curl -s -T '%s' http://localhost:9998/tika --header 'Accept: text/plain'
|
|
|
|
|
2019-08-08 05:47:12 +00:00
|
|
|
Conversion to pdf for pdf preview
|
|
|
|
==================================
|
2019-01-18 12:03:57 +00:00
|
|
|
|
|
|
|
text/plain
|
|
|
|
text/csv
|
2020-09-17 17:57:54 +00:00
|
|
|
application/csv
|
2019-01-18 12:03:57 +00:00
|
|
|
application/vnd.oasis.opendocument.text
|
|
|
|
application/msword
|
|
|
|
application/vnd.wordperfect
|
2020-09-17 07:13:29 +00:00
|
|
|
text/rtf
|
2019-01-18 12:03:57 +00:00
|
|
|
unoconv -d document -f pdf --stdout -v '%f' > '%o'
|
|
|
|
|
|
|
|
image/png
|
|
|
|
image/jpg
|
|
|
|
image/jpeg
|
2021-10-08 10:10:23 +00:00
|
|
|
convert -density 300 '%f' 'pdf:%o'
|
2019-01-18 12:03:57 +00:00
|
|
|
|
|
|
|
application/vnd.ms-powerpoint
|
|
|
|
application/vnd.openxmlformats-officedocument.presentationml.presentation
|
2020-09-17 07:13:29 +00:00
|
|
|
application/vnd.oasis.opendocument.presentation
|
2019-01-18 12:03:57 +00:00
|
|
|
unoconv -d presentation -f pdf --stdout -v '%f' > '%o'
|
|
|
|
|
|
|
|
application/vnd.ms-excel
|
|
|
|
application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
|
2020-09-17 07:13:29 +00:00
|
|
|
application/vnd.oasis.opendocument.spreadsheet
|
2019-01-18 12:03:57 +00:00
|
|
|
unoconv -d spreadsheet -f pdf --stdout -v '%f' > '%o'
|
|
|
|
|
2022-04-08 05:25:49 +00:00
|
|
|
message/rfc822
|
|
|
|
java -jar emailconverter-2.5.3-all.jar '%f' -o '%o'
|
|
|
|
|
|
|
|
The emailconverter can be obtained from https://github.com/nickrussler/email-to-pdf-converter
|
|
|
|
It requires wkhtmltopdf which is part of debian.
|
|
|
|
|
2019-08-08 05:47:12 +00:00
|
|
|
Conversion to png for preview images
|
|
|
|
=====================================
|
|
|
|
|
|
|
|
If you have problems running convert on PDF documents then read this page
|
|
|
|
https://askubuntu.com/questions/1081895/trouble-with-batch-conversion-of-png-to-pdf-using-convert
|
|
|
|
It basically instructs you to comment out the line
|
|
|
|
|
|
|
|
<policy domain="coder" rights="none" pattern="PDF" />
|
|
|
|
|
|
|
|
in /etc/ImageMagick-6/policy.xml
|
2019-01-18 12:03:57 +00:00
|
|
|
|
2021-10-08 10:10:23 +00:00
|
|
|
convert determines the format of the converted image from the extension of
|
|
|
|
the output filename. SeedDMS usually sets a propper extension when running
|
|
|
|
the command, but nevertheless it is good practice to explicitly set the output
|
|
|
|
format by prefixing the output filename with 'png:'. This is of course always
|
|
|
|
needed if the output goes to stdout.
|
|
|
|
|
2019-01-18 12:03:57 +00:00
|
|
|
image/jpg
|
|
|
|
image/jpeg
|
|
|
|
image/png
|
2021-10-08 10:10:23 +00:00
|
|
|
convert -resize %wx '%f' 'png:%o'
|
2023-01-04 13:32:16 +00:00
|
|
|
|
|
|
|
text/plain
|
|
|
|
convert -density 100 -resize %wx 'text:%f[0]' 'png:%o'
|
2019-01-18 12:03:57 +00:00
|
|
|
|
|
|
|
application/pdf
|
|
|
|
gs -dBATCH -dNOPAUSE -sDEVICE=png16m -dPDFFitPage -r72x72 -sOutputFile=- -dFirstPage=1 -dLastPage=1 -q '%f' | convert -resize %wx png:- '%o'
|
|
|
|
|
2021-12-09 09:09:03 +00:00
|
|
|
convert -density 100 -resize %wx '%f[0]' 'png:%o'
|
|
|
|
|
|
|
|
mutool draw -F png -w %w -q -N -o %o %f 1
|
|
|
|
|
2023-01-05 14:09:04 +00:00
|
|
|
application/postscript
|
|
|
|
convert -density 100 -resize %wx '%f[0]' 'png:%o'
|
|
|
|
|
2019-01-18 12:03:57 +00:00
|
|
|
text/plain
|
2021-10-08 10:10:23 +00:00
|
|
|
a2ps -1 -a1 -R -B -o - '%f' | gs -dBATCH -dNOPAUSE -sDEVICE=png16m -dFirstPage=1 -dLastPage=1 -dPDFFitPage -r72x72 -sOutputFile=- -q - | convert -resize %wx png:- 'png:%o'
|
2019-01-18 12:03:57 +00:00
|
|
|
|
2023-01-05 08:02:30 +00:00
|
|
|
On Linux systems you will have to set the desired value in /etc/papersize for a2ps
|
|
|
|
e.g. a4, or letter
|
|
|
|
|
2019-01-18 12:03:57 +00:00
|
|
|
application/msword
|
|
|
|
application/vnd.oasis.opendocument.spreadsheet
|
|
|
|
application/vnd.oasis.opendocument.text
|
|
|
|
application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
|
|
|
|
application/vnd.ms-excel
|
|
|
|
application/vnd.openxmlformats-officedocument.wordprocessingml.document
|
2020-09-17 07:13:29 +00:00
|
|
|
text/rtf
|
2019-01-18 12:03:57 +00:00
|
|
|
application/vnd.ms-powerpoint
|
|
|
|
text/csv
|
2020-09-17 17:57:54 +00:00
|
|
|
application/csv
|
2019-01-18 12:03:57 +00:00
|
|
|
application/vnd.wordperfect
|
2021-10-08 10:10:23 +00:00
|
|
|
unoconv -d document -e PageRange=1 -f pdf --stdout -v '%f' | gs -dBATCH -dNOPAUSE -sDEVICE=pngalpha -dPDFFitPage -r72x72 -sOutputFile=- -dFirstPage=1 -dLastPage=1 -q - | convert -resize %wx png:- 'png:%o'
|
2019-01-18 12:03:57 +00:00
|
|
|
|
2023-02-28 12:09:33 +00:00
|
|
|
video/webm
|
|
|
|
video/mp4
|
2023-02-28 12:10:16 +00:00
|
|
|
This will take 12th frame of a video and converts into a png. It requires
|
|
|
|
ffmpeg to be installed.
|
|
|
|
|
2023-02-28 12:09:33 +00:00
|
|
|
convert -resize %wx '%f[12]' 'png:%o'
|