seeddms-code/doc/README.Converters

134 lines
4.1 KiB
Plaintext
Raw Normal View History

2019-08-08 05:47:12 +00:00
Conversion to text for fulltext search
=======================================
text/plain
text/csv
2020-09-17 17:57:54 +00:00
application/csv
2019-08-08 05:47:12 +00:00
cat '%s'
application/pdf
pdftotext -nopgbrk %s - | sed -e 's/ [a-zA-Z0-9.]\{1\} / /g' -e 's/[0-9.]//g'
2023-03-19 08:16:49 +00:00
If pdftotext takes too long on large document you may want to pass parameter
-l to specify the last page to be converted
2021-12-09 09:09:03 +00:00
mutool draw -F txt -q -N -o - %s
2019-08-08 05:47:12 +00:00
application/vnd.openxmlformats-officedocument.wordprocessingml.document
docx2txt '%s' -
application/msword
catdoc %s
application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
2021-10-08 11:11:40 +00:00
xlsx2csv -d tab %s
2019-08-08 05:47:12 +00:00
application/vnd.ms-excel
2021-10-08 11:11:40 +00:00
xls2csv -d tab %s
2019-08-08 05:47:12 +00:00
text/html
html2text %s
Many office formats
unoconv -d document -f txt --stdout '%s'
2021-07-20 14:31:44 +00:00
Apache Tika is another option for creating plain text from various document
types. Just use curl to send the document to your tika server and get the
plain text in return.
curl -s -T '%s' http://localhost:9998/tika --header 'Accept: text/plain'
2019-08-08 05:47:12 +00:00
Conversion to pdf for pdf preview
==================================
2019-01-18 12:03:57 +00:00
text/plain
text/csv
2020-09-17 17:57:54 +00:00
application/csv
2019-01-18 12:03:57 +00:00
application/vnd.oasis.opendocument.text
application/msword
application/vnd.wordperfect
2020-09-17 07:13:29 +00:00
text/rtf
2019-01-18 12:03:57 +00:00
unoconv -d document -f pdf --stdout -v '%f' > '%o'
image/png
image/jpg
image/jpeg
convert -density 300 '%f' 'pdf:%o'
2019-01-18 12:03:57 +00:00
application/vnd.ms-powerpoint
application/vnd.openxmlformats-officedocument.presentationml.presentation
2020-09-17 07:13:29 +00:00
application/vnd.oasis.opendocument.presentation
2019-01-18 12:03:57 +00:00
unoconv -d presentation -f pdf --stdout -v '%f' > '%o'
application/vnd.ms-excel
application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
2020-09-17 07:13:29 +00:00
application/vnd.oasis.opendocument.spreadsheet
2019-01-18 12:03:57 +00:00
unoconv -d spreadsheet -f pdf --stdout -v '%f' > '%o'
message/rfc822
java -jar emailconverter-2.5.3-all.jar '%f' -o '%o'
The emailconverter can be obtained from https://github.com/nickrussler/email-to-pdf-converter
It requires wkhtmltopdf which is part of debian.
2019-08-08 05:47:12 +00:00
Conversion to png for preview images
=====================================
If you have problems running convert on PDF documents then read this page
https://askubuntu.com/questions/1081895/trouble-with-batch-conversion-of-png-to-pdf-using-convert
It basically instructs you to comment out the line
<policy domain="coder" rights="none" pattern="PDF" />
in /etc/ImageMagick-6/policy.xml
2019-01-18 12:03:57 +00:00
convert determines the format of the converted image from the extension of
the output filename. SeedDMS usually sets a propper extension when running
the command, but nevertheless it is good practice to explicitly set the output
format by prefixing the output filename with 'png:'. This is of course always
needed if the output goes to stdout.
2019-01-18 12:03:57 +00:00
image/jpg
image/jpeg
image/png
convert -resize %wx '%f' 'png:%o'
2023-01-04 13:32:16 +00:00
text/plain
convert -density 100 -resize %wx 'text:%f[0]' 'png:%o'
2019-01-18 12:03:57 +00:00
application/pdf
gs -dBATCH -dNOPAUSE -sDEVICE=png16m -dPDFFitPage -r72x72 -sOutputFile=- -dFirstPage=1 -dLastPage=1 -q '%f' | convert -resize %wx png:- '%o'
2021-12-09 09:09:03 +00:00
convert -density 100 -resize %wx '%f[0]' 'png:%o'
mutool draw -F png -w %w -q -N -o %o %f 1
2023-01-05 14:09:04 +00:00
application/postscript
convert -density 100 -resize %wx '%f[0]' 'png:%o'
2019-01-18 12:03:57 +00:00
text/plain
a2ps -1 -a1 -R -B -o - '%f' | gs -dBATCH -dNOPAUSE -sDEVICE=png16m -dFirstPage=1 -dLastPage=1 -dPDFFitPage -r72x72 -sOutputFile=- -q - | convert -resize %wx png:- 'png:%o'
2019-01-18 12:03:57 +00:00
On Linux systems you will have to set the desired value in /etc/papersize for a2ps
e.g. a4, or letter
2019-01-18 12:03:57 +00:00
application/msword
application/vnd.oasis.opendocument.spreadsheet
application/vnd.oasis.opendocument.text
application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
application/vnd.ms-excel
application/vnd.openxmlformats-officedocument.wordprocessingml.document
2020-09-17 07:13:29 +00:00
text/rtf
2019-01-18 12:03:57 +00:00
application/vnd.ms-powerpoint
text/csv
2020-09-17 17:57:54 +00:00
application/csv
2019-01-18 12:03:57 +00:00
application/vnd.wordperfect
unoconv -d document -e PageRange=1 -f pdf --stdout -v '%f' | gs -dBATCH -dNOPAUSE -sDEVICE=pngalpha -dPDFFitPage -r72x72 -sOutputFile=- -dFirstPage=1 -dLastPage=1 -q - | convert -resize %wx png:- 'png:%o'
2019-01-18 12:03:57 +00:00
2023-02-28 12:09:33 +00:00
video/webm
video/mp4
2023-02-28 12:10:16 +00:00
This will take 12th frame of a video and converts into a png. It requires
ffmpeg to be installed.
2023-02-28 12:09:33 +00:00
convert -resize %wx '%f[12]' 'png:%o'