seeddms-code/doc/README.Converters

104 lines
3.2 KiB
Plaintext
Raw Normal View History

2019-08-08 05:47:12 +00:00
Conversion to text for fulltext search
=======================================
text/plain
text/csv
2020-09-17 17:57:54 +00:00
application/csv
2019-08-08 05:47:12 +00:00
cat '%s'
application/pdf
pdftotext -nopgbrk %s - | sed -e 's/ [a-zA-Z0-9.]\{1\} / /g' -e 's/[0-9.]//g'
application/vnd.openxmlformats-officedocument.wordprocessingml.document
docx2txt '%s' -
application/msword
catdoc %s
application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
xlsx2csv %s
application/vnd.ms-excel
xls2csv %s
text/html
html2text %s
Many office formats
unoconv -d document -f txt --stdout '%s'
2021-07-20 14:31:44 +00:00
Apache Tika is another option for creating plain text from various document
types. Just use curl to send the document to your tika server and get the
plain text in return.
curl -s -T '%s' http://localhost:9998/tika --header 'Accept: text/plain'
2019-08-08 05:47:12 +00:00
Conversion to pdf for pdf preview
==================================
2019-01-18 12:03:57 +00:00
text/plain
text/csv
2020-09-17 17:57:54 +00:00
application/csv
2019-01-18 12:03:57 +00:00
application/vnd.oasis.opendocument.text
application/msword
application/vnd.wordperfect
2020-09-17 07:13:29 +00:00
text/rtf
2019-01-18 12:03:57 +00:00
unoconv -d document -f pdf --stdout -v '%f' > '%o'
image/png
image/jpg
image/jpeg
convert -density 300 '%f' 'pdf:%o'
2019-01-18 12:03:57 +00:00
application/vnd.ms-powerpoint
application/vnd.openxmlformats-officedocument.presentationml.presentation
2020-09-17 07:13:29 +00:00
application/vnd.oasis.opendocument.presentation
2019-01-18 12:03:57 +00:00
unoconv -d presentation -f pdf --stdout -v '%f' > '%o'
application/vnd.ms-excel
application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
2020-09-17 07:13:29 +00:00
application/vnd.oasis.opendocument.spreadsheet
2019-01-18 12:03:57 +00:00
unoconv -d spreadsheet -f pdf --stdout -v '%f' > '%o'
2019-08-08 05:47:12 +00:00
Conversion to png for preview images
=====================================
If you have problems running convert on PDF documents then read this page
https://askubuntu.com/questions/1081895/trouble-with-batch-conversion-of-png-to-pdf-using-convert
It basically instructs you to comment out the line
<policy domain="coder" rights="none" pattern="PDF" />
in /etc/ImageMagick-6/policy.xml
2019-01-18 12:03:57 +00:00
convert determines the format of the converted image from the extension of
the output filename. SeedDMS usually sets a propper extension when running
the command, but nevertheless it is good practice to explicitly set the output
format by prefixing the output filename with 'png:'. This is of course always
needed if the output goes to stdout.
2019-01-18 12:03:57 +00:00
image/jpg
image/jpeg
image/png
convert -resize %wx '%f' 'png:%o'
2019-01-18 12:03:57 +00:00
application/pdf
gs -dBATCH -dNOPAUSE -sDEVICE=png16m -dPDFFitPage -r72x72 -sOutputFile=- -dFirstPage=1 -dLastPage=1 -q '%f' | convert -resize %wx png:- '%o'
text/plain
a2ps -1 -a1 -R -B -o - '%f' | gs -dBATCH -dNOPAUSE -sDEVICE=png16m -dFirstPage=1 -dLastPage=1 -dPDFFitPage -r72x72 -sOutputFile=- -q - | convert -resize %wx png:- 'png:%o'
2019-01-18 12:03:57 +00:00
application/msword
application/vnd.oasis.opendocument.spreadsheet
application/vnd.oasis.opendocument.text
application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
application/vnd.ms-excel
application/vnd.openxmlformats-officedocument.wordprocessingml.document
2020-09-17 07:13:29 +00:00
text/rtf
2019-01-18 12:03:57 +00:00
application/vnd.ms-powerpoint
text/csv
2020-09-17 17:57:54 +00:00
application/csv
2019-01-18 12:03:57 +00:00
application/vnd.wordperfect
unoconv -d document -e PageRange=1 -f pdf --stdout -v '%f' | gs -dBATCH -dNOPAUSE -sDEVICE=pngalpha -dPDFFitPage -r72x72 -sOutputFile=- -dFirstPage=1 -dLastPage=1 -q - | convert -resize %wx png:- 'png:%o'
2019-01-18 12:03:57 +00:00