fix formatting

This commit is contained in:
Uwe Steinmann 2025-10-23 13:50:18 +02:00
parent 00e6a22dbd
commit ed36b88a09

View File

@ -29,46 +29,46 @@ php-fpm's configuration. On Debian this is done in the file
### text/plain, text/csv, application/csv ### text/plain, text/csv, application/csv
`cat '%s'` `cat '%s'`
### application/pdf ### application/pdf
`pdftotext -q -nopgbrk %s - | sed -e 's/ [a-zA-Z0-9.]\{1\} / /g' -e 's/[0-9.]//g'` `pdftotext -q -nopgbrk %s - | sed -e 's/ [a-zA-Z0-9.]\{1\} / /g' -e 's/[0-9.]//g'`
If pdftotext takes too long on large document you may want to pass parameter If pdftotext takes too long on large document you may want to pass parameter
`-l` to specify the last page to be converted. `-q` is for suppressing error/warnings `-l` to specify the last page to be converted. `-q` is for suppressing error/warnings
send to stderr send to stderr
`mutool draw -F txt -q -N -o - %s ` `mutool draw -F txt -q -N -o - %s `
### application/vnd.openxmlformats-officedocument.wordprocessingml.document ### application/vnd.openxmlformats-officedocument.wordprocessingml.document
`docx2txt '%s' -` `docx2txt '%s' -`
### application/msword ### application/msword
`catdoc %s` `catdoc %s`
### application/vnd.oasis.opendocument.text ### application/vnd.oasis.opendocument.text
`odt2txt %s` `odt2txt %s`
### application/vnd.openxmlformats-officedocument.spreadsheetml.sheet ### application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
`xlsx2csv -d tab %s` `xlsx2csv -d tab %s`
### application/vnd.ms-excel ### application/vnd.ms-excel
`xls2csv -d tab %s` `xls2csv -d tab %s`
### text/html ### text/html
`html2text %s` `html2text %s`
Many office formats can be converted with `unoconv`, though this turned Many office formats can be converted with `unoconv`, though this turned
out in the past to sometimes crash or taking a long time. out in the past to sometimes crash or taking a long time.
`unoconv -d document -f txt --stdout '%s'` `unoconv -d document -f txt --stdout '%s'`
Apache Tika is another option for creating plain text from various document Apache Tika is another option for creating plain text from various document
types. Just use `curl` to send the document to your tika server and get the types. Just use `curl` to send the document to your tika server and get the
@ -81,49 +81,49 @@ image.
## Conversion to pdf for pdf preview ## Conversion to pdf for pdf preview
* text/plain, text/csv, application/csv, application/vnd.oasis.opendocument.text application/msword, application/vnd.wordperfect, text/rtf ### text/plain, text/csv, application/csv, application/vnd.oasis.opendocument.text application/msword, application/vnd.wordperfect, text/rtf
`unoconv -d document -f pdf --stdout -v '%f' > '%o'` `unoconv -d document -f pdf --stdout -v '%f' > '%o'`
* image/png, image/jpg, image/jpeg ### image/png, image/jpg, image/jpeg
`convert -density 300 '%f' 'pdf:%o'` `convert -density 300 '%f' 'pdf:%o'`
Actually `convert` can be used for many other image formats. Actually `convert` can be used for many other image formats.
* image/svg+xml ### image/svg+xml
`cairosvg -f pdf -o '%o' '%f'` `cairosvg -f pdf -o '%o' '%f'`
* application/vnd.ms-powerpoint, application/vnd.openxmlformats-officedocument.presentationml.presentation, application/vnd.oasis.opendocument.presentation ### application/vnd.ms-powerpoint, application/vnd.openxmlformats-officedocument.presentationml.presentation, application/vnd.oasis.opendocument.presentation
`unoconv -d presentation -f pdf --stdout -v '%f' > '%o'` `unoconv -d presentation -f pdf --stdout -v '%f' > '%o'`
* application/vnd.ms-excel, application/vnd.openxmlformats-officedocument.spreadsheetml.sheet, application/vnd.oasis.opendocument.spreadsheet ### application/vnd.ms-excel, application/vnd.openxmlformats-officedocument.spreadsheetml.sheet, application/vnd.oasis.opendocument.spreadsheet
`unoconv -d spreadsheet -f pdf --stdout -v '%f' > '%o'` `unoconv -d spreadsheet -f pdf --stdout -v '%f' > '%o'`
* message/rfc822 ### message/rfc822
`java -jar emailconverter-2.5.3-all.jar '%f' -o '%o'` `java -jar emailconverter-2.5.3-all.jar '%f' -o '%o'`
The emailconverter can be obtained from https://github.com/nickrussler/email-to-pdf-converter The emailconverter can be obtained from https://github.com/nickrussler/email-to-pdf-converter
It requires `wkhtmltopdf` which is part of debian. It requires `wkhtmltopdf` which is part of debian.
* text/plain ### text/plain
`iconv -c -f utf-8 -t latin1 '%f' | a2ps -1 -q -a1 -R -B -o - - | ps2pdf - -` `iconv -c -f utf-8 -t latin1 '%f' | a2ps -1 -q -a1 -R -B -o - - | ps2pdf - -`
The parameter `-q` is important because a2ps sends some statistical The parameter `-q` is important because a2ps sends some statistical
data to stderr, which makes SeedDMS believe the command has failed. data to stderr, which makes SeedDMS believe the command has failed.
* application/x-xopp ### application/x-xopp
`xournalpp -p "%o" "%f"` `xournalpp -p "%o" "%f"`
Converting from application/x-xopp to pdf only works if the xopp file Converting from application/x-xopp to pdf only works if the xopp file
does not use a pdf document as a background, because this pdf is not does not use a pdf document as a background, because this pdf is not
stored in the xopp fіle. stored in the xopp fіle.
## Conversion to png for preview images ## Conversion to png for preview images
@ -143,64 +143,64 @@ needed if the output goes to stdout.
### image/jpg, image/jpeg, image/png ### image/jpg, image/jpeg, image/png
`convert -resize %wx '%f' 'png:%o'` `convert -resize %wx '%f' 'png:%o'`
* image/svg+xml ### image/svg+xml
`cairosvg -f png --output-width %w -o '%o' '%f'` `cairosvg -f png --output-width %w -o '%o' '%f'`
* text/plain ### text/plain
`convert -density 100 -resize %wx 'text:%f[0]' 'png:%o'` `convert -density 100 -resize %wx 'text:%f[0]' 'png:%o'`
* application/pdf ### application/pdf
`gs -dBATCH -dNOPAUSE -sDEVICE=png16m -dPDFFitPage -r72x72 -sOutputFile=- -dFirstPage=1 -dLastPage=1 -q '%f' | convert -resize %wx png:- '%o'` `gs -dBATCH -dNOPAUSE -sDEVICE=png16m -dPDFFitPage -r72x72 -sOutputFile=- -dFirstPage=1 -dLastPage=1 -q '%f' | convert -resize %wx png:- '%o'`
`convert -density 100 -resize %wx '%f[0]' 'png:%o'` `convert -density 100 -resize %wx '%f[0]' 'png:%o'`
`mutool draw -F png -w %w -q -N -o '%o' '%f' 1` `mutool draw -F png -w %w -q -N -o '%o' '%f' 1`
`pdftocairo '%f' -png -singlefile -scale-to-x %w -scale-to-y -1 - > '%o'` `pdftocairo '%f' -png -singlefile -scale-to-x %w -scale-to-y -1 - > '%o'`
`pdftocairo` needs to output to stdout because the output file name passed `pdftocairo` needs to output to stdout because the output file name passed
to pdftocairo will be suffixed with `.png` to pdftocairo will be suffixed with `.png`
* application/postscript ### application/postscript
`convert -density 100 -resize %wx '%f[0]' 'png:%o'` `convert -density 100 -resize %wx '%f[0]' 'png:%o'`
* text/plain ### text/plain
iconv -c -f utf-8 -t latin1 '%f' | a2ps -1 -q -a1 -R -B -o - - | gs -dBATCH -dNOPAUSE -sDEVICE=png16m -dFirstPage=1 -dLastPage=1 -dPDFFitPage -r72x72 -sOutputFile=- -q - | convert -resize %wx png:- 'png:%o' `iconv -c -f utf-8 -t latin1 '%f' | a2ps -1 -q -a1 -R -B -o - - | gs -dBATCH -dNOPAUSE -sDEVICE=png16m -dFirstPage=1 -dLastPage=1 -dPDFFitPage -r72x72 -sOutputFile=- -q - | convert -resize %wx png:- 'png:%o'`
On Linux systems you will have to set the desired value in /etc/papersize for a2ps On Linux systems you will have to set the desired value in /etc/papersize for a2ps
e.g. a4, or letter. Unfortunately, a2ps cannot process utf-8 encoded files. That's e.g. a4, or letter. Unfortunately, a2ps cannot process utf-8 encoded files. That's
why the input needs to be recoded with iconv or recode. why the input needs to be recoded with iconv or recode.
* application/msword, application/vnd.oasis.opendocument.spreadsheet, application/vnd.oasis.opendocument.text, application/vnd.openxmlformats-officedocument.spreadsheetml.sheet, application/vnd.ms-excel, application/vnd.openxmlformats-officedocument.wordprocessingml.document, text/rtf, application/vnd.ms-powerpoint, text/csv, application/csv, application/vnd.wordperfect, ### application/msword, application/vnd.oasis.opendocument.spreadsheet, application/vnd.oasis.opendocument.text, application/vnd.openxmlformats-officedocument.spreadsheetml.sheet, application/vnd.ms-excel, application/vnd.openxmlformats-officedocument.wordprocessingml.document, text/rtf, application/vnd.ms-powerpoint, text/csv, application/csv, application/vnd.wordperfect,
`unoconv -d document -e PageRange=1 -f pdf --stdout -v '%f' | gs -dBATCH -dNOPAUSE -sDEVICE=pngalpha -dPDFFitPage -r72x72 -sOutputFile=- -dFirstPage=1 -dLastPage=1 -q - | convert -resize %wx png:- 'png:%o'` `unoconv -d document -e PageRange=1 -f pdf --stdout -v '%f' | gs -dBATCH -dNOPAUSE -sDEVICE=pngalpha -dPDFFitPage -r72x72 -sOutputFile=- -dFirstPage=1 -dLastPage=1 -q - | convert -resize %wx png:- 'png:%o'`
* video/webm, video/mp4 ### video/webm, video/mp4
This will take 12th frame of a video and converts into a png. It requires This will take 12th frame of a video and converts into a png. It requires
ffmpeg to be installed. ffmpeg to be installed.
`convert -resize %wx "%f[12]" "png:%o"` `convert -resize %wx "%f[12]" "png:%o"`
You may as well use ffmpeg right away You may as well use ffmpeg right away
`ffmpeg -i "%f" -ss 00:00:02 -frames:v 1 -loglevel quiet -vf scale=%w:-1 -f apng "%o"` `ffmpeg -i "%f" -ss 00:00:02 -frames:v 1 -loglevel quiet -vf scale=%w:-1 -f apng "%o"`
* audio/mpeg ### audio/mpeg
`sox "%f" -n spectrogram -x 600 -Y 550 -r -l -o - | convert -resize %wx png:- "png:%o"` `sox "%f" -n spectrogram -x 600 -Y 550 -r -l -o - | convert -resize %wx png:- "png:%o"`
* application/x-xopp ### application/x-xopp
`xournalpp -i "%o" --export-png-width=%w "%f"` `xournalpp -i "%o" --export-png-width=%w "%f"`
Converting from application/x-xopp to png only works if the xopp file Converting from application/x-xopp to png only works if the xopp file
does not use a pdf document as a background, because this pdf is not does not use a pdf document as a background, because this pdf is not
stored in the xopp fіle. stored in the xopp fіle.