Using LibreOffice to Realize Document Format Conversion under CentOS

  • 2021-07-01 08:37:34
  • OfStack

Project requirements, the uploaded documents for 1 some pretreatment, if the user uploaded doc format documents, need to be processed for docx or pdf format, so that the subsequent process of document content extraction.

First, I tried it once phpoffice/phpword This package found that its conversion to doc is not ideal. This package is more suitable for generating documents according to content than converting documents, which is not very suitable for my needs.

Then found LibreOffice this open source tool, after use, the effect is very good, share 1.

The server is CentOS7, and installing LibreOffice directly using yum requires approximately 600MB + disk space:


#  You can delete it before installing it 1 Under, prevent it from being installed before 
yum remove libreoffice-*
yum install libreoffice

After waiting for the installation to be completed, confirm the next version of version 1. Although the official version has reached version 6.1, yum is still a package of 5.3. 6, but there is nothing wrong with it. Here, I still suggest that you use the package management tools of your own Linux system to install it, which can save a lot of trouble.


[root@localhost /]# soffice --version
LibreOffice 5.3.6.1 30(Build:1)

If you can't use it, you can use soffice-help to see 1 help. There are many parameters and use cases, and the conversion format is very simple:


soffice --headless --convert-to docx /opt/upload/source/123.doc --outdir /opt/upload/source

The above command is to put /opt/upload/source/123.doc The file is converted to docx format and output to/ opt/upload/source In the folder.

By default:

The output file will be saved with the source file name + new extension; Will overwrite the existing file with the same name in outdir;

Successful conversion will output something like this:


convert /opt/upload/source/123.doc -> /opt/upload/source/123.docx using filter : MS Word 2007 XML
Overwriting: /opt/upload/source/123.docx

LibreOffice will automatically match the format filter (filter) according to the file format. As for which formats it supports, please refer to official website under 1.

Summarize


Related articles: