Conversion plugin phpdocx

Preparing the documents for their conversion

Preparing the documents for conversion when using the LibreOffice method

To achieve the highest quality when transforming a document using the LibreOffice method, it is advisable to follow some good practices:

  • Use the latest release available on LibreOffice.
  • Set the sizes of the tables manually when using addTable or embedHTML methods. So, instead of keeping the sizes with automatic values according to the content, indicate specific values for each table and cell.
  • To generate active links using addLink, embedHTML or replaceVariableByHTML when trasforming DOCX to PDF, an existing custom character style (rStyle) in the DOCX must be applied to the link. phpdocx adds a default rStyle (DefaultParagraphFontPHPDOCX) to links, that can be customized with addLink, embedHTML or replaceVariableByHTML to use a custom one in the DOCX, or created dynamically with createCharacterStyle.
  • Headers and footers are best transformed if their content is in a table. This allows to place correctly each header and footer element.
  • To hide tables borders, erase them cell by cell. If you generate the DOCX from HTML with the embedHTML method, hide the border in each <td>.
  • Choose font types available in the operating system where the conversion plugin is running. Linux, Windows and macOS allow adding new fonts easily. Fonts can also be embedded in the DOCX: working with fonts.
  • LibreOffice has some minor differences in default sizes and distances compared with MS Word. It is recommended to check these values; just open the document with LibreOffice to adjust them.
  • In order to achieve a conversion as much close as possible to the original content, you can create the DOCX template with LibreOffice. LibreOffice 4.3 and higher allows to save documents in DOCX format (File > Save as > Microsoft Word 2007-2013 XML (.docx)). These files are fully compatible with MS Word.
  • Avoid to the greatest extent possible using absolute positions and place the contents in the document with elements like tables.
  • LibreOffice uses 'footnote reference' and 'endnote reference' style names to apply styles to footnote and endnote references. If needed, these styles can be added or imported with the available methods.
Supported OOXML tags and attributes when using the native method

phpdocx parses contents, styles, properties and other OOXML contents.

The list of currently parsed contents and styles include :

  • document (w:body)

    • border (w:pgBorders) => w:top, w:bottom, w:left, w:right: w:color, w:sz, w:val (nil, none, dashed, dotted), w:space
  • sections (w:sectPr)

    • size (w:pgSz) => w:w (width), w:h (height)
    • margin (w:pgMar) => w:top (margin-top), w:bottom (margin-bottom), w:left (margin-left), w:right (margin-right)
  • title and properties (cp:coreProperties)

    • title (dc:title) => title
    • author (dc:creator) => creator, author
    • subject (dc:subject) => subject
    • keywords (cp:keywords) => keywords
  • text strings (w:t) and text styles (w:rPr)

    • text (w:t)
    • bold (w:b)
    • color (w:color)
    • font family (w:rFonts, w:asciiTheme [majorHAnsi, minorHAnsi])
    • font size (w:sz)
    • highlight (w:highlight)
    • italic (w:i)
    • line through (w:strike)
    • text decoration (w:u) => none or underline
    • vanish (w:vanish)
    • vertical align (w:vertAlign)
  • paragraphs (w:pPr)

    • background color (w:shd)
    • bold (w:b)
    • color (w:color)
    • font family (w:rFonts, w:asciiTheme [majorHAnsi, minorHAnsi])
    • font size (w:sz)
    • italic (w:i)
    • line height (w:spacing)
    • line through (w:strike)
    • page break (w:pageBreakBefore)
    • text align (w:jc) => left, justify, center, right
    • text decoration (w:u) => none or underline
    • text indent (w:firstLine)
  • images (w:drawing): png, jpg and other formats supported by web browsers

    • align (wp:positionH, wp:align) => right, left, center
    • border (a:ln, a:noFill)
    • height (wp:extent) => cy
    • link (a:hlinkClick) => r:id
    • width (wp:extent) => cx
  • lists (w:numPr)

    • type (w:numId) => w:val and w:ilvl (list-style-type: disc, decimal, lower-alpha, lower-roman, upper-alpha, upper-roman)
    • view paragraphs elements for other styles
    • some styles such as color or font sizes can be inherited to the li content from the li symbol. In this case, the content must have its own style
  • links

    • link (w:instrText) => HYPERLINK
  • form elements

    • checkbox (w:instrText)
    • input (w:instrText)
    • select (w:instrText)
  • styles (view elements on this same page for supported styles)

    • character/run (w:rPr)
    • paragraph (w:pPr)
    • list (w:pPr, w:numId, w:ilvl)
    • table (w:style)
    • styles file (w:styles) => character/run (w:rStyle), paragraph and list (w:pStyle), table
    • numbering file => list (w:abstractNum)
  • tables (w:tbl)

    • border (w:tblBorders) => w:top, w:right, w:bottom, w:left (width, style [solid, none], color)
    • layout (w:tblLayout) => w:type fixed
    • width (w:tblW) => w:type pct, dxa w:w
    • rowspan (w:vMerge) => w:val restart, continue (rowspan)
    • cell background color (w:shd) => w:fill
    • cell border (w:tcPr) => w:top, w:right, w:bottom, w:left (width, style [solid, none], color)
    • cell padding (w:tblCellMar) => w:top, w:right, w:bottom, w:left
    • colspan (w:gridSpan) => w:val (colspan)
    • all contents use a left alignment
  • other elements

    • break (w:br) => line and page
    • simple fields (w:fldSimple) => AUTHOR, COMMENTS, LASTSAVEDBY, TITLE
    • textbox => style (min-height, float, width), fillcolor (background-color), margin-top (margin-top), strokecolor (border-color, border-style), strokeweight (border-width)
    • tracked contents (w:ins, w:del)
  • headers and footers

    • The default type is added to all pages. To use distinct headers or footers, the recommended approach is generating a PDF for each header/footer pages and merge all outputs using mergePdf.
    • To get the best output, headers and footers must use tables to position contents.
    • NUMPAGES and PAGES fields are supported and transformed to their numeric values.

    WARNING:

  • The fact that a content is not parsed does not mean its content disappears from the DOCX output. It only implies that their associated OOXML properties are not taken directly into account. Their children and text content will be parsed and rendered with their corresponding styles into the PDF output.

phpdocx requires DOMPDF to transform DOCX to PDF using the native conversion plugin:

Next - Other conversion methods