Tips to convert HTML to Word
The embedHTML method and its counterpart for templates, replaceVariableByHTML, allow to convert HTML with CSS to Word, while respecting to the maximum their contents and styles. To achieve the maximum similarity with the original HTML and avoid any errors, it is necessary to follow some good practices.
Supported tags and styles
phpdocx supports nearly all the HTML tags and CSS styles that have an equivalent in MS Word.
In our web you cand find the complete list of compatible tags and styles.
When working with HTML 5 tags (such as 'section' or 'main') and an old version of Tidy, you may need to upgrade to the latest release of Tidy to set styles correctly. Otherwise, some styles may not be applied to these tags.
Beside these HTML tags and CSS styles, when importing HTML you can assign too existing Word styles to classes, ids or specific tags with the option 'wordStyles'.
The HTML Extended feature is available since the release of phpdocx 9. It allows to create custom HTML tags to invoke the library methods, and thus add contents not available in the standard HTML. Thanks to this functionality it is possible to use HTML to insert headers, footers, comments, TOCs, page number, WordFragments and many other contents.
Tidy, incorrect tagging, accents and other non ASCII characters
For a proper HTML import, it is mandatory that the tags and styles are correctly opened and closed. In other words, that the structure of the code is right. phpdocx uses the PHP extension Tidy (http://php.net/manual/en/book.tidy.php) to correct the HTML and generate a valid tagging. You can install this extension in any operating system with PHP.
To import HTML with accents, we also recommend installing the PHP mbstring extension to auto detect mime encoding.
If you haven't installed the Tidy extension, errors may ocurr, like appearing the CSS styles in the document, import with errors the HTML or not displaying accents and other non ASCII characters.
Divide and Optimize
Although the import of HTML and CSS is optimized to the maximum, transforming thousands of lines with different tagging and styles may affect performance.
The solution to achieve the best possible performance is to divide the code you are importing. E.g.: instead of adding with embedHTML an HTML file of 10000 lines, you could divide it in five HTML files and then call embedHTML for each HTML.
With this easy step you can decrease exponentially CPU and memory consumption.
phpdocx 9 performance improvements
phpdocx 9 included several changes in the HTML to Word classes to get an average improvement of 60% less RAM used and 15% faster compared to previous versions.