Forum


Replies: 4   Views: 1426
How to use div tags properly when importing html?
Topic closed:
Please note this is an old forum thread. Information in this post may be out-to-date and/or erroneous.
Every phpdocx version includes new features and improvements. Previously unsupported features may have been added to newer releases, or past issues may have been corrected.
We encourage you to download the current phpdocx version and check the Documentation available.

Posted by lynk  · 14-01-2021 - 06:13

Hi,
Rcently, we integrated advanced version of doxc into laravel to create docx from html. 
The html content contains general tags with inline styles. 
ex)
<div style="padding: 30px 40px; background-color: yellow">Test docx</div>

It looks like this https://share.getcloudapp.com/04uNj85Y
And the generated docx from the above one is https://share.getcloudapp.com/o0uqdLZe
seems it does not recognize padding in the example.

Actually, it's the pretty simple ex. Our html contents may include several kinds of tags, and more inline styled contents generated from tinymce.
Could you help me to solve this problem with phpdocx (embedHTML)?

Thank you
 

Posted by admin  · 14-01-2021 - 07:01

Hello,

We recommend you to check the available documentation to transform HTML to Word:

https://www.phpdocx.com/documentation/introduction/html-to-word-PHP

https://www.phpdocx.com/documentation/introduction/html-extended-to-word-PHP (HTML Extended is only available in Premium licenses)

https://www.phpdocx.com/documentation/htmlapi-documentation

You are adding a div tag, and by default, phpdocx handles div tags as containers (from the previous documentation pages):

div: Although this tag is probably the most frequent in modern HTML code, it does not have a direct translation into Word. phpdocx offers different parsing options:

Only use it for the CSS inheritance and parse consequently its child elements.

Parse them as a "p" element with the option "parseDivs" set to "paragraph" (this may be an useful option when using HTML code coming from a WYSIWYG editor).

Parse it as a table with the option "parseDivs" set to "table". This may be the most accurate option if one may decide to preserve all available formatting but may produce complicated Word documents that may be later difficult to edit manually (if that is an interesting option).

So if you want to use div tags as paragraph tags, you need to use the parseDivs option (or change them to p tags):

$html = '<div style="padding: 30px 40px; background-color: yellow">Test docx</div>';
$docx->embedHTML($html, array('parseDivs' => 'paragraph'));

$html = '<p style="padding: 30px 40px; background-color: yellow">Test docx</p>';
$docx->embedHTML($html);

Also please note that not all CSS styles have a direct translation to MS Word. For example, MS Word doesn't allow setting paddings to run-of-text contents (such as inline texts), only to block elements such as paragraphs or tables; to add spaces to run-of-text tags you'd need to add blank spaces as MS Word does (or tabs but these elements only supported in Premium licenses when transforming HTML).

Regards.

Posted by lynk  · 14-01-2021 - 14:30

Hi, thanks for supporting quickly.
Could you please check this screenthot?
https://share.getcloudapp.com/mXu5gw1m

As you could see the screenshot, our goal is getting word docx from html doc, which is generated by tinymce editor. So it must be matched to the original one.
I do understand there are many diff between msword and html tag/style.
But, i dont think it's possible for me to direct those converting rules myself, because our doc(html) is generated by tinymce editor dynamically and  i am not familiar with msword doc structure.
So we are going to upgrade our plan to premium now. But I'd like you to confirm if we can get our goal with the premium thing?
Looking for your help again.

Thank you

Posted by admin  · 14-01-2021 - 15:30

Hello,

Getting the same DOCX output than the linked image depends on the HTML tags and CSS styles it uses. phpdocx supports many HTML tags and CSS styles, and Premium licenses also extend them adding custom HTML tags and CSS styles (mainly to add support not standard HTML and CSS, for example to allow adding a TOC using a tag or set MS Word styles using data attributes), but to transform HTML and CSS correctly, the contents and styles need to be supported and have an equivalent to MS Word.

You can get the same output you need but if your WYSIWYG editor uses the supported HTML and CSS. All supported tags, attributes and styles by HTML to DOCX feature are detailed on the HTML API documentation: https://www.phpdocx.com/documentation/htmlapi-documentation. For example:

· p tags (https://www.phpdocx.com/htmlapi-documentation/html-standard/insert-paragraph-text-Word-document-with-HTML) as you can check on this same page, some styles may not be supported, for example setting width to a paragraph is not supported because MS Word doesn't support it.

· span tags (https://www.phpdocx.com/htmlapi-documentation/html-standard/insert-span-text-Word-document-with-HTML), for example text-align isn't supported because MS Word doesn't allow setting it to run-of-text contents; text-align needs to be applied to block tags such as p or div or table.

We never ask users to get familiar with the internal DOCX structure but HTML to DOCX is limited by the contents and styles supported by MS Word and how they are handled.

Regards.

Posted by lynk  · 14-01-2021 - 18:04

Makes sense. It helps us. Thank you