Forum


Replies: 6   Views: 1064
Not all images are in created document
Topic closed:
Please note this is an old forum thread. Information in this post may be out-to-date and/or erroneous.
Every phpdocx version includes new features and improvements. Previously unsupported features may have been added to newer releases, or past issues may have been corrected.
We encourage you to download the current phpdocx version and check the Documentation available.

Posted by netengine_at  · 24-11-2021 - 06:51

Hi,

I have the following problem, I use embedHTML to get a HTML template into a word document.
This also works most of the time, but in certain templates the images are missing. I've looked at the HTML code for errors (missing closing tags etc.) and if the missing images in the DOC are accessible (both these things are good). How can I find out what is causing the problem? I'm at a loss because it seems so random, when images are there and when they are missing... I have for example a document with 500 pages where about one tenth of the images are missing.

I'm using Version 12 of phpdocx
$docx->embedHTML($myHTMLcode, $embedHTMLoptions);

Posted by admin  · 24-11-2021 - 07:22

Hello,

embedHTML and replaceVariableByHTML methods download the images to be added (the downloadImages option is true as default, if this option is set as false then the images are not downloaded but added as external images) using the file_get_contents function from PHP. If the error is not always the same (the images not being added aren't the same for each execution of the script), maybe some image or the script are returning a timeout when downloading the files? Or the external server is retricting access to download them?

On https://www.phpdocx.com/documentation/cookbook/insert-images-html you can find some documentation about adding images from HTML.

We recommend checking the logs in your server and the remote server to check if the images are being readed correctly.

If you send the most simple script with a HTML sample you are transforming to contact[at]phpdocx.com we'll do some tests. If you also send a sample DOCX output with missing images (please as small as possible), we'll check it too.

Regards.

Posted by netengine_at  · 24-11-2021 - 08:25

Thank you for the quick response, I'll first have a look at the logs and if I can't find anything I'll get back to you.

Posted by netengine_at  · 26-11-2021 - 07:43

Hello,

so I've been trying to find out what is going wrong. As I have mentioned before, one time the images will be in the document and another time the images will be missing. I've added two links to log files, where I have saved the variables when the documents is created, once the images are (debug2.log) there and once they are missing (debug.log), maybe you could have a look at the files and give me some feedback.

At about line 1138 there is a difference in the word code...


https://www.service-host.at/kunden/debug.log
https://www.service-host.at/kunden/debug2.log

 

Posted by admin  · 26-11-2021 - 08:54

Hello,

The only difference from the logs is that some images can't be downloaded so the Word content of the image is not added to the document (that line 1138 you point out and others).

embedHTML and replaceVariableByHTML methods work in the same way as web browser: if an image can't be readed/downloaded then it's not added (file_get_contents from PHP is used to download images)

If we compare both logs, we can check that the following images are added correctly in both cases:

https://www.energiebericht.net/images/icons-doc/warmth.png
https://www.energiebericht.net/images/icons-doc/electricity.png
https://www.energiebericht.net/images/icons-doc/water.png

but the missing images are:

https://www.energiebericht.net/cache/408/2020/pages/cobjects/8002/warmthConsumption.png
https://www.energiebericht.net/cache/408/2020/pages/cobjects/8002/electricityConsumption.png
https://www.energiebericht.net/cache/408/2020/pages/cobjects/8002/waterConsumption.png
https://www.energiebericht.net/cache/408/2020/pages/cobjects/8002/energy_indicator_27.png
https://www.energiebericht.net/cache/408/2020/pages/cobjects/8002/energy_indicator_1.png
https://www.energiebericht.net/cache/408/2020/pages/cobjects/8002/energy_indicator_2.png
https://www.energiebericht.net/cache/408/2020/pages/cobjects/8002/energy_indicator_3.png

that are the images from https://www.energiebericht.net/cache/ . Maybe the images from this remote folder are auto generated (it seems a cache folder) and not available in specific cases (such as being autogenerated internally)? The problem is that PHP can't read these images for some reason we don't know.

We have tested the HTML from your log using embedHTML and in all cases the images are added perfectly (all images can be readed). The best approach would be checking the web server logs in the remove server (https://www.energiebericht.net/cache/ , where the images should exist); the web server logs must detail why the images can't be readed/downloaded (404 not found, 403 access denied or other) when you run the script.

As embedHTML and replaceVariableByHTML methods silence file_get_contents information when an image is downloaded (to avoid false warnings and work as web browsers), you can edit HTML2WordML.php to debug it deeply. In this file you can find the following line (around line 1400):

$photo = @file_get_contents($this->parseURL($nodo['attributes']['src']));

if you remove @:

$photo = file_get_contents($this->parseURL($nodo['attributes']['src']));

and run it using PHP CLI mode, if the image download fails you should get PHP information about it.

The remote web server logs should explain why they can't be readed (404, 403, 500...).

Regards.

Posted by netengine_at  · 26-11-2021 - 09:05

Hi,

thank you very much for the detailed answer! Strangely there is no error in the PHP-Log, also if I don't download the image and only link them they are also missing from the document. I will add a "try catch" to $photo = file_get_contents.. and see if I get some message from the server.

 

Best wishes!