Forum


Replies: 11   Views: 120
Transform docx to pdf using ms word with php com closes slow

Posted by Bertil  · 07-05-2024 - 16:57

Hi, I am struggling with a slow conversion time issue which seems like a timeout exceeded somewhere. 

I want to convert a DOCX to PDF using MSWORD, so I am using the transformDocument method. It works fine, but it takes >30sec. I added logs to see what was taking time in the library, specifically in the TransformDocAdvMSWord.php, whish can be summarised like this : 

// start a Word instance
$MSWordInstance = new COM("word.application");

// open the source document
$MSWordInstance->Documents->Open($source);

// save the target document
$MSWordInstance->ActiveDocument->SaveAs($target, $code[$filesExtensions['targetExtension']]);

// close Word
$MSWordInstance->Quit();

// free memory
$MSWordInstance = null;

And the guilty is the free memory section, that line alone takes 30 sec.

$MSWordInstance = null;

Meanwhile, the PDF has been created almost instantaneously, but I have no response before that 30 sec.

Dunno if linked but I have "COM Surrogate" processes that stay idle even after conversion, whereas the "word" processes are automatically closed after conversion.

That fixed 30 sec makes me think of a timeout, as if apache couldn't make the COM object end (using apache + php/cgi on windows server).

Do you have an idea ? Thanks in advance !

Posted by admin  · 07-05-2024 - 17:44

Hello,

Please test the same conversion standalone using PHP CLI, so you can check if the issue comes from Apache configuration or some external setting.

Also please check if there's any change in performace calling Release() to close PHP COM. In TransformDocAdvMSWord.php, instead of:

$MSWordInstance->Quit();

$MSWordInstance = null;

Please try using:

$MSWordInstance->Quit();
$MSWordInstance->Release();

$MSWordInstance = null;

Although calling "Release()" is not needed to close the COM instance, maybe it improves the performance.

If you send to contact[at]phpdocx.com the DOCX you are transforming, we can test it on our test server to check the performance.

Regards.

Posted by Bertil  · 10-05-2024 - 09:47

Thanks for your reply, I tried the same conversion using PHP CLI and I have the same issue, sadly.

But when I use Release() after the Quit(), it gives me this error : COMException (0x800706BA): The RPC server is unavailable. Seems understandable since I asked to quit, but if I release before the quit(), I have this : COMException (0x80020003): Member not found. 

If I comment the free memory section, I still have a 30s waiting time, cause I suppose PHP is killing the pointer anyway at the end.

Posted by admin  · 10-05-2024 - 10:11

Hello,

Please note that PHP should add or show a warning or error when it exceeds some limit (memory, time...). Do you get some information in the server logs or stdout when doing the conversion?

As the same issue happens using PHP CLI mode, some external setting or program must be causing this strange issue (some antivirus or external program? or some missing permission (https://www.php.net/manual/es/ref.com.php#120122)?). Is the same problem repeated with another document? for example a very simple DOCX that contains only a text paragraph.

What happens if the code ends the PHP execution before releasing the PHP COM instance?:

exit;
$MSWordInstance->Quit();

$MSWordInstance = null;

What Windows, MS Word and PHP versions are you using? If you send to contact[at]phpdocx.com a DOCX sample you are transforming we'll test it in our Windows test server.

Regards.

Posted by Bertil  · 10-05-2024 - 11:09

I'm logging all errors/warnings/notices and none of them is related to that process, would have been too easy.
Already done everything on the link you provided, that's how I made the conversion work.
The issue can be reproduced on any document, even a blank docx.
 
Something interesting though, if I exit before the quit(), then the response is very fast and the conversion works fine, but the word.application process is still running in the task manager.
Without the exit, I can see the word.application closing after the 30sec.
If I set a 10sec time limit, the script still takes 30sec and crashes (error 500).

Using Windows Server 2022, Office LTSC Professional Plus 2021, PHP 8.0.28 (nts-Win32-vs16-x64) and Phpdocx 14.5 premium.

I'm starting to think of an asynchronous way to do it ..

Posted by admin  · 10-05-2024 - 11:20

Hello,

We are doing some tests with PHP 8.0, Windows Server 2022 and MS Word 2019 and the conversion is done correctly in all cases (without extra time to release the PHP COM instance).

As alternative approach (and a good test), your code could kill the WINWORD process with PHP as detailed on the following reply:

https://www.phpdocx.com/en/forum/default/topic/2438

Instead of using Quit from PHP COM, please test killing the WINWORD process to check if the script ends without adding extra time. It's a very weird issue, we are doing all possible tests in our test servers but we are unable to replicate it.

Regards.

Posted by Bertil  · 10-05-2024 - 11:45

Works like a charm, but I don't see how to implement it because I don't know the PID of the WORD instance PHP created and I can't kill all the instances because several processes can be launched in parallel.

To add mystery, I have a similar environment but with Word LTSC (not office), and "it works on my computer". I will try to install the same version.

Posted by admin  · 10-05-2024 - 12:03

Hello,

If you need to do parallel conversions in the same OS, then killing the MS Word process is not a proper solution because Windows may use a single PID for MS Word (all DOCX documents opened with MS Word may use the same PID, not a single PID for each document).

In this case, it's a good test trying another MS Word version (maybe some version with a valid license that doesn't require an online checking).

Regards.

Posted by Bertil  · 10-05-2024 - 15:00

I don't agree on the single PID explanation, if I run multiple conversions simultaneously I can see Windows create a PID for each conversion. Moreover, if I don't quit() I can see the Word instances stack.

I managed to install the same version of Word on both servers but still, the worst is that the environment with the extra time issue is the one with the most ressources. CPU and memory usage are very low.

I will get back on this issue tuesday, have a nice weekend

Posted by admin  · 10-05-2024 - 15:28

Hello,

If you open MS Word manually and then open two or more DOCX documents, you can check in the task manager that only one PID (WINWORD.EXE name) is assigned to all documents. You can get the same information running:

tasklist /svc | findstr "WINWORD"

Please note we are not saying that PHP COM may not generate multiple PID (one for each instance), but we think doing parallel conversions in the same OS and killing WINWORD process (with taskkill) is not a proper solution because it may close more than one document at the same time. We don't see any direct option to get the PID of the program running in a PHP COM instance.
If you run:

$MSWordInstance = new COM("word.application") or PhpdocxLogger::logger('Check that PHP COM is enabled and a working copy of Word is installed.', 'fatal');
$tasklist = shell_exec('tasklist /svc /FO CSV | findstr "WINWORD"');

You can get a CSV output of the WINWORD processes, incluiding the PID. The last entry will be the last one unless another process has created another one at exactly the same moment. This information can be used to kill the related PID:

$pid_value = // get the last PID from $tasklist
shell_exec('taskkill /PID ' . $pid_value);

Of course, the best approach is knowing why that server is not working correctly and it's adding extra time when the PHP COM instance is closed. Some external setting or program or permission (we don't know) in the server may be blocking the normal PHP COM workflow. You say that one of your server doesn't have the same issue, there must be some configuration/program/setting difference between the server that is working as expected and the other server that adds the extra time to close the PHP COM instance.

Regards.

Posted by Bertil  · 16-05-2024 - 08:34

Hi there, good news, problem solved. Better news, the issue was not related to phpdocx but to the firewall which blocked access to DigiCert, don't know why it never happened on the application but only with phpdocx.

Thanks a lot for the time spent helping me :)

Regards

Posted by admin  · 16-05-2024 - 08:48

Hello,

Thanks for your feedback. It was a weird issue we were sure it wasn't related to phpdocx but some external setting. In this case, it seems that MS Word was trying to get some resource from DigiCert, so the timeout was due to the firewall blocking the access.

Regards.