Forum


Replies: 3   Views: 234
How to extract xml from .doc or .docx ?

Posted by kupa369  · 13-03-2024 - 07:32

 

I need a way to convert from doc and docx to xml. Can I use phpdocx to convert doc or docx to xml without any loss?

Posted by admin  · 13-03-2024 - 09:35

Hello,

Please note that DOCX documents include XML files and also other optional files such as images, XLSX, binary files... If you extract the DOCX file (a DOCX is a ZIP file), you can view the included files. A DOCX is not a single XML file but many XML contents that follow OOXML standard (https://en.wikipedia.org/wiki/Office_Open_XML).

phpdocx includes methods to extract information and XML contents from DOCX documents. For example getDocxPathQueryInfo, getWordStyles, indexer... You can also transform OCX to HTML and other document formats.

Regards.

Posted by kupa369  · 14-03-2024 - 00:51

 

I asked the wrong question. I am dealing with Word documents from the 2003 version (doc). I want to convert doc documents to XML without any loss. Is there no way to directly map doc to XML instead of converting doc to docx and then extracting the XML?

Posted by admin  · 14-03-2024 - 08:38

Hello,

phpdocx doesn't include a direct conversion from DOC to XML. You need to transform DOC to DOCX using transformDocument to be able to use phpdocx methods to get XML contents and information.

Regards.