Forum


Replies: 3   Views: 254
How to extract xml from .doc or .docx ?

Posted by admin  · 13-03-2024 - 09:35

Hello,

Please note that DOCX documents include XML files and also other optional files such as images, XLSX, binary files... If you extract the DOCX file (a DOCX is a ZIP file), you can view the included files. A DOCX is not a single XML file but many XML contents that follow OOXML standard (https://en.wikipedia.org/wiki/Office_Open_XML).

phpdocx includes methods to extract information and XML contents from DOCX documents. For example getDocxPathQueryInfo, getWordStyles, indexer... You can also transform OCX to HTML and other document formats.

Regards.