Hello,
The current stable version of phpdocx doesn't include a direct method to extract URL links with their related text contents.
Please note that MS Word and other DOCX editors can generate links using two tags: hyperlinks (w:hyperlink) and fields (w:instrText).
If your DOCX only contains w:hyperlinks, you can extract the needed information using a custom code:
$docx = new CreateDocxFromTemplate('document.docx');
// get hyperlinks
$referenceNode = array(
'customQuery' => '//w:hyperlink',
);
$hyperlinksInfo = $docx->getDocxPathQueryInfo($referenceNode);
// get rels content
$xmlRelsContent = $docx->getWordFiles('word/_rels/document.xml.rels');
$xmlUtilities = new XmlUtilities();
$contentRelsDOM = $xmlUtilities->generateDomDocument($xmlRelsContent);
$contentRelsXpath = new DOMXPath($contentRelsDOM);
$contentRelsXpath->registerNamespace('rel', 'http://schemas.openxmlformats.org/package/2006/relationships');
// get hyperlinks information
$hyperlinks = array();
foreach ($hyperlinksInfo['elements'] as $hyperlinkInfo) {
$hyperlinkEntries = $contentRelsXpath->query('//rel:Relationship[@Id="'.$hyperlinkInfo->getAttribute('r:id').'"]');
if ($hyperlinkEntries->length > 0) {
$hyperlinks[] = array(
'textContent' => $hyperlinkInfo->textContent,
'target' => $hyperlinkEntries->item(0)->getAttribute('Target'),
);
}
}
var_dump($hyperlinks);
We have opened a task to the dev team, and they have added support in the testing branch to extract this information (from hyperlinks and fields) using Indexer:
- linksContents option to get URL and text content from hyperlinks.
Your phpdocx 15.5 license doesn't include LUS (https://www.phpdocx.com/support). If you upgrade to phpdocx 16 and include LUS you can access these changes from the testing branch (https://www.phpdocx.com/support).
If you upgrade your license, please send to contact[at]phpdocx.com if you are using the classic or namespaces package. We'll send you the updated Indexer class with a custom sample.
Regards.