Forum


Replies: 1   Views: 27
I need to extract hyperlinks and their text from a .docx

Posted by admin  · 14-07-2025 - 07:04

Hello,

The current stable version of phpdocx doesn't include a direct method to extract URL links with their related text contents.
Please note that MS Word and other DOCX editors can generate links using two tags: hyperlinks (w:hyperlink) and fields (w:instrText).

If your DOCX only contains w:hyperlinks, you can extract the needed information using a custom code:

$docx = new CreateDocxFromTemplate('document.docx');

// get hyperlinks
$referenceNode = array(
    'customQuery' => '//w:hyperlink',
);
$hyperlinksInfo = $docx->getDocxPathQueryInfo($referenceNode);

// get rels content
$xmlRelsContent = $docx->getWordFiles('word/_rels/document.xml.rels');
$xmlUtilities = new XmlUtilities();
$contentRelsDOM = $xmlUtilities->generateDomDocument($xmlRelsContent);
$contentRelsXpath = new DOMXPath($contentRelsDOM);
$contentRelsXpath->registerNamespace('rel', 'http://schemas.openxmlformats.org/package/2006/relationships');

// get hyperlinks information
$hyperlinks = array();
foreach ($hyperlinksInfo['elements'] as $hyperlinkInfo) {
    $hyperlinkEntries = $contentRelsXpath->query('//rel:Relationship[@Id="'.$hyperlinkInfo->getAttribute('r:id').'"]');
    if ($hyperlinkEntries->length > 0) {
        $hyperlinks[] = array(
            'textContent' => $hyperlinkInfo->textContent,
            'target' => $hyperlinkEntries->item(0)->getAttribute('Target'),
        );
    }
}

var_dump($hyperlinks);

We have opened a task to the dev team, and they have added support in the testing branch to extract this information (from hyperlinks and fields) using Indexer:

  • linksContents option to get URL and text content from hyperlinks.

Your phpdocx 15.5 license doesn't include LUS (https://www.phpdocx.com/support). If you upgrade to phpdocx 16 and include LUS you can access these changes from the testing branch (https://www.phpdocx.com/support).
If you upgrade your license, please send to contact[at]phpdocx.com if you are using the classic or namespaces package. We'll send you the updated Indexer class with a custom sample.

Regards.