Forum


Replies: 1   Views: 216
I need to extract hyperlinks and their text from a .docx
Topic closed:
Please note this is an old forum thread. Information in this post may be out-to-date and/or erroneous.
Every phpdocx version includes new features and improvements. Previously unsupported features may have been added to newer releases, or past issues may have been corrected.
We encourage you to download the current phpdocx version and check the Documentation available.

Posted by jawaidbazyar  · 13-07-2025 - 20:51

Hello,

I have a large number of existing documents that contain hyperlinks. I want to extract :

the hyperlink url

the hyperlink text

from 

for instance, here is an XML snippet:

      <w:hyperlink r:id="rId3">
        <w:r>
          <w:rPr>
            <w:rStyle w:val="InternetLink"/>
            <w:b/>
            <w:lang w:val="en-US" w:eastAsia="en-US"/>
          </w:rPr>
          <w:t>Boeing Special Attention Requirements Bulletin 737-71-1911 RB, Revision 1</w:t>
        </w:r>
      </w:hyperlink>


I know the rId3 is a reference to an entry in another file containing the hyperlink itself.

I was hoping that there is a single query I can perform against the document to fetch both the text (in this example, "Boeing Special Attention...") and the hyperlink URL.

Right now the closest I have come is using two different API:

this gets me the hyperlink url:

 // Load the existing document
    $indexer = new Indexer($fname);
    $output = $indexer->getOutput();

and this gets me the hyperlink text:
 

$referenceNode = array(
        'type' => 'link'
    );

    // Extract hyperlinks
    $hyperlinks = $docx->getDocxPathQueryInfo( $referenceNode);
    foreach ($hyperlinks['elements'] as $element) {
        var_dump($element);
    }

 

Is there an api call where I can get both the url and text together in one call?

Thank you.