Forum


Replies: 1   Views: 27
I need to extract hyperlinks and their text from a .docx

Posted by jawaidbazyar  · 13-07-2025 - 20:51

Hello,

I have a large number of existing documents that contain hyperlinks. I want to extract :

the hyperlink url

the hyperlink text

from 

for instance, here is an XML snippet:

      <w:hyperlink r:id="rId3">
        <w:r>
          <w:rPr>
            <w:rStyle w:val="InternetLink"/>
            <w:b/>
            <w:lang w:val="en-US" w:eastAsia="en-US"/>
          </w:rPr>
          <w:t>Boeing Special Attention Requirements Bulletin 737-71-1911 RB, Revision 1</w:t>
        </w:r>
      </w:hyperlink>


I know the rId3 is a reference to an entry in another file containing the hyperlink itself.

I was hoping that there is a single query I can perform against the document to fetch both the text (in this example, "Boeing Special Attention...") and the hyperlink URL.

Right now the closest I have come is using two different API:

this gets me the hyperlink url:

 // Load the existing document
    $indexer = new Indexer($fname);
    $output = $indexer->getOutput();

and this gets me the hyperlink text:
 

$referenceNode = array(
        'type' => 'link'
    );

    // Extract hyperlinks
    $hyperlinks = $docx->getDocxPathQueryInfo( $referenceNode);
    foreach ($hyperlinks['elements'] as $element) {
        var_dump($element);
    }

 

Is there an api call where I can get both the url and text together in one call?

Thank you.