Forum


Replies: 7   Views: 260
How to get an image's alt text

Posted by bkidd  · 02-06-2020 - 20:23

I need to read a .docx file and get a list of alt text from images.  I will eventually replace the image source but I need to first identify images with placeholder alt text.

I have a premium namespaced license.  I've tried the indexer which has image data but it doesn't have the alt text.

Any suggestions are appreciated.

Posted by admin  · 03-06-2020 - 06:43

Hello,

Indexer doesn't extract alt text contents. You can use getDocxPathQueryInfo with a custom Xpath query to get that information:

$referenceNode = array(
    'customQuery' => '//w:drawing//wp:docPr',
);

$contents = $docx->getDocxPathQueryInfo($referenceNode);

foreach ($contents['elements'] as $content) {
    echo $content->getAttribute('descr');
}

Regards.

Posted by bkidd  · 03-06-2020 - 11:38

It works perfectly.  Thank you!

Posted by bkidd  · 04-06-2020 - 14:44

I had a customer upload a template and the picture alt text is stored in the title attribute, not in the descr attribute:

<wp:docPr id="1" name="Picture 1" title="$PHOTO$"/>

I opened the document, cleared the alt text and re-entered the same alt text and, after saving, the alt text was in the descr attribute as I was expecting:

<wp:docPr id="1" name="Picture 1" descr="$PHOTO$"/>

I am assuming this has something to do with the word processor / version of the user and I have asked for more information.

I can also run the query for the title attribute but I am assuming if I need to later replace the image source using replacePlaceholderImage, I'll need the placeholder in the descr attribute so this method can find the correct picture to update.

If this is true, is there any class methods available to insert a descr attrbitue on the element?  I'm hoping I can do this while looping through the results of the query.

UPDATE: User states: I am using Office 2016 on Windows 10 Home. my desktop is a Dell.

Posted by admin  · 04-06-2020 - 14:59

Hello,

The title attribute is a caption value, not the alternative text:

http://officeopenxml.com/drwPic-nvPicPr.php

The id is an integer and specifies a unique identifier. The name is a string and typically stores the original file name. The title is a string that specifies the caption. The descr is alternative text used for assistive technologies which do not display the picture.

We think the user hasn't added the alt text to the correct place.

If you need to customize Word contents/attributes, you can use customizeWordContents (https://www.phpdocx.com/api-documentation/docxcustomizer/customize-docx-Word-documents-PHP). As descr isn't one of the supported attributes, you need to use the customAttributes option and the same XPath query than the one used to get the images.

Maybe the easiest solution for your case is allowing both attributes to be used with replacePlaceholderImage. If you edit CreateDocxFromTemplate.php, and go to the Image4Image method, you can find this line:

if ($domImages->item($i)->getAttribute('descr') == self::$_templateSymbol . $variable . self::$_templateSymbol && $imageCounter == 0) {

that can be replaced by:

if (($domImages->item($i)->getAttribute('descr') == self::$_templateSymbol . $variable . self::$_templateSymbol || $domImages->item($i)->getAttribute('title') == self::$_templateSymbol . $variable . self::$_templateSymbol) && $imageCounter == 0) {

We have forwarded the topic to the dev team to be checked and include the same change to the stable release.

Regards.

Posted by admin  · 04-06-2020 - 15:18

Hello again,

We have tested it with MS Word 2016 and there're two fields when adding the alt text (as other MS Word versions), this is the XML output of the wp:docPr tag:

<wp:docPr descr="$VAR_DESCR_1$" id="826989436" name="" title="$VAR_TITLE_1$"/>

We think the user has added it to the Title field, not the Description one.

Regards.

Posted by bkidd  · 04-06-2020 - 15:33

I thought so too but it's not the case.  I don't want to post the file on the forum - I will send it to you at contact@phpdocx.com.

Based on your response, it's seems I should treat this as an anomaly but, unfortunately, the user doesn't know/care where it's store - he is seeing the value in the alt text.

Posted by admin  · 06-06-2020 - 12:09

Hello,

The next release of phpdocx will include the same changes to handle that cases.

Regards.