Forum


Replies: 5   Views: 454
Placeholders are split across xml nodes
Topic closed:
Please note this is an old forum thread. Information in this post may be out-to-date and/or erroneous.
Every phpdocx version includes new features and improvements. Previously unsupported features may have been added to newer releases, or past issues may have been corrected.
We encourage you to download the current phpdocx version and check the Documentation available.

Posted by bkdl  · 31-08-2021 - 21:17

Hello,

Our word templates are user-generated which means that sometimes the formatting may be strange and unpredictable. We have a case where even a simple toggle of bold on a part of a placeholder will split the placeholder in the xml.

processTemplate doesn't seem to be able to repair these placeholders. In the xml below I selected a few characters from the placeholder text and toggled Bold on and off. You can see that this split the placeholder into a few runs (w:r).

What should we do in these scenarios? Are there any settings we can use to get phpdocx to recognize these as a placeholder and repair it? I suspect the issue is that the run properties (w:rPr) are not the same across each run, but if their styles are evaluated they would be a match.

 


<w:document mc:Ignorable="w14 wp14">
   <w:body>
      <w:p>
         <w:pPr>
            <w:pStyle w:val="Normal"/>
            <w:bidi w:val="0"/>
            <w:jc w:val="left"/>
            <w:rPr/>
         </w:pPr>
         <w:r>
            <w:rPr>
               <w:b w:val="false"/>
               <w:bCs w:val="false"/>
            </w:rPr>
            <w:t>{{USER</w:t>
         </w:r>
         <w:r>
            <w:rPr/>
            <w:t>_N</w:t>
         </w:r>
         <w:r>
            <w:rPr>
               <w:b w:val="false"/>
               <w:bCs w:val="false"/>
            </w:rPr>
            <w:t>A</w:t>
         </w:r>
         <w:r>
            <w:rPr/>
            <w:t>ME}}</w:t>
         </w:r>
      </w:p>
      <w:p>
         <w:pPr>
            <w:pStyle w:val="Normal"/>
            <w:bidi w:val="0"/>
            <w:jc w:val="left"/>
            <w:rPr/>
         </w:pPr>
         <w:r>
            <w:rPr/>
         </w:r>
      </w:p>
      <w:sectPr>
         <w:type w:val="nextPage"/>
         <w:pgSz w:w="12240" w:h="15840"/>
         <w:pgMar w:left="1134" w:right="1134" w:header="0" w:top="1134" w:footer="0" w:bottom="1134" w:gutter="0"/>
         <w:pgNumType w:fmt="decimal"/>
         <w:formProt w:val="false"/>
         <w:textDirection w:val="lrTb"/>
         <w:docGrid w:type="default" w:linePitch="100" w:charSpace="0"/>
      </w:sectPr>
   </w:body>
</w:document>

 

Posted by admin  · 01-09-2021 - 07:07

Hello,

By default, phpdocx allows using a single symbol ($, #, |...) or ${ } to wrap placeholders.

Using a not unique placeholder distinct than ${ }, such as #{ }, {{ }}, [], [[]]... requires customizing the static variable CreateDocxFromTemplate::$regExprVariableSymbols. You can read this same information on the documentation page of the setTemplateSymbol method (https://www.phpdocx.com/api-documentation/templates/set-Word-template-placeholder-variable-symbol):

Different at the beginning and the end: ${VAR}. phpdocx requires using ${ } to wrap placeholders that don't use the same symbol at the beginning and the end, to use other symbols the public static variable CreateDocxFromTemplate::$regExprVariableSymbols must be customized.

CreateDocxFromTemplate::$regExprVariableSymbols is a public static variable with a regular expression that must match the symbols you want to use to wrap placeholders (as this is a regular expression, please note you need to escape protected characters in regular expressions such as [ ]).

We recommend using the default symbols to wrap placeholders:

  • A single symbol (1 byte character): $VAR$, #VAR#, |VAR|...
  • Or ${ }: ${VAR}

As explained previously, using not unique simbols different to the default ${ } require customizing CreateDocxFromTemplate::$regExprVariableSymbols, for example, the following code:

CreateDocxFromTemplate::$regExprVariableSymbols = '\[.*\]';
$docx->setTemplateSymbol('[', ']');

print_r($docx->getTemplateVariables());

sets [ ] as symbols to wrap placeholders. Please note that this custom code may require some minor adjustments to get the exact regular expression to be used in your templates, for example to exclude [ if you add it as other content in the document:

CreateDocxFromTemplate::$regExprVariableSymbols = '\[[^\[]*\]';

This custom regular expression is required to clean splitted placeholders across XML nodes with processTemplate (this method is called automatically by phpdocx) when not using the default settings (a single symbol or ${ } to wrap placeholders).

Regards.

Posted by bkdl  · 01-09-2021 - 14:29

Thanks for the quick reply.

I should have mentioned that we have customized those fields already according to the docs. All variable replacement works, except when it is split across nodes as shown previously. Is there anything I'm missing with this regex? (We're using `#` as our block identifier.)

CreateDocxFromTemplate::$regExprVariableSymbols = '\{\{(?:#){0,1}(?:[A-Z0-9\s\-_])+\}\}';

Is it perhaps that we're allowing a space character in our variables?

Posted by bkdl  · 01-09-2021 - 15:29

I changed our regex to the following and it's working well now...

'\{\{[^}]+\}\}'

That's s till too permissive for us, but you put me on the right track to find what my previous regex was missing. maybe newlines? idk I'll keep exploring.

 

Thanks!

Posted by admin  · 01-09-2021 - 15:35

Hello,

We think the problem is that you need to include all characters (<, >, :...), not only letters, numbers, _ ... as set in [A-Z0-9\s\-_]

If we try a more generic regex using your regex as base:

\{\{(?:#){0,1}(?:.)+\}\}

The placeholder in the XML is returned correctly.

Also note that this regular expression (and the new one you have posted) will match if you have {{ and }}, but MS Word may break them, for example:

<w:r>
   <w:rPr>
      <w:b w:val="false"/>
      <w:bCs w:val="false"/>
   </w:rPr>
   <w:t>{</w:t>
</w:r>
<w:r>
   <w:rPr>
      <w:b w:val="false"/>
      <w:bCs w:val="false"/>
   </w:rPr>
   <w:t>{USER</w:t>
</w:r>
<w:r>
   <w:rPr/>
   <w:t>_N</w:t>
</w:r>
<w:r>
   <w:rPr>
      <w:b w:val="false"/>
      <w:bCs w:val="false"/>
   </w:rPr>
   <w:t>A</w:t>
</w:r>
<w:r>
   <w:rPr/>
   <w:t>ME}</w:t>
</w:r>
<w:r>
   <w:rPr/>
   <w:t>}</w:t>
</w:r>

A more generic regex would be:

\{(?:.)*\{(?:#){0,1}(?:.)+\}(?:.)*\}

The easiest approach is using a single symbol ($, |, @...) or the default ${ } to wrap placeholders; in these cases you don't need to change the default regex. But if you want to customize it using {{ }} or others such as {[ ]}, $[ ]$ then a generic regex such as the previous one must be generated.

We are writting a new cookbook to details everything about using custom not unique symbols to wrap placeholders and how to generate correct regex for these specific cases. It will available on https://www.phpdocx.com/documentation/cookbook/.

Regards.

Posted by bkdl  · 01-09-2021 - 16:08

Ohhhhhhhhhh. This regex operates on the entire XML as a string for each portion of the document. So in the case I listed we're not matching because there are colons, slashes, and other characters we were excluding.

Considering this behavior I don't think we have much of an option to be specific in our variable name requirements within the delimiters, but that's probably ok.

Thanks again!