Filtering HTML options for the embedding of HTML into Word
This information is outdated. Since PHPDocX 3.0 one can also use XPath expressions for filtering the embedded HTML. Please, refer to the HTML to Word documentation for up to date info.
We have received a few questions about how to use the filter option available in the embedHTML and replaceTemplateVariableByHTML methods. So it may be now the time to extend the explanations included in the tutorial and API documentation via some simple examples.
We will use this simple HTML page as the HTML source for our examples.
We are going to cover the most important options:
- Select content by id
- Select content by CSS class
- Select content by HTML tag
- A combination of all of them
Select HTML content by id
This example is already covered in the tutorial but we include it here for completeness.
If we want to extract the content of an HTML content with id=”lateral” we have just to include in the filter option: filter=>array("#lateral") So the PHPDocX code reads:
$docx->embedHTML('http://www.2mdc.com/PHPDOCX/example.html', array('isFile' => true, 'parseDivsAsPs' => false, 'filter' => array('#lateral'), 'downloadImages' => true));
The resulting document reads like this.
One may also choose more than one id at a time:
$docx->embedHTML('http://www.2mdc.com/PHPDOCX/example.html', array('isFile' => true, 'parseDivsAsPs' => false, 'filter' => array('#lateral','#capa_bg_bottom'), 'downloadImages' => true));
As you may see in this case both ids are selected although some of the format is lost. If you want to preserve the format you should use the replaceTemplateVariableByHTML method (we will not elaborate more on that because it is out of the scope of this blog entry).
Select HTML content by Class
To illustrate this particular case we are going to extract all HTML contents with classes “rosa” and “naranja”. So we need this time the following code:
$docx->embedHTML('http://www.2mdc.com/PHPDOCX/example.html', array('isFile' => true, 'parseDivsAsPs' => false, 'filter' => array('.rosa','.naranja'), 'downloadImages' => true));
The resulting Word document reads like this.
Select HTML content by HTML tag
How we will do if we just want to extract the content within <p> tags?
Pretty simple:
$docx->embedHTML('http://www.2mdc.com/PHPDOCX/example.html', array('isFile' => true, 'parseDivsAsPs' => false, 'filter' => array('p'), 'downloadImages' => true));
Isn´t it? (download document).
Mixed selection
If we now want to extract the content within <h2>s and with id "e;entrada"e; we just need to insert this piece of code:
$docx->embedHTML('http://www.2mdc.com/PHPDOCX/example.html', array('isFile' => true, 'parseDivsAsPs' => false, 'filter' => array('#entrada','h2'), 'downloadImages' => true));
To get this resulting Word document.
We hope that at this point to generalize the procedure to more sophisticated cases could be pretty straightforward (but have a look at: http://abstrusegoose.com/474).
Recent Posts
-
March 4, 2013We are happy to announce the release of PHPDocX v3.2. This new version includes some important changes that greatly improve the PHPDocX...
-
February 6, 2013Since v3.0 we have included the notion of Word (or WordML) fragments to simplify the process of creating sophisticated Word documents from scratch...
-
January 31, 2013We are happy to announce the release of PHPDocX v3.1 This new version includes quite a few new features that you may find interesting: It is now...
-
January 8, 2013We have just released PHPDocX v3.0. This new version includes substantial changes that have required that this new version were not fully backwards...
-
December 21, 2012Although one can easily introduce real checkboxes in a Word document generated by PHPDocX via the embedHTML method (just include the corresponding...
PHPDocx. Dinamic generation of reports in .docx format 