Register Lost your password?

Blog - News

Convert HTML to Word with PHP

One of the most demanded functionalities by our PHPDocX users is the posibility to generate Word documents out of HTML.

Since the launch of the 2.5 version of PHPDocX we have at our disposal two new methods: embedHTML() and replaceTemplateVariableByHTML() that allow to convert HTML into Word with a high degree of customization.

The configuration options for both new methods, embedHTML and replaceTemplateVariableByHTML, include:

  • The posibility of extracting the HTML from:
    • A external URL.
    • A internal HTML file.
    • A string of HTML code.
  • To select different containers of the whole HTML code for the HTML to Word conversion.
  • Embed or not the images included in the HTML code.
  • Embed the HTML into a template via replaceTemplateVariableByHTML().
  • Use different styles:
    • The ones included in the CSS stylesheet used in the HTML or written inline in the very same HTML code.
    • The Word styles included in the template used by PHPDocX.
    • Or a combination of both.

Moreover this conversión is obtained by direct translation of the HTML code into WordProcessingML (the native Word format) so the result is fully compatible with Open Office (and all its avatars), the Microsoft compatibility
pack for Word 2003 and most importantly with the conversión to PDF, DOC, ODT and RTF included in the library.

I guess that at this point you would like to have a more clear idea of how the HTML to Word converter method works and see some real examples of use and results.

Let’s start with a very simple example and let us build upon it.

Let us first insert a simple HTML string into a Word document:


require_once('path_to_phpdocx/classes/createDocx.inc');
$docx = new CreateDocx();
$html =  '<p>A <span style="font-weight: bold" >very simple</span> HTML example.</p>';
$docx->embedHTML($html);
$docx->createDocx('embedHTML_1');

As you can check from the result:

simple Word document from HTML

the string of HTML code has been included in the Word document.

Let us now go one step further: include some HTML code but making use of some native Word styles:


require_once('path_to_phpdocx/classes/createDocx.inc');
$docx = new CreateDocx();

$myHTML = '<br /><p style="font-family: Calibri; font-size: 11pt">We include a table with rowspans and colspans using the embedHTML method.</p>
<table style="font-family: Calibri; font-size: 11pt">
<tr>
<td>header 1</td>
<td>header 2</td>
<td>header 3</td>
<td>header 4</td>
</tr>
<tr>
<td rowspan="2" colspan="2">cell_1_1</td>
<td>cell_1_3</td>
<td>cell_1_4</td>
</tr>
<tr>
<td>cell_2_3</td>
<td>cell_2_4</td>
</tr>
<tr>
<td>cell_3_1</td>
<td>cell_3_2</td>
<td>cell_3_3</td>
<td>cell_3_4</td>
</tr>
</table>';
$docx->embedHTML($myHTML, array('tableStyle' => 'MediumGrid3-accent5PHPDOCX'));
$docx->createDocx('embedHTML_2');

So you obtain (download Word document):

HTML to Word with Word styles

You may also extract the HTML content from an existing file but rather than illustrate that functionality with the embedHTML() method we will use its equivalent method: replaceTemplateVariableByHTML() that is used when Word templates get into play:


require_once 'classes/CreateDocx.inc';

$docx = new CreateDocx();
$docx->addTemplate('testHTML2mdc.docx');

$docx->replaceTemplateVariableByHTML('ADDRESS', 'inline', '<p style="font-family: verdana; font-size: 11px">C/ Matías Turrión 24, Madrid 28043 <b>Spain</b></p>', array('isFile' => false, 'parseDivsAsPs' => true, 'downloadImages' => false));
$docx->replaceTemplateVariableByHTML('CHUNK_1', 'block', 'http://www.2mdc.com/PHPDOCX/example.html', array('isFile' => true, 'parseDivsAsPs' => true,  'filter' => 'capa_bg_bottom', 'downloadImages' => true));
$docx->replaceTemplateVariableByHTML('CHUNK_2', 'block', 'http://www.2mdc.com/PHPDOCX/example.html', array('isFile' => true, 'parseDivsAsPs' => false,  'filter' => 'lateral', 'downloadImages' => true));

$docx->createDocx('webpage');

So from a very simple template:

HTML to Word with Word styles

And a standard web page, one may get this Word document:

Word from HTML file

You may notice that to get the effect of the floating div on the right of the web page we have inserted a table into the template and we have consequently extracted the correspondings “columns in HTML”, using the parameter id in the replaceTemplatevariableByHTML method, because this is the option that guarantees a closer fit to the original format.

We would like also to comment on the different options for the “filter” parameter option:

  • “filter” can be an array or a single string, depending on how many different type of elements we wish to include in the final Word document.
  • In either case the values that it can take are a string with or without certain prefixes/sufixes:
    • “#” to select an HTML element with a certain id, for example, #main_content will only extract the HTML within the div (or any other HTML element) tagged as <div id=”main_content” >.
    • “.” to select an HTML element with a certain class, for example, .news will only extract the HTML within the divs (or any other HTML elements) tagged as <div class=”news” >.
    • “< >” to select an HTML element with a certain HTML tag, for example, <p> will only extract the HTML within paragraphs.
    • If the prefix/suffix is omitted PHPDocX will understand that it should extract all content with that id, class or tag, for example p will extract all the HTML with id, class or tag named p.

Obviously the posibilities are, in principle, practically unlimited and we do not want to bore you here with excesive detail. If you want to check all the available options you may have a look at:

PHPdocx 2.3 and HTML insertion: compatibility issues with generated documents

To render HTML in a docx document generated with PHPdocx we use a documented tag called <altchunk> that permits to add content to a docx document (rtf, HTML, more …) to a word document. This tag permits that the Word’s own render engine can read and show properly this kind of content to the user.

The problem is that not every version of Word that support docx documents have support for <altchunk> and this is not an issue of PHPdocx: it’s a decision of Microsoft.

At this point,Open Office, Office for Windows 2003 with compatibility Pack and Office for Mac 2008 don´t have support for <altchunk> so any document created with any utility that use <altchunk> can render it properly (and give some error advice when is opened).

Office 2007 for Windows or higher and Office for Mac 2011 have support for <altchunk>.

But there will be a solution to add, at least, some basic HTML to docx documents that will be opened in a non-compatible Word version that will be add to PHPdocx 2.4: a new method called AddbasicHTML .

AddbasicHTML will be available on PHPdocx Pro 2.4 early september. Users of PHPdocx Pro 2.3 could ask for a Golden Master version if this issue are affecting to the deployment of PHPdocx and the documents generated.

Creating a Report using PHPdocx

This is an advanced example of how to use PHPdocx to create a complete Report.

The code is commented to make easy to understand how it works:


require_once '../../classes/CreateDocx.inc';

$docx = new CreateDocx();

// browser stats
$statsFeb2009Feb2010 = '
<STATS>
    <BROWSER>
        <NAME>Internet Explorer</NAME>
        <VALUE>58</VALUE>
    </BROWSER>
    <BROWSER>
        <NAME>Firefox</NAME>
        <VALUE>31</VALUE>
    </BROWSER>
    <BROWSER>
        <NAME>Chrome</NAME>
        <VALUE>4</VALUE>
    </BROWSER>
    <BROWSER>
        <NAME>Safari</NAME>
        <VALUE>3</VALUE>
    </BROWSER>
    <BROWSER>
        <NAME>Opera</NAME>
        <VALUE>2</VALUE>
    </BROWSER>
    <BROWSER>
        <NAME>Other</NAME>
        <VALUE>2</VALUE>
    </BROWSER>
</STATS>
';

$statsFeb2010Feb2011 = '
<STATS>
    <BROWSER>
        <NAME>Internet Explorer</NAME>
        <VALUE>50</VALUE>
    </BROWSER>
    <BROWSER>
        <NAME>Firefox</NAME>
        <VALUE>31</VALUE>
    </BROWSER>
    <BROWSER>
        <NAME>Chrome</NAME>
        <VALUE>11</VALUE>
    </BROWSER>
    <BROWSER>
        <NAME>Safari</NAME>
        <VALUE>4</VALUE>
    </BROWSER>
    <BROWSER>
        <NAME>Opera</NAME>
        <VALUE>2</VALUE>
    </BROWSER>
    <BROWSER>
        <NAME>Other</NAME>
        <VALUE>2</VALUE>
    </BROWSER>
</STATS>
';

// add text and date in header
$header = 'Browsers stats';

$paramsHeader = array(
    'jc' => 'right',
    'textWrap' => 5,
);

$date = getdate();

$docx->addHeader($header . ' ' . $date['mon'] . '/' . $date['mday'] . '/'
    . $date['year'], $paramsHeader);

// add footer with pager
$footer = 'DOCX generated using PHPDOCX PRO.';

$paramsFooter = array(
    'pager' => 'true',
    'pagerAlignment' => 'center',
);

$docx->addFooter($footer, $paramsFooter);

// add chart title
$title = 'Browsers stats chart';

$paramsTitle = array(
    'val' => 1,
    'b' => 'single',
    'sz' => 22
);

$docx->addTitle($title, $paramsTitle);

// add line break
$docx->addBreak('line');
$docx->addBreak('line');
$docx->addBreak('line');

// read XML Feb209Feb2010
$xmlFeb2009Feb2010 = new DOMDocument();
$xmlFeb2009Feb2010->loadXML($statsFeb2009Feb2010);

// read XML Feb2010Feb2011
$xmlFeb2010Feb2011 = new DOMDocument();
$xmlFeb2010Feb2011->loadXML($statsFeb2010Feb2011);

// read stats and create charts
$legendsAndValues = array();

for ($i = 0; $i < $xmlFeb2009Feb2010->getElementsByTagName("NAME")->length; $i++) {
    $legendsAndValues[$xmlFeb2009Feb2010->getElementsByTagName("NAME")->item($i)->nodeValue] = array(
        $xmlFeb2009Feb2010->getElementsByTagName("VALUE")->item($i)->nodeValue
    );
}

$chart = array(
    'data' => $legendsAndValues,
    'type' => 'pie3DChart',
    'title' => 'Feb 2009 - Feb 2010',
    'cornerX' => 20, 'cornerY' => 20, 'cornerP' => 30,
    'color' => 2,
    'textWrap' => 0,
    'sizeX' => 14, 'sizeY' => 8,
    'jc' => 'center',
    'showPercent' => 1,
);

$docx->addGraphic($chart);

// add line break
$docx->addBreak('line');
$docx->addBreak('line');

$legendsAndValues = array();

for($i = 0; $i < $xmlFeb2010Feb2011->getElementsByTagName("NAME")->length; $i++) {
    $legendsAndValues[$xmlFeb2010Feb2011->getElementsByTagName("NAME")->item($i)->nodeValue] = array(
        $xmlFeb2010Feb2011->getElementsByTagName("VALUE")->item($i)->nodeValue
    );
}

$chart['data'] = $legendsAndValues;
$chart['title'] = 'Feb 2010 - Feb 2011';

$docx->addGraphic($chart);

// add page break
$docx->addBreak('page');

// add table title
$title = 'Browsers stats table';

$paramsTitle = array(
    'val' => 1,
    'b' => 'single',
    'sz' => 22
);

$docx->addTitle($title, $paramsTitle);

// add line break
$docx->addBreak('line');
$docx->addBreak('line');
$docx->addBreak('line');

// read stats and create table
$table = array();

$table[] = array(
    '',
    'Feb 2009 - Feb 2010',
    'Feb 2010 - Feb 2011',
);

for($i = 0; $i < $xmlFeb2009Feb2010->getElementsByTagName("NAME")->length; $i++) {
    $paramsHeaderTextTable[0] = array(
        'text' => $xmlFeb2009Feb2010->getElementsByTagName("NAME")->item($i)->nodeValue,
        'b' => 'single',
        'sz' => 14
    );

    $table[] = array(
        $docx->addElement('addText', $paramsHeaderTextTable),
        $xmlFeb2009Feb2010->getElementsByTagName("VALUE")->item($i)->nodeValue,
        $xmlFeb2010Feb2011->getElementsByTagName("VALUE")->item($i)->nodeValue
    );
}

$paramsTable = array(
    'border' => 'single',
    'border_sz' => 2,
    'jc' => 'center',
    'size_col' => 2800
);

$docx->addTable($table, $paramsTable);

// add page break
$docx->addBreak('page');

// add text
$textInfo = 'Stats are based on aggregate data collected by StatCounter on a'
. ' sample exceeding 15 billion pageviews per month collected from across'
. ' the StatCounter network of more than 3 million websites. Stats are '
. ' updated and made available every 4 hours, however are subject to '
. 'quality assurance testing and revision for 7 days from publication.';

$paramsTextInfo = array(
    'val' => 1,
    'i' => 'single',
    'sz' => 8
);

$docx->addText($textInfo, $paramsTextInfo);

// add link
$docx->addLink('Source: StatCounter', 'http://gs.statcounter.com');

// generate DOCX file
$docx->createDocx('example_report');

Add a footnote to a chart in a docx document using PHPdocx

In you want to add a chart (pie, bars) in PHPdocx is easy. You have some examples here, like: Charts and templates mix good in PHPdocx, Creating a 3D pie chart with PHPdocx to include in a docx document or Creating a bar chart in a docx document with PHPdocx

But, if you need to add some footnotes to a chart? here is the code of how to do it:


require_once '../../classes/CreateDocx.inc';

$docx = new CreateDocx();

$text = 'Lorem ipsum dolor sit amet, consectetur adipisicing elit, ' .
    'sed do eiusmod tempor incididunt ut labore et dolore magna aliqua:';

$paramsText = array(
    'font' => 'Arial'
);

$docx->addText($text, $paramsText);

$legends = array(
    'legend1' => array(24),
    'legend2' => array(45),
    'legend3' => array(31)
);
$args = array(
    'data' => $legends,
    'type' => 'pie3DChart',
    'title' => 'Title first chart',
    'cornerX' => 20, 'cornerY' => 20, 'cornerP' => 30,
    'color' => 2,
    'textWrap' => 0,
    'sizeX' => 10, 'sizeY' => 10,
    'jc' => 'left',
    'showPercent' => 1,
    'font' => 'Times New Roman',
    'border' => 1
);
$docx->addGraphic($args);

$docx->addFootnote(
    array(
        'textDocument' => 'Lorem ipsum dolor sit amet',
        'textEndNote' => 'Curabitur id dui purus, sit amet blandit lacus. ' .
    					 'Vivamus mollis magna et risus molestie blandit. ' .
    					 'Phasellus vel tortor quis metus consectetur.'
    )
);

$docx->addBreak('line');

$text = 'Cras eget porttitor sapien. Aenean tristique, nibh quis egestas ' .
		'varius, erat neque sodales neque, quis bibendum sem lorem accumsan ' .
		'mauris. Aliquam justo justo, vulputate sed condimentum non, pharetra:';

$paramsText = array(
    'font' => 'Arial'
);

$docx->addText($text, $paramsText);

$paramsImg = array(
    'name' => '../files/img/image.png',
	'scaling' => 75,
    'textWrap' => 0,
    'border' => 1,
);

$docx->addImage($paramsImg);

$docx->addFootnote(
    array(
        'textDocument' => 'Aenean non gravida sapien',
        'textEndNote' => 'Nunc pretium bibendum dui id laoreet. Nunc ' .
    					 'venenatis. Duis quis lorem vel dui tincidunt ' .
    					 'pellentesque quis sed diam.'
    )
);

$docx->createDocx('example_chart_footnotes');

Multiple custom documents with PHPdocx

Do you want multiple and name-customized documents from only one code?

Thos example shows you how to create multiple documents customized for each user with only one code:


require_once '../../classes/CreateDocx.inc';

$users = array(
				0 => array('name' => 'Don Mattingly',
							'value1' => '0.2',
							'value2' => '0.4',
							'value3' => '0.6',
							),
				1 => array('name' => 'Brian Sipe',
							'value1' => '0.3',
							'value2' => '0.3',
							'value3' => '0.4',
							),
				2 => array('name' => 'Julius Erving',
							'value1' => '0.1',
							'value2' => '0.2',
							'value3' => '0.7',
							)
);

foreach ($users as $user) {

	$docx = new CreateDocx();

	$paramsTitle = array(
	    'val' => 1,
	    'u' => 'single',
	);

	$docx->addTitle($user['name'] . '\'s Document', $paramsTitle);

	$docx->addBreak('line');

	$text = array();

	$text[] =
	    array(
	        'text' => 'Hi, ',
	);

	$text[] =
	    array(
	        'text' => $user['name'],
	        'b' => 'single',
	);

	$text[] =
	    array(
	        'text' => ' lorem ipsum dolor sit amet, consectetur' .
				 'adipiscing elit. Pellentesque egestas gravida tincidunt. ' .
				 'Nunc ante enim, auctor at elementum porttitor, pharetra a' .
				 ' erat. Vivamus semper orci nec neque faucibus a varius ' .
				 'libero ultrices. Mauris viverra, nisl sed ullamcorper.',
	);

	$docx->addText($text);

	$docx->addBreak('line');

	$docx->addText('Lorem ipsum dolor sit amet, consectetur: ');

	$paramsList = array(
    	'val' => 1,
		'bullets' => array(3, 1, 2)
	);

	$valuesList = array(
	    'Donec tellus justo',
		    array(
		        'faucibus nec commodo quis',
		        'dignissim ut ipsum',
		        'Aenean hendrerit interdum',
				    array(
				        'Morbi malesuada luctus libero',
				        'sodales est placerat eget',
				        'Aenean eget nulla vel'
				    ),
		    ),
	    'enim viverra iaculis',
	    'aliquet aliquam nisl',
	);
	$docx->addList($valuesList, $paramsList);

	$docx->addBreak('page');

	$legends = array(
	    '0' => array('sequence 1', 'sequence 2', 'sequence 3'),
	    'legend1' => array($user['value1']),
	    'legend2' => array($user['value2']),
	    'legend3' => array($user['value3'])
	);
	$args = array(
	    'data' => $legends,
	    'type' => 'pie3DChart',
	    'title' => $user['name'] . '\'s chart',
	    'cornerX' => 20, 'cornerY' => 20, 'cornerP' => 30,
	    'color' => 2,
	    'textWrap' => 0,
	    'sizeX' => 10, 'sizeY' => 10,
	    'jc' => 'left',
	    'showPercent' => 1,
	    'font' => 'Times New Roman'
	);
	$docx->addGraphic($args);

	$docx->createDocx('example_multidocument_' . $user['name']);

}