Content analysis of several document formats with phpdocx Indexer
Virtual library content analysis
A virtual library contains documents in different formats (TXT, DOC, RTF, ODT and DOCX), and wants to analyze their contents. To accomplish this task, they use the conversion plugin of phpdocx to convert them to DOCX and later analyze the content with Indexer.
The formats of the documents are transformed with the method transformDocument of the conversion plugin. The following code snippet shows an example of each format.
Using Indexer, the contents of the documents can be analyzed, for example extracing text and image contents: