summarize

summarize

PREMIUM TRIAL / ADVANCED

Summarizes contents in a DOCX document.

Description
public summarize ( $source [, array $options = array()] )

Summarizes contents in a DOCX.

Parameters

source

DOCX document.

options

An array with the available options.

The possible keys and values are:

key Type Description
minLength int Minimum length of keywords. Default as null.
orderByScore bool If true order by paragraph scores returned from the keyword scores. Default as false.
referenceNode array Default all paragraphs. DOCXPath options for custom queries:
  • 'type' (string) paragraph (default)
  • 'contains' (string)
  • 'occurrence' (int)
  • 'attributes' (array)
  • 'parent' (string) '/' (any parent, default), w:body or any other specific parent (/w:tbl/, /w:tc/, /w:r/...)
  • 'customQuery' (string) if set overwrites all previous references. It must be a valid XPath query
regExprCleanWords string Regular expression to clean contents to remove extra symbols. Default as '/[^\p{L}\p{N}\s]/u'.
stopWords array Words to be ignored. Default as empty. https://github.com/stopwords-iso to get stop word lists for many languages.
Return values

string or array

Exceptions

Not valid DOCX source.

Code samples

Example #1

The resulting output looks like:

Release notes
  • phpdocx 14.0:
    • new method.