extractKeywords

extractKeywords

PREMIUM TRIAL / ADVANCED

Extract top keywords in a DOCX document.

Description
public extractKeywords ( $source [, array $options = array()] )

Extracts top keywords in a DOCX.

Parameters

source

DOCX document.

options

An array with the available options.

The possible keys and values are:

key Type Description
maxKeywords int Maximum number of keywords to return. Default as unlimited.
minLength int Minimum length of keywords. Default as null.
referenceNode array Default all paragraphs. DOCXPath options for custom queries:
  • 'type' (string) paragraph (default)
  • 'contains' (string)
  • 'occurrence' (int)
  • 'attributes' (array)
  • 'parent' (string) '/' (any parent, default), w:body or any other specific parent (/w:tbl/, /w:tc/, /w:r/...)
  • 'customQuery' (string) if set overwrites all previous references. It must be a valid XPath query
regExprCleanWords string Regular expression to clean contents to remove extra symbols. Default as '/[^\p{L}\p{N}\s]/u'.
stopWords array Words to be ignored. Default as empty. https://github.com/stopwords-iso to get stop word lists for many languages.
target array Extract specific targets:
  • document
  • headers
  • footers
  • footnotes
  • endnotes
  • comments
Return values

string or array with the keywords

Exceptions

Not valid DOCX source.

Code samples

Example #1

The resulting output looks like:

Release notes
  • phpdocx 14.0:
    • new method.