extractKeywords

WORD CONTENTS

LAYOUT & SETTINGS

TEMPLATES

FORMAT CONVERSION

DOCXPATH

DOCXUTILITIES

DOCXPATHUTILITIES

PDFUTILITIES

DOCXCUSTOMIZER

customizeWordContent

PERFORMANCE

BULK PROCESSING

TRACKING

CRYPTOPHPDOCX

DIGITAL SIGNATURE

ARTIFICIAL INTELLIGENCE

XLSXUTILITIES

PPTXUTILITIES

BLOCKCHAIN

VARIANT

extractKeywords

PREMIUM

TRIAL / ADVANCED

Extract top keywords in a DOCX document.

Description

public extractKeywords ( $source [, array $options = array()] )

Extracts top keywords in a DOCX.

Supported AI integrations:

GPT OpenAI
phpdocx AI

Parameters GPT OpenAI

source

DOCX document.

options

An array with the available options.

The possible keys and values are:

key	Type	Description
frequency_penalty	float	Default as 0.8.
max_tokens	int	Default as 1000. Each OpenAI model limits max tokens.
model	string	Default as 'text-davinci-003'.
presence_penalty	float	Default as 0.0.
prompt	string	Default as 'Extract keywords from this text:'.
referenceNode	array	Default all paragraphs. DOCXPath options for custom queries: 'type' (string) paragraph (default) 'contains' (string) 'occurrence' (int) 'attributes' (array) 'parent' (string) '/' (any parent, default), w:body or any other specific parent (/w:tbl/, /w:tc/, /w:r/...) 'customQuery' (string) if set overwrites all previous references. It must be a valid XPath query
returnFullResponse	bool	If true returns the whole GPT response. Default as false.
target	array	Extract specific targets: document headers footers footnotes endnotes comments
temperature	float	Default as 0 (set 0.5 to generate related keywords).
top_p	float	Default as 1.0.
url	string	Default as 'https://api.openai.com/v1/completions'.

Parameters phpdocx AI

source

DOCX document.

options

An array with the available options.

The possible keys and values are:

key	Type	Description
maxKeywords	int	Maximum number of keywords to return. Default as unlimited.
minLength	int	Minimum length of keywords. Default as null.
referenceNode	array	Default all paragraphs. DOCXPath options for custom queries: 'type' (string) paragraph (default) 'contains' (string) 'occurrence' (int) 'attributes' (array) 'parent' (string) '/' (any parent, default), w:body or any other specific parent (/w:tbl/, /w:tc/, /w:r/...) 'customQuery' (string) if set overwrites all previous references. It must be a valid XPath query
regExprCleanWords	string	Regular expression to clean contents to remove extra symbols. Default as '/[^\p{L}\p{N}\s]/u'.
stopWords	array	Words to be ignored. Default as empty. https://github.com/stopwords-iso to get stop word lists for many languages.
target	array	Extract specific targets: document headers footers footnotes endnotes comments

Return values

string or array with the keywords

Exceptions

Not valid DOCX source.

Error connecting to GPT.

GPT error.

Code samples

Example #1

require_once 'classes/CreateDocx.php';

$aiPhpdocx = new AIPhpdocx();
$keywords = $aiPhpdocx->extractKeywords('document.docx', array('maxKeywords' => 20, 'stopWords' => array('the', 'that', 'and', 'its'), 'minLength' => 4));
// filters can be used. https://github.com/stopwords-iso to get stop word lists for many languages

print_r($keywords);

The resulting output looks like:

Example #2

require_once 'classes/CreateDocx.php';

// set OPENAI API key
$aiGPT = new AIGPT('OPENAI_API_KEY');
$keywords = $aiGPT->extractKeywords('document.docx');

print_r($keywords);

The resulting output looks like:

Release notes

phpdocx 14.0:
- new method.