Detailed explanation of word classes in PHPAnalysis

  • 2021-06-29 10:39:41
  • OfStack

PHPAnalysis is currently a widely used Chinese word classifier, which uses the reverse matching pattern to partition words, so compatible encoding is wider. Now its variables and commonly used functions are described in detail as follows:

1. More important member variables

$resultType = 1 generated word breaking result data type (1 for all, 2 for Dictionary Vocabulary and a single Chinese, Japanese, Korean and English, 3 for Dictionary Vocabulary and English)
This variable 1 is typically set using the SetResultType ($rstype) method.
$notSplitLen = 5 Minimum length of split sentence
$toLower = false lowercase all English words
$differMax = false uses maximum slicing mode to disambiguate 2-ary words
$unitWord = true tries to merge words (that is, new word recognition)
$differFreq = false uses the Top Word Priority model to eliminate ambiguity

2. List of main member functions

1. public function uconstruct ($source_charset='utf-8', $target_charset='utf-8', $load_all=true, $source=')
Function description: Constructor
Parameter list:
$source_charset Source String Encoding
$target_charset directory string encoding
$load_Whether all fully loads the dictionary (this parameter is obsolete)
$source source string
If both input and output are utf-8, you can actually set the text to be manipulated using the SetSource method instead of initializing with any parameters

2. public function SetSource ($source, $source_charset='utf-8', $target_charset='utf-8')
Function description: Set source string
Parameter list:
$source source string
$source_charset Source String Encoding
$target_charset directory string encoding
Return value: bool

3. public function StartAnalysis ($optimize=true)
Function description: Begin word breaking operation
Parameter list:
Does $optimize try to optimize results after word breaking
Return value: void
A basic word segmentation process:
//////////////////////////////////////
$pa = new PhpAnalysis();

$pa- > SetSource ('String requiring participle');

//Set participle properties
$pa- > resultType = 2;
$pa- > differMax = true;

$pa- > StartAnalysis();

//Get the results you want
$pa- > GetFinallyIndex();
////////////////////////////////////////

4. public function SetResultType ($rstype)
Function description: Set the type of result returned
Actual operation on member variable $resultType
The parameter $rstype value is:
1 for all, 2 for Dictionary Vocabulary and a single simple Chinese, Japanese, Korean characters and English, 3 for Dictionary Vocabulary and English
Return value: void

5. public function GetFinallyKeywords ($num = 10)
Function description: Gets the specified number of terms that occur most frequently (usually used to extract document keywords)
Parameter list:
$num = 10 returns the number of entries
Return value: A list of keywords separated by','

6. public function GetFinallyResult ($spword='')
Function description: Get the final word break result
Parameter list:
Separator between $spword entries
Return value: string

7. public function GetSimpleResult()
Function description: Obtain coarse score results
Return value: array

8. public function GetSimpleResultAll()
Function description: Obtain rough results with attribute information
Properties (1 Chinese sentence, 2 ANSI vocabulary (including full angle), 3 ANSI punctuation (including full angle), 4 numbers (including full angle), 5 Chinese punctuation or unrecognized characters)
Return value: array

9. public function GetFinallyIndex()
Function description: Get hash index array
Return value: array ('word'= > count,...) Sorted by frequency of occurrence

10. public function MakeDict ($source_file, $target_file='')
Function description: Compile a text file lexicon into a dictionary
Parameter list:
$source_file Source Text File
$target_file target file (current dictionary if not specified)
Return value: void

11. public function ExportDict ($targetfile)
Function description: Export all entries of the current dictionary as text files
Parameter list:
$targetfile target file
Return value: void


Related articles: