Analysis of the usage of pinyin4j in Java

  • 2020-04-01 04:33:09
  • OfStack

This article illustrates the usage of pinyin4j in Java. Share with you for your reference, as follows:

The conversion of Chinese characters to pinyin is a common problem in daily development. For example, our great 12306, in the place name input "WH", will appear "wuhan", "WUHU" "weihai" and other place names, input "WUHU" will appear "WUHU".

Java gets the pinyin of Chinese characters, and the pinyin4j library is a good solution to this problem.

Download address: (link: http://sourceforge.net/projects/pinyin4j/)

Download the extract, there is a pinyin4j-2.5.0.jar, use this library.

Pinyin for Chinese characters:

String[] pinyin = PinyinHelper.toHanyuPinyinStringArray(' heavy ');

The above line of code is a single Chinese character to pinyin, such as "heavy", this method returns a String type array:

"Zhong4"

"Chong2"

The "heavy" is a polyphonic word, and the return array of this method contains the pinyin of all the pronunciations of the word. The number at the end of each sound is the tone.

This is the simplest way to get a single character, and you can use the HanyuPinyinOutputFormat to format the returned pinyin.


HanyuPinyinOutputFormat format = new HanyuPinyinOutputFormat();
//-sheldon: I'll take it in UPPERCASE.
//LOWERCASE: LOWERCASE (zhong)
format.setCaseType(HanyuPinyinCaseType.LOWERCASE);
//WITHOUT_TONE: no sound mark (zhong)
//WITH_TONE_NUMBER: 1-4
//WITH_TONE_MARK: directly with phonetic symbol (must be WITH_U_UNICODE otherwise exception) (zhong)
format.setToneType(HanyuPinyinToneType.WITH_TONE_MARK);
//WITH_V: u (nv) in terms of v
//WITH_U_AND_COLON :" u:" for u (nu)
//WITH_U_UNICODE: directly using u (nu)
format.setVCharType(HanyuPinyinVCharType.WITH_U_UNICODE);
String[] pinyin = PinyinHelper.toHanyuPinyinStringArray(' heavy ', format);
toHanyuPinyinStringArray If the incoming character is not a Chinese character and cannot be converted to pinyin, it will be returned directly null . 

Although pinyin4j is useful, it is limited. The above code can only obtain the pinyin of a single Chinese character, but not a word containing polyphonic pinyin. For example, "chongqing" cannot determine whether it is "chongqing" or "zhongqing", and pinyin4j cannot judge the pronunciation of polyphonyms from the context.

So, to get the pronunciation of a word that contains polyphonic words, you can return a list, the correct pronunciation can only be artificial judgment choice.

I hope this article has been helpful to you in Java programming.


Related articles: