Brief introduction to JavaScript character set

  • 2020-03-30 03:03:59
  • OfStack

JavaScript is case sensitive:

Keywords, variables, function names, and all identifiers must be in the same case (we usually write them in lower case), which is a big difference from the various styles of writing C#.

For example :(take the variables STR and STR as examples)

var str='abc';
var Str='ABC';
alert(str);//The output of ABC

< img SRC = "border = 0 / / 201442211829 ">

If STR and STR are the same variable, alert(STR); , the output should be ABC instead of ABC as shown in the figure above. This just goes to show that JavaScript is case sensitive.

Unicode escapes sequences

The Unicode character set was created to make up for the limitation that ASCII code can only represent 128 characters, while ASCII is obviously impossible if we want to display Chinese characters and Japanese characters. So Unicode is a superset of ASCII and latin-1. First, JavaScript program is written in the Unicode character set, but in some computer hardware and software can't fully display or input Unicode character set (such as: e), in order to solve this phenomenon JavaScript defines a particular sequence, the sequence of six ASCII characters used to represent any 16-bit Unicode code, this particular sequence referred to as Unicode escape sequences, it take \ u as prefix, four hexadecimal number follows it

Such as:

var str='cafu00e9';
var Str='caf e ';
alert(Str+' '+str);//You can see that the display is the same.
alert (Str===str);//The output of true

< img SRC = "border = 0 / / 201442211917 "> < img SRC =" border = 0 / / 201442211945 ">

However, it should be noted that Unicode allows multiple methods to encode the same character, as illustrated by the e escape example above:


1. Can be represented by the Unicode character \u00E9

2. Can also be expressed by e\u0301(intonation character)

var str='cafu00e9';
var Str='cafeu0301';
alert(str+' '+Str); //As shown in the figure below, the output of Str and Str is the same
alert(Str===str); //The result is the same, but the binary representation is not the same at all, so false is printed

Although according to the result is the same in a text editor, but they cost the same binary code said no, and programming languages will eventually translate into local platform computer machine code (binary codes), the computer can only know the decision by comparing binary encoding, so they are the end result is false

So this is the best explanation of the fact that Unicode allows multiple methods to encode the same character, because the Unicode standard defines a preferred encoding format for all characters in order to convert text into a unified Unicode escape sequence for proper comparison

Again, take e as an example:

Is face the same as e in cafe?

Related articles: