Parse the escape function in php

  • 2020-06-23 00:04:07
  • OfStack

js is used to encode Chinese characters in URL.
< a href = "" onclick =" window open (' product_list. php? p_sort='+escape(' this site '));" > This is the effect of clicking a link:
Reference: http: / / 127.0.0.1 shop/product_list php? p_sort = u6E90 u8D44 u53D1 u5F00 PHP % % % % % u7F51
With this effect, it is clear that urldecode() or base64_decode() with PHP cannot be reversed.
The solution is to write an inverse solution function with PHP:

function js_unescape($str){        
$ret = '';        
$len = strlen($str);        
for ($i = 0; $i < $len; $i++)        
{                
if ($str[$i] == '%' && $str[$i+1] == 'u')                
{                        
$val = hexdec(substr($str, $i+2, 4));                        
if ($val < 0x7f) $ret .= chr($val);                        
else if($val < 0x800) $ret .= chr(0xc0|($val>>6)).chr(0x80|($val&0x3f));                        else $ret .= chr(0xe0|($val>>12)).chr(0x80|(($val>>6)&0x3f)).chr(0x80|($val&0x3f));                       
$i += 5;                
}                
else if ($str[$i] == '%')                
{                        
$ret .= urldecode(substr($str, $i, 3));                        
$i += 2;                
}                
else $ret .= $str[$i];        
}        
return $ret;}

Note that the JS code will be automatically converted to UTF-8, so you must do the encoding conversion to get the correct result, otherwise the Chinese code will be confused. But if you use the ES48en-8 code, you don't need this step.
print iconv('utf-8', 'gb2312', js_unescape($_REQUEST['p_sort']));
At this point, we have successfully decoded the escape code of js.
As follows:
In addition, I found a function using PHP to implement js's escape code:

function phpescape($str)
{         
$sublen=strlen($str);
      $retrunString="";         
for ($i=0;$i<$sublen;$i++)         
{                  
if(ord($str[$i])>=127)                  
{                           
$tmpString=bin2hex(iconv("gb2312","ucs-2",substr($str,$i,2)));                           
//$tmpString=substr($tmpString,2,2).substr($tmpString,0,2);window You might want to turn this item on next                            
$retrunString.="%u".$tmpString;                           
$i++;                  
} else 
{                           
$retrunString.="%".dechex(ord($str[$i]));                  
}         
}         
return $retrunString;
}

In json does not support Chinese, use it to transmit data in Chinese would be a loss of data or gibberish, must be before the send to encode string to send, send the past need to be done with js data due to the resolution, considering the js unescape function, so if a escape in php function, to encode the data, on the client side with unescape decoding, so that it will be more convenient.
First, I searched on the Internet for 1. Many escape functions implemented with php are similar, such as the following one:

function phpEscape($str) {
preg_match_all("/[\x80-\xff].|[\x01-\x7f]+/",$str,$r);
$ar = $r[0];
foreach($ar as $k=>$v) {
    if(ord($v[0]) < 128)
      $ar[$k] = rawurlencode($v);
    else
      $ar[$k] = "%u".bin2hex(iconv("GB2312","UCS-2",$v));
}
return join("",$ar);
}

This function works well, but if you're a novice (like me) who doesn't understand how this function works, I'm going to explain how it works. And I think, taking other people's code to reuse, is like standing on the shoulders of giants, but if you don't understand other people's code, sooner or later will fall to the ground.
|[\x01-\x7f]+/",$str,$r This is to use regular expression match string of all the characters in [\ x80 - \ xff]. Matching of Chinese characters, \ said x character of hexadecimal code, [] is a class selector, ". "said any one character, so [\ x80 - \ xff]. Matching are the two characters, one of the first to hexadecimal to ff characters from 80, which just happened to be the first character of the Chinese character coding. In this way, one Chinese character can be completely matched. For the Chinese character code in unicode, you can search 1 on the Internet. Similarly, [\x01-\x7f]+ English string, because the earliest English is the ASCII encoding, the encoding value is less than 128, that is, the hexadecimal from 01 to 7f, "+" means one or more characters, so [\ ES106en01-\ x7f]+ can match consecutive English character strings.

$ar = $r[0];             //$r[0] It holds the array that matches 
foreach($ar as $k=>$v) {
    if(ord($v[0]) < 128)                 // Suppose the character encoding value is less than 128 "Is an English character 
      $ar[$k] = rawurlencode($v);    // Use directly rawurlencode coding 
    else
      $ar[$k] = "%u".bin2hex(iconv("GB2312","UCS-2",$v));    // If you don't iconv The function converts Chinese characters into ucs-2 The code, that is unicode coding 
}

unescape is available for decoding in javascript
\u0391-\uFFE5 and \u4e00-\u9fa5 to match Chinese
But it seems that the former contains A-¥and the latter may be pure Chinese characters.
Where the decoding function is:

function unescape($str) {
         $str = rawurldecode($str);
         preg_match_all("/%u.{4}|&#x.{4};|&#\d+;|.+/U",$str,$r);
         $ar = $r[0];
         foreach($ar as $k=>$v) {
                  if(substr($v,0,2) == "%u")
                           $ar[$k] = iconv("UCS-2","GBK",pack("H4",substr($v,-4)));
                  elseif(substr($v,0,3) == "&#x")
                           $ar[$k] = iconv("UCS-2","GBK",pack("H4",substr($v,3,-1)));
                  elseif(substr($v,0,2) == "&#") {
                           $ar[$k] = iconv("UCS-2","GBK",pack("n",substr($v,2,-1)));
                  }
         }
         return join("",$ar);
}

1. Coding range
1. GBK (GB2312/GB18030)
\x00-\xff GBK double byte encoding range
\x20-\x7f ASCII
\ xa1 - \ xff Chinese
\ x80 - \ xff Chinese
2. UTF-8 (Unicode)
\u4e00-\u9fa5 (Chinese)
\ x3130 - \ x318F (Korean
\xAC00-\xD7A3 (Korean)
\u0800-\u4e00 (Japanese)
ps: Korean is a character greater than [\u9fa5]
Regular examples:
preg_replace("/([\x80-\xff])/","",$str);
preg_replace("/([u4e00-u9fa5])/","",$str);

Related articles: