PHP develops security rules that cannot be violated to filter user input

  • 2020-05-05 11:03:42
  • OfStack

As the most basic precaution you need to pay attention to your external submissions, do a good job with the first side of the security mechanism handling the firewall.
Rule 1: never trust external data or input
The first thing must realize about Web application security is that external data should not be trusted. External data (outside data) includes any data not entered directly by the programmer in PHP code. Any data from any other source (such as GET variables, forms POST, databases, configuration files, session variables, or cookie) is not trusted until steps are taken to ensure security.
For example, the following data elements can be considered safe because they are set in PHP.
 
<?php 
$myUsername = 'tmyer'; 
$arrayUsers = array('tmyer', 'tom', 'tommy'); 
define( " GREETING " , 'hello there' . $myUsername); 
?> 

However, the following data elements are flawed.
Listing 2. Unsafe, flawed code
 
<?php 
$myUsername = $_POST['username']; //tainted! 
$arrayUsers = array($myUsername, 'tom', 'tommy'); //tainted! 
define( " GREETING " , 'hello there' . $myUsername); //tainted! 
?> 

Why is the first variable $myUsername flawed? Because it comes directly from form POST. The user can enter any string in this input field, including malicious commands to clean up files or run previously uploaded files. You might ask, "couldn't a client-side (Javascr pt) form verification script that accepts only the letters A-Z avoid this danger?" Yes, this is always a good step, but as you'll see later, anyone can download any form to their machine, modify it, and then resubmit whatever they need.
The solution is simple: you must run the cleanup code on $_POST['username']. If you don't, you might contaminate these objects any other time you use $myUsername, such as in an array or constant.
An easy way to clean up user input is to use regular expressions to process it. In this example, you only want to accept letters. It might also be a good idea to limit strings to a certain number of characters, or to require all letters to be lowercase.
Listing 3. Make user input
safe
 
<?php 
$myUsername = cleanInput($_POST['username']); //clean! 
$arrayUsers = array($myUsername, 'tom', 'tommy'); //clean! 
define( " GREETING " , 'hello there' . $myUsername); //clean! 
function cleanInput($input){ 
$clean = strtolower($input); 
$clean = preg_replace( " /[^a-z]/ " ,  "" , $clean); 
$clean = substr($clean,0,12); 
return $clean; 
} 
?> 

rule 2: disable
Settings for PHP that make security difficult to enforce
already knows that you can't trust user input, and you should also know that you shouldn't trust the way PHP is configured on your machine. For example, be sure to disable register_globals. If register_globals is enabled, you might do something careless, such as replacing an GET or POST string of the same name with $variable. By disabling this setting, PHP forces you to refer to the correct variable in the correct namespace. To use the variable from form POST, you should refer to $_POST['variable']. This prevents you from mistaking this particular variable for cookie, session, or GET.
rule 3: if you can't understand it, you can't protect
Some developers of
use strange syntax or organize statements in a compact way that forms short but ambiguous code. This approach may be efficient, but if you don't understand what your code is doing, you can't decide how to protect it.
For example, which of the following two pieces of code do you like?
Listing 4. easy to protect code
 
<?php 
//obfuscated code 
$input = (isset($_POST['username']) ? $_POST['username']: " ); 
//unobfuscated code 
$input =  " ; 
if (isset($_POST['username'])){ 
$input = $_POST['username']; 
}else{ 
$input =  " ; 
} 
?> 

In the second, cleaner snippet, it's easy to see that $input is flawed and needs to be cleaned up before it can be safely handled.
rule 4: "defense in depth" is the new
this tutorial USES examples to show how to protect online forms while taking the necessary steps in the PHP code that handles forms. Also, even if PHP regex is used to ensure that the GET variable is fully numeric, steps can be taken to ensure that SQL queries use escaped user input.
Defense in depth is not just a good idea, it ensures you don't get into serious trouble.
Now that we've discussed the ground rules, let's look at the first threat: the SQL injection attack.
Prevents SQL injection from attacking
In the SQL injection attack, the user adds information to the database query by manipulating the form or GET query string. For example, suppose you have a simple login database. Each record in this database has a username field and a password field. Build a login form so that users can log in.
Listing 5. Simple login form
 
<html> 
<head> 
<title>Login</title> 
</head> 
<body> 
<form action= " verify.php "  method= " post " > 
<p><label for='user'>Username</label> 
<input type='text' name='user' id='user'/> 
</p> 
<p><label for='pw'>Password</label> 
<input type='password' name='pw' id='pw'/> 
</p> 
<p><input type='submit' value='login'/></p> 
</form> 
</body> 
</html> 

This form accepts the user's username and password and submits the user input to a file named verify.php. In this file, PHP processes the data from the login form, as shown below:
Listing 6. Unsafe PHP form handling code
 
<?php 
$okay = 0; 
$username = $_POST['user']; 
$pw = $_POST['pw']; 
$sql =  " select count(*) as ctr from users where username=' " .$username. " ' and password=' " . $pw. " ' limit 1 " ; 
$result = mysql_query($sql); 
while ($data = mysql_fetch_object($result)){ 
if ($data->ctr == 1){ 
//they're okay to enter the application! 
$okay = 1; 
} 
} 
if ($okay){ 
$_SESSION['loginokay'] = true; 
header( " index.php " ); 
}else{ 
header( " login.php " ); 
} 
?> 

This code looks fine, right? Hundreds (if not thousands) of PHP/MySQL sites around the world use this code. What's wrong with it? Well, remember that "user input cannot be trusted". There is no escaping of any information from the user, thus leaving the application vulnerable. Specifically, any type of SQL injection attack can occur.
For example, if the user enters foo as the username and 'or '1' ='1 as the password, then the following string is actually passed to PHP and the query is passed to MySQL:
 
<?php 
$sql =  " select count(*) as ctr from users where username='foo' and password= "  or '1 ' ='1 '  limit 1 " ; 
?> 

This query always returns the count value 1, so PHP allows access. By injecting some malicious SQL at the end of the password string, the hacker can pretend to be a legitimate user.
The solution to this problem is to use PHP's built-in mysql_real_escape_string() function as a wrapper for any user input. This function escapes characters in a string, makes it impossible to pass special characters such as apostrophes and lets MySQL operate on special characters. Listing 7 shows the code with escape handling.
Listing 7. Secure PHP form handling code
 
<?php 
$okay = 0; 
$username = $_POST['user']; 
$pw = $_POST['pw']; 
$sql =  " select count(*) as ctr from users where username=' " .mysql_real_escape_string($username). " ' and password=' " . mysql_real_escape_string($pw). " ' limit 1 " ; 
$result = mysql_query($sql); 
while ($data = mysql_fetch_object($result)){ 
if ($data->ctr == 1){ 
//they're okay to enter the application! 
$okay = 1; 
} 
} 
if ($okay){ 
$_SESSION['loginokay'] = true; 
header( " index.php " ); 
}else{ 
header( " login.php " ); 
} 
?> 

Using mysql_real_escape_string() as a wrapper for user input prevents any malicious injection of SQL from user input. If the user attempts to pass the deformed password through SQL injection, the following query is passed to the database:
select count(*) as
where username='foo and password='\' or \'1\'=\'1 'limit 1 "
There is nothing in the database that matches such a password. With one simple step, you plugged a big hole in the Web application. The rule of thumb here is that user input for SQL queries should always be escaped.
But there are several security holes that need to be plugged. The next term is to manipulate the GET variable.
Prevents users from manipulating the GET variable
In the previous section, users were prevented from logging in with a malformed password. If you are smart, you should apply what you have learned to ensure that all user input in the SQL statement is escaped.
But yes, the user is now safely logged in. Just because a user has a valid password doesn't mean he will play by the rules -- he has plenty of opportunities to do damage. For example, an application might allow the user to view special content. All links to template.php? pid = 33 or template. php? pid=321. The part after the question mark in URL is called the query string. Because the query string is placed directly in URL, it is also called GET query string.
In PHP, if register_globals is disabled, the string can be accessed with $_GET['pid']. On the template.php page, you might do something similar to listing 8.
Listing 8. Example template.php
 
<?php 
$pid = $_GET['pid']; 
//we create an object of a fictional class Page 
$obj = new Page; 
$content = $obj->fetchPage($pid); 
//and now we have a bunch of PHP that displays the page 
?> 

What's wrong with that? First, it is implicitly believed that the GET variable pid from the browser is safe. What's going to happen? Most users are not smart enough to construct a semantic attack. However, if they notice pid=33 in the browser's URL location field, they might start messing around. If they type in another number, that might be fine; But what happens if you type something else, such as the SQL command or the name of a file (such as /etc/passwd), or if you play a prank, such as typing a value up to 3,000 characters?
In this case, remember the ground rules and don't trust user input. The application developer knows that the personal identifier template.php accepts (PID) should be a number, so you can use PHP's is_numeric() function to ensure that non-numeric PID is not accepted, as shown below:
Listing 9. limits the GET variable
by using is_numeric()
 
<?php 
$pid = $_GET['pid']; 
if (is_numeric($pid)){ 
//we create an object of a fictional class Page 
$obj = new Page; 
$content = $obj->fetchPage($pid); 
//and now we have a bunch of PHP that displays the page 
}else{ 
//didn't pass the is_numeric() test, do something else! 
} 
?> 

This method seems to work, but the following inputs are easily checked by is_numeric() :
100 (valid)
100.1 (there should be no decimal place)
+0123.45e6 (scientific counting -- bad)
Hexadecimal -- dangerous! Danger!)
So what should security-conscious PHP developers do? Years of experience have shown that it is best to use regular expressions to ensure that the entire GET variable is composed of Numbers, as shown below:
Listing 10. Limit GET variable
with a regular expression
 
<?php 
$pid = $_GET['pid']; 
if (strlen($pid)){ 
if (!ereg( " ^[0-9]+$ " ,$pid)){ 
//do something appropriate, like maybe logging them out or sending them back to home page 
} 
}else{ 
//empty $pid, so send them back to the home page 
} 
//we create an object of a fictional class Page, which is now 
//moderately protected from evil user input 
$obj = new Page; 
$content = $obj->fetchPage($pid); 
//and now we have a bunch of PHP that displays the page 
?> 

All you need to do is use strlen() to check whether the length of the variable is non-zero; If so, an all-digital regular expression is used to ensure that the data element is valid. If PID contains letters, slashes, dots, or anything similar to hexadecimal, this routine captures it and blocks the page from user activity. If you look behind the scenes of the Page class, you will see that the security-conscious PHP developer has escaped the user's $pid to protect the fetchPage() method, as shown below:
Listing 11. Escape
from the fetchPage() method
 
<?php 
class Page{ 
function fetchPage($pid){ 
$sql =  " select pid,title,desc,kw,content,status from page where pid=' " .mysql_real_escape_string($pid). " ' " ; 
} 
} 
?> 

You might ask, "why escape when you've made sure PID is a number?" You don't know how many different contexts and situations you will use the fetchPage() method. This method must be protected everywhere it is called, and escapes in the method are defensive in depth.
What happens if the user tries to enter a very long number, say 1,000 characters, and tries to launch a buffer overflow attack? The next section discusses this in more detail, but for now you can add another check to make sure that the PID entered has the correct length. You know that the maximum length of the pid field in the database is 5 bits, so you can add the following check.
Listing 12. Use regular expressions and length checks to restrict the GET variable
 
<?php 
$pid = $_GET['pid']; 
if (strlen($pid)){ 
if (!ereg( " ^[0-9]+$ " ,$pid) && strlen($pid) > 5){ 
//do something appropriate, like maybe logging them out or sending them back to home page 
} 
} else { 
//empty $pid, so send them back to the home page 
} 
//we create an object of a fictional class Page, which is now 
//even more protected from evil user input 
$obj = new Page; 
$content = $obj->fetchPage($pid); 
//and now we have a bunch of PHP that displays the page 
?> 

Now, no one can cram a 5,000-bit number into a database application -- at least not where the GET string is involved. Imagine a hacker gnashing his teeth in frustration as he tries to break through your application. And because error reporting is turned off, it is harder for hackers to detect.
Buffer overflow attack
Buffer overflow attacks attempt to overflow the memory allocation buffer in PHP applications (or, more precisely, in Apache or the underlying operating system). Keep in mind that you may be writing an Web application in a high-level language like PHP, but you will end up calling C (in the case of Apache). Like most low-level languages, C has strict rules about memory allocation.
The buffer overflow attack sends a large amount of data to the buffer, causing part of the data to overflow into the adjacent memory buffer, thus destroying the buffer or overwriting the logic. This can cause denial of service, corrupt data, or malicious code to be executed on a remote server.
The only way to prevent a buffer overflow attack is to check the length of all user input. For example, if you have a form element that requires the name of the user, add an maxlength attribute with a value of 40 on the field and check with substr() on the back end. Listing 13 shows a short example of the form and the PHP code.
Listing 13. Check the length of the user's input
 
<?php 
if ($_POST['submit'] ==  " go " ){ 
$name = substr($_POST['name'],0,40); 
} 
?> 
<form action= " <?php echo $_SERVER['PHP_SELF'];?> "  method= " post " > 
<p><label for= " name " >Name</label> 
<input type= " text "  name= " name "  id= " name "  size= " 20 "  maxlength= " 40 " /></p> 
<p><input type= " submit "  name= " submit "  value= " go " /></p> 
</form> 

For what are the maxlength attributes provided and the substr() checks performed on the back end? Because deep defense is always good. The browser prevents the user from entering an extremely long string that PHP or MySQL cannot safely handle (imagine someone trying to enter a name up to 1,000 characters long), and the back-end PHP check ensures that no one is manipulating the form data remotely or in the browser.
As you can see, this is similar to checking the length of the GET variable pid using strlen() in the previous section. In this example, any input values longer than 5 bits are ignored, but you can easily truncate the value to an appropriate length, as shown below:
Listing 14. Change the length of the input GET variable
 
<?php 
$pid = $_GET['pid']; 
if (strlen($pid)){ 
if (!ereg( " ^[0-9]+$ " ,$pid)){ 
//if non numeric $pid, send them back to home page 
} 
}else{ 
//empty $pid, so send them back to the home page 
} 
//we have a numeric pid, but it may be too long, so let's check 
if (strlen($pid)>5){ 
$pid = substr($pid,0,5); 
} 
//we create an object of a fictional class Page, which is now 
//even more protected from evil user input 
$obj = new Page; 
$content = $obj->fetchPage($pid); 
//and now we have a bunch of PHP that displays the page 
?> 

Note that buffer overflow attacks are not limited to long strings of Numbers or letters. You may also see long hexadecimal strings (which often look like \xA3 or \xFF). Remember that the purpose of any buffer overflow attack is to flood a particular buffer and put malicious code or instructions into the next buffer, thereby destroying data or executing malicious code. The easiest way to deal with a hex buffer overflow is to not allow the input to exceed a specific length.
If you're dealing with a form text area that allows you to enter long items into the database, you can't easily limit the length of the data on the client side. After the data reaches PHP, you can use regular expressions to clean up any strings like hexadecimal.
Listing 15. Prevents the hexadecimal string
 
<?php 
if ($_POST['submit'] ==  " go " ){ 
$name = substr($_POST['name'],0,40); 
//clean out any potential hexadecimal characters 
$name = cleanHex($name); 
//continue processing ... . 
} 
function cleanHex($input){ 
$clean = preg_replace( " ![\][xX]([A-Fa-f0-9]{1,3})! " ,  "" ,$input); 
return $clean; 
} 
?> 
<form action= " <?php echo $_SERVER['PHP_SELF'];?> "  method= " post " > 
<p><label for= " name " >Name</label> 
<input type= " text "  name= " name "  id= " name "  size= " 20 "  maxlength= " 40 " /></p> 
<p><input type= " submit "  name= " submit "  value= " go " /></p> 
</form> 

You may find this series of operations a bit too rigorous. After all, hexadecimal strings have legitimate USES, such as outputting characters in a foreign language. It is up to you to deploy the hexadecimal regex. A better strategy is to delete a hexadecimal string only if there are too many hexadecimal strings in a row, or if the string has more than a certain number of characters (such as 128 or 255).
Cross-site scripting attacks
In cross-site scripting (XSS) attacks, it is common for a malicious user to enter information in a form (or through other user input methods) that inserts malicious client markup into the process or database. For example, suppose the site has a simple visitor register program that lets visitors leave their names, E-mail addresses, and short messages. Malicious users can use this opportunity to insert something beyond a short message, such as inappropriate images for other users or redirect users to another site's Javascr pt, or steal cookie information.
Fortunately, PHP provides the strip_tags() function, which clears anything enclosed in the HTML markup. The strip_tags() function also allows you to provide a list of allowed tags, such as < b > Or < i > .
Data manipulation within the browser
There is a class of browser plug-ins that allow users to tamper with header and form elements on a page. With Tamper Data (an Mozilla plug-in), you can easily manipulate a simple form with many hidden text fields to send instructions to PHP and MySQL.
The user can launch Tamper Data before clicking on Submit on the form. When he submits the form, he sees a list of the form's data fields. Tamper Data allows the user to tamper with this data, and the browser completes the form submission.
Let's go back to the example we created earlier. The string length has been checked, the HTML flag has been cleared, and the hexadecimal characters have been deleted. However, some hidden text fields have been added, as shown below:
Listing 17. Hide the variable
 
<?php 
if ($_POST['submit'] ==  " go " ){ 
//strip_tags 
$name = strip_tags($_POST['name']); 
$name = substr($name,0,40); 
//clean out any potential hexadecimal characters 
$name = cleanHex($name); 
//continue processing ... . 
} 
function cleanHex($input){ 
$clean = preg_replace( " ![\][xX]([A-Fa-f0-9]{1,3})! " ,  "" ,$input); 
return $clean; 
} 
?> 
<form action= " <?php echo $_SERVER['PHP_SELF'];?> "  method= " post " > 
<p><label for= " name " >Name</label> 
<input type= " text "  name= " name "  id= " name "  size= " 20 "  maxlength= " 40 " /></p> 
<input type= " hidden "  name= " table "  value= " users " /> 
<input type= " hidden "  name= " action "  value= " create " /> 
<input type= " hidden "  name= " status "  value= " live\ " /> 
<p><input type= " submit "  name= " submit "  value= " go " /></p> 
</form> 

Notice that one of the hidden variables exposes the table name: users. You will also see an action field with the value create. With basic SQL experience, you can see that these commands may control an SQL engine in the middleware. People who want to wreak havoc simply change the table name or offer another option, such as delete.
Now what's left? Remote form submission.
Remote form submission
The benefit of Web is that information and services can be Shared. The downside is that information and services can be Shared, because some people do things without scruples.
Take forms. Anyone can access an Web site and use File > on the browser Save As creates a local copy of the form. Then he can modify action parameter to point to a fully qualified URL (not pointing formHandler php, but point to http: / / www yoursite. com/formHandler php, because the form on this site), do he want any changes, click Submit, the server will be received this form data as a legitimate traffic.
You might first consider checking $_SERVER['HTTP_REFERER'] to see if the request is coming from your own server, which blocks most malicious users but not the best hackers. These people are smart enough to tamper with the referrer information in the header to make a remote copy of the form look like it was submitted from your server.
A better way to handle remote form submissions is to generate a token based on a unique string or timestamp and place that token in session variables and forms. After the form is submitted, check that the two tokens match. If there is a mismatch, you know someone is trying to send data from a remote copy of the form.
To create a random token, use PHP's built-in md5(), uniqid(), and rand() functions, as shown below:
Listing 18. Defending against remote form submission
 
<?php 
session_start(); 
if ($_POST['submit'] ==  " go " ){ 
//check token 
if ($_POST['token'] == $_SESSION['token']){ 
//strip_tags 
$name = strip_tags($_POST['name']); 
$name = substr($name,0,40); 
//clean out any potential hexadecimal characters 
$name = cleanHex($name); 
//continue processing ... . 
}else{ 
//stop all processing! remote form posting attempt! 
} 
} 
$token = md5(uniqid(rand(), true)); 
$_SESSION['token']= $token; 
function cleanHex($input){ 
$clean = preg_replace( " ![\][xX]([A-Fa-f0-9]{1,3})! " ,  "" ,$input); 
return $clean; 
} 
?> 
<form action= " <?php echo $_SERVER['PHP_SELF'];?> "  method= " post " > 
<p><label for= " name " >Name</label> 
<input type= " text "  name= " name "  id= " name "  size= " 20 "  maxlength= " 40 " /></p> 
<input type= " hidden "  name= " token "  value= " <?php echo $token;?> " /> 
<p><input type= " submit "  name= " submit "  value= " go " /></p> 
</form> 

This technique works because session data cannot be migrated between servers in PHP. Even if someone gets hold of your PHP source code, transfers it to their own server, and submits the information to your server, your server will receive only empty or malformed session tokens and the original supplied form tokens. They don't match, and the remote form submission fails.

Related articles: