Due to the relative simplicity of PHP, more and more young webmasters are getting their hooks into scripting. This can be a good thing — it increases the range of functionality and fun that we can add to our websites without the need to learn how to code ourselves — the problem is, a lot of these scripters don’t know what they’re doing. They’re either unaware of massive holes in their scripts, or oblivious to the damage they can do. This checklist covers some of the most basic mistakes, and how to fix them.

Lack of Validation

Lack of input validation — or ‘cleaning’ of user submitted data — is one of the most common errors I see in a huge variety of scripts. This often varies from complete trust in the user with data being sent to the browser or a database with no restrictions at all, to accidental slip-ups like the display of information through the $_SERVER superglobal array, which not many people realise is manipulatable. Often people rely on pre-specified options in a drop-down menu or hidden form fields too; these can all be altered at another location and processed through your script!

By not validating data you risk all sorts of problems. Minor problems like broken formatting on your page because of user-added mark-up, huge problems like injected JavaScript (which could steal your cookies), or lines of SQL which can empty your tables and destroy your database. If you’ve got confidential data stored, don’t expect it to stay private for long.

You can combat this problem by making sure that any data the user has access to is cleaned before it is stored and displayed. Stripping tags, converting entities and trimming whitespace is a good start, and can be done with a custom function:

function cleanData($data) {
	$data = trim(htmlentities(strip_tags($data)));
	
	return $data;
}

Or, to clean data prior to inserting into a MySQL database:

function cleanDataForDB($data) {
	$data = trim(htmlentities(strip_tags($data)));
	
	if (get_magic_quotes_gpc())
		$data = stripslashes($data);
	
	$data = mysql_real_escape_string($data);
	
	return $data;
}

You can then apply these functions to user-submitted data, either one at a time:

$cleanName = cleanData($_POST['name']);

…or by looping through all of the data at once, assigning it to a ‘clean’ array):

foreach($_POST as $key => $value) {
	$clean[$key] = cleanData($value);
}

(Although the example above is designed to loop through the $_POST superglobal array, the same can be done with $_SERVER, $_GET and $_COOKIE — in fact, any array, using any function you fancy.)

Further checks on specific fields can be done with regular expressions. Input should specifically match a pattern before it is processed: e.g. an e-mail address could be matched against a pattern of letters and/or numbers, an @, and then a domain and tld. Regular expressions (regex) are great for this purpose because you don’t need to know exactly what the user will submit, just what it should ‘look’ like.

Relying on register_globals

The majority of those who code for register_globals turned on either learnt PHP in the dark ages, or haven’t read a decent article on PHP recently. When register_globals is turned on, data that would normally need to be accessed/manipulated through the specific superglobal arrays can be accessed like a normal variable (e.g. a url of page.php?x=foo would create a variable of $x with a value of “foo”). Because PHP does not require variable initilisation to execute, variables that are only supposed to be set when certain arguments are met can be maliciously altered. Take this piece of code:

if (is_logged_in())
	$authorised = true;
if ($authorised) // show sensitive admin panel data here

Even if our is_logged_in() function does fancy password checking logic, with register_globals turned on we can bypass it all and access the restricted area simply by visiting our-page.php?authorised=1. How secure is that?

Avoid this by declaring variables before they’re checked, e.g:

$authorised = false;
if (is_logged_in())
	$authorised = true;
if ($authorised) // show sensitive admin panel data here

…and by using the predefined superglobals $_ENV, $_GET, $_POST, $_COOKIE, and $_SERVER.

Plain text Password Cookies

Storing passwords as plain text is a bad idea. Worse still, is storing plain text passwords in a cookie. No ifs, buts or maybes: it’s bad. Cookies can be stolen a variety of ways (including using JavaScript injected into unvalidated scripts) and if they’re not hashed or mangled (not 100% secure in itself) they can be used to ‘break in’ to whatever site the unsuspecting cookie-holder is using that same password for.

The alternative is hashing the password using md5() or sha1() with a ‘salt’ (a word or phrase that normal users don’t know), which helps prevent the password being guessed even if the password hash is obtained. Lastly, you can avoid storing the password at all and use sessions instead. Sessions themselves have their own set of problems, but you can learn more about that on Shiflett’s Truth About Sessions article.

While we’re on the subject of cookies: don’t rely on their existance for verification that a person is who they say they are! Make sure that after checking for the existance of cookies, that you check a cookie contains what it should. Otherwise a person will be able to create a cookie with fake data and be logged straight in. Again, this is more data validation.

Stop Using $_REQUEST

$_REQUEST takes its data from $_GET, $_POST and $_COOKIE (in that order of priority). You should avoid using $_REQUEST when possible unless you don’t care where your data is coming from. As all good scripters should track where their data is coming from and what it contains, using it defeats the point of even reading this article.

SQL Injections

SQL injections are usually caused by data that hasn’t been validated! (See how important cleaning up is yet?) A common mistake when working with databases is allowing user submitted data to interact with the database without any sort of checking. A good example of bad coding is as follows:

<?php
$pass = md5($_POST['pass']);
$sql = "SELECT * FROM users WHERE username = $_POST[user] AND password = $pass";
?>

First, the data is not clean, secondly, it’s not escaped. You should always form SQL statements using backticks around table/field names and apostrophes around values being passed. This is not foolproof though, take a look at what the query would become if we added user' OR 'foo' = 'foo' -- as our username (even with backticks and apostrophes):

<?php
$pass = md5($_POST['pass']);
$sql = "SELECT * FROM `users` WHERE `username` = 'user' OR 'foo' = 'foo' -- ' AND password = 'random hashed password'";
?>

The query checks to see if user is a valid username, which is probably wrong, but it’s the next part that is the important part: OR 'foo' = 'foo'. Of course foo equals foo (and the — comments out everything left), leaving your malicious user logged in. You can avoid this by escaping submitted data. The PHP manual has an entry on mysql_real_escape_string() and how best to use it (see: “Example 3. A “Best Practice” query”). Alternatively, refer to the cleanDataForDB() function above.

Summary

Of course, the best way to avoid writing insecure PHP scripts is to not write scripts at all. Unfortunately this probably isn’t what you want to hear, and doesn’t condone a healthy learning environment. If you’re going to put your website and your users potentially at risk by writing your own scripts, do as much studying beforehand as possible to reduce the risk. Take advice from experts in the field and don’t base your scripts on known insecure code.

Recommended Reading

These are a few books and articles I’ve found beneficial when studying PHP security:

The following links are also interesting: