Regular Expressions

  • Perl Compatible Regular Expressions. preg_match(); preg_match_all();
  • POSIX Regular Expressions. ereg(); eregi(); split();  Deprecated

Regular expressions are enclosed within delimiters, usually the forward slash /

 

preg_match( "/RegEx/", "Target String", $storageArray ) ;

or

preg_match( "/$regEx/", $targetString, $storageArray ) ;

 

Where:

  • 1st argument is the regular expression to search for
  • 2nd Argument is the target string to search in
  • 3rd (optional) argument is resultant match
  • Match will return 1
  • No match returns 0

The use of any other alternative delimiter symbol can be used as long as it's the same at both ends. For instance if the vertical bar | were used, it avoids having to escape forward slashes, e.g. "/https:\/\//"

 

echo preg_match( "|http://|", "https://www.tech-academy.co.uk") ; // returns 1

 

adding a lower case i after the closing delimiter, makes the search case insensitive:

 

echo preg_match( "/HtTp/i", "http://www.tech-academy.co.uk") ; // returns 1

 

 

Meta-Characters 

^ Match at Beginning of Line
$ Match at End of line
. Match Any single character
? Match zero or one of the preceding items. Preceding character is optional
( ) Groups literal characters
[ ] Set of optional characters
[!] Set of Non-matching optional characters
- All characters between two characters
+ Match One or more of the preceding items
* Match Zero or more of the preceding items
{ , } Start and end numbers of a range of repartitions, or exactly if just one number
\ following character is literal
( | ) Match Set of alternate strings, logical OR'ing either side of the |
e.g.
ab*c will match ac or abbbc
colou?r will match the US spelling of color or the UK spelling colour

 

 

 

Character types

\d Any decimal digit
\D Any character not a decimal digit
\h Any horizontal whitespace character
\H Any character not a horizontal whitespace
\s Any whitespace character
\S Any character not a whitespace character
\v Any vertical whitespace character
\V Any character not a vertical whitespace
\w Any word (underscore or alphanumeric) character
\W Any non-word character
e.g.
\S*[Ff]red\S* will match Fred or fred within a word
^\s*$ 0 or more whitespace

 

 

 

[ Square brackets ]are used to find a range of characters, within a character class

[ Denotes start of character class
] Denotes end of character class
^ Negate, only if first character
- Indicates character range
[a-z ] Matches Any lowercase character from a to z
[A-Z] Matches Any UPPERCASE character from A to Z
[a-Z] Matches any character from lowercase a to UPPERCASE Z
[0-9] Matches decimal digit from 0 to 9
[abc] a, b, or c
[a-z] Any lowercase letter
[^A-Z] Any character that is not a uppercase letter
(gif|jpg) Matches either "gif" or "jpg"
[a-z]+ One or more lowercase letters
[0-9.-] Any number, dot, or minus sign
^[a-zA-Z0-9_]{1,}$ Any word of at least one letter, number or _
([wx])([yz]) Sub-patterns: wy, wz, xy, or xz
[^A-Za-z0-9] Any symbol (not a number or a letter)
([A-Z]{3}|[0-9]{4}) Matches three letters or four numbers

 

 

Multipliers

r+ At least one r
r* Zero or more r's
r? Alternative to r*
r{N} Match exactly N r's
r{N,M} Match at least N r's, but no more than M
r{N, } Match N or more r's
e.g.
ab{3,7}c will match an a followed by a min of 3 and a max of 7 b's followed by a c

 

 

Functions

preg_match() The preg_match() function searches string for pattern, returning true if pattern exists, and false otherwise.
preg_match_all() The preg_match_all() function matches all occurrences of pattern in string.
preg_replace() The preg_replace() function operates just like ereg_replace(), except that regular expressions can be used in the pattern and replacement input parameters.
preg_split() The preg_split() function operates exactly like split(), except that regular expressions are accepted as input parameters for pattern.
preg_grep() The preg_grep() function searches all elements of input_array, returning all elements matching the regexp pattern.
preg_ quote() Quote regular expression characters

 

 

https://webcheatsheet.com/php/regular_expressions.php

https://www.tutorialspoint.com/php/php_regular_expression.htm

 

Password checker example:

<?php
	if ( !isset($_POST["passWord"]) ) { ?>

<form action="<?php echo $_SERVER['PHP_SELF']; ?>" method="POST">
	Please enter a password: <input type="text" name="passWord" />
	<input type="submit" />
</form>

<?php
	} else {

		$passWord = $_POST["passWord"] ;

		$regEx = "/^.*(?=.{8,})(?=.*\d)(?=.*[a-z])(?=.*[A-Z]).*$/" ;

		if (preg_match($regEx, $passWord)) {

			echo "You have a string password.";

		} else {

			echo "Your have a weak password.";
		}
	}
?>

In this example, the regular expression has been assigned to a string $regEx as follows

 

$regEx = /^.*(?=.{8,})(?=.*\d)(?=.*[a-z])(?=.*[A-Z]).*$/ ;

 

Breaking this down, we have four groups:

 

  1. ^.*(?=.{8,}) checks there are at least 8 characters
  2. (?=.*\d) any alphanumeric zero or more times, then any digit (therefore checks there is at least one number, anywhere)
  3. (?=.*[a-z]) checks for lower case
  4. (?=.*[A-Z]).*$ checks for uppercase

The caret ^ means match at the start of the string.

 

The . means any alphanumeric character and the * means zero or more, therefore together .* means any alphanumeric character zero or more times.

 

The ?= means the following text must be like this.

 

So in applying the above (?=.{8,}) means the next, any alphanumeric character must occur 8 or more times.

 

And then (?=.*\d) means the next text must be any alphanumeric character that is a digit, zero or more times.

 

 

 

Subpattern Capturing

( (red|white) (King|Queen) )
$1 $2 $3
  1. Outer brackets values are captured to $1
  2. First inner brackets are assigned to $2
  3. Second inner brackets are assigned to $3

Should be assigned to another variable, since they are destroyed after the match.

Leave a Reply