I found this rather useful for testing mutliple strings when developing a regex pattern.
<?php
/**
* Runs preg_match on an array of strings and returns a result set.
* @author wjaspers4[at]gmail[dot]com
* @param String $expr The expression to match against
* @param Array $batch The array of strings to test.
* @return Array
*/
function preg_match_batch( $expr, $batch=array() )
{
// create a placeholder for our results
$returnMe = array();
// for every string in our batch ...
foreach( $batch as $str )
{
// test it, and dump our findings into $found
preg_match($expr, $str, $found);
// append our findings to the placeholder
$returnMe[$str] = $found;
}
return $returnMe;
}
?>
preg_match
(PHP 4, PHP 5)
preg_match — Perform a regular expression match
Opis
Searches subject for a match to the regular expression given in pattern .
Parametry
- pattern
-
The pattern to search for, as a string.
- subject
-
The input string.
- matches
-
If matches is provided, then it is filled with the results of search. $matches[0] will contain the text that matched the full pattern, $matches[1] will have the text that matched the first captured parenthesized subpattern, and so on.
- flags
-
flags can be the following flag:
- PREG_OFFSET_CAPTURE
- If this flag is passed, for every occurring match the appendant string offset will also be returned. Note that this changes the return value in an array where every element is an array consisting of the matched string at index 0 and its string offset into subject at index 1.
- offset
-
Normally, the search starts from the beginning of the subject string. The optional parameter offset can be used to specify the alternate place from which to start the search (in bytes).
Informacja: Using offset is not equivalent to passing substr($subject, $offset) to preg_match() in place of the subject string, because pattern can contain assertions such as ^, $ or (?<=x). Compare:
<?php
$subject = "abcdef";
$pattern = '/^def/';
preg_match($pattern, $subject, $matches, PREG_OFFSET_CAPTURE, 3);
print_r($matches);
?>Powyższy przykład wyświetli:
Array ( )
while this example
<?php
$subject = "abcdef";
$pattern = '/^def/';
preg_match($pattern, substr($subject,3), $matches, PREG_OFFSET_CAPTURE);
print_r($matches);
?>will produce
Array ( [0] => Array ( [0] => def [1] => 0 ) )
Zwracane wartości
preg_match() returns the number of times pattern matches. That will be either 0 times (no match) or 1 time because preg_match() will stop searching after the first match. preg_match_all() on the contrary will continue until it reaches the end of subject . preg_match() returns FALSE if an error occurred.
Rejestr zmian
| Wersja | Opis |
|---|---|
| 4.3.3 | The offset parameter was added |
| 4.3.0 | The PREG_OFFSET_CAPTURE flag was added |
| 4.3.0 | The flags parameter was added |
Przykłady
Example #1 Find the string of text "php"
<?php
// The "i" after the pattern delimiter indicates a case-insensitive search
if (preg_match("/php/i", "PHP is the web scripting language of choice.")) {
echo "A match was found.";
} else {
echo "A match was not found.";
}
?>
Example #2 Find the word "web"
<?php
/* The \b in the pattern indicates a word boundary, so only the distinct
* word "web" is matched, and not a word partial like "webbing" or "cobweb" */
if (preg_match("/\bweb\b/i", "PHP is the web scripting language of choice.")) {
echo "A match was found.";
} else {
echo "A match was not found.";
}
if (preg_match("/\bweb\b/i", "PHP is the website scripting language of choice.")) {
echo "A match was found.";
} else {
echo "A match was not found.";
}
?>
Example #3 Getting the domain name out of a URL
<?php
// get host name from URL
preg_match('@^(?:http://)?([^/]+)@i',
"http://www.php.net/index.html", $matches);
$host = $matches[1];
// get last two segments of host name
preg_match('/[^.]+\.[^.]+$/', $host, $matches);
echo "domain name is: {$matches[0]}\n";
?>
Powyższy przykład wyświetli:
domain name is: php.net
Example #4 Using named subpattern
<?php
$str = 'foobar: 2008';
preg_match('/(?<name>\w+): (?<digit>\d+)/', $str, $matches);
print_r($matches);
?>
Powyższy przykład wyświetli:
Array ( [0] => foobar: 2008 [name] => foobar [1] => foobar [digit] => 2008 [2] => 2008 )
Notatki
preg_match
28-Aug-2008 04:55
10-Aug-2008 11:12
For validation of email addresses, Cal Henderson's RFC 822 and RFC 2822 is_valid_email() functions rule all:
http://code.iamcal.com/php/rfc822/
09-Jul-2008 01:11
preg_match and preg_replace_callback doesnt match up in the structure of the array that they fill-up for a match.
preg_match, as the example shows, supports named patterns, whereas preg_replace_callback doesnt seem to support it at all. It seem to ignore any named pattern matched.
08-Jul-2008 05:01
I made a mistake in my previous post. Mail addresses may of course only be "exotic" in their local parts, not in the domain part. Therefore, an exotic mail address would be "exotic#%$mail@domain.com".
07-Jul-2008 11:51
For those not so familiar with regex's, I post my algorithmic email validation routine. It can more easily be changed for individual needs than regex's. My function does NOT recognize exotic email addresses as allowed by RFC. (For example, info@exotic%&$#mail.com is a legal email address but not allowed by my function.)
-Tim
<?php
function email_is_valid($email) {
if (substr_count($email, '@') != 1)
return false;
if ($email{0} == '@')
return false;
if (substr_count($email, '.') < 1)
return false;
if (strpos($email, '..') !== false)
return false;
$length = strlen($email);
for ($i = 0; $i < $length; $i++) {
$c = $email{$i};
if ($c >= 'A' && $c <= 'Z')
continue;
if ($c >= 'a' && $c <= 'z')
continue;
if ($c >= '0' && $c <= '9')
continue;
if ($c == '@' || $c == '.' || $c == '_' || $c == '-')
continue;
return false;
}
$TLD = array (
'COM', 'NET',
'ORG', 'MIL',
'EDU', 'GOV',
'BIZ', 'NAME',
'MOBI', 'INFO',
'AERO', 'JOBS',
'MUSEUM'
);
$tld = strtoupper(substr($email, strrpos($email, '.') + 1));
if (strlen($tld) != 2 && !in_array($tld, $TLD))
return false;
return true;
}
?>
03-Jul-2008 11:30
The regexp below thinks that the e-mail address:
'me@de.com' is invalid, which it is not.
'/^([a-z0-9])(([-a-z0-9._])*([a-z0-9]))*\@
([a-z0-9])([-a-z0-9_])+([a-z0-9])*
(\.([a-z0-9])([-a-z0-9_-])([a-z0-9])+)*$/i'
I modified it and it seems to work for me in my limited tests of it.
YMMV.
26-Jun-2008 04:48
Paperweight, this pattern worked fine for me (even for intranet adresses, like "john@localhost"; and also for subdomain emails, like "john@foo.bar.com"):
'/([a-z0-9])([-a-z0-9._])+([a-z0-9])\@
([a-z0-9])([-a-z0-9_])+([a-z0-9])
(\.([a-z0-9])([-a-z0-9_-])([a-z0-9])+)*/i'
but, still, this won't replace the "activation link", that is the better way to check if an e-mail is valid or not.
26-May-2008 09:50
Because making a truly correct email validation function is harder than one may think, consider using this one which comes with PHP through the filter_var function (http://www.php.net/manual/en/function.filter-var.php):
<?php
$email = "someone@domain .local";
if(!filter_var($email, FILTER_VALIDATE_EMAIL)) {
echo "E-mail is not valid";
} else {
echo "E-mail is valid";
}
?>
04-Apr-2008 11:36
In addition to reiner-keller's comment about Umlaute using setlocale (LC_ALL, 'de_DE');
To enable 'de_DE' on my Debian 4 machine I first had to:
- uncomment 'de_DE' in file /etc/locale.gen and afterwards
- run locale-gen from the shell
