PHP
downloads | documentation | faq | getting help | mailing lists | reporting bugs | php.net sites | links | conferences | my php.net

search for in the

token_name> <Funkcje Tokenizer
Last updated: Fri, 27 Jun 2008

view this page in

token_get_all

(PHP 4 >= 4.2.0, PHP 5)

token_get_all — Dzieli zadane źródło na tokeny PHP

Opis

array token_get_all ( string $źródło )

token_get_all() parsuje zadane źródło , czyli ciąg znaków zawierający kod PHP, zamieniając je na tokeny języka PHP przy użyciu analizy leksykalnej silnika Zend.

Listę tokenów identyfikowanych przez parser znajdziesz w List of Parser Tokens, możesz także użyć funkcji token_name() by przetłumaczyć wartość tokena na jego nazwę.

Parametry

źródło

Źródło PHP do sparsowania.

Zwracane wartości

Tablica zindentyfikowanych tokenów. Każdy pojedynczy token identyfikowany jest poprzez pojedynczy znak (np.: ;, ., >, !, etc...), lub trójelementową tablicę zawierającą indeks tokenu w elemencie 0 i łańcuch znaków zawierający oryginalny token w elemencie 1 i numer linii w elemencie 2.

Przykłady

Example #1 Przykłady użycia token_get_all()

<?php
  $tokeny 
token_get_all('<?php echo; ?>'); /* => array(
                                                    array(T_OPEN_TAG, '<?php'), 
                                                    array(T_ECHO, 'echo'),
                                                    ';',
                                                    array(T_CLOSE_TAG, '?>') ); */
.
/* W następnym przykładzie zwróć uwagę na fakt, iż ciąg znaków jest parsowany
   jako T_INLINE_HTML, inaczej niż oczekiwane T_COMMENT (T_ML_COMMENT w PHP
   <5). Jest tak z powodu braku w "kodzie" otwierających/zamykających tagów
   PHP. Jest to jednoznaczne z komentarzem znajdującym się poza tagami <?php
   ?> w normalnym pliku. */
  
$tokeny token_get_all('/* koment. */'); // => array(array(T_INLINE_HTML, '/* koment. */'));
?>

Rejestr zmian

Wersja Opis
5.2.2 Numer linii jest zwracany w elemencie 2



token_name> <Funkcje Tokenizer
Last updated: Fri, 27 Jun 2008
 
add a note add a note User Contributed Notes
token_get_all
kevin at metalaxe dot com
26-Apr-2008 11:58
Rogier, thanks for that fix. This bug still exists in php 5.2.5. I did notice though that it is possible for a notice to pop up from your code. Changing this line:

            $temp[] = $tokens[0][2];

To read this:

            $temp[] = isset($tokens[0][2])?$tokens[0][2]:'unknown';

fixes this notice.
rogier
10-Jan-2008 08:01
Complementary note to code below:
Note that only the FIRST 2 (or 3, if needed) array elements will be updated.

Since I only encountered incorrect results on the FIRST occurence of T_OPEN_TAG, I wrote this quick fix.
Any other following T_OPEN_TAG are, on my testing system (Apache 2.0.52, PHP 5.0.3), parsed correctly.

So, This function assumes only a possibly incorrect first T_OPEN_TAG.
Also, this function assumes the very first element (and ONLY the first element) of the token array to be the possibly incorrect token.
This effectively translates to the first character of the tokenized source to be the start of a php script opening tag '<', followed by either 'php' OR '%' (ASP_style)
rogier at dsone dot nl
10-Jan-2008 05:37
On several PHP versions (pre 5.1), if token_get_all is used, the result will NOT always return the correct result.
This bug will only show (as far as I know) when PHP is loaded as a module. In the CLI the bug seems non-existent.
Related here are bugs 29761 and 34782.
To work around this, here's a fixing function:

<?php
//fixes related bugs: 29761, 34782 => token_get_all returns <?php NOT as T_OPEN_TAG
function token_fix( &$tokens ) {
    if (!
is_array($tokens) || (count($tokens)<2)) {
        return;
    }
  
//return of no fixing needed
   
if (is_array($tokens[0]) && (($tokens[0][0]==T_OPEN_TAG) || ($tokens[0][0]==T_OPEN_TAG_WITH_ECHO)) ) {
        return;
    }
   
//continue
   
$p1 = (is_array($tokens[0])?$tokens[0][1]:$tokens[0]);
   
$p2 = (is_array($tokens[1])?$tokens[1][1]:$tokens[1]);
   
$p3 = '';

    if ((
$p1.$p2 == '<?') || ($p1.$p2 == '<%')) {
       
$type = ($p2=='?')?T_OPEN_TAG:T_OPEN_TAG_WITH_ECHO;
       
$del = 2;
       
//update token type for 3rd part?
       
if (count($tokens)>2) {
           
$p3 = is_array($tokens[2])?$tokens[2][1]:$tokens[2];
           
$del = (($p3=='php') || ($p3=='='))?3:2;
           
$type = ($p3=='=')?T_OPEN_TAG_WITH_ECHO:$type;
        }
       
//rebuild erroneous token
       
$temp = array($type, $p1.$p2.$p3);
        if (
version_compare(phpversion(), '5.2.2', '<' )===false) {
           
$temp[] = $token[0][2];
        }
       
//rebuild
       
$tokens[1] = '';
        if (
$del==3) $tokens[2]='';
       
$tokens[0] = $temp;
    }
    return;
}

?>
nicolas dot grekas+php at gmail dot com
03-Dec-2007 10:10
Well, there is a way to parse for errors. See
http://www.php.net/manual/function.php-check-syntax.php#77318
smp_info at yahoo dot com
30-Nov-2007 03:50
As far as I am aware, there is no way to tell if the source code passed is free of parse errors.

You might come across such a situation when you're using PHP to analyze PHP source code.

In a case like this.. You'll get a warning similar to (but varying) Warning: Unexpected character in input: ''' (ASCII=39) state=1

If it doesn't matter to you that the source is free of parse errors, use @token_get_all($source) to suppress the error.
phpcomments at majiclab dot com
01-Aug-2005 07:08
Regarding bertrand at toggg dot com's comment:  there is another case of the { } curly braces being used in PHP, but the token_get_all() function treats it just like a code block: string index.  Example:

<?php
$text
= "Hello";
if (
$text{ 0 } == 'H') {
    echo
"This example uses { for both a PHP block and a string index.";
}
?>

Just in case some people were wondering.  Since PHP treats them as the same token, it makes some things a little more interesting for parsing.  You can't just assume that { ... } is a code block, it could just be a number referring to an index of a string.
bertrand at toggg dot com
08-Mar-2005 07:41
If you want to retrieve the PHP blocks then you will count up the opening curly braces '{' and down the closing ones '}' (counter zero means block finished)
CAUTION: the opening curly braces token can take 3 values:
1) '{' for all PHP code blocks,
2) T_CURLY_OPEN for "protected" variables within strings as "{$var}"
3) T_DOLLAR_OPEN_CURLY_BRACES for extended format "${var}"

On the other hand, closing token is allways '}' !

So counting up must take place on the 3 tokens:
'{' , T_CURLY_OPEN and T_DOLLAR_OPEN_CURLY_BRACES

Have fun with PHP tokenizer !
bishop
08-Dec-2004 07:58
You may want to know the line and column number at which a token begins (or ends). Since this tokenizer interface doesn't provide that information, you have to track it manually, like below:

<?php
function update_line_and_column_positions($c, &$line, &$col)
{
   
// update line count
   
$numNewLines = substr_count($c, "\n");
    if (
1 <= $numNewLines) {
       
// have new lines, add them in
       
$line += $numNewLines;
       
$col  1;

       
// skip to right past the last new line, as it won't affect the column position
       
$c = substr($c, strrpos($c, "\n") + 1);
        if (
$c === false) {
           
$c = '';
        }
    }

   
// update column count
   
$col += strlen($c);
}

?>

Now use it, something like:

<?php

$line
= 1;
$col  = 1;
foreach (
$tokens as $token) {
    if (
is_array($token)) {
        list (
$token, $text) = $token;
    } else if (
is_string($token)) {
       
$text = $token;
    }

   
update_line_and_column_positions($text, $line, $col);
}

?>

Note this assumes that your desired coordinate system is 1-based (eg (1,1) is the upper left). Zero-based is left as an exercise for the reader.
Leon Atkinson
07-Dec-2002 12:17
This function parses PHP code.  Here's an example of it's use.
<?
    $code
= '<?$a = 3;?>';

    foreach(
token_get_all($code) as $c)
    {
        if(
is_array($c))
        {
            print(
token_name($c[0]) . ": '" . htmlentities($c[1]) . "'\n");
        }
        else
        {
            print(
"$c\n");
        }
    }
?>

token_name> <Funkcje Tokenizer
Last updated: Fri, 27 Jun 2008
 
 
show source | credits | sitemap | contact | advertising | mirror sites