Hacking PHP syntax

Posted by on Apr 27, 2013 in Linux, PHP Programming | 4 Comments

Have you ever though how to extend the core of PHP? What does it take to create a new keyword or even design a whole new syntax? If you have some basic knowledge about C you shouldn’t have any problem with making small changes. Yes, I know it might be little bit pointless but it doesn’t matter because It’s fun.

Lets create an alternative way to define a class. The simplest class definition allowed in PHP looks like this:

We can simplify the syntax and replace the curly brackets with semicolon.

If you try to execute this code it will obviously throw an error. That’s not a problem, we can fix it.

First step is to install some software.

PHP is written in C however the parser is created with Bison. Bison is a parser generator. The home page defines it as: a general-purpose parser generator that converts an annotated context-free grammar into a deterministic LR or generalized LR (GLR) parser employing LALR parser tables.

It’s a very powerful peace of software and one can write a whole book about it. If you would like to learn more I refer you to the documentation. It’s not a very easy read but there is a good example. If you will ever want to create a programming language that might be the good place to start.

Go to the http://php.net and get the latest PHP sources.

Take your hat off. You are looking at the core of PHP. Code in those files powers vast majority of web servers. Lets break it.

A default extension for Bison files is “y”.

We don’t want to mess with the “ini” syntax so the only choice is “zend_language_parser.y“. Open it with your editor of choice.

If you search for “class” you will find

Parsers like to operate on tokens. The “class” token is “T_CLASS“. If you search for the “T_CLASS” you will find something like that:

You are looking at four different ways to define a class.

  • class
  • abstract class
  • trait
  • final class

In curly brackets you can see some low level assignments. I can only guess what are they for. Lets ignore them ;)

We are on a right track but it’s not exactly what we’re looking for. Search for “class_entry_type” which groups those four definitions.

That takes you to the final destination. It’s easy but not very readable at the beginning.

There are two declarations here. One for a class and one for an interface. We are interested in the first one. It starts with “class_entry_type” which resolves to: class | abstract class | trait | final class. Next element is a token T_STRING. That’s going to be the class name. Another element “extends_from” is a group. It can be “extends T_STRING” or nothing.

After that parser calls the Zend engine to begin class declaration.

You can find this function in zend_compiler.c file.

First argument is a class token “class_entry_type“, second is a class name “T_STRING” and the last one is a parent class “extends_from“.

Under that we have another group “implements_list”. I’m sure you can guess it. Yes, it’s for assigning interfaces. Following lines define the mandatory class body: opening bracket “{“, “class_statement_list” group and the closing bracket “}“. Finally the parser informs Zend engine that the class declaration has ended.

We need to duplicate that code but without class body definition.

It was quite simple, wasn’t it? Now you just have to compile it.

First compilation is always taking a while.

Paste the test code.

Go and test your hack.

Well done, you’ve hacked PHP!

Lets add one more thing. In PHP you define a class with the “class” keyword. How about make it shorter? “cls” should do fine.

Look for Lexer files.

Bison file was operating on tokens. Lexer allow you to define how to convert a code into the tokens.

Opens zend_language_scanner.l and search for “class“.

Duplicate this block and change class to cls.

Job done. Compile the code and you can use “cls” instead of the “class” word.

Wasn’t that fun? I hope you enjoyed it as much as I did. Play around, break it. If you really like it think about closing some bugs on https://bugs.php.net/.

4 Comments

  1. Theodore R. Smith (PHP Experts, Inc.)
    02/05/2013

    What if I wanted to wanted to rename the function “strpos” to “string_position” and wanted to create an alias named “strpos”?

    Reply
    • Lukasz Kujawa
      02/05/2013

      Hello Theodore. Your question is more related to extending PHP than hacking the Zend engine. Function “strpos” is part of the standard extension and is defined in “ext/standard/string.c” – grep for “PHP_FUNCTION(strpos)”. In this case I would rather create a new extension, define the “string_position” wrapper and call “php_strpos” from there. Extending PHP is quite well explained at Zend Devzone. Google for “writing php extension”, there are few good articles. You can also find many examples in the “ext/” directory and on the PECL. I hope that answers your question.

      Reply
  2. solu
    10/05/2013

    It’s fun! :)
    I want to add array slice syntax like Python (list[1:2]),
    but I found it’s too hard for me.

    Reply
    • Lukasz Kujawa
      11/05/2013

      Thank you for your comment. I agree. It doesn’t sound like a super easy tweak.

      Reply

Leave a Reply