Systems Architect

A quick guide to writing good code

Lukasz Kujawa — Tue, 03 Feb 2015 15:43:57 +0000

Are you writing a high quality code? Most of the developers would probably say yes. The really honest one would say that it might not be perfect but it’s OK. Both answers are fair. Most of us are very serious about the job we do and we always try to give our best (at least most of the time). The big question here is that if we’re all genuinely convinced about a reasonable quality of our code why there is so much technical debt wherever we look?

Should we worry about software quality at all? Surely we should be focused on solving business problems, not spending hundreds of man-hours on making a pretty code, right? We’ve heard it so many times in so many different forms that it has to be true! I don’t know who was the first person to make this statement but with eyes of my imagination I can see an overcaffeinated sales person.

If you think about how development time is distributed you might come to a conclusion that only 20% of it goes into new development and 80% is maintenance. In purely commercial terms cutting corners in the new development phase is saving $1 today and than losing $4 every following day. It sounds bad but, it’s actually even worse. Tolerating bad code has a snowball effect. Something which might look harmless at the beginning will quickly turn into a big problem. Roy Osherove in his book The Art of Unit Testing calls it the broken window (the Broken window theory). In a nutshell the theory says that one broken window will quickly turn into all broken windows, graffitis and burned cars in front of the building. Bad code is an invitation for other developers to produce even more bad code.

Following the best coding practice should support you in working fast and effectively. It’s counter intuitive for many people who wrongly associate it with slowness and unjustified perfectionism. Of course, even with the best code base there will be a time when you will have to break rules. When that happens make sure it won’t become part of your routine (if it’s a repeating patterns something doesn’t work), chose corners to cut very wisely and plan to fix all wrongdoings in the next few days.

So what is the definition of a good code? How can one tell the difference between good and bad. In simple terms I will risk to say that good code is a de-coupled code and bad code is a highly coupled code. For those who haven’t heard this phrase before coupling by Wikipedia definition is the manner and degree of interdependencies between software modules.

Why coupling is bad? Because it makes your code unpredictable, difficult to read, maintain and sometimes even execute. You might experience the Butterfly effect with it when adding a new menu item to a dropdown will for example result in truncating users table. This is of course an extreme case but how many applications are coupled with a particular environment, configuration or even dataset. How many developers can’t do their work properly because they can’t reproduce certain behaviours on their dev environment? How many bugs were closed only to open different bugs in a different part of the system. There is a countless list of examples which cost employers across the world billions of dollars or even a good name.

I assume we can agree that coupling is bad but how to avoid it? There is a very good term which I’ve found in the “Head First Design Patterns” book: “Program to an interface, not an implementation”. It’s simple but very powerful approach. It means that instead of writing a solution to a specific problem one should create an API for dealing with this class of problems and than use the API to solve the problem.

If you prefer more granular guidelines there is “SOLID programming” which in fact is 5 different principles. SOLID is an acronym and stands for: single responsibilities, open/close. Leskov substitution and interface segregation.

Single responsibilities principle – a class should have a single responsibility. What it means is that a single class should be focused on solving very narrow problem. For example, if you write an encryption library (which for the record would be a crazy thing to do) you don’t want all algorithms in one class. You will do better with a separate class for every algorithm and few additional classes to glue it all together. This principle can be also applied to methods and functions.

Open/close principle – software entities should be open for extension but close for modification. In other words new code should be extending the existing one instead of modifying it. The logic behind it is that modifying the existing code might break already working features so it’s safer to avoid it.

Liskov substitution principle – objects in a program should be replaceable with instances of their subtypes without altering the correctness of that program. This could be understood as whenever you decide to override a parent method from a child class keep the behaviour consistent.

Interface segregation principle – many client-specific interfaces are better than one general-purpose interface. This could be seen as an extension of the Single responsibilities principle. In simple terms it encourages to continue narrowing down responsibilities of a single class. If a class grows too large It can be broken into at least two different classes. Analogically this can be also applied to methods and functions.

Dependency inversion principle – one should depend upon abstractions. Do not depend upon concretions. In essence this principle says that classes should depend on abstraction not a concrete implementation.

Although software quality is a vast topic this short list of guidelines should be enough to (if followed) significantly improve code quality. The domain specific terminology will also help to discuss code in an objective non-personal and respectful way.

Achieving and (what’s more difficult) maintaining a high quality source code requires team’s commitment and is a never ending process. As with the best coding guidelines it can be very smart and complex but I will try to simplify it.

There are two key components which I believe are sufficient to lay a foundation for a well maintain and stable software. Those are also prerequisites to more mature methodologies which you can build around them with a time. Those components are: unit testing and code reviews:

I deliberately said unit testing instead of a Test Driver Development because TDD can be a big move for an inexperienced team. You have to start somewhere and it’s better to take it easy. A good start would be writing Unit Tests for all core classes and for reproducing bugs before fixing them. The less critical code should be written with Unit Testing in mind. What I mean by that is it should be possible to write a Unit Test for such a class without a major refactoring (ideally no refactoring should be needed).

Unit tests will not only improve the stability of your software but also it’s quality! The reason for that is you can’t create a correct Unit Tests for a coupled code. If you’re not experience with Unit Testing I do recommend you a great book I’ve mentioned before – The Art of Unit Testing by Roy Osherove. If you already have some experience with Unit Testing than perhaps you might be interested in refreshing you knowledge about Unit Testing best practices.

All of the clever ideas I’ve already mentioned won’t help you much without consistency. Code can get easy out of control if the team doesn’t keep an eye on quality and how the new commits fit into the big picture. Regular code reviews will help you embracing the quality-centric culture and collaboration.

The easiest approach would be to setup weekly code reviews where everybody go though the latest commits of their peers. They can list out all issues and then discuss them on a meeting following the review. Ideally all of the issues should be fix on the same day after the meeting (it might not always be possible).

Regular reviews are good but the team might find them impractical at certain periods. If you’re using GIT for version control then you can build code review around pull requests. There are few good products to support it but if you’re not happy to pay you can install something like GitLab. It’s an open source version of GitHub and provides you similar functionality.

Writing a good code is not difficult and doesn’t require lots of additional time. I would say it’s actually the opposite. Bad code is very difficult to work with and cost lots of money in the long run. I hope you will find the proposed approach useful and give it a try. If you’re familiar with this subject but have a different opinion or perhaps some suggestions I would love to hear it.

Intorduction to Munin monitoring tool and extending it with PHP

Lukasz Kujawa — Fri, 16 May 2014 11:19:37 +0000

This is a quick introduction to http://munin-monitoring.org/ monitoring tool. To make it little bit more interesting I decided to step outside of my comfort zone and record a video instead of posting plane text.

Enjoy watching!

Hello_world extension

#!/usr/bin/php graph_title Hello World! graph_vlabel important value graph_total total rn1.label First random number rn2.label Second random number

If you need a VirtualHost configuration you can use the one below.



    ServerName 192.168.0.7

    DocumentRoot /var/www/html/munin

    ErrorLog  /var/log/httpd/munin-error.log
    CustomLog /var/log/httpd/munin-access.log combined

        
               Order allow,deny
               Allow from All
               Options None
               AuthUserFile /etc/munin/munin-htpasswd
               AuthName "Munin"
               AuthType Basic
               require valid-user
           
               ExpiresActive On
               ExpiresDefault M310

Parsing binary data in PHP on an example with the PCAP format

Lukasz Kujawa — Wed, 12 Mar 2014 18:16:06 +0000

Not every problem should be solved with PHP. For certain types of applications it’s simply more appropriate to select a different programming language. You might be glad to hear that parsing binary data, is **NOT** one of them. If you don’t have to worry about microseconds and the biggest concern is development time, PHP might be in fact a very good tool for the job.

There are many different file formats which we potentially could investigate but for this post I will focus only on PCAP. It’s easy, fun to work with and holds an interesting kind of data.

The libpcap file format (full name) is the main capture format used in TcpDump/WinDump, Wireshark/TShark, snort, and many other networking tools. If you are not familiar with any of the above names, those programs are packet analysers (or network sniffers).

The format is well document on the Wireshark website. It consists of 3 different data structures:

Packet data is an array of chars (in PHP it’s a string) and doesn’t have a defined size. It’s encapsulated by a packet header which defines its length.
A very simple (and useless) parser would have to find the first packet header, figure out length of the packet data and move to another packet header (which lays immediately after data section).

Before we get to the first packet header we need to read the global header. It begins at the first byte and holds some basic information about the file.

The global header structure is defined as follows:

typedef struct pcap_hdr_s {
        guint32 magic_number;   /* magic number */
        guint16 version_major;  /* major version number */
        guint16 version_minor;  /* minor version number */
        gint32  thiszone;       /* GMT to local correction */
        guint32 sigfigs;        /* accuracy of timestamps */
        guint32 snaplen;        /* max length of captured packets, in octets */
        guint32 network;        /* data link type */
} pcap_hdr_t;

If you are not familiar with C, please allow me to explain what the above notation means. The simples way to think about a C structure is to imagine a PHP class with public attributes and no methods. A PHP translation can look this:

class pcap_hdr_t {
     public $magic_number; /* magic number */
     public $version_major; /* major version number */
     public $version_minor; /* minor version number */
     public $thiszone;  /* GMT to local correction */
     public $sigfigs; /* accuracy of timestamps */
     public $snaplen; /* max length of captured packets, in octets */
     public $network;  /* data link type */
}

The biggest difference is that in PHP our attributes can hold any kind of data while in C they have strictly defined type and length.

guint32 – Unsigned Integer 4 bytes long (32 bits)
guint16 – Unsigned integer 2 bytes long (16bits)
gint32 – Integer 4 bytes long (32 bits)

Based on this information we can be sure that:

the global header has exactly 24 bytes
the first 4 bytes are the “magic_number” and should be interpreted as an unsigned integer
following 2 bytes are the “version_major” and should be interpreted as an unsigned integer
and so on… I’m sure you can recognise the pattern

Lets create an example pcap file on which we are going to experiment. The easiest way would be running tcpdump:

$ tcpdump -i eth0 -w example.pcap
tcpdump: listening on eth0, link-type EN10MB (Ethernet), capture size 65535 bytes

(make sure you specify a network interface “-i eth0”, tcpdump can sniff on all interfaces but the output file will be saved in PCAP-NG instead of PCAP)

Generate some network traffic by visiting any website or pinging google.com, and than switch back to the terminal, and press CTRL+C.

118 packets captured
118 packets received by filter
0 packets dropped by kernel

Run `ls` to make sure there is some data in the file.

$ ls -la
total 56
drwxr-xr-x   3 lukasz  staff    102  9 Mar 01:39 .
drwxr-xr-x  62 lukasz  staff   2108  9 Mar 01:38 ..
-rw-r--r--   1 lukasz  staff  25689  9 Mar 01:39 example.pcap

Now lets open the file in PHP and read the first 24 bytes of data.

<?php

$fh = fopen( 'example.pcap', 'rb');

if( ! $fh ) {
     throw new Exception( "Can't opend the PCAP file" );
}

$buffer = fread( $fh, 24 );

fclose( $fh );

Please notice the ‘b’ flag passed to the fopen() function. It forces binary mode and prevents PHP from being clever about the data.

Now we have a raw data in the buffer and we need to translate it to something what makes sense. In C one could simply cast the buffer on a desired structure (although it’s a discourage practice because it makes the code less portable). In PHP (and PERL from which the function was borrowed) you can use unpack().

Unpack takes a binary string and translates it to an array of values. To start simple lets read only the first 4 bytes which represent the magic_number.

<?php

/* ... */

$buffer = unpack( "Nmagic_number", fread( $fh, 24 ) );
print_r( $buffer );

The function should return something like this:

Array
(
    [magic_number] => 3569595041
)

You probably noticed the strange “N” character before “magic_number” passed to the unpack(). It tells the function how to parse binary string. “N” stands for “unsigned long (always 32 bit, bigendian byte order)“. You can find all codes under pack() documentation.

Code	Description
a	NUL-padded string
A	SPACE-padded string
h	Hex string, low nibble first
H	Hex string, high nibble first
c	signed char
C	unsigned char
s	signed short (always 16 bit, machine byte order)
S	unsigned short (always 16 bit, machine byte order)
n	unsigned short (always 16 bit, big endian byte order)
v	unsigned short (always 16 bit, little endian byte order)
i	signed integer (machine dependent size and byte order)
I	unsigned integer (machine dependent size and byte order)
l	signed long (always 32 bit, machine byte order)
L	unsigned long (always 32 bit, machine byte order)
N	unsigned long (always 32 bit, big endian byte order)
V	unsigned long (always 32 bit, little endian byte order)
f	float (machine dependent size and representation)
d	double (machine dependent size and representation)
x	NUL byte
X	Back up one byte
Z	NUL-padded string (new in PHP 5.5)
@	NUL-fill to absolute position

If you want to unpack more than one value you have to use “/” character as a separator. To read the full header you can do:


That will return all header’s values.
Array
(
    [magic_number] => 3569595041
    [version_major] => 2
    [version_minor] => 4
    [thiszone] => 0
    [sigfigs] => 0
    [snaplen] => 65535
    [network] => 1
)

You might have noticed that there are 3 different types for a 32 bit long:

machine byte order
big endian byte order
little endian byte order

There are two ways of storing a number which is longer than a byte. For example, number 2864434397 can encoded as: 0xAA 0xBB 0xCC 0xDD (Big-endian) or 0xDD 0xCC 0xBB 0xAA (Little-endian). Different platforms might prefer a different order. If you want to stick to your default settings you can use the “machine byte order” which is relative.
So why did I decide to use Little-endian for all values following the magic number?

The magic number in my PCAP is 3569595041. That’s 0xd4c3b2a1 in hex. PCAP documentation states that:
The writing application writes 0xa1b2c3d4 with it’s native byte ordering format into this field. The reading application will read either 0xa1b2c3d4 (identical) or 0xd4c3b2a1 (swapped). If the reading application reads the swapped 0xd4c3b2a1 value, it knows that all the following fields will have to be swapped too.
A good parser should handle both cases but for the sake of simplicity I will ignore the other possibility.
At this moment you should have a script which can read and understand the first 24 bytes of a PCAP file. The next step is to read the first packet header.
typedef struct pcaprec_hdr_s {
        guint32 ts_sec;         /* timestamp seconds */
        guint32 ts_usec;        /* timestamp microseconds */
        guint32 incl_len;       /* number of octets of packet saved in file */
        guint32 orig_len;       /* actual length of packet */
} pcaprec_hdr_t;

This structure is simpler and is always 16 bytes long. Now we have all information we need to finish the parser.
<?php

$fh = fopen( 'example.pcap', 'rb');

if( ! $fh ) {
     throw new Exception( "Can't opend the PCAP file" );
}

/* Reading global header */

$buffer = unpack( "Nmagic_number/vversion_major/vversion_minor/lthiszone/Vsigfigs/Vsnaplen/Vnetwork", fread( $fh, 24 ) );

printf( "Magic number: 0x%s, Version: %d.%d, Snaplen: %dn",
          dechex( $buffer['magic_number']),
          $buffer['version_major'],
          $buffer['version_minor'],
          $buffer['snaplen'] );

/* Reading packets */

$frame = 1;
while( ( $data = fread( $fh, 16 ) ) ) {

     /* Read packet header */
     $buffer = unpack( "Vts_sec/Vts_usec/Vincl_len/Vorig_len", $data );

     /* Read packet raw data */
     $packetData = fread( $fh, $buffer['incl_len'] );

     printf( "Frame: %d, Packetlen: %d, Captured: %dn",
               $frame,
               $buffer['orig_len'],
               $buffer['incl_len'] );
    
     $frame++;
}

fclose( $fh );

Processing PCAP is very straight forward. You have to read 16 bytes of a packet header and get “incl_len” to find out where the next packet header starts.
Although the parser doesn’t do much this example should give you a good understanding of how to deal with binary data in PHP. If you would like to push it further and find out an IP address, or TCP payload open the PCAP file in Wireshark and try to figure it out. If you need some extra help have a look at this great article about programming with libpcap. You will find there all the structures you need (sniff_ethernet, sniff_ip and sniff_tcp) with a solid explanation.
As you can see PHP is capable of doing more than generating a dynamic HTML code. It’s obviously not so as fast as C but there are many occasions when that’s not a big problem.

A painless guide to Apache CouchDB for a PHP developer

Lukasz Kujawa — Sat, 04 Jan 2014 00:40:34 +0000

World of technology loves buzzwords. There is always something new which allegedly is the game changer, and we all should upgrade in the matter of urgency. On the other hand, absorbing new knowledge costs time and effort. It’s difficult to decide which technology is worth sacrificing brain glucose levels.

Few months ago I had to chose a database for my web crawler. I needed something which is simple, fast and could hypothetically store the whole internet. I subconsciously felt that I needed a NoSQL database although I didn’t know which one.

My liberation came from Nathan Hurst (thank you) and his priceless post – Visual Guide to NoSQL Systems. The CAP Triangle (Consistency, Availability and Partition Tolerance) narrowed down my options to “only” 8 systems (AP). I tossed an eight sided coin and chose Apache CouchDB.

Three months later I can say I’m still happy with my choice. CouchDB is simple (only 18,000 lines of code) yet powerful database. It talks JSON though the HTTP protocol. You don’t need any special tools or libraries to use it. In fact, you don’t even need a backend to create a web application because it’s capable of hosting static files.

If you’re also looking for a scalable and fast document store, or you just simply want to try a NoSQL have a look at CouchDB. It’s very easy to learn and I believe you will find it useful.

To install CouchDB on Linux use a package manager of your choice, for example:

$ sudo apt-get install couchdb

or if you are Mac/Windows user get the CouchDB binary and run it.

Once the database is running you can start playing with it though a web interface called Futon. To access it visit http://127.0.0.1:5984/_utils/.

CouchDB provides REST API and you can perform all operations with CURL. The API is well documented so I won’t discuss it. CouchDB manual is distributed with the database and you can access it locally at http://127.0.0.1:5984/_utils/docs/.

There are few PHP libraries for CouchDB although I wasn’t happy with them, and started developing my own. You can find it on GitHub if you like, but I recommend to install it with the composer.

$ mkdir couchdb-test
$ cd couchdb-test
$ curl -sS https://getcomposer.org/installer | php 
$ vim composer.json

{
    "require": {
        "lukaszkujawa/couch-php": "dev-master"
    }
}

$ php composer.phar install

Now lets create some PHP code.

 'product-00001',
     'schema' => 'product',
     'name' => 'Your Brain at Work: Strategies for Overcoming Distraction, Regaining Focus, and Working Smarter All Day Long',
     'type' => 'book',
     'reviews' => array(
          array(
               'stars' => 5,
               'title' => 'Outstanding book',
               'text' => 'One of the best books I've ever read. David Rock collects together a bunch of neuroscience(..)'
          ),

          array(
               'stars' => 3,
               'title' => 'Tough to Get Your Brain Round 'Your Brain'',
               'text' => 'I had a difficult time getting into and through this book(..)'

          ),
      )
));

$doc->insert();

When you run the script for the first time it will as expected populate a new document in your database.

An interesting thing will happen when you run the script again.

PHP Fatal error:  Uncaught exception 'CouchPHPClientException' with message 'conflict'

You can’t insert the same document twice. If you want to amend a document you need to assure CouchDB that you have the latest version of it. You do it by setting a revision property “_rev” which is automatically added to a document on insert and update (see the screenshot above).

To fix it you can first get the product-00001, read _rev value and add pass it in the Document constructor.

 'product-00001',
     'schema' => 'product',
     'name' => 'Your Brain at Work: Strategies for Overcoming Distraction, Regaining Focus, and Working Smarter All Day Long',
     'type' => 'book',
     'reviews' => array(
          array(
               'stars' => 5,
               'title' => 'Outstanding book',
               'text' => 'One of the best books I've ever read. David Rock collects together a bunch of neuroscience(..)'
          ),

          array(
               'stars' => 3,
               'title' => 'Tough to Get Your Brain Round 'Your Brain'',
               'text' => 'I had a difficult time getting into and through this book(..)'

          ),
      )
);

$doc = CouchPHPDocument::getById('product-00001');

if( $doc ) {
     $fields['_rev'] = $doc->_rev;
}

$doc  = new CouchPHPDocument($fields);

$doc->insert();

If you don’t worry about concurrency you can use “insertOverwrite” method, and the library will handle getting the latest “_rev” for you.

 'product-00001',
     'schema' => 'product',
     'name' => 'Your Brain at Work: Strategies for Overcoming Distraction, Regaining Focus, and Working Smarter All Day Long',
     'type' => 'book',
     'reviews' => array(
          array(
               'stars' => 5,
               'title' => 'Outstanding book',
               'text' => 'One of the best books I've ever read. David Rock collects together a bunch of neuroscience(..)'
          ),

          array(
               'stars' => 3,
               'title' => 'Tough to Get Your Brain Round 'Your Brain'',
               'text' => 'I had a difficult time getting into and through this book(..)'

          ),
      )
));

$doc->insertOverwrite();

Look for the moment at the document fields. One of the properties is called “schema”. It’s an arbitrary label which defines document type. In our simple example we have only one document type but in real live you could have some additional types like: basket or order. CouchDB database doesn’t have tables as such but you can query for documents with certain properties.

To select documents of a specific kind you need to create a view. View is a JavaScript function which is stored in so called “design document”. Design documents are special type of CouchDB document and are always prefixed with “_design/“. Futon doesn’t provide a dedicated interface for creating design documents but there is a special section for creating views (which are part of a design document).

If you navigate to the test database page (http://127.0.0.1:5984/_utils/database.html?test) you will see “View: All Document” drop down in the top right hand side.

Click on the drop down and select “Temporary view…”. You will get redirected to a special page where you can design a view.

The default map function doesn’t have any conditions and will always return all documents.

function(doc) {
  emit(null, doc);
}

Click “Run” button to see how it works.

Lets create a view which will return only product documents.

function(doc) {
  if( doc.schema && doc.schema == 'product' ) {
    emit(null, doc);
  }
}

That will do the trick but we can push it further. The first argument of the emit function is null. You can set it to product type which will allow some more advance search.

function(doc) {
  if( doc.schema && doc.schema == 'product' ) {
    emit(doc.type, doc);
  }
}

Click “Save As…” to save the view. First you need to specify which design document will hold the view – name it “products”. Second input is the view name – call it “all”.

Once the view is saved access it directly from your web browser http://127.0.0.1:5984/test/_design/products/_view/all. As expected the view will return our one and only product.

Now let’s imagine you want to see products which are books. The view emits doc type as a key so we can do it with two parameters “startkey” and “endkey”.

http://127.0.0.1:5984/test/_design/products/_view/all?startkey=”book”&endkey=”book”

To find all querying options look into the documentation.

Key emitted by the view doesn’t have to be flat. You can emit list of values, for example:

emit([doc.type,doc.category], doc);

Getting all books for this configuration would look like this:

startkey=["book"]&endkey=["book",{}]

It’s important to know that CouchDB creates an index for every view. It makes search faster but it cost space on disk. Another interesting fact is that index won’t be created until first query to that view. On some occasions (after inserting hundreds of thousands of documents) it might take even few minutes.

Although I covered only basics of CouchDB it should be enough for many applications. If you would like to learn more about the database I recommend you CouchDB – The Definitive Guide which is free to read.

API-based Web Application with Backbone, Require.js and Slim framework

Lukasz Kujawa — Sat, 21 Sep 2013 11:14:46 +0000

A single-page application (SPA) was something I’ve been exploring for the last few months. I always liked the idea of moving certain responsibilities to the client’s side. After all, why would you like to waste server’s RAM/CPU to buil a HTML page (and pay for a transfer to deliver it) when a web browser is perfectly capable of doing that on its own? You will not only save money but also provide a better user experience. In addition to the performance, moving the presentation layer to the web browser gives a clearer division between back-end and front-end code.

Recently I came across an extremely useful boilerplate for Backbone.js created by Thomas Davis. The boilerplate brings together one of the most popular (and useful) front-end libraries:

It’s a great start because if you are new to those technologies it’s really hard to guess what would be the best way to put everything together. I used it as a foundation to create an example App which talks to a PHP back-end.

The back-end could be a plain PHP but instead I went for the Slim micro framework. It’s fast, very easy to learn and will enforce a better code structure. After all, this is one of the best use cases for a micro framework.

Before I will go into details get a copy of the example from GitHub and install it on your web server.

$ git clone https://github.com/lukaszkujawa/api-based-application.git
$ cd api-based-application/
$ php composer.phar install
$ sudo vim /etc/apache2/sites-available/your-vhost


        ServerName 192.168.186.133
        ServerAdmin admin@localhost

        DocumentRoot /home/lukasz/projects/api-based-application/public

        
                Options Indexes FollowSymLinks MultiViews
                AllowOverride None
                Order allow,deny
                allow from all

                RewriteEngine On
                RewriteBase /
                RewriteRule ^index.php$ - [L]
                RewriteCond %{REQUEST_FILENAME} !-f
                RewriteCond %{REQUEST_FILENAME} !-d
                RewriteRule . /index.php [L]

$ sudo a2ensite your-vhost
$ sudo /etc/init.d/apache2 reload

When you open the page in your web browser you should see something like this:

A very minimalistic website with 3 links in the main content area. You can add some more if you follow the “create” link in the gray nav bar.

While you’re switching between pages to submit a new URL, notice that your web browser won’t reload. All front-end code is stored in JavaScript. When the web browser needs to talk to the web server it does it in the background with AJAX. It makes the application very responsive and limits number of requests.

To understand how it works lets have a look into “public/index.php“. The script defines 3 endpoints:

$app->get(‘/link/latest‘, function () use ($app)
$app->post(‘/link/submit‘, function () use ($app)
$app->get(‘/‘, function() use ($app)

The first one returns a JSON object with all available links. In the real live it would look for them in a database but for the sake of simplicity I’m using a session. The second one handles POST request and stores links in the session. The last one is the simplest. It renders “templates/index.php” for all requests to the root “/”.

The template is an almost empty HTML5 file.

The only interesting line here is:

http://js/libs/require/require.js

This is where the front-end gets initialised. It loads the Require.js library and refers it to js/main.js.

If you’re not familiar with Require.js it’s a module loader. Small but powerful library for breaking JavaScript code into multiple files/modules. If you want to learn only one new front-end framework this is where I would start. It has a good documentation and will help you organise your JavaScripts.

The next file we are going to look at is obviously “public/js/main.js”.

require([
  'views/app',
  'router',
  'vm'
], function(AppView, Router, Vm){
  var appView = Vm.create({}, 'AppView', AppView);
  appView.render();
  Router.initialize({appView: appView});  // The router now has a copy of all main appview
});

It loads 3 modules: views/app.js, router.js and vm.js.

Router.js is the heart of the application. It defines available pages and assigns them with the appropriate views.

var AppRouter = Backbone.Router.extend({
    routes: {
      // Pages
      'create': 'create',

      // Default - catch all
      '*actions': 'defaultAction'
    },

    instance: null

  });

The AppRouter extends Backbone.js router which is documented on the official website.

router.on('route:create', function () {
        require(['views/create'], function (CreateView) {
          console.log( "router::create" );

          var createView = Vm.create(appView, 'CreateView', CreateView);
          createView.render();
        });
    });

router.on('route:defaultAction', function (actions) {
      require(['views/index'], function (IndexView) {
        console.log( "router::defaultAction" );

        var indexView = Vm.create(appView, 'IndexView', IndexView);
        // indexView.render();
      });
    });

In this example we have only one custom route – “create“. It’s handled by “public/js/views/create.js“. Any other request will be resolved to the default action handled by “public/js/views/index.js“.

The index view does two things: initialise and render. When the object is initialised it’s calling LinksModel (public/js/models/links.js) model to fetch all links from the PHP back-end.

model: new LinksModel(),

initialize: function() {
        console.log( "controllers/IndexController::initialize()" );

        var context = this;

        this.model.fetch({
                success: function () {
                        context.render();
                }
            });

    },

Once a JSON object is successfully pulled from the web server the “public/templates/index/index.html” template is parsed and rendered with Lo-Dash template function.

render: function () {
        console.log( "controllers/IndexController::render()" );
        console.log( indexTemplate );
        console.log( this.model.toJSON() );

        this.$el.html ( _.template( indexTemplate, this.model.toJSON() ) );
}

The LinksModel I mentioned above extends Backbone.js model. Backbone models are the heart of any JavaScript application, containing the interactive data as well as a large part of the logic surrounding it: conversions, validations, computed properties, and access control. It’s a very useful and important entity so try to learn more about it.

define([
  'lodash',
  'backbone'
], function(_, Backbone) {

        var linksModel = Backbone.Model.extend({

           defaults: {
              links: []
           },

            url: "link/latest",

            initialize: function(){
                console.log("Links model init...");
            }

        });

        return linksModel;

});

In our example the model defines a data type called “links” which is an array and a URL “link/latest” to populate the defaults from.

As you can see it’s all quite simple. The only problem is that you have to embrace multiple JavaScript frameworks at once but the boilerplate makes it much easier. What I really like about this setup is the structure. It’s similar to how things are done in the back-end and it should be fine even with big applications.

I know that some developers feel anxious about writing one application in two languages simultaneously. I can sympathise with them because I prefer simple solution. The problem is that we like it or not, the web applications will have more and more JavaScript code. As the old saying goes – “if you can’t fight them, join them”. With the latest front-end frameworks you can improve user experience, cut hosting costs and have a well structured code. Single-page applications will probably become the industry standard in the next few years so you have all the reasons to give it a go.

Continuous Integration for PHP with Jenkins

Lukasz Kujawa — Fri, 06 Sep 2013 23:12:04 +0000

Today I would like to show you how to setup an continues integration server for a PHP project with Jenkins. If you don’t know what the continues integration is have a look at Wikipedia. In a simple words It can be explained as a process of frequent commits to the project’s mainline and running all sort of automated tests to discover problems as soon as it’s possible. CI comes with a long list of benefits and only two disadvantages (according to Wikipedia). One of them is “Initial setup time required”.

Setting up a CI server can be tricky if you haven’t done it before. It relies on various tools which can yell for dependencies and configuration. Fortunately Jenkins on it’s own is very simple to install.

$ apt-get install jenkins

That should install and run the service. You should be able to access it immediately on port 8080.

You need to create a new job but before we do that lets install the PHP template for Jenkins. It is a very nice project created by Sebastian Bergmann (and other contributors) to standardise Jenkins jobs for PHP projects.

$ wget http://localhost:8080/jnlpJars/jenkins-cli.jar
# install dependencies for the template
$ java -jar jenkins-cli.jar -s http://localhost:8080 install-plugin checkstyle cloverphp dry htmlpublisher jdepend plot pmd violations xunit
$ java -jar jenkins-cli.jar -s http://localhost:8080 safe-restart
$ sudo apt-get install curl
# install the template
$ curl https://raw.github.com/sebastianbergmann/php-jenkins-template/master/config.xml | java -jar jenkins-cli.jar -s http://localhost:8080 create-job php-template
$ java -jar jenkins-cli.jar -s http://localhost:8080 safe-restart

The template is installed but you will need some PHP tools to generate artifacts for Jenkins:

To simplify the installation I created a Jenkins skeleton project which handles dependencies with the Composer. The project comes with an example code and configuration so you can see a real output straight out of the box. Build script is compatible with Phing so you don’t have to install Ant.

$ git clone https://github.com/lukaszkujawa/jenkins-php-quickstart.git
$ cd jenkins-php-quickstart/
$ php composer.phar install
# 5 minutes later...
$ sudo vendor/bin/phing install

Now you are ready to create a new Jenkins job. Click on the “New Job” link in the left hand side menu. Make up a job name, select “Copy existing Job” and type “php-template” into the “Copy from field”.

The first important thing is to set the project’s workspace. Got to “Advance Project Options” and check “Use custom workspace”.

Scroll down to the “Build” section and add a new build step.

Select “Execute shell” and paste “vendor/bin/phing -logger phing.listener.NoBannerLogger” to the “Command” text box.

Delete Invoke Ant.

Find PHPUnit section (almost at the bottom of the page) and uncheck “Delete temporary JUnit files”.

Save settings and enable the project.

You are ready to perform your first build. Click on “Build Now” on the left hand side navigation. Give if a few moments and you should see a new line in the “Build History” below the navigation.

If everything went right you should see a blue circle.

As you can see on the above screenshot I had 2 unsuccessful attempts (I forgot to uncheck deleting temporary JUnit files). If you want to know what went wrong you can always click on a red circle.

I recommend you to play around with Jekins settings and features. You should also look into the configuration files and tweak them according to your needs:

phpcs.xml
phpdox.xml
phpmd.xml
phpunit.xml
build.xml

The last challenge is to update code and run build on every commit but I will leave that with you. I hope this article gave you a smooth start with Jenkins and you will try continues integration with one of your projects.

Sucesfull planning in software development

Lukasz Kujawa — Sun, 01 Sep 2013 07:48:57 +0000

All successful people have a plan. They plan literally everything from work related stuff to details of their personal live. One of my friends used to plan how much he’s going to earn the next day. He used to make up a number and write a short sentence in his notebook like “Tomorrow I’m going to earn £5,000” (It was actually 25,000zl because he lives in Poland). In some mysterious way he was able to earn that money which was astonishing even himself. Obviously if you don’t own a business your salary won’t spontaneously change over a night but a good plan will help you to manage your boss, the development and ultimately will contribute to your success.

What I’m going to say here is not rocket science. It’s actually pretty simple common sense stuff but if you didn’t try it before you are going to be amazed how effective that is. It might be also used as the first step before moving into SCRUM.

Lets assume you received a high level requirements to integrate an online store with PayPal (which by the way is the worse integration experience ever). The requirements might come with a request to do it in 3 weeks or if you’re lucky somebody will ask you for your estimates.

Regardless of the time pressure you need to break the requirements into user stories. If you’re not familiar with Agile a user story is a sentence written in a business language describing what a user does or needs to do as part of his job function. In this case that might be:

as a merchant I want to setup my PayPal account to collect payments
as a merchant I want to return money to my customers for orders paid via PayPal if I change order status to “refunded”
as a customer I want to use PayPal to pay for my order

It’s important to confirm user stories with stakeholders. Usually you can capture and have them confirmed on the same meting you received the requirements. Even if that’s the case don’t worry about over communicating. Write a short e-mail summarising what are you going to work on.

Once you have a good understanding of what to do and you confirmed it’s exactly what the stakeholders want, you can start working on your plan.

Break down user stories into separate tasks (you can think about them as points on a todo list), for example:

as a merchant I want to setup my PayPal account to collect payments:

research how to integrate with PayPal
design PayPal section on the payment settings page
integrate the PayPal design with the payment section page
design and create database schema
create PayPal model UnitTest
create PayPal model
alter payments settings controller to handle PayPal payments
log updates to PayPal account settings

It might be boring and laborious process because in real live you will have more than 3 user stories. Don’t rush and stay focused. It’s very important to do it right because you are going to build your strategy around those tasks.

Once the list is done you need to estimate how long it will take to complete each of the tasks. There are few ways to do it and different people might prefer a different approach. I usually go for the following formula:

estimate in hours
use simple values like: 0.5h, 1h, 2h, 4h, 8h, 12h, 16h, 24h
get a pessimistic and optimistic estimate and chose a value between them
don’t estimate other peoples tasks
confirm your estimates with other developers

If you would like to have a proper Agile experience go for the planning poker.

Now it’s going to get fun.

Sum up all numbers and add about 10-15% for QA and bug fixing. Confront it with available resources and you will get a guess (and this is the right way to call it) of delivery date. Obviously that’s under assumption that the world is a perfect place. Nobody will get off sick and everybody is going to be 100% productive. In the real live something will always go wrong so you need an extra 15% of time for contingency.

Just to make it clear, contingency is not a free time added to a project by default. You need to operate as it’s not there and you’re allowed to claim it only for a valid reason.

If the delivery date doesn’t fit your deadline you have the responsibility to communicate it to the business. I recommend a simple but very effective presentation where you display high level requirements as multi color boxes. In our simplified example I would create only two of them: “paypal payments” and “paypal returns”.

This is the complete set of features which the business expects you to deliver (in the real live you would be looking at something similar to a chessboard). Now, lets assume you have only 20 man-days available so the realistic scenario will look as follows:

This diagram shows core functionality, which can be delivered to the end user in a given time frame. It provides 20% contingency (4 days) which reduce a risk and is much more realistic.

Obviously there might be a business need to deliver everything in the given time. If that’s the case you will need to ask for additional resources. If there is no budget available then you need to be transparent about the risk. There is always a chance you will deliver on time but if things go wrong than everybody will know what to expect.

The idea is to print those diagrams and handle them to everybody on the meeting. Encourage people to write on them and rearrange things if they want to prioritise.

At this point you should have a complete and approved plan. You can print it off and monitor your progress on a paper or (more likely) us a web application. There are plenty of good tools over the Internet and you need to select one which will suit you the best. My recommendations are:

Google Docs – good old Excel spreadsheet with version control is a sufficient solution for many organisations
Trello – I personally didn’t use it for my projects but I know many people who absolutely love it
Basecamp – very successful project management and collaboration tool
Moovia – Trello on steroids although little bit confusing interface

If you would like to see more tools have a look at Agile Designer website. They have a great catalog and it’s a good website to know.

Planning is extremely important even you are the only developer. It improves communication with stakeholders, gives a structure to your work and motivates. Every time you tick a task off your list you will experience an immediate sense of accomplishment. If on the other hand you were not productive enough you will instantly notice an annoying trend change.

Load testing with BlazeMeter

Lukasz Kujawa — Sun, 28 Jul 2013 23:42:00 +0000

Some time ago I wrote a post about using jMeter to benchmark performance of a web application. Apache jMeter is a great piece of software and I strongly recommend it but it’s not everything. You also need a high capacity bandwidth and a reliable hosts to run a test from. You often need more than a one testing server because it’s impossible to generate real concurrent connections from a single computer. The reason for that is a network interface can physically send only one packet at the time. People from BlazeMeter recognised this problem and created a web based service which makes the load testing very simple.

BlazeMeter is “the load testing platform for developers”. It allows you to upload jMeter tests, allocate resources and analyse results though a straight forward graphs. It has many interesting features and I would like to highlight some of them.

To add a test to the BlazeMeter you have to specify a geographical location from where the load test is going to be executed. At the time of writing this post there are 8 locations available: EU West (Irleand), US Easy (Virginia), US West (N.California), US West (Oregon), Asia Pacific (Singapore), Australia (Sydney), Japan (Tokyo) and South America (Sao Paulo). This quite impressive selection allows you to load test your website from different parts of the world.

BlazeMeter knows how to overwrite jMeter’s concurrent users settings so you don’t have to edit jMeter file every time you need to change a number of connections. It can also ramp up to a desired number of users in a specified time frame. This is very useful for visualising how increasing number of users will impact the response time.

If you don’t want to create a jMeter file you can still use BlazeMeter by simply specifying target URLs.

Tests can be scheduled to run automatically so BlazeMeter can be used as a monitoring tool.

Another interesting feature is network emulation. BlazeMeter allows to set a download limit and a network delay to simulate traffic from various networks (for example slow 3G connetion).

There are also things like API access, New Relic and Selenium integration. All of that is packed into nice and easy interface. If you need any help BlazeMeter has active community and lots of useful resources.

I’ve been using BlazeMeter for the last few months and it’s a great product. It’s simple, powerful and does the job. Check it yourself. Free account allows you to perform 10 tests per month with 50 concurrent users which is more than enough to begin with.

How to improve PHP programming skills

Lukasz Kujawa — Sat, 27 Jul 2013 23:25:07 +0000

“How do I improve my PHP skills?” is a recurring question on various boards and chats. It’s often asked by newbies but even experienced developers ask themselves the same thing. After all trying to be better is in the human nature. This is a deep question and when you think about it there isn’t a straight forward reply. Nevertheless I will try to give a comprehensive answer which hopefully is going to be useful not only to the beginners but also to the people with some commercial experience.

Matthew Syed in his book “Bounce” tells us that to reach an elite level in any discipline one has to practice for 10,000 hours (which is roughly 10 years). I don’t think that mastering PHP requires 10 years but practicing is essential. Obviously practice must be purposeful, otherwise it is useless.

The last sentence of the above paragraph is the key to sucess. It’s not only about what you have to do but also how are you doing it.

Make sure you’ve chosen the most effective way of learning. It will depend on your actual knowledge. Beginners might prefer direct interaction with people or follow video tutorials while experience developer could chose blogs, books or conferences.

Once you learn something new – practice it. It might require creating a proof of concept, a pet project or changing your habits. Stay curious and open minded. Don’t be afraid of breaking things. If you can check something yourself do it.

Although context of this post is PHP the question doesn’t have much to do with this particular language. Improving PHP skills means improving your universal programming skills. That can be broken down into 3 areas:
– engineering (coding standards, design patterns, unit testing, algorithms etc.)
– managing (application life cycle, version control, agile etc)
– environment (databases, operating system, networks, protocols etc)

Each person needs a different mixtures of the above skills – depends on what’s their ultimate goal. There are various senior positions for a developer but generally speaking all of them are either technical (head of development, senior architect etc) or commercial (development manager, CTO etc). The senior roles can be paid equally well so it’s a matter of choosing between science and business.

Ok, after this somewhat long introduction it’s the time to answer the question. To improve you PHP (or generally speaking web development) skills pick a point from the below list and try to learn as much as you can about it. You don’t have to follow the proposed order but ultimately you want to know everything from this list.

http://rcm-eu.amazon-adsystem.com/e/cm?t=systearchi-21&o=2&p=48&l=ur1&category=pcvideogames&banner=0ARHTTTPV6PH0V84N202&f=ifr

PHP programming
– PHP basics: variables, loops and functions
– Arrays (http://uk1.php.net/manual/en/book.array.php)
– File system functions (http://uk3.php.net/manual/en/ref.filesystem.php)

Font-end basics
– HTML
– CSS

Object oriented programming in PHP
– Classes and Objects (http://php.net/manual/en/language.oop5.php)
– Exceptions (http://www.php.net/manual/en/language.exceptions.php)
– Namespaces (http://www.php.net/manual/en/language.namespaces.php)

Database basics
– SQL basics (select, insert, update, delete)
– PHP PDO (http://uk3.php.net/manual/en/class.pdo.php)

Front-end
– JavaScript
– jQuery
– Responsive web design

PHP
– XML & DOM (http://uk3.php.net/manual/en/book.dom.php)
– Regular expressions (http://www.regular-expressions.info/tutorial.html)
– SPL (http://uk3.php.net/manual/en/book.spl.php)
– Magic Methods (http://php.net/manual/en/language.oop5.magic.php)
– GD (http://uk3.php.net/manual/en/book.image.php)
– JSON (http://uk3.php.net/manual/en/book.json.php)

Database
– Database design (http://en.wikipedia.org/wiki/Database_design)
– Indexing
– Maintenance (manage users, backups)
– SQL optimisation

Software design
– Design patterns (“PHP Objects, Patterns and Practice”)
– Algorithms and data structures (Introduction to Algorithms)
– Unit Testing (The Art of Unit Testing: with Examples in .NET)
– PHP Frameworks (one is enough)
– UML

Web application security
– MySQL injections
– Cross site scripting

Code managment
– Version control (SVN or GIT)
– Branching (http://nvie.com/posts/a-successful-git-branching-model/)
– Bug tracking (any available software)
– Coding standards (“Clean Code: A Handbook of Agile Software Craftsmanship”)

Linux
– Command line
– SSH
– Installation and configuration of LAMP environment
– Installing PHP extensions

Apache web server
– Virtual Hosts
– MOD_Rewrite

Alternative storage
– Caching: Memcached or Redis
– NoSQL: MongoDB or CouchDB or Cassandra
– Search engine: SOLR or ElasticSearch

Networking
– OSI Model (http://en.wikipedia.org/wiki/OSI_model)
– TCP/IP protocol
– HTTP protocol
– Working with sniffers (tcpdump or wireshark)
– CURL (http://uk3.php.net/manual/en/book.curl.php)

Leading development
– SCRUM (“Agile Project Management with Scrum”)
– Leading (“How to Lead: What the best leaders know, do and say”)
– Test Driven development

I would like this list to be as useful as possible so I will be extending it with the time. If you feel I missed something or you would like to recommend a good tutorial please let me know.

http://rcm-eu.amazon-adsystem.com/e/cm?t=systearchi-21&o=2&p=12&l=ur1&category=kindle&banner=0KM6H4ZKZB223DNMFAR2&f=ifr
http://rcm-eu.amazon-adsystem.com/e/cm?t=systearchi-21&o=2&p=12&l=ur1&category=kindle_paperwhite&banner=1JGP9S6XAG8Q8VQV2HG2&f=ifr

Reading PHP session from Varnish Cache

Lukasz Kujawa — Mon, 08 Jul 2013 09:00:50 +0000

In my previous post I showed how to integrate Varnish Cache with a PHP application. The example can solve various simple problems but it might not be enough for a complex software. A good example is a multilingual application. One URL can have multiple caches. You might also need to know more about a user (is he logged in? has he received a notification? etc) to make some additional caching decisions.

All of that can be handled with a special cookie(s) which will flag different scenarios but in my experience this is a clumsy solution. You will need to think of all possible user journeys and make sure that appropriate cookies are created. Caching is very difficult on it’s own so there is no need to make it even more complicated. Much better approach in my opinion is pulling data directly from a PHP session.

PHP by default stores session in a file. This might be OK with a single server architecture but if you have more than one web server than you need a centralised storage. Independently of your setup much better place for a PHP session is memcached. It will improve access time, scalability and of course – you will be able to access session from Varnish.

Storing session data inside the memcached is very simple to do with PHP.

$ sudo apt-get install memcached php5-memcached
$ sudo /etc/init.d/memcached start

Edit the php.ini file.

$ sudo vim /etc/php5/apache2/php.ini

Look for session settings

[Session]
; Handler used to store/retrieve data.
; http://php.net/session.save-handler
session.save_handler = files

and change it to

[Session]
; Handler used to store/retrieve data.
; http://php.net/session.save-handler
session.save_handler = memcached
session.save_path = "localhost:11211"

Now restart apache and it’s done.

$ sudo /etc/init.d/apache2 restart

If you like you can test it with the below code.

addServer('localhost', 11211);

foreach( $m->getAllKeys() as $key ) {
  printf( '%s', $key );
  var_dump ( $m->get( $key ) );
}

It should return something like this:

memc.sess.key.lock.78uso0onvumb665c1gm739er36

string

 '1' (length=1)
memc.sess.key.78uso0onvumb665c1gm739er36

string

 'test|s:11:"Hello World";' (length=24)

If it’s all working lets create a simple page which will simulate multilingual support.

My language is: %s (%s)", $lang, time() );

?>



   English


   Spanish


   German

The idea is simple. If langues is set PHP will store it in session as “lang” and appropriate content will be displayed.

The challenge for Varnish is to create and return an appropriate cache based on selected language. Language is saved as a serialised string inside the memcached. It’s stored under “memc.sess.key.UNIQUE_KEY” where the UNIQUE_KEY is a value from the PHPSESSID cookie.

To access memcached from Varnish Cache script you have to install VMOD-Memcached. To compile this module you need Varnish source code.

$ wget http://repo.varnish-cache.org/source/varnish-3.0.3.tar.gz
$ tar zxfv varnish-3.0.3.tar.gz

Get the VMOD and all dependencies.

$ git clone https://github.com/sodabrew/libvmod-memcached
$ sudo apt-get install libmemcached-dev python-docutils
$ cd libvmod-memcached
$ ./autogen.sh 
$ ./configure VARNISHSRC=../varnish-3.0.3/
$ make
$ sudo make install

The extension should be copied into your Varnish vmod directory.

$ ls /usr/local/lib/varnish/vmods/ | grep memcached
libvmod_memcached.a
libvmod_memcached.la
libvmod_memcached.so

The last missing thing is the default.vcl file.

import std;
import memcached;

backend default {
  .host = "127.0.0.1";
  .port = "80";
}

sub vcl_init {
  memcached.servers({"--SERVER=localhost:11211 --NAMESPACE="memc.sess.key.""});
  return (ok);
}

sub vcl_recv {

  if (req.restarts == 0) {
    if (req.http.x-forwarded-for) {
      set req.http.X-Forwarded-For =
      req.http.X-Forwarded-For + ", " + client.ip;
    } else {
      set req.http.X-Forwarded-For = client.ip;
    }
  }

  if (req.request != "GET" &&
      req.request != "HEAD" &&
      req.request != "PUT" &&
      req.request != "POST" &&
      req.request != "TRACE" &&
      req.request != "OPTIONS" &&
      req.request != "DELETE") {
      /* Non-RFC2616 or CONNECT which is weird. */
      return (pipe);
  }

  if (req.request != "GET" && req.request != "HEAD") {
    /* We only deal with GET and HEAD by default */
    return (pass);
  }

  set req.http._sess = regsub( regsub( req.http.Cookie, ".*PHPSESSID=", "" ), ";.*", "" );
  std.log( "Cookie: " + req.http._sess );
  set req.http._sess = memcached.get( req.http._sess );
  std.log( "Session: " + req.http._sess );

  return (lookup);
}


sub vcl_pipe {
  return (pipe);
}

sub vcl_pass {
  return (pass);
}

sub vcl_hash {
  hash_data(req.url);
  if (req.http.host) {
    hash_data(req.http.host);
  } else {
    hash_data(server.ip);
  }

  if( req.http._sess && req.http._sess ~ "lang" ) {
    set req.http._lang = regsub( regsub( req.http._sess, ".*lang.*?x22", "" ), "x22.*", "" );
    std.log( "Lang: " + req.http._lang );
    hash_data( req.http._lang );
  }

  return (hash);
}

sub vcl_hit {
  return (deliver);
}

sub vcl_miss {
  return (fetch);
}

sub vcl_fetch {

  if( req.url ~ "^/$" ) {
    set beresp.ttl = 30m;
    remove beresp.http.set-cookie;
    return(deliver);
  }

  if (beresp.ttl <= 0s ||
    beresp.http.Set-Cookie ||
    beresp.http.Vary == "*") {
    /*
    * Mark as "Hit-For-Pass" for the next 2 minutes
    */
      set beresp.ttl = 520 s;
      return (hit_for_pass);
  }

  return (deliver);
}


sub vcl_deliver {
  return (deliver);
}

sub vcl_error {
  set obj.http.Content-Type = "text/html; charset=utf-8";
  set obj.http.Retry-After = "5";
  synthetic {"
  ERROR
  "};
  return (deliver);
}


sub vcl_fini {
  return (ok);
}

There are few interesting things going on here.

sub vcl_init {
  memcached.servers({"--SERVER=localhost:11211 --NAMESPACE="memc.sess.key.""});
  return (ok);
}

As you probably can guess Varnish will connect to the memcached server on init.

Now look at the bottom of the vcl_recv function.

set req.http._sess = regsub( regsub( req.http.Cookie, ".*PHPSESSID=", "" ), ";.*", "" );
std.log( "Cookie: " + req.http._sess );
set req.http._sess = memcached.get( req.http._sess );
std.log( "Session: " + req.http._sess );

VCL language doesn’t allow to define new variables although you can reuse the predefined one (like in this example “req.http“). By the end of this block you should have the whole PHP session stored inside req.http._sess.

You can use

$ varnishlog | grep Log

to see output of the std.log function.

The most important code happens inside the vcl_hash subroutine.

if( req.http._sess && req.http._sess ~ "lang" ) {
  set req.http._lang = regsub( regsub( req.http._sess, ".*lang.*?x22", "" ), "x22.*", "" );
  std.log( "Lang: " + req.http._lang );
  hash_data( req.http._lang );
}

You can read more about VCL subroutines here but in a nutshell vcl_hash is responsible for building a hash string under which a cache is going to be saved.

By default Varnish is caching per URL and host but we have to extend it by a language name. This is exactly what happens here. A full hash string will look more less like this:

"/" + "localhost:8080" + "English"

The last thing worth explaining is what happens inside the vcl_fetch.

sub vcl_fetch {

  if( req.url ~ "^/$" ) {
    set beresp.ttl = 30m;
    remove beresp.http.set-cookie;
    return(deliver);
  }

If there is a cookie attached to a request Varnish will never return cached content. It comes from an assumption that if there is a cookie the page must be dynamic.

The point of this exercise if to handle dynamic content so we walk around this limitation for http://localhost:8080/ requests by unsetting cookies (it happens only in the Varnish scope).

Now you can start Varnish server (don’t forget to type start).

$ sudo varnishd -f /usr/local/etc/varnish/default.vcl -s malloc,128M -T 127.0.0.1:2000 -a 0.0.0.0:8080 -d
Platform: Linux,3.5.0-30-generic,x86_64,-smalloc,-smalloc,-hcritbit
200 244 
-----------------------------
Varnish Cache CLI 1.0
-----------------------------
Linux,3.5.0-30-generic,x86_64,-smalloc,-smalloc,-hcritbit

Type 'help' for command list.
Type 'quit' to close CLI session.
Type 'start' to launch worker process.

start
child (4913) Started
200 0

Child (4913) said Child starts

Open two different web browsers, go to http://localhost:8080/ and start changing languages. POST requests are always forwarded to the web server so session value should be updated. For every GET Varnish should return an appropriate content (according to the current language selection) from cache.

It’s little bit tricky to set it up for the first time but the reward is worth it. Making the Varnish Cache aware of user’s status gives much more flexibility and allows to handle more requests directly from cache. That dramatically drops your hosting costs and increases capacity of your server. Give it a go.