A painless guide to Apache CouchDB for a PHP developer

World of technology loves buzzwords. There is always something new which allegedly is the game changer, and we all should upgrade in the matter of urgency. On the other hand, absorbing new knowledge costs time and effort. It’s difficult to decide which technology is worth sacrificing brain glucose levels.

Few months ago I had to chose a database for my web crawler. I needed something which is simple, fast and could hypothetically store the whole internet. I subconsciously felt that I needed a NoSQL database although I didn’t know which one.

My liberation came from Nathan Hurst (thank you) and his priceless post – Visual Guide to NoSQL Systems. The CAP Triangle (Consistency, Availability and Partition Tolerance) narrowed down my options to “only” 8 systems (AP). I tossed an eight sided coin and chose Apache CouchDB.

Three months later I can say I’m still happy with my choice. CouchDB is simple (only 18,000 lines of code) yet powerful database. It talks JSON though the HTTP protocol. You don’t need any special tools or libraries to use it. In fact, you don’t even need a backend to create a web application because it’s capable of hosting static files.

If you’re also looking for a scalable and fast document store, or you just simply want to try a NoSQL have a look at CouchDB. It’s very easy to learn and I believe you will find it useful.

To install CouchDB on Linux use a package manager of your choice, for example:

$ sudo apt-get install couchdb

or if you are Mac/Windows user get the CouchDB binary and run it.

Once the database is running you can start playing with it though a web interface called Futon. To access it visit http://127.0.0.1:5984/_utils/.

futon

CouchDB provides REST API and you can perform all operations with CURL. The API is well documented so I won’t discuss it. CouchDB manual is distributed with the database and you can access it locally at http://127.0.0.1:5984/_utils/docs/.

There are few PHP libraries for CouchDB although I wasn’t happy with them, and started developing my own. You can find it on GitHub if you like, but I recommend to install it with the composer.

$ mkdir couchdb-test
$ cd couchdb-test
$ curl -sS https://getcomposer.org/installer | php 
$ vim composer.json
{
    "require": {
        "lukaszkujawa/couch-php": "dev-master"
    }
} 
$ php composer.phar install

Now lets create some PHP code.

 'product-00001',
     'schema' => 'product',
     'name' => 'Your Brain at Work: Strategies for Overcoming Distraction, Regaining Focus, and Working Smarter All Day Long',
     'type' => 'book',
     'reviews' => array(
          array(
               'stars' => 5,
               'title' => 'Outstanding book',
               'text' => 'One of the best books I've ever read. David Rock collects together a bunch of neuroscience(..)'
          ),

          array(
               'stars' => 3,
               'title' => 'Tough to Get Your Brain Round 'Your Brain'',
               'text' => 'I had a difficult time getting into and through this book(..)'

          ),
      )
));

$doc->insert(); 

When you run the script for the first time it will as expected populate a new document in your database.

couchdb document

An interesting thing will happen when you run the script again.

PHP Fatal error:  Uncaught exception 'CouchPHPClientException' with message 'conflict' 

You can’t insert the same document twice. If you want to amend a document you need to assure CouchDB that you have the latest version of it. You do it by setting a revision property “_rev” which is automatically added to a document on insert and update (see the screenshot above).

To fix it you can first get the product-00001, read _rev value and add pass it in the Document constructor.

 'product-00001',
     'schema' => 'product',
     'name' => 'Your Brain at Work: Strategies for Overcoming Distraction, Regaining Focus, and Working Smarter All Day Long',
     'type' => 'book',
     'reviews' => array(
          array(
               'stars' => 5,
               'title' => 'Outstanding book',
               'text' => 'One of the best books I've ever read. David Rock collects together a bunch of neuroscience(..)'
          ),

          array(
               'stars' => 3,
               'title' => 'Tough to Get Your Brain Round 'Your Brain'',
               'text' => 'I had a difficult time getting into and through this book(..)'

          ),
      )
);

$doc = CouchPHPDocument::getById('product-00001');

if( $doc ) {
     $fields['_rev'] = $doc->_rev;
}

$doc  = new CouchPHPDocument($fields);

$doc->insert(); 

If you don’t worry about concurrency you can use “insertOverwrite” method, and the library will handle getting the latest “_rev” for you.

 'product-00001',
     'schema' => 'product',
     'name' => 'Your Brain at Work: Strategies for Overcoming Distraction, Regaining Focus, and Working Smarter All Day Long',
     'type' => 'book',
     'reviews' => array(
          array(
               'stars' => 5,
               'title' => 'Outstanding book',
               'text' => 'One of the best books I've ever read. David Rock collects together a bunch of neuroscience(..)'
          ),

          array(
               'stars' => 3,
               'title' => 'Tough to Get Your Brain Round 'Your Brain'',
               'text' => 'I had a difficult time getting into and through this book(..)'

          ),
      )
));

$doc->insertOverwrite(); 

Look for the moment at the document fields. One of the properties is called “schema”. It’s an arbitrary label which defines document type. In our simple example we have only one document type but in real live you could have some additional types like: basket or order. CouchDB database doesn’t have tables as such but you can query for documents with certain properties.

To select documents of a specific kind you need to create a view. View is a JavaScript function which is stored in so called “design document”. Design documents are special type of CouchDB document and are always prefixed with “_design/“. Futon doesn’t provide a dedicated interface for creating design documents but there is a special section for creating views (which are part of a design document).

If you navigate to the test database page (http://127.0.0.1:5984/_utils/database.html?test) you will see “View: All Document” drop down in the top right hand side.

Click on the drop down and select “Temporary view…”. You will get redirected to a special page where you can design a view.

couchdb view

The default map function doesn’t have any conditions and will always return all documents.

function(doc) {
  emit(null, doc);
} 

Click “Run” button to see how it works.

Lets create a view which will return only product documents.

function(doc) {
  if( doc.schema && doc.schema == 'product' ) {
    emit(null, doc);
  }
} 

That will do the trick but we can push it further. The first argument of the emit function is null. You can set it to product type which will allow some more advance search.

function(doc) {
  if( doc.schema && doc.schema == 'product' ) {
    emit(doc.type, doc);
  }
} 

Click “Save As…” to save the view. First you need to specify which design document will hold the view – name it “products”. Second input is the view name – call it “all”.

couchdb save view

Once the view is saved access it directly from your web browser http://127.0.0.1:5984/test/_design/products/_view/all. As expected the view will return our one and only product.

Now let’s imagine you want to see products which are books. The view emits doc type as a key so we can do it with two parameters “startkey” and “endkey”.

http://127.0.0.1:5984/test/_design/products/_view/all?startkey=”book”&endkey=”book”

To find all querying options look into the documentation.

Key emitted by the view doesn’t have to be flat. You can emit list of values, for example:

emit([doc.type,doc.category], doc);

Getting all books for this configuration would look like this:

startkey=["book"]&endkey=["book",{}]

It’s important to know that CouchDB creates an index for every view. It makes search faster but it cost space on disk. Another interesting fact is that index won’t be created until first query to that view. On some occasions (after inserting hundreds of thousands of documents) it might take even few minutes.

Although I covered only basics of CouchDB it should be enough for many applications. If you would like to learn more about the database I recommend you CouchDB – The Definitive Guide which is free to read.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s