Scaling CouchDB

Scaling CouchDBMy latest book, Scaling CouchDB, is now available in ebook format. This is a short book (about 72 pages) and serves as a practical guide to scaling CouchDB and designing a distributed system to meet your capacity needs. Replication, conflict resolution, load balancing, clustering, distributed load testing, and monitoring are covered. The chapters on load balancing (using Apache) and distributed load testing (using Tsung) are broadly applicable, even if you aren’t a CouchDB user.

If you blog and would like to review Scaling CouchDB or my previous book, Writing and Querying MapReduce Views in CouchDB, then please let me know and I can arrange to have you sent a review copy.

Update (3/31/2011): At the request of a commenter, here is the table of contents:

  1. Defining Scaling Goals
    1. What is Scalability?
    2. Capacity Planning
    3. The CAP Theorem
      1. Consistency
      2. Availability
      3. Partition Tolerance
  2. Tuning and Designing for Scale
    1. Performance Tips
    2. Document Design
  3. Replication
    1. Filters and Specifying Documents
    2. Conflict Resolution
      1. Picking the Same Revision as CouchDB
      2. Picking a Conflicted Revision
      3. Merging Revisions
  4. Load Balancing
    1. CouchDB Nodes
    2. Replication Setup
    3. Proxy Server Configuration
    4. Testing
  5. Clustering
    1. BigCouch
    2. Lounge
    3. Pillow
  6. Distributed Load Testing
    1. Installing Tsung
    2. Configuring Tsung
    3. Running Tsung
    4. Monitoring
    5. Identifying Bottlenecks
    6. Test Configuration

One Web

Almost two years ago, Luke Wroblewski first described a trend in web development called mobile first. The basic idea was that web applications should be designed for mobile first, as opposed to designed for the desktop first. Luke provided some compelling reasons for this including the explosive growth of mobile adoption, the fact that mobile forces you to focus on key areas of your application, and that mobile extends your capabilities. Today I saw this tweet from Chris Shiflett:

The Web still trumps the Mobile Web. If you’re making a web app, don’t let “mobile first” lead you astray.

Luke’s original premise is not wrong, but I think many web developers have interpreted “mobile first” as “mobile only”—often intentionally, sometimes by accident. As Chris pointed out, it’s easy to let “mobile first” lead you astray. My response to Chris was that the goal should be One Web, even if “mobile first”. “Mobile only” should rarely, if ever, be the goal. The concept of One Web is described in the W3C’s Mobile Web Best Practices 1.0 Basic Guidelines. From the original document:

The recommendations in this document are intended to improve the experience of the Web on mobile devices. While the recommendations are not specifically addressed at the desktop browsing experience, it must be understood that they are made in the context of wishing to work towards “One Web”.

As discussed in the Scope document [Scope], One Web means making, as far as is reasonable, the same information and services available to users irrespective of the device they are using. However, it does not mean that exactly the same information is available in exactly the same representation across all devices. The context of mobile use, device capability variations, bandwidth issues and mobile network capabilities all affect the representation. Furthermore, some services and information are more suitable for and targeted at particular user contexts (see 5.1.1 Thematic Consistency of Resource Identified by a URI).

Let’s not let the web fragment into a “mobile web”, a “desktop web”, and a “whatever comes next” web—there’s no reason for this. The underlying technology is designed to allow for One Web, as Ben Ramsey added:

@BradleyHolt @shiflett This is why we have content types, accept headers, user agent strings, and content negotiation. 🙂

If you’re considering taking a mobile first approach, please consider taking the One Web approach instead. Your main focus can be on mobile to start, but at least deliver something of value that is not dependent on the client being a mobile device. As Ben suggested, use content types, accept headers, user agent strings, and content negotiation to deliver the best experience based on your user’s device or browser.

Load Balancing with Apache

At last night’s Burlington, Vermont PHP Users Group meeting I gave a presentation on Load Balancing with Apache:

I’ve posted the example configuration files for reference.

Basic load balancing:

Sticky sessions in PHP:

Create your own sticky sessions:

Route based on HTTP method:

Distributed load testing with Tsung:

I cover load balancing and distributed load testing in more detail in my upcoming book, Scaling CouchDB.

Voices of the ElePHPant

ElePHPant alliance

ElePHPant alliance by DragonBe, on Flickr

Cal Evans, PHP community member extraordinaire, has started a new podcast called Voices of the ElePHPant. The podcast is a series of short “interviews with the people that are making the PHP community special.” So far, Cal has interviewed ten different members of the PHP community: Jeremy Kendall, Anna Filina, Matthew Turland, Paul M. Jones, Kathy Reid, Ian Barber, Pablo Godel, Ivo Jansch, me, and David Coallier.

You can subscribe to the podcast feed with iTunes or subscribe to the feed directly.  If you know someone who helps make the PHP community special, please nominate him or her to be featured on Voices of the ElePHPant.

Support Japanese Disaster Relief

Today you can save 50% on the purchase of any ebook or video from O’Reilly Media and they will donate all revenues, less author royalties, from today’s “Deal of the Day” sales to the Japanese Red Cross Society. Like many other authors, I’ve opted to also donate my royalties from today’s “Deal of the Day” sales. This means that you can purchase an ebook copy of my book, Writing and Querying MapReduce Views in CouchDB, today at 50% off and all revenues and author royalties will go towards helping with the relief efforts in Japan.

Big Data and APIs for PHP Developers

If you’re in Austin, Texas for SXSW Interactive, be sure to check out the Big Data and APIs for PHP Developers Workshop that I’m curating. The Workshop speakers will be Julie Steele, Laura Thomson, Eli White, Dennis Yang, and David Zülke. The session will be on Monday, March 14, 2011 from 11am to 1:30pm at the Sheraton in Capitol E-H. From the description:

Big Data creates problems and opportunities that do not exist when dealing with smaller datasets. You will learn how to scale, utilize, and visualize Big Data as well as create and integrate Big Data related APIs. We will talk about how to scale your data, expose your data through APIs, integrate existing data from the data marketplace, and communicate your data through visualization.

You will find out what techniques and strategies work best when working with Big Data. Many developers have learned how to scale their systems for high levels of concurrency. However, scaling for Big Data has its own unique challenges. Sometimes strategies that would make no sense for smaller systems work great when dealing with larger datasets. This Workshop is geared towards PHP developers, but all are welcome.

Decentralization as a Strategy

The other day Chris Dixon tweeted:

strategy != tactics. Having a website was strategic in 1995, seo was strategic in 2005, mobile is strategic today.

Chris is likely talking about startups here (since startups are the focus of much of his writing). If you’re not sure what the difference is between a strategy and tactics, Seth Godin has a good explanation:

Here’s the obligatory January skiing analogy: Carving your turns better is a tactic. Choosing the right ski area in the first place is a strategy. Everyone skis better in Utah, it turns out.

Based on the pattern that Chris outlines, by 2015 mobile may transform from a strategy to a tactic. By 2015 websites, SEO, mobile, and likely even social will all be tactics that startups can employ—but not strategies themselves. This got me thinking, what will be the major new and successful strategy for startups in 2015?

My bet is that, by 2015, radical decentralization and user ownership of his or her own data will be used by startups to gain a strategic advantage over incumbents. To a certain extent, this has already started. The distributed social networking service DIASPORA* is attempting to gain market share from the likes of Facebook by offering users “choice”, “ownership”, and “simplicity”. I don’t know if DIASPORA* will ultimately succeed (they should focus on creating a better web, not just a better Facebook), but they got a lot of attention and traction with their attempt to create a decentralized social network. This demonstrates at least some interest in their stated ideals of freedom, user ownership of his or her own data, privacy, and decentralization.

By 2015 I think that these concepts will evolve and we will see implementations that are much more interesting and compelling than what we see today. These implementations (in aggregate) will have a broader scope than just social networking. Users of centralized services will be more savvy when it comes to issues such as privacy (no, privacy is not dead). Everyone will have the computing capacity needed to run their own part of a truly decentralized system (arguably this is true today). The technology and tools needed to create radically decentralized systems will be accessible to developers (arguably this is also true today). The combination of user demand for more control, more ubiquitous computing (via the evolution of mobile computing), availability of decentralized software platforms (e.g. CouchDB), and startups looking for a strategic advantage over incumbents will lead these new startups to focus on creating radically decentralized systems.

CouchDB at New York PHP

Last night I gave a presentation on CouchDB at the New York PHP User Group. I talked about the basics of CouchDB, its JSON documents, its RESTful API, writing and querying MapReduce views, using CouchDB from within PHP, and scaling. The talk was broadcast and recorded on Ustream.

A big thanks to New York PHP (especially Hans Zaunere, Daniel Krook, Alan Seiden, and Isaac Foster) for having me as a guest! If you like what you see here then I hope you’ll consider buying my book, Writing and Querying MapReduce Views in CouchDB or my upcoming book, Scaling CouchDB.

Update (2/27/2011): I’ve uploaded the slides from my talk to SlideShare and used the recorded audio to create a slidecast.

Writing and Querying MapReduce Views in CouchDB

Writing and Querying MapReduce Views in CouchDB: Indexing and Querying DocumentsMy first book, Writing and Querying MapReduce Views in CouchDB, has been published by O’Reilly Media. It is a short and concise ebook with step-by-step instructions and lots of sample code. Most examples are provided both in Futon and using CouchDB’s RESTful HTTP API (using cURL).

In my experience, web developers who are new to CouchDB often encounter three main barriers to understanding and using CouchDB. First, its JSON documents. Second, its RESTful HTTP API. Third, its MapReduce views. The first two are fairly straightforward and many web developers already have experience with JSON and RESTful HTTP APIs. Once you understand the benefits of using JSON as a document format and exposing the database through a RESTful HTTP API, then understanding MapReduce views becomes the main barrier.

The goal of this book is to walk readers through both writing MapReduce views and then querying these same views. Both the Map and Reduce steps are explained separately. Several example Map functions are demonstrated and the built-in Reduce functions are covered. There are also discussions about custom Reduce functions and the limitations of MapReduce. There are examples of creating both temporary views and saving views permanently to design documents. Finally, there is a chapter on querying views which talks about range queries, limiting rows, skipping rows, reversing results, exact grouping, group levels, and including the original documents in query results.

P.S. If you are a blogger and would like to review this book, then please ping me and let me know.

Speaking at SXSW Interactive 2011

See Me Speak at SXSW Interactive 2011 I will be speaking at SXSW Interactive 2011. My talk will be a solo presentation on Zend Framework 2.0 and PHP 5.3 Web Applications. A big thanks to everyone who voted for my talk in the panel picker! I’ll be giving a preview of this talk at DC PHP on February 9th and at New York PHP on February 22nd. If you have specific questions you’d like to see addressed then please let me know!

Update (2/9/2011): I will no longer be giving this presentation. Instead, SXSW has invited me to curate a Workshop on Big Data and APIs for PHP Developers. While I would have really enjoyed giving my Zend Framework talk, this new Workshop is shaping up to be an excellent session. The Workshop’s speakers will be Laura Thomson, Eli White, Julie Steele, Dennis Yang, and David Zülke. I’ll update my blog with more details later.