Here are the slides from today’s ZendCon UnCon session on Domain-Driven Design:
If you were in this session, please give me feedback on Joind.in.
Here are the slides from today’s ZendCon UnCon session on Domain-Driven Design:
If you were in this session, please give me feedback on Joind.in.
Here are the slides from today’s ZendCon tutorial on Learning CouchDB:
You can instead download the PDF version, if you’d prefer.
If you were in this session, please give me feedback on Joind.in.
This past Monday evening I was Appointed by the City Council of the City of Burlington, Vermont to the Telecommunications Advisory Committee. In this volunteer position, my role is to advise the City Council on matters related to Burlington Telecom, a municipally owned telecommunications services provider.
I have many of my own opinions about Burlington Telecom, its role in our community, and its future. As a member of this committee, I’m more interested in bringing the opinions of others forward. If you’re part of the Burlington community, please share with me your thoughts about Burlington Telecom. What do you want out of Burlington Telecom? What attributes of Burlington Telecom are important to you? Please share these thoughts publicly on Google Moderator. I’ve shared a few of my own thoughts to get things started.
As I mentioned, I have my own opinions about Burlington Telecom. To understand further where I’m coming from, I’ve included here a truncated version of the letter I sent applying to be on the committee.
Dear City Councilors:
Enclosed for your consideration is my application to join the Burlington Telecom Advisory Committee. I was one of the first beta customers when Burlington Telecom launched. I am now both a residential and a business customer. I was enthusiastic about the vision for Burlington Telecom from the moment I first heard about it. As someone who spends a good portion of his personal and professional life online, I understood immediately the cultural and economic benefits that an advanced fiber optic network could bring to our city.
…
Burlington Telecom is the 21st century equivalent of rural electrification, at a local scale. Like electricity in the 1930s, private companies are unwilling to invest in a telecommunications infrastructure that will spur economic growth in less populated areas. Companies like Verizon have gone so far as to divest their networks in non-metropolitan areas so that they don’t need to build-out modern broadband infrastructure in these places.
The Burlington Electric Department proudly states, “public power since 1905.” The City of Burlington created the Burlington Electric Department a full 30 years before the forming of the Rural Electrification Administration at the federal level. The City of Burlington seemed to have similar foresight when they created Burlington Telecom. Perhaps Burlington Telecom is an idea that is ahead of its time. Maybe we should wait several decades for a national initiative on telecommunications infrastructure. Alternatively, we can be proud of what has been done with Burlington Telecom. There are serious challenges to maintaining Burlington Telecom as a public resource, but these are challenges worth addressing.
Like the Burlington Electric Department, I believe that the people of Burlington can again be proud of Burlington Telecom. During Tropical Storm Irene, my house lost power for only 30 minutes. This was a point of civic pride for me and provided an exemplar of how our public utilities can be operated. Burlington taxpayers have regularly voted to fund Burlington Electric Department initiatives around green energy, energy efficiency, and general infrastructure improvement. If people can be proud of things as intrinsically boring as utilities and electricity, then we can be proud of a state-of-the-art fiber optic network that provides high-speed Internet, television, and telephone services.
As a member of the Burlington Telecom Advisory Committee, I hope to understand more fully the challenges facing Burlington Telecom and advise the City Council on addressing these challenges while maintaining Burlington Telecom as the vital public resource that it is. I am an advocate for a vision of Burlington Telecom not as a burden, but as a source of great opportunity for our city.
…
Here are the slides from today’s jQuery Conference presentation on CouchApps with CouchDB & jQuery:
If you were in this talk, please give me feedback on SpeakerRate.
Related links:
I’m pleased to announce that I’ll be speaking at CouchConf New York City on October 24, 2011. This event is part of the CouchConf World Tour presented by Couchbase. My talk will be on CouchApps with CouchDB, JavaScript and HTML5. From the talk description:
In this talk we’ll see how to build CouchApps using CouchDB, Javascript, and HTML5. We’ll look at related tools such as the couchapp command ine tool, the Evently jQuery plugin, the CouchDB API jQuery plugin, the CouchApp Loader, Pathbinder, and the Mustache templating framework.
Web application frameworks have varying support for the concepts behind Representational State Transfer (REST). Most web application frameworks, if not all, allow you to create “fully” RESTful web applications. However, there does not seem to be a focus on explicitly applying RESTful principles. So, here are the key concepts that I’d like to see addressed:
GET, POST, PUT, DELETE, OPTIONS, HEAD) to perform operations on entities/resources.Accept and Content-Type.Benefits:
There has been some discussion recently on the Zend Framework mailing list around release cycles. I proposed a release cycle of six months for major versions (someone else suggested eighteen months, which may be more reasonable for a framework). Rapid releases allow one to accelerate the cycle of building, measuring, and learning. Gathering data from actual usage (measuring) provides an opportunity for learning that can be applied to the next release (building).
Zend Framework 2.0 should be released soon, and it has been four years since the last major release (1.0). This is not to imply that Zend Framework has been stagnant—far from it. There has been a ton of development effort and many improvements to Zend Framework since 1.0. I have a great amount of trust in the team, and I have complete confidence that Zend Framework 2.0 will be an awesome framework. This post is intended to make the case for rapid release cycles for software in general, and is not meant to be a criticism of Zend Framework or the development processes behind it. However, the discussions around Zend Framework’s release cycle are what’s prompting me to make this post.
First, let me describe what I mean by a “rapid release cycle”. In this context, I mean rapid releases of major versions. Put simply, major versions are those that allow backwards compatible breaks. This is somewhat controversial. I don’t think anyone really has any big concerns with the rapid release of minor (introduction of new features while maintaining backwards compatibility) or maintenance (bug and/or security fix) releases. “Rapid” depends on the context. Both Chrome and Firefox have adopted a six-week release cycle. As I mentioned before, six months could be considered “rapid” for a framework.
For a framework (and maybe for other software), I think the following rules are necessary in order for a rapid release cycle to work:
What are the concerns with a rapid release cycle? I’ll paraphrase, and then address, the major concerns that I’ve heard.
“Rapid releases of major versions are just for psychological effect and has no affect on the delivery pace of new features.” This is both true and false. See my earlier post on iterative vs. incremental development. If development is incremental and driven entirely by a pre-determined roadmap then there is no tangible differences between a “normal” and a rapid release cycle. The development of many consumer software packages is perceived as incremental, in which case major version bumps are mostly psychological. However, if you take an iterative development approach and build-in outside learning from end-users into your process, then a rapid release cycle gives you the chance to change course based on outside feedback. Learning opportunities are introduced that you would have never had if your software wasn’t actually used by real people in the real world.
“Rapid release cycles are for consumer software where you don’t have to care for backward compatibility.” This is related to the previous concern. My response is that rapid release cycles are for any product where learning from real-world usage and outside input can be used to improve the product. To quote Steve Blank, “There are no answers inside the building.”
“It forces people to upgrade too often and rewrite their code, or get left behind.” See my earlier note about minimizing backwards compatibility changes in each major release. Additionally, it is much easier to automate upgrades if the backwards compatibility changes are small. There should be little code rewriting for applications built using the framework with each major version upgrade of the framework.
“Having lots of end-of-life (EOL) versions being used could be a security risk.” See my earlier note about providing LTS releases. Each major release should come with a pre-determined EOL date. It is the responsibility of the end-user (in the case of a framework, the developer) to be aware of a release’s EOL date. Using EOLed software is always a security risk.
While not specifically a concern with rapid release cycles, there’s a general mentality that major releases are “our chance to get it right.” Hopefully you’re a better software developer than you were even six months ago. Chances are you know more than you did then, and would approach solving problems differently now. Think six months before that, and six months before that. Now project this into the future. Where will you be in six months? Will you know more than you do now? Will you approach solving problems differently than you do now? If you’re a good software developer, you will never get it “right”—you will always be better six months from now than you are today and know more than you know today. A rapid release cycle allows you to apply new learning, knowledge, and perspective as often as possible. Do your best today, and give yourself opportunities to do your best in the future as well.
Vermont is a beautiful place to visit—especially in the fall! We’re looking for Vermonters and non-Vermonters alike to speak at this year’s Vermont Code Camp. Vermont Code Camp is organized entirely by community volunteers, with the help of our great sponsors (we’re still accepting sponsorships, too). Vermont Code Camp is a polyglot event. We’re looking for sessions on .NET, PHP, Ruby, Python, Java, and more. Abstracts are due this Friday, August 12 and we’re going to try and have the session list available by August 19. Check out the 2010 schedule to get an idea of what we had for talks last year.
Personally, I’d like to encourage submissions on the following topics:
Propose a session or two now—you know you want to!
I’ve found CouchDB to be a great fit for domain-driven design (DDD). Specifically, CouchDB fits very well with the building block patterns and practices found within DDD. Two of these building blocks include Entities and Value Objects. Entities are objects defined by a thread of continuity and identity. A Value Object “is an object that describes some characteristic or attribute but carries no concept of identity.” Value objects should be treated as immutable.
Aggregates are groupings of associated Entities and Value Objects. Within an Aggregate, one member is designated as the Aggregate Root. External references are limited to only the Aggregate Root. Aggregates should follow transaction, distribution, and concurrency boundaries. Guess what else is defined by transaction, distribution, and concurrency boundaries? That’s right, JSON documents in CouchDB.
Let’s take a look at an example Aggregate, that representing a blog entry and related metadata. Note that the following UML diagrams are for classes in PHP, but it should be easy enough to translate these examples to any object-oriented programming language. We’ll start with the Entry Entity, which will serve as our Aggregate Root:
----------------------------------------- | Entry | ----------------------------------------- |+ id : string | |+ rev : string | |+ title : Text | |+ updated : Date | |+ authors : Person[*] | |+ content : Text | ----------------------------------------- |+ __construct(entry : array) : void | |+ toArray() : array | -----------------------------------------
The Text Value Object:
---------------------------------------------- | Text | ---------------------------------------------- |- type : string | |- text : string | ---------------------------------------------- |+ __construct(type : string, text : string) | |+ toArray() : array | ----------------------------------------------
The Date Value Object:
-------------------------------------- | Date | -------------------------------------- |- timestamp : integer | -------------------------------------- |+ __construct(timestamp : integer) | |+ __toString() : string | --------------------------------------
The Person Value Object:
------------------------------------------------------------- | Person | ------------------------------------------------------------- |- name : string | |- uri : string | |- email : string | ------------------------------------------------------------- |+ __construct(name : string, uri : string, email : string) | |+ toArray() : array | -------------------------------------------------------------
I recommend serializing each Aggregate, starting with the Aggregate Root, into a JSON document. Control access to Aggregate Roots through a Repository. The toArray() methods above return an associative array representation of each object. The Repository can then transform the array into JSON for storage in CouchDB. Let’s take a look at the EntryRepository:
--------------------------------- | EntryRepository | --------------------------------- | | --------------------------------- |+ get(id : string) : Entry | |+ post(entry : Entry) : void | |+ put(entry : Entry) : void | |+ delete(entry : Entry) : void | ---------------------------------
Here’s an example of what the Aggregate’s object graph might look like, serialized as a JSON document:
{
"_id": "http://bradley-holt.com/?p=1251",
"title": {
"type": "text",
"text": "CouchDB and Domain-Driven Design"
},
"updated": "2011-08-02T15:30:00+00:00",
"authors": [
{
"name": "Bradley Holt",
"uri": "http://bradley-holt.com/",
"email": "bradley.holt@foundline.com"
}
],
"content": {
"type": "html",
"text": "<p>I've found CouchDB to be a great fit for…</p>"
}
}
You can also provide access to CouchDB views through Repositories. In the above example, this could be through the addition of an index(skip : integer, limit : integer) : Entry[*] method to the the EntryRepository (note that this is a naive pagination implementation, especially on large data sets—but that’s beyond the scope of this blog post). For more complex views, you may want to create a separate Repository for each CouchDB view.
There were quite a few NoSQL critics at OSCON this year. I imagine this was true of past years as well, but I don’t know that first hand. I think there are several reasons behind the general disdain for NoSQL databases.
First, NoSQL is horrible name. It implies that there’s something wrong with SQL and it needs to be replaced with a newer and better technology. If you have structured data that needs to be queried, you should probably use a database that enforces a schema and implements Structured Query Language. I’ve heard people start redefining NoSQL as “not only SQL”. This is a much better definition and doesn’t antagonize those who use existing SQL databases. An SQL database isn’t always the right tool for the job and NoSQL databases give us some other options.
Second, there are way too many different types of databases that are categorized as NoSQL. There are document-oriented databases, key/value stores, graph databases, column-oriented databases, in-memory databases, and other database types. There are also databases that combine two or more of these properties. It’s easy to criticize something that is vague and loosely defined. As the NoSQL space matures, we’ll start to get some more specific definitions, which will be much more helpful.
Third, at least one very popular vendor in the NoSQL space has a history of making irresponsible claims about their database’s capabilities. Antony Falco of Basho (makers of Riak) has a great blog post on the topic: See It’s Time to Drop the “F” Bomb – or “Lies, Damn Lies, and NoSQL.” If you care about your data, please read Tony’s blog post. It’s unfortunate that the specious claims of a few end up making everyone in the NoSQL space look bad.
I also want to address some of the specific criticisms that I’ve heard of NoSQL, as they apply (or don’t apply) to CouchDB (I’m not familiar enough with other NoSQL databases to talk about those).
This is absolutely true. If you pick a NoSQL database, you should do your homework and make sure that your database of choice truly respects the fact that writing a reliable database is a very difficult task. Most of the NoSQL databases take the problem very seriously, and try to learn from those that have come before them. But why create a new type of database in the first place? Because an SQL database is not the right solution to every problem. When all you have is a schema, everything looks like a join. The data model in CouchDB (JSON documents) is a great fit for many web applications.
SQL Scales Just FineThis is also true. If you’re picking a NoSQL database because it “scales”, you’re likely doing it wrong. Scaling is typically more aspiration than reality. There are many other factors to consider and questions to ask when choosing a database technology other than, “does it scale?” If you do actually have to scale, then your database isn’t going to magically do it for you. You can’t abstract scaling problems to your database layer. However, I will say that many NoSQL databases have properties (such as eventually consistency) that will make scaling easier and more intuitive. For example, it’s dead simple to replicate data between CouchDB databases.
CouchDB is ACID compliant. Within a CouchDB server, for a single document update, CouchDB has the properties of atomicity, consistency, isolation, and durability (ACID). No, you can’t have transactions across document boundaries. No, you can’t have transactions across multiple servers (although BigCouch does have quorum reads and writes). Not all NoSQL databases are durable (at least with default settings).
If you want the best possible guarantee of durability, you can change CouchDB’s delayed_commits configuration option from true (the default) to false. Basically, this will cause CouchDB to do an explicit fsync after each operation (which is very expensive and slow). Note that operating systems, virtual machines, and hard drives often lie about fsync, so you really need to research more about how your particular system works if you’re concerned about durability. If you think your write speeds are too good to be true, they probably are.
If you leave delayed commits on, CouchDB has the option of setting a batch=ok parameter when creating or updating a document. This will queue up batches of documents in memory and write them to disk when a predetermined threshold has been reached (or when triggered by the user). In this case, CouchDB will respond with an HTTP response code of 202 Accepted, rather than the normal 201 Created, so that the client is informed about the reduced integrity guarantee.
At least one NoSQL database requires a consistency check after a crash (guess which one). This can be a very slow process, causing additional downtime. CouchDB’s crash-only design and append-only files means that there is no need for consistency checks. There’s no shut down process in CouchDB—shutting it down is the same as killing the process.
CouchDB’s append-only files do come at a cost. That cost is disk space and the need for compaction. If you don’t compact your database, it will eventually fill up your hard drive. There is no automatic compaction in CouchDB. Compaction is triggered manually (it can easily be automated through a cron job) and should be done when the database’s write load is not at full capacity.
MapReduce is Limiting and Hard to UnderstandIt can take some time to get up to speed with MapReduce views in CouchDB. However, it’s not a very difficult concept to understand and most developers are already proficient with JavaScript (the default language for Map and Reduce functions in CouchDB). There’s a lot you can do with MapReduce, but there are some limitations. Views are one dimensional so full text indexing and geospatial data are difficult (if not impossible) to index. However, there are plugins for integrating with Lucene and ElasticSearch. For geospatial data, you can use GeoCouch.
This is a feature, not a bug. CouchDB only lets you query against indexes. This means that queries in CouchDB will be extremely fast, even on huge data sets. Most web applications have predefined usage patterns and don’t need ad hoc queries. If you need ad hoc queries, say for business intelligence reporting, you can replicate your data (using CouchDB’s changes feed) to an SQL database.
If you have a large number of documents in CouchDB, the first build of an index will be very slow. However, each query after that will be very fast. CouchDB’s MapReduce is incremental, meaning new or updated documents can be processed without needing to rebuild the entire index. In most scenarios, this means that there will be a small performance hit to process documents that are new or updated since the last time the view was queried. You can optionally include the stale=ok parameter with your query. This will instruct CouchDB to not bother processing new or updated documents and just give you a stale result set (which will be faster than processing new or updated documents). As of CouchDB 1.1, you can include a stale=update_after parameter with your query. This will return a stale result set, but will trigger an update of the index (if necessary) after your query results are returned, bringing the index up-to-date for future queries by you or other clients.
Some say that not having a schema is a problem. Sure—if you have structured data, you probably want to enforce a schema. However, not all applications have highly structured data. Many web applications work with unstructured data. If you’ve encountered any of the following, you may want to consider a schema-free database:
NULL values because many columns only apply to a subset of your rows.I’ll add that you can enforce schemas in CouchDB through the use of document update validation functions.
Did I miss anything? What other criticisms exist of NoSQL databases? Please comment and I’ll do my best to address each.