Comments on: Addressing the NoSQL Criticism http://bradley-holt.com/2011/07/addressing-the-nosql-criticism/ Mon, 29 Sep 2014 19:50:55 +0000 hourly 1 https://wordpress.org/?v=4.6.1 By: Bradley Holt http://bradley-holt.com/2011/07/addressing-the-nosql-criticism/comment-page-1/#comment-3554 Mon, 08 Aug 2011 18:06:56 +0000 http://bradley-holt.com/?p=1216#comment-3554 @Roland: I’m glad you found the original post useful! For what it’s worth, I didn’t find your comment long-winded. It challenged me to really think about what ACID means in the context of CouchDB.

You bring up some interesting points. However, I still maintain that even though CouchDB does not support transactions, it is ACID compliant. I agree that this can be confusing. Being ACID compliant is often thought of as being the same as supporting transactions.

On a side-note, this got me wondering if MyISAM was ACID compliant or not. Everyone (I hope) knows that it doesn’t support transactions—but are individual statements ACID compliant? I’m almost completely sure that they are not. In which case, I think one could non-redundantly say that MyISAM is neither ACID compliant nor does it support transactions.

To re-iterate your scenario in CouchDB terminology: Client 1 requests a view that returns data from Documents A, B, and C. I believe the view as it exists at the moment the response begins to be sent will be returned to Client 1. Effectively, this means that all writes that have succeeded, up until that moment, to Documents A, B, and C will be visible to Client 1 as part of the view’s response. Client 2 deletes Document C immediately after CouchDB started building the response for Client 1, but before that response has been sent to Client 1. In this case, I believe that CouchDB will include the data from the previous revision (before the delete) of Document C in the response to Client 1. This is the data that had been indexed, and at the point-in-time that the view was requested it was a valid and consistent view of the database.

In CouchDB, reads don’t block writes, and writes don’t block reads. This is one of the benefits of its multi-version concurrency control (MVCC) and append-only files. To be ACID compliant, only writes that have fully completed will be read. Not sure if this helps, but an interesting conversation regardless!

]]>
By: Roland Bouman http://bradley-holt.com/2011/07/addressing-the-nosql-criticism/comment-page-1/#comment-3550 Sat, 06 Aug 2011 00:33:04 +0000 http://bradley-holt.com/?p=1216#comment-3550 Hi Bradley,

thanks for your kind words – I forgot to thank you for your original article, which I found highly useful (despite my longwinded comment).

I just read your retort, and I’d like to comment on that some more:

“In CouchDB, you are limited to the equivalent of this autocommit mode and implicit transactions. These ‘transactions’ are ACID compliant, just like their counterparts in RDMSs.”

I understand why you compare CouchDB’s “atomic write/update” to autocomitting ACID transactions. However, I’d argue that they are not the same. I have written about this in some detail a few years ago here: http://rpbouman.blogspot.com/2007/02/mysql-transactions-and-autocommit.html

The gist of it is that a transaction – autocommitting or not – can (and often will) encompass multiple individual row-level operations. I argue that what I called an “atomic write” in CouchDB should be compared to one such row-level operation in a RDBMS.

Assuming we can agree on that, then in MySQL it is easy to demonstrates how an SQL statement that affects multiple rows in a table backed by the transactional InnoDB engine in autocommit mode differs from an equivalent SQL statement targeted at a table backed by the non-transactional MyISAM engine. In the InnoDB case, if one single row-level operation fails due to a violation of a database constraint, then all row-operations caused by the current statement up to that immanent violation will be rolled back. In the MyISAM case, the statement simply aborts without undoing any changes caused by the statement prior to the constraint violation.

So basically, autocommit mode does not fundamentally change the nature of transactions, it merely automatically demarcates them for each statement. But that is independent of the fact that a statement may affect many individual writes that are either committed or rolled back atomically. As far as I can see, CouchDB has no such functionality. Batch operations in CouchDB are capable of executing multiple write operations, but as far as I understand, if such a batch operation is interrupted, changes up until the interrupt will have been effectuated (just like in the MySQL MyISAM example in my article).

“Consistency: On a single node, any view queries will show a consistent snapshot of the database. In other words, all of the writes that have succeeded will show up in view results.”

This is an interesting point. It depends on the definition of “consistent snapshot”. I just don’t know enough about CouchDB, so I wonder: suppose we have session 1 executing a view that would, in absence of any other, concurrent operations return documents A,B and C. Now suppose another session 2 is started immediately after session 1 was started, but before it returns document C. Now let’s assume session 2 deletes document C. Will session 1 return document C? I should probably come up with a few more tests for this before I can fully understand and compare this to ACID transactions.

Concerning isolation: I believe I understand what isolation means in the CouchDB case. My point was mainly that because of the absence of a transaction concept, this kind of isolation is virtually not comparable to the concept of isolation in a transactional system. In the case of atomic writes, there doesn’t seem to be any opportunity to witness whether sessions are isolated from each other. Calling this kind of isolation “ACID-isolation” seems a major source of confusion to me.

Regarding durability: I agree this is a thorny subject.

]]>
By: Bradley Holt http://bradley-holt.com/2011/07/addressing-the-nosql-criticism/comment-page-1/#comment-3549 Fri, 05 Aug 2011 16:06:03 +0000 http://bradley-holt.com/?p=1216#comment-3549 @Roland: Thank you for your thorough comment!

Atomicity, Consistency, Isolation, and Durability (ACID) have very specific technical meanings. By definition, all transactions must be ACID compliant, but transactions are not the only way to achieve ACID compliance. The ACID properties are the building blocks from which transactions are defined. To say that “ACID” is a synonym of “transaction” is an oversimplification. From the paper you linked to:

“These fours properties, atomicity, isolation, and durability (ACID), describe the major highlights of the transaction paradigm, which has influenced many aspects of development in database systems.”

One interesting thing to note is that most, if not all, RDMSs have a concept of implicit transactions. By default, MySQL (and other RDMSs) run in what’s called “autocommit mode”. If you don’t explicitly begin a transaction, the database does a commit (if no error) or rollback (if there’s an error) after each statement. In an RDMS, it’s not just multiple statements run together in a transaction that are ACID compliant, but single statements in implicit transactions are also ACID compliant. In CouchDB, you are limited to the equivalent of this autocommit mode and implicit transactions. These “transactions” are ACID compliant, just like their counterparts in RDMSs.

CouchDB implements more than just Atomicity. I think it would be helpful to look at what ACID means, in the context of CouchDB.

Atomicity: Individual document updates succeed or fail (“all or nothing”). This includes the ability to run document update validation and reject the update if it fails validation.

Consistency: On a single node, any view queries will show a consistent snapshot of the database. In other words, all of the writes that have succeeded will show up in view results. Even if you specify in your query that you’re OK with stale results you will still get a consistent, but old, view of the database.

Isolation: CouchDB achieves isolation through its multi-version concurrency control (MVCC). Document updates are isolated to the previous version of the document. For example, I can update revision 3-a734 to revision 4-ca7e (revision numbers contrived and shortened). As you can see, these revision numbers represent the state of the document and reads and writes are isolated based on the document’s state.

Durability: There is no such thing as complete durability. Hard drives crash, data centers explode, entire regions of the country lose power at the same time (so even having multiple data centers may not always help). True durability is a matter of looking at your entire system, not just your database. However, a baseline of durability for databases can be defined as handing off the data to the filesystem and being told that the file has been written to disk. An explicit fsync is not required to meet this definition of durability. Many servers have RAID controllers with battery backed caches. Even with external power cut, these controllers will still get the data written to disk without an explicit fsync. However, I would argue that databases that cache writes to RAM are not durable as these systems now rely on external power and the machine itself not crashing. This is why CouchDB will give a “202 Accepted” rather than a “201 Created” if you specify that batch mode is OK when writing. It’s also worth nothing that *any* database that requires an fsync will be slow to write data (assuming the operating system truly performs an fsync when told to). This is a matter of the laws of physics and spinning disks (SSDs are a different story). Again, if your database writes appear too fast to be true, they probably are.

]]>
By: Thought this was cool: Addressing the NoSQL Criticism | Lisheng Yu http://bradley-holt.com/2011/07/addressing-the-nosql-criticism/comment-page-1/#comment-3547 Fri, 05 Aug 2011 05:46:47 +0000 http://bradley-holt.com/?p=1216#comment-3547 […] Karwin’s comment is spot […]

]]>
By: Roland Bouman http://bradley-holt.com/2011/07/addressing-the-nosql-criticism/comment-page-1/#comment-3545 Thu, 04 Aug 2011 08:09:56 +0000 http://bradley-holt.com/?p=1216#comment-3545 Hi!

no criticism of CouchDB but this here:

“CouchDB is ACID compliant. Within a CouchDB server, for a single document update, CouchDB has the properties of atomicity, consistency, isolation, and durability (ACID). No, you can’t have transactions across document boundaries.”

is a contradiction, or if it’s not a contradiction, a redefinition of terms that is very confusing.

In the context of database operations, ACID refers to the properties of a database *transaction*. The concept of a transaction explicitly includes the possibility of multiple interactions with the database. This has been the case since the term ACID was first coined (http://cc.usst.edu.cn/Download/5b953407-339b-46c3-9909-66dfa9c3d52a.pdf).

What you describe for Couchdb is what is known as an “atomic write” or an “atomic update”. In this terminology, the term atomic has the same meaning as the A in ACID, namely that the operation either completes as a whole or fails as a whole. But in the CouchDB case that operation is not a transaction but an individual write, whereas the term ACID is historically linked to operations that are transactions. Because the term ACID has always been used to describe the properties of *transactions* (and not individual writes), claiming that CouchDB is “ACID compliant” suggests that CouchDB supports ACID-transactions, which it clearly doesn’t.

To be clear: I am not disputing whether or not the individual properties that make up the ACID acronym apply to CouchDB database writes/updates. I am simply observing that these writes/updates are not transactions. Given that the ACID acronym has such a long history of describing the properties of database transactions, a claim that CouchDB is “ACID-Compliant” will lead to at least some people drawing the incorrect conclusion that CouchDB supports ACID transactions.

You can of course keep say something like “ACID properties are applicable to CouchDB database operations” (which would be correct although it still has the potential of being misunderstood as though CouchDB has ACID transactions). But let’s consider what ACID means in an environment that only supports atomic writes/updates.

By definititon, an atomic write is atomic, so that accounts for the A in ACID. So how can we account for the CID part? I know CouchDB supports a validate_doc_update() function so let’s assume that gives you the C. Next up is the I – Isolation.

I am having a very hard time to come up with a non-trivial meaning for the Isolation property if you’re only handling atomic writes/updates. In the case of ACID-transactions, Isolation describes to what extent sessions are aware of the state of transactions in other sessions. If you’re only handling atomic single operation writes/updates, there is no state, and thus, nothing to isolate.

Finally, you address durability yourself, and you mention that true durability can be satisfied by disabling the delayed_commits setting, in which case the responsibility for durability is left to the fsync implementation of the underlying platform. At the same time, you seem to imply that this is not really recommended, or at least out of the ordinary.

So, my conclusions are that:
1) by default, CouchDB satisfies only the Atomicity in ACID.
2) if the user chooses to implement validate_doc_update(), they can achieve Consistency (if they will in fact do that, depends on how they implement it).
3) Isolation is meaningless since there are no transactions
4) Durability is not enabled by default, and can be achieved only if you can afford it to be slow.

Given all these ifs and buts, I think it would be more clear to simply not call CouchDB “ACID-Compliant”. It would at least avoid considerable confusion for those that are working with systems supporting ACID transactions.

]]>
By: Bradley Holt http://bradley-holt.com/2011/07/addressing-the-nosql-criticism/comment-page-1/#comment-3535 Mon, 01 Aug 2011 15:29:42 +0000 http://bradley-holt.com/?p=1216#comment-3535 @Bill: Thank you for your comments—glad to have an SQL expert weigh-in on the subject! I’ll do my best to address each of your points…

“First, NoSQL is a marketing term, not a technology term.”

I completely agree. I personally hate the term “NoSQL” and try to avoid it whenever possible and use a more meaningful term instead (e.g. “document-oriented database”). For better or for worse, it is the word that people are using to describe the class of databases that don’t implement SQL.

“Second, non-relational data management does optimize better than relational data management — but only for a subset of usage of the data.”

I’d argue that this “subset” of data usage patterns tends to correlate with the problems encountered by many web applications. For example, content management systems and wikis are a great fit for document-oriented databases. However, I’d like to re-iterate my point that scalability, and to a lesser extent performance, shouldn’t be your only concerns when picking a database technology. Other concerns such as developer productivity, ease of deployments, and fitting your data model to your database are all very important concerns as well. In most situations, I think picking a NoSQL database because it’s “more scalable” or “faster” are warning signs that one hasn’t really considered all of the options thoroughly—and that one probably doesn’t understand what “scalable” and “fast” really mean, in context of one’s entire system.

“It’s also very easy to get your NoSQL database design wrong if you skip your query analysis step, because one believes the marketing message of “just start putting data in.””

Database vendors that imply you can “just start putting data in” are being quite irresponsible, in my opinion. I certainly don’t advocate skipping the query analysis step, even when using a NoSQL database. In fact, I think it’s even more critical that you think about your schema design when your database doesn’t enforce a schema. Without guard rails, it’s very easy to fly off the cliff. I recommend using the building block patterns outlined in domain-driven design and serializing each aggregate root (and the entire object graph associated with the aggregate root) to a single JSON document. Designing your aggregate roots involves a large amount of analysis.

“The problem is the hype, the advocacy, and the claims that one can get optimization for free without doing analysis. TANSTAAFL is still true.”

Again, I completely agree. I think the hype has been counterproductive to the NoSQL space. Speaking from personal experience, I can say it’s the hype that kept me from exploring NoSQL databases sooner than I did. If you can look past the hype, I think you’ll see that there are some really useful tools to be found in the NoSQL space.

]]>
By: Bradley Holt http://bradley-holt.com/2011/07/addressing-the-nosql-criticism/comment-page-1/#comment-3534 Mon, 01 Aug 2011 15:00:19 +0000 http://bradley-holt.com/?p=1216#comment-3534 @Clive: I think limiting transaction boundaries to one document in CouchDB is a good thing. Having transactions across document boundaries would bring a ton of extra complications along with it that aren’t worth the benefits. If you think you need this feature then ask yourself this: why isn’t all of the related data in one document to begin with?

]]>
By: Bill Karwin http://bradley-holt.com/2011/07/addressing-the-nosql-criticism/comment-page-1/#comment-3530 Sun, 31 Jul 2011 07:26:39 +0000 http://bradley-holt.com/?p=1216#comment-3530 First, NoSQL is a marketing term, not a technology term. Once we acknowledge that, all the arguments about “what exactly is NoSQL” are moot.

Second, non-relational data management does optimize better than relational data management — but only for a subset of usage of the data. The process of designing a non-relational database is the same as designing a denormalized relational database, e.g. a Data Warehouse. You need to itemize the queries you run against the non-relational data store, and design the data store with those queries in mind.

It’s not surprising that NoSQL _can_ be faster and more scalable for those queries, if you do this right. So can DW be very scalable, but with similar limitations on the queries that are served by a given DW schema.

It’s also very easy to get your NoSQL database design wrong if you skip your query analysis step, because one believes the marketing message of “just start putting data in.” This explains some of the disappointments over the past couple of years when companies tried to use NoSQL as a drop-in replacement for RDBMS.

One workaround is to store data redundantly, in different document collections optimized to serve different query patterns. This is also similar to DW, materialized views, or other denormalizations.

The problem with NoSQL is not the technology. It’s fine technology when used appropriately, and it fits an important specialty role in data management. The problem is the hype, the advocacy, and the claims that one can get optimization for free without doing analysis. TANSTAAFL is still true.

]]>
By: Alex Feinberg http://bradley-holt.com/2011/07/addressing-the-nosql-criticism/comment-page-1/#comment-3529 Sun, 31 Jul 2011 06:38:57 +0000 http://bradley-holt.com/?p=1216#comment-3529 @Bradley

It isn’t just about scaling reads vs. writes: it’s also the issue of maintaining reasonable throughput and latency when the amount of data is several times (usually 3-5) larger than a single machines memory. In this case, you need to partition the data (Google for “The Case for Shared Nothing” for a discussion of why you need to do this rather than just use a heavy machine and a SAN).

Out of the systems you’ve mentioned, I am familiar with BigCouch (I’ve known one of its developers): it is indeed a horizontally scalable partitioned system, so it would fit the bill.

]]>
By: clive boulton http://bradley-holt.com/2011/07/addressing-the-nosql-criticism/comment-page-1/#comment-3528 Sun, 31 Jul 2011 06:08:17 +0000 http://bradley-holt.com/?p=1216#comment-3528 @Bradley: I’d like to see support for transaction bundles from the NoSQL movement. Bundles in CouchDB would be great. Because using SQL is over engineering and prohibits horizontal scaling.

]]>