Ian Eure wrote an interesting piece on scalability at Digg:
http://about.digg.com/blog/looking-future-cassandra
A short quotation perfectly summarizes the motivation to move away from existing SQL systems:
The fundamental problem is endemic to the relational database mindset, which places the burden of computation on reads rather than writes. This is completely wrong for large-scale web applications, where response time is critical.
There are two things I like about the above statement:
- The author does not suggest that the problem is inherent, but rather endemic.
- He presents — with crystal clarity — exactly what’s keeping SQL systems out of the running; and it’s not the SQL language. It’s the processing model.
The various NoSQL processing models can be integrated seamlessly into a SQL system. For instance, Truviso (my employer) answers exactly this problem by offering a stream processing model, which computes results as the data arrives. The engine uses the SQL language and is fully integrated with a mature SQL implementation.
The author is moving toward NoSQL, which is a kind of “back to basics” database system movement trying to build database systems from the ground up. The big push behind NoSQL is clearly performance; but discarding SQL systems also discards all of the lessons learned over the years for managing the variety of queries that real businesses require.
One of those lessons is the declarative language itself, SQL, which started out as a primitive language but grew much richer over time. NoSQL systems either use a new declarative language that is much less powerful than SQL, or regress all the way to a key/value (or graph) storage system. Poor language support means a poor optimizer. It’s often possible to work around a poor optimizer, but these workarounds quickly turn into herculean engineering efforts as you try use a dumb engine in a clever way.
The next lesson is that database systems must be suitable for a wide variety of queries. If you are running only one query, and you know what it is in advance, then clearly you can engineer your whole data architecture around that single query. But for most companies, that’s far from reality — they need to add queries on the fly, query historical data, and join new data with historical data. Additionally they need a language flexible enough that this can be done immediately, rather than kicking off a new engineering project every time they need to add a query.
A unified database management system that integrates NoSQL processing models with a traditional SQL system is the real answer here;Â and streaming is one way to accomplish that. This integration allows a wide range of data processing strategies to work together –Â traditional tables offer recovery of streaming data, for instance — rather than forcing you to choose a single processing model.
In other words, the language and logical model should be separate from the processing model. And isn’t that what the relational model is all about?




