Clicky

TRUVISO Continuous Analytics rss

Truviso Blog
Home Page Products Solutions Customers Resources Company Contact Us
Customer Support Login
Truviso Blog | Immediate Insight |

NoSQL can be fast, but what if SQL were fast and flexible?

By Jeff Davis
December 13, 2009 @ 11:49 pm

Ian Eure wrote an interesting piece on scalability at Digg:

http://about.digg.com/blog/looking-future-cassandra

A short quotation perfectly summarizes the motivation to move away from existing SQL systems:

The fundamental problem is endemic to the relational database mindset, which places the burden of computation on reads rather than writes. This is completely wrong for large-scale web applications, where response time is critical.

There are two things I like about the above statement:

  1. The author does not suggest that the problem is inherent, but rather endemic.
  2. He presents — with crystal clarity — exactly what’s keeping SQL systems out of the running; and it’s not the SQL language. It’s the processing model.

The various NoSQL processing models can be integrated seamlessly into a SQL system. For instance, Truviso (my employer) answers exactly this problem by offering a stream processing model, which computes results as the data arrives. The engine uses the SQL language and is fully integrated with a mature SQL implementation.

The author is moving toward NoSQL, which is a kind of “back to basics” database system movement trying to build database systems from the ground up. The big push behind NoSQL is clearly performance; but discarding SQL systems also discards all of the lessons learned over the years for managing the variety of queries that real businesses require.

One of those lessons is the declarative language itself, SQL, which started out as a primitive language but grew much richer over time. NoSQL systems either use a new declarative language that is much less powerful than SQL, or regress all the way to a key/value (or graph) storage system. Poor language support means a poor optimizer. It’s often possible to work around a poor optimizer, but these workarounds quickly turn into herculean engineering efforts as you try use a dumb engine in a clever way.

The next lesson is that database systems must be suitable for a wide variety of queries. If you are running only one query, and you know what it is in advance, then clearly you can engineer your whole data architecture around that single query. But for most companies, that’s far from reality — they need to add queries on the fly, query historical data, and join new data with historical data. Additionally they need a language flexible enough that this can be done immediately, rather than kicking off a new engineering project every time they need to add a query.

A unified database management system that integrates NoSQL processing models with a traditional SQL system is the real answer here;  and streaming is one way to accomplish that. This integration allows a wide range of data processing strategies to work together –  traditional tables offer recovery of streaming data, for instance — rather than forcing you to choose a single processing model.

In other words, the language and logical model should be separate from the processing model. And isn’t that what the relational model is all about?


Tags: , ,


Decouple Data Processing from Data Consumption

By Sailesh Krishnamurthy
December 4, 2009 @ 4:32 pm

The philosophy of the Truviso continuous analytics approach is that data is most efficiently processed while on the move as opposed to while at rest. Traditional store-first, query-later data warehouses are like the Hotel California in the famous Eagles song – easy to get into, hard to get out of – which is more complimentary than what a partner of ours calls them: the “Roach Motel” of enterprises where data goes to die.

What really sets the Truviso approach apart from other related real-time technologies is that the timing of data processing is decoupled from data consumption. In other words, just because the analysis of data occurs in real-time does not mean that the results of the analysis must also happen immediately. This subtlety was lost to various CEP vendors who focus on the “now” and only analyze current conditions and exceptions as described perspicaciously by Doug Henschen in an Intelligent Enterprise Q&A. This decoupled approach is also increasingly finding other uses such as in “assist or suggest” capabilities for Internet search, as discussed in a good GigaOM post.

Truviso’s approach is realized in a Stream-Relational Database Management System (see our CIDR 2009 paper for more details) where the results of continuous analysis of data are stored natively in a high-performance fashion. This lets us blend the real-time-only nature of stream processing with the stability, flexibility and familiarity of OLAP-style analytics in a single architecture. Furthermore, having both real-time and OLAP functionalities tightly integrated in a single system enables our customers to easily marry analyses of both live and historical data using standard SQL queries.

With this hybrid architecture, Truviso has created a solution for analyzing recent data. In some cases – such as for Internet, video or mobile usage dashboards – analytics should be in real-time. In contrast, some back office systems may only require updates by the hour or by the day to meet operational needs and service level agreements.

In other words, it’s like the old U.S. Army saying of “Hurry up and wait”. While data processing occurs continuously in real-time for maximum efficiency, the analytics is available on demand in whatever time periods that operational systems and business users need. This distinction is critical to the success of the Truviso solution: maximize efficiency and scalability through continuous processing, while providing analytics “whenever needed” for both people (internal users, customers and partners) and operational systems.

With Truviso, you’re providing analytics in real-time to those who want and need it, while integrating seamlessly with existing infrastructures that operate on timed intervals.

In my next post I’ll describe the evolution of Continuous Analytics in historical context. Stay tuned!




© Truviso, Inc. 2009-2010. All Rights Reserved.
Truviso™, Continuous Analytics™, VIA™, TruCQ™, and TruView™ are trademarks of Truviso, Inc.