Clicky

TRUVISO Continuous Analytics rss

Truviso Blog
Home Page Products Solutions Customers Resources Company Contact Us
Customer Support Login
Truviso Blog | Immediate Insight |

Fathoming the Deep Real-Time Web

By Michael Franklin
August 24, 2009 @ 9:19 am

The “Deep Web” is the vast collection of information that is hidden in databases and file systems.  The more familiar “surface” Web, which consists largely of html and text, contains only a small fraction of the information available on the Internet.   The deep web contains much of the valuable data on the Web, but is largely invisible to standard web crawling techniques.   As a result, search engine companies and researchers are hard at work developing new approaches to get at this important information.  (See “Exploring a Deep Web that Google Can’t Grasp”)

It should be no surprise then that a similar phenomenon arises in the “Real-Time Web”.  All the tweets, blog posts, and comments on the latest viral video that make up the surface real-time web sit atop a much larger, constantly evolving, continuously growing set of structured data streams and conversations that are driven by the activities on the surface.  And like in the regular Web, much of the real value lies under the surface.

Where does this deep real-time information come from?   It is machine-generated.   It is the constant chatter of the complex and dynamic software and hardware stack that underlies all Web 2.0 applications.  Every user action (or sometimes even inaction) generates a cascade of data describing what was shown, clicked, viewed and interacted with as well as log and performance data from all the systems and networks along the way.

Let’s say a user visits a Web site to watch a video.   Logs record this page view, each ad that is served and presented, various requests to collect data necessary to construct the page, etc.   Clicking on an ad generates yet more data.   When the video is started, “beacons” in the video return a stream of periodic status updates recording content, user actions, (stop/start/fast forward), video quality and bit rate, etc.    When the user posts a comment about the video, tweets about it, forwards a link to his or her friends, etc. more data is generated.   Thus, each surface action results in many data events in the Deep Real-Time Web.

Why is this Deep Real-Time Web data valuable?  Consider an advertiser who purchases a video advertising campaign targeted at certain types of customers.   The success of this campaign depends on many factors, including the accuracy of the targeting,  the other campaigns that might be running concurrently with this one,  and the video quality in terms of both content and delivery.   Quickly and correctly understanding what is going on with the campaign can mean the difference between success and failure.

The Deep Real-Time Web presents new data analytics challenges due to both time constraints and scale.   Companies that use Deep Real-Time Web data can easily find themselves dealing Terabytes of data a day, and the analysis of that data must be done quickly enough so that the company can react when something is going wrong, or capitalize on new opportunities as they arise.

Truviso is built for this world of continuously streaming real-time data.   Early on, with a partner, we developed the first dynamic tag cloud that showed the evolution of hot topics on the blogosphere (“Truviso Shows Off Dynamic Database with Technorati Tag Cloud” ). More recently, our customers have been using Truviso to extract value from the less visible but vastly larger streams of the deep real-time web.   Truviso Continuous Analytics enables them to understand the performance of ad campaigns as they evolve,  quickly detect unexpected problems, and even take the load off of their data warehouses that are struggling to keep  up with the massive data volumes of the Deep Real-Time Web.

And more breakthroughs will come from combining insights gleaned through both the Surface and Deep Real-Time Webs.




Truviso: Breakthrough Solutions for “Big Data” Analytics

By Michael Franklin
July 14, 2009 @ 5:20 pm

Welcome to the first installment of the Truviso blog.    Truviso is in the business of Data Analytics.   It’s no secret that data analytics is a hot area in technology right now.   And it’s no secret why.

Businesses of all sizes are faced with unprecedented amounts of information.   As the world becomes increasingly interconnected through networks, every activity of every participant, whether a company, a department, or an individual user, generates streams of data that is crucial to their needs and goals; understanding and then quickly acting upon this information is the key to competitiveness and success across all industries.

The volumes of data that need to be analyzed are growing at a rate much faster than Moore’s Law and the other technology laws that in the past have allowed analytics systems to keep up.  But, as the pace of business and interaction increases, the pressure for more immediate visibility has increased.   These dual pressures – the growth of “big data” and the “need for speed” – have caused the traditional approaches to data management to break down.   The result has been a tremendous acceleration of activity and innovation in Data Analytics.

Many emerging approaches are based on parallelizing the query engine.   Data processing has been long known to be a great use case for parallelism.   Teradata has been doing this for decades.   More recently, companies have been exploiting commodity hardware, and increasingly commodity software to lower the cost of parallel database systems.   At the same time, open source data parallel infrastructure such as Hadoop has been attracting a lot of interest as well.  The problem with relying solely on parallelism  is that it is really a “brute force” approach, and as such, with data volumes growing faster than hardware is improving – users of these systems are signing up for buying, managing, and powering an ever-growing complex of servers.  Furthermore, parallelism doesn’t help with providing answers quickly – in fact, it makes low latency processing harder.

Solving the Data Analytics problem requires more than just using more hardware to run more copies of the same old query engine.   Rather, what is required is a rethinking of how the query engine works in the first place.   This is where Truviso comes in.

Truviso’s Continuous Analytics is a different way of analyzing data, one that is tuned for the challenging analytics workloads of network-centric businesses.   As the name implies, Continuous Analytics is always on.   Rather than saving up data, then storing it to disk, then starting to process it,  Truviso takes advantage of the additive nature of data streams.   Queries run continuously and as new data arrives it is pushed directly through the queries.   Queries incrementally produce new results, which can then be immediately sent to dashboards, message buses or other applications, or can be stored in native tables for later access and reporting.

Continuous Analytics involves innovative stream processing technology, but is much more than that.   Truviso has a unique “stream-relational” architecture in which streams and persistent tables are unified to provide a seamless mechanism for querying the present as well as the past.   As a result, Truviso can solve the scalability problem for traditional reporting applications as well as enabling those applications to evolve into a more real-time mode of operation as business demands require and as business processes evolve.

That’s a quick overview of the underlying approach we are taking at Truviso.

Through this blog we will be describing our technology and its uses in more detail as well as looking at the evolving landscape of data analytics in general.     It is exciting to be in such a fast moving and crucial area and we look forward to sharing the journey with you.




Welcome to Truviso

By admin
July 13, 2009 @ 4:17 pm

You’ve made it to the Truviso blog – thanks for visiting.

Since we’re leading the way with revolutionary and game-changing data processing and analytics technologies, we’ll be using this blog to discuss some of the things we’re working on.  These include the challenges our customers have solved with our software, upcoming events, other articles and blog posts, and business intelligence, data warehouse, stream-based processing and web analytics trends.

If you’re “real-time” you can also follow us on Twitter:  http://twitter.com/truviso

Thanks for commenting and joining in the discussion!

The Truviso Team


Tags:


« Newer Posts
© Truviso, Inc. 2009-2010. All Rights Reserved.
Truviso™, Continuous Analytics™, VIA™, TruCQ™, and TruView™ are trademarks of Truviso, Inc.