Menu:

Cayuga: Stateful Publish/Subscribe for Event Monitoring

People    Publications    Source Code    Related Projects    Funding

Publish/Subscribe is a popular paradigm for users to express interests ("subscriptions") in events ("publications"). It allows efficient asynchronous interaction among distributed applications. For many years it has been an active field of research, with topics spanning active databases, event systems, high performance implementations of pub/sub, and distributed pub/sub. Today, a publish/subscribe system is part of a typical message-oriented middleware, and major vendors sell products with the functionality of message brokers.

Traditional publish/subscribe (pub/sub) systems such as topic-based and content-based pub/sub systems have a major limitation: They only allow users to express stateless subscriptions that are evaluated over individual events that arrive at the system. However, many applications require the ability to handle stateful subscriptions that involve more than a single event, and users need to be notified with customized witness events as soon as one of their stateful subscriptions is satisfied.

Application Scenario 1: Monitoring The Blogosphere and Social Networks

Social networks with blogs, discussion forums and multimedia content have become increasingly important for online exchange of news and opinions, as well as a means for social interaction across geographic boundaries. As the amount of data overwhelms individual users, the only massively used medium to reduce the overhead of staying updated is web feeds in diverse XML formats including RSS, RDF, and Atom. Current web feed aggregators allow users to specify simple filters on the title and/or category of published feed items. In Cayuga, we can pose the following more interesting subscriptions:

  1. Notify me when I write a comment on someone's scrapbook or blog and someone writes another comment that references my comment within a given time span.
  2. Notify me when a camera with some feature (like 12X zoom) is released by Canon and there are at least 10 positive reviews on this product within a given time span.
  3. Notify me when there is a news article on CNN about US troop morale with positive sentiment and this is followed by an article on BBC about US troop morale with negative sentiment.
  4. Notify me when a friend of mine on a particular social network, adds a new friend who has a profile very similar to mine (May be my friend)
  5. Notify me when a new topic starts in a discussion forum (social network community) and there have been postings from at least 20 distinct people within a given time span.

While simple to understand for humans, Cayuga requires users to further specify their interests in the structured Cayuga Event Language (CEL). The following table contains the CEL equivalents of the examples listed above. For detailed explanation of the CEL syntax we refer to our CIDR '07 paper.  

Subscription

Description

1

SELECT *

FROM FILTER {DUR <= w}

(S FOLD {CONTAINS(Content,$1.URL), ,} S)  

2

SELECT *

FROM FILTER {DUR <= w and count >= 10}

(FILTER{Company = ‘Canon' and Category = ‘Camera' and CONTAINS(Description, '12X zoom')}

(SELECT *, 1 as count FROM Products) FOLD {CONTAINS(Content, $1.URL) and IsPositive(Content), , $.count + 1 as count} S) 

3

SELECT *

FROM FILTER{Author = CNN and CONTAINS(Content, 'US troop morale') and IsPositive(Content)}(S) NEXT{,} FILTER{Author = ‘BBC' and CONTAINS(Content, 'US troop morale') and IsNegative(Content)}(S) 

4

SELECT *

FROM FILTER{IsMyFriend(userID) and Type = add and IsSimilar(ObjectProfile, p) } F  

5

SELECT *

FROM FILTER {DUR <= w and Size(AuthorSet) >= 20} (

(SELECT *, 1 as cnt, Set(Author) as AuthorSet FROM S)

FOLD{CONTAINS(Content, $1.URL), , $1.cnt+1 as cnt, Union($1.AuthorSet, Author) as AuthorSet} S)

 

Application Scenario 2: Technical Analysis for Stock Investors

Technical analysis, also known as charting, is the study of the trading history (the price and volume over time) of any type of traded market (stocks, commodities, etc.) to attempt to predict future prices. Technical analysts look for many different patterns in price movements. The interpretations of such patterns are used to support trading decisions.

Investors that are able to see trends before other market players will be able to make early moves and thus increase profit margins. Cayuga can be used to monitor in real time many different complex patterns across a large number of different securities.

 

Figure 1: Double Top Formation Marked In Red

 

Subscription 1: Double Top (Figure 1)

“Notify me whenever there is double top formation in the price chart of any stock.” A well-known pattern amongst analysts, the double top (or M-shaped) pattern can be expressed in CEL as follows:

SELECT Name, PriceA, PriceB, PriceC, PriceD, Price_1 AS PriceE, Price AS PriceF
FROM FILTER {Price >= Price_1 AND Price <= PriceA}
(FILTER{Price <= 1.1*PriceB} (
SELECT Name, PriceA, PriceB, PriceC, Price_1 AS PriceD, Price
FROM
FILTER{Price >= 0.9*PriceB} (
SELECT Name, PriceA, PriceB, Price_1 AS PriceC, Price
FROM
FILTER{Price >= 0.9*PriceA AND Price <= 1.1*PriceA} (
SELECT Name, PriceA, Price_1 AS PriceB, Price
FROM
FILTER{Price >= 1.2*PriceA} (
SELECT Name, Price_1 AS PriceA, Price
FROM
FILTER {Price < Price_1}
(SELECT Name, Price FROM Stock NEXT {$1.Name=$2.Name} Stock)
FOLD {$1.Name = $2.Name, $2.Price >= $.Price,} Stock)
FOLD {$1.Name = $2.Name, $2.Price <= $.Price,} Stock)
FOLD {$1.Name = $2.Name, $2.Price >= $.Price,} Stock)
FOLD {$1.Name = $2.Name, $2.Price <= $.Price,} Stock)
NEXT {$1.Name = $2.Name}
Stock)
PUBLISH MShapeStock 

Subscription 2: Flag

“Notify me whenever the price of a stock has been increasing for the past 10 minutes followed by a 3 minute duration when the price is constant.”

SELECT *

FROM FILTER{DUR >= 3min} (

FILTER{DUR >= 10 min} (S FOLD {Name=$1.Name, Price > $.Price, } S)

FOLD {Name=$1.Name, Price = $1.Price, } S)

   

Subscription 3: Divergence of Moving Average and Price Lines

“Notify me when the difference between the current price of a stock and its 10 day moving average is greater than some threshold value.”

In this example, we will also demonstrate how we apply divide and conquer to more easily express queries. A feature called resubscription is used to allow a query to subscribe to the output of another. Key to achieve this disconnection is that our operators are stream-to-stream, implying that the input and output of every operator must be a stream. The PUBLISH clause then specifies an identifier of the output stream which other queries can specify as their input stream. Queries can also be nested, as this query exemplifies in declaring the output from an inner SELECT clause as the input for the FILTER operator in line two.

SELECT Name, sum / count AS Price

FROM FILTER{DUR = 10day} (

SELECT *, Price as sum, 1 as count FROM S

FOLD {Name=$1.Name, , $.count + 1 as count, $.sum + Price as sum} S)

PUBLISH MovingAverage

 

SELECT *

FROM MovingAverage NEXT {Name = $1.Name, Price - $1.Price > threshold} S

 

Subscription 4: Gap

“Notify me when the lowest price in a (t, t+5 minute) window is greater than the highest price in a (t',t'+5 minute) window, where t'>t and t'-t is less than some constant value.”

Let t' - t <= threshold

SELECT Name, MinPrice

FROM FILTER{DUR = 5min} (

SELECT *, Price as MinPrice FROM S

FOLD {Name=$1.Name, , min($.MinPrice, Price) as MinPrice} S)

PUBLISH T_WINDOW

 

SELECT Name, MaxPrice

FROM FILTER{DUR = 5min} (

SELECT *, Price as MaxPrice FROM S

FOLD {Name=$1.Name, , max($.MaxPrice, Price) as MaxPrice} S)

PUBLISH T_PRIME_WINDOW

 

SELECT *

FROM FILTER {DUR <= threshold} (

T_WINDOW NEXT {Name = $1.Name, MaxPrice < $1.MinPrice} T_PRIME_WINDOW)

 

Subscription 5: Pair Trades

“Notify me when some stock increases more than a set threshold within some constant time t.”

SELECT *
FROM FILTER {DUR < t} (S NEXT {Name = $1.Name, Price - $1.Price > threshold} S)

Other applications that motivate stateful subscriptions include transactions in retail chains, ATM and credit card operations in banks, online analysis of OS and Web server log records, network monitoring, and event streams from RFIDs.

 

Cayuga Non-Finite Automatons

Cayuga uses non-finite automatons to match event patterns to queries.

Consider a stock ticker stream with schema (Name, Price, Volume). The query monitors this stream to find the following pattern for any listed company. There is a large trade for a company (volume > 10,000), followed by a monotonic decrease of the stock price of this company for at least 10 minutes. After the decrease, the stock suddenly rebounds (increases at least 5% in value).

The query can be read and understood like a standard SQL query, but it has a few new constructs.

The innermost expression is a filter to find the initial large trade. Once such a trade is found, a new automaton instance is spawned. It holds the data of the stock ticker event and makes a transition from state Q0 to A.

The highlighted construct matches the monotonically decreasing stock price. This is done by the FOLD construct. It repeatedly finds the next ticker event for the same company ($2.Name=$.Name predicate) and it keeps matching as long as the price is falling ($2.Price < $.Price). If the price increases, the automaton instance is deleted.

The corresponding automaton state has two self-loops. The top loop, called a filter edge, corresponds to the first argument of FOLD, i.e., the predicate on NAME in the example. It essentially enables the automaton instance to skip over all stock ticker events for other companies. The bottom loop, called a rebind edge, implements the iteration for the second argument of FOLD. In the example, it repeatedly matches on decreasing price for the same company, each time updating the current lowest price stored in the automaton instance.

The output of this expression, i.e., the instances that leave state A, has company name, highest and lowest price in the schema.

The FILTER condition outside the FOLD expression ensures that the monotonic price decrease has to last for at least 10 minutes. Only those automaton instances are allowed to make the transition from state A to B. Notice that for an monotonic decline of more than 10 minutes duration, each time a new automaton instance is spawned to advance to state B, while the original instance at state A keeps matching until the monotonic trend ends.

Finally, the NEXT expression finds the immediately next stock ticker event for the company. The filter edge in state B enables the automaton instance to make a transition when a ticker event for a different company is processed, i.e., it can skip over such events until the next matching company event arrives. The last transition from state B to C can only be made if the price increased by at least 5% in this matched event.

If an automaton instance reaches the final state, it contains the company name, the price of the initial large trade (MaxPrice), the lowest price after the monotonic decline (MinPrice), and the final price after the rebound (FinalPrice).

Cayuga: The Tradeoffs.

Traditional pub/sub systems scale to millions of registered subscriptions and very high event rates, but have limited expressive power. In our stateful pub/sub system, however, subscriptions can span multiple events, involving parameterization and aggregation, while maintaining scalability in the number of subscriptions and event rate. In comparison, full-fledged Data Stream Management Systems (DSMS) have powerful query languages that allow them to express much more powerful subscriptions than stateful pub/sub systems; however, this limits their scalability with the number of subscriptions, and existing DSMSs only do limited multi-query optimization. The table below illustrates these tradeoffs.

 


Number of concurrent subscriptions
 


few
 

many

Complexity of Subscriptions

low

boring :-)

publish/subscribe

high

DSMS

Cayuga

Cayuga is designed to allow users to express subscriptions that span multiple events, and it supports powerful language features such as parameterization and aggregation, which significantly extend the expressive power of standard pub/sub systems. Based on a set of formally defined language operators, the subscription language of Cayuga provides non-ambiguous subscription semantics as well as unique opportunities for optimizations. Subscriptions written in Cayuga algebra will be translated into Cayuga automata, extended from standard non-deterministic finite automata. The formal properties of this language, as well as the exact expressiveness of the language are the subject of another research project on the expressiveness of stream processing languages.

The Cayuga system architecture is designed to efficiently support a large number of concurrent subscriptions. Its core components include a query processing engine, an index component, a meta data manager, and a memory manager. The query processing engine extends the functionality of traditional publish/subscribe to support stateful subscriptions. We have developed novel multi-query optimization (MQO) techniques, essential for large-scale, high-speed pub/sub systems, to speed up event processing. These techniques are performed when Cayuga automata are loaded into the query processing engine. The meta data manager and memory manager have their counterparts in traditional database systems, but in Cayuga they are tuned for efficient in-memory processing of subscriptions.

Cayuga is also the subscription matching engine of the Cornell Knowledge Broker.

People

Publications

Funding

This research has been supported by the National Science Foundation under Grants IIS-0621438 and IIS-0121175, and by the Air Force Office of Scientific Research. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the sponsors.