Where?
Find the code here: http://github.com/bwlewis/esperr.
Source package: http://illposed.net/esperr_0.1.0.tar.gz
Find R here: http://www.r-project.org.
Find Esper here: http://esper.codehaus.org/.
What?
esperr: Streaming event processing for R
The esperr package for R incorporates the Esper framework and implements an
R-language interface to its XML and Java bean event API. Events are defined by
Java objects or by XML documents that follow an XML schema definition (XSD)
document. The package includes example schema and events.
An event is an immutable, structured data object associated with a time in
the past.
Esper is a set of open-source libraries for working with multiple sequences of
events to perform computations involving the event streams.
Paul Fremantle of WS02 defines the following simple taxonomy
of event-processing terms:
- Simple event processing
Filters that work on single events.
- Event stream processing
Filters and other computations that work across multiple events.
- Complex event processing
Filters and other computations that work across multiple event streams.
Here are some slides from a talk given at the R in Finance 2010 conference:
LewisKaneRInFinance.pdf.
Note that this package is quite new and still in active development! Please,
please, feel free to contribute.
Why?
- R is not particularly well-suited to handling streaming data, Esper is.
- Esper doesn't provide sophisticated analysis tools, R does.
Event types and performance
Events may be described programatically in many ways. The esperr
package exposes two event descriptions directly to R: XML and plain-old Java
objects (POJOs). Each event description has advantages and disadvantages.
The chief difference is performance: We reported in our talk a throughput
of about 4,000 events/second using XML-described events. POJO events in the
same VWAP example on the same hardware yielded about 250,000 events/second.
XML events are defined by a text XML schema document. Their chief
advantage is their flexibility. They require only a text editor to define,
and are human-readable. No extra software is required to create and use
XML events. Their chief drawback is that the XML must be parsed, incurring
extra processing overhead. The esperr package presently uses a very simple
XML event representation implemented with the "document object model" (DOM).
A much higher-performance framework is available in the Apache Axiom library.
Axiom is geared to efficiently and rapidly process streaming XML events and
also includes capabilities for handling raw binary data (as well as many other
advantages). We plan to include Axiom support in a future revision of the
esperr package.
POJO events are simply Java objects that conform to Java bean
convention. They require a Java compiler to create, although once created
the esperr package can process them without the need of a compiler. Their
main advantage is performance. The Esper library is particularly well-suited
to processing POJOs. You will see up to several orders magnitude greater
performance with POJO events and esperr than with XML events.
How to choose?
XML events provide a very quick and dynamic way to create and experiment
with event structure. POJO requires a bit more work and a Java compiler.
Perhaps one approach is to prototype with XML and implement with Java in
production.
So, is that it?
Not quite. The esperr package also includes a basic prototype interface to
the Redis database that makes it easy to send output events for offline or
distributed processing. Our intuition is that the combination of Esper stream
processing with a Redis output event cache is a powerful one, but we are
just beginning to explore this combination.
Right now at least, the package includes all the elements one needs to
perform distributed parallel event processing with R.