3db5e67e180fb8c992e513c9c9142e62.ppt
- Количество слайдов: 34
Scalable Realtime Analytics with declarative, SQL like, Complex Event Processing Scripts Srinath Perera Director, Research WSO 2 Apache Member (@srinath_perera) srinath@wso 2. com
(Batch) Analytics § Scientists are doing this for 25 year with MPI (1991) on special Hardware § Took off with Google’s Map. Reduce paper (2004), Apache Hadoop, Hive and whole eco system created. § It was successful, So we are here!! § But, processing takes time.
Value of Some Insights degrade Fast! § For some usecases ( e. g. stock markets, traffic, surveillance, patient monitoring) the value of insights degrade very quickly with time. - E. g. stock markets and speed of light § We need technology that can produce outputs fast - Static Queries, but need very fast output (Alerts, Realtime control) - Dynamic and Interactive Queries ( Data exploration)
History § Realtime Analytics are not new either!! - Active Databases (2000+) - Stream processing (Aurora, Borealis (2005+) and later Storm) - Distributed Streaming Operators (e. g. Database research topic around 2005) - CEP vendor roadmap ( from http: //www. complexevents. com/2014/12/03/ceptooling-market-survey-2014/)
Realtime Analytics Tools
I. Stream Processing § Program a set of processors and wire them up, data flows though the graph. § A middleware framework handles data flow, distribution, and fault tolerance (e. g. Apache Storm, Samza) § Processors may be in the same machine or multiple machines
II. Complex Event Processing
III. Micro Batch § Process data in small batches, and then combine results for final results (e. g. Spark) § Works for simple aggregates, but tricky to do this for complex operations (e. g. Event Sequences) § Can do it with Map. Reduce as well if the deadlines are not too tight.
IV. OLAP Style In Memory Computing § Usually done to support interactive queries § Index data to make them readily accessible so you can respond to queries fast. (e. g. Apache Drill) § Tools like Druid, Volt. DB and SAP Hana can do this with all data in memory to make things really fast.
Realtime Analytics Patterns § Simple counting (e. g. failure count) § Counting with Windows ( e. g. failure count every hour) § Preprocessing: filtering, transformations (e. g. data cleanup) § Alerts , thresholds (e. g. Alarm on high temperature) § Data Correlation, Detect missing events, detecting erroneous data (e. g. detecting failed sensors) § Joining event streams (e. g. detect a hit on soccer ball) § Merge with data in a database, collect, update data conditionally
Realtime Analytics Patterns (contd. ) § Detecting Event Sequence Patterns (e. g. small transaction followed by large transaction) § Tracking - follow some related entity’s state in space, time etc. (e. g. location of airline baggage, vehicle, tracking wild life) § Detect trends – Rise, turn, fall, Outliers, Complex trends like triple bottom etc. , (e. g. algorithmic trading, SLA, load balancing) § Learning a Model (e. g. Predictive maintenance) § Predicting next value and corrective actions (e. g. automated car)
Apache Hive § A SQL like data processing language § Since many understand SQL, Hive made large scale data processing Big Data accessible to many § Expressive, short, and sweet. § Define core operations that covers 90% of problems § Lets experts dig in when they like!
(Batch Processing, Hive) (Realtime Analytics, X) What is X?
CEP = SQL for Realtime Analytics § Easy to follow from SQL § Expressive, short, and sweet. § Define core operations that covers 90% of problems § Lets experts dig in when they like! Lets look at the core operations.
Operators: Filters define stream Temp. Stream (ts long, temp double); from Temprature. Stream [weather: convert. Fto. C(temp) > 30. 0) and room. No != 2043] select room. No, temp insert into Hot. Rooms. Stream ; § Assume a temperature stream § Here weather: convert. Fto. C() is a user defined function. They are used to extend the language. § Usecases: - Alerts , thresholds (e. g. Alarm on high temperature) - Preprocessing: filtering, transformations (e. g. data cleanup)
Operators: Windows and Aggregation from Temprature. Stream#window. time(1 min) select room. No, avg(temp) as avg. Temp insert into Hot. Rooms. Stream ; § Support many window types - Batch Windows, Sliding windows, Custom windows § Usecases - Simple counting (e. g. failure count) - Counting with Windows ( e. g. failure count every hour)
Operators: Patterns from every (a 1 = Temprature. Stream) -> a 2 = Temprature. Stream [temp > a 1. temp + 5 ] within 1 day select a 2. ts as ts, a 2. temp – a 1. temp as diff insert into Hot. Day. Alert. Stream; § Models a followed by relation: e. g. § Usecases - Detecting Event Sequence Patterns event A followed by event B - Tracking § Very powerful tool for tracking - Detect trends and detecting patterns
Operators: Joins from Temp. Stream[temp > 30. 0]#window. time(1 min) as T join Regulator. Stream[is. On == false]#window. length(1) as R on T. room. No == R. room. No select T. room. No, R. device. ID, ‘start’ as action insert into Regulator. Action. Stream § Join two data streams based on a condition and windows § Usecases - Data Correlation, Detect missing events, detecting erroneous data - Joining event streams
Operators: Access Data from the Disk define stream Temp. Stream (ts long, temp double); define table Hist. Temp. Table(day long, avg. T double); from Temp. Stream #window. length(1) join Old. Temp. Table on get. Day. Of. Year(ts) == Hist. Temp. Table. day && ts > avg. T select ts, temp insert into Purchase. User. Stream ; § Event tables allow users to map a database to a window and join a data stream with the window § Usecases - Merge with data in a database, collect, update data conditionally
Revisit Patterns § Simple counting § Counting with Windows § Preprocessing: filtering, transformations § Alerts , thresholds § Data Correlation, Detect missing events, § Joining event streams § Tracking § Merge with data in a database, collect, update data conditionally § Detecting Event Sequence Patterns § Detect trends § Learning a Model § Predicting next value and corrective actions
Predictive Analytics § Build models and use them with WSO 2 CEP, BAM and ESB using upcoming WSO 2 Machine Learner Product ( 2015 Q 2) § Build model using R, export them as PMML, and use within WSO 2 CEP § Call R Scripts from CEP queries § Regression and Anomaly Detection Operators in CEP
Case Study: Realtime Soccer Analysis Watch at: https: //www. youtube. com/watch? v=n. RI 6 bu. Q 0 NOM
TFL Traffic Analysis Built using TFL ( Transport for London) open data feeds. http: //goo. gl/04 t. X 6 k http: //goo. gl/9 x. Ni. Cm
Great, Does it Scale?
Idea 1: Network of CEP Nodes § For scaling, we arrange CEP processing nodes in a graph like with stream processing. § The Graph can be implemented using an stream processing engine like Apache Storm
Idea II: Compile SQL like Queries to a Network of CEP Nodes from Temp. Stream[temp > 33] insert into High. Temp. Stream; from High. Temp. Stream#window(1 h) select max(temp)as max insert into Hourly. Max. Temp. Stream;
How do We partition the Data to scale up the Analysis? § Lets follow Map. Reduce § Map Reduce does not scale itself, it asks users to break the problem to many small independent problems.
Idea III: Let the Users specify Parallelism define partition on Temp. Stream. region { from Temp. Stream[temp > 33] insert into High. Temp. Stream; } from High. Temp. Stream#window(1 h) select max(temp)as max insert into Hourly. Max. Temp. Stream; § Language include parallel constructs: partitions, pipelines, distributed operators § Assign each partition to a different node, and partition the data accordingly
Handling Ordering § When the data processed in parallel, output might be generated out of order. § Due to lack of a global time, we cannot trigger windows and other time sensitive constructs § Solution: the current time needs to be propagated though the graph
Putting Everything Together
WSO 2 CEP & Big Data Platform
CEP = SQL for Realtime Analytics § Easy to follow from SQL § Expressive, short, sweet and fast!! § Define core operations that covers 90% of problems § Lets experts dig in when they like! And it Scales!!
Questions? Visit us at Booth 1025 http: //wso 2. com/landing/stratahadoop-world-ca-2015/
3db5e67e180fb8c992e513c9c9142e62.ppt