• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
 

Stream processing and Norikra

on

  • 0 views

 

Statistics

Views

Total Views
0
Views on SlideShare
0
Embed Views
0

Actions

Likes
0
Downloads
0
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via SlideShare as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Stream processing and Norikra Stream processing and Norikra Presentation Transcript

    • @tagomoris Norikra meetup 2014/07/09 Stream processing and Norikra 14年7月9日水曜日
    • TAGOMORI Satoshi (@tagomoris) LINE Corporation Analytics Platform Team 14年7月9日水曜日
    • THE ONE THING WHAT YOU MUST LEAN TODAY IS 14年7月9日水曜日
    • Norikra 14年7月9日水曜日
    • Norikra IS NOT Norika 14年7月9日水曜日
    • STREAM PROCESSING 14年7月9日水曜日
    • Processing models Batch processing: RDBMS, Hadoop(Hive), BigQuery/RedShift Stream processing: Storm, Spark streaming, Norikra 14年7月9日水曜日
    • Batch processing RDBMS, Hadoop/Hive, .... (transaction is out of this topic) Target window: hours - weeks (or more) Total throuput: HIGHEST Query Latency: LARGEST (20sec - mins - hours) 14年7月9日水曜日
    • Stream processing Storm, Esper, Norikra, Fluentd, .... Kafka(?), Spark streaming(?) Target window: seconds - hours Total throughput: Normal Query latency: SMALLEST (milliseconds) Queries must be written BEFORE DATA Once registered, runs forever 14年7月9日水曜日
    • Data flow and latency data window query execution Batch Stream incremental query exection 14年7月9日水曜日
    • Query for stored data v1,v2,v3,v4,v5,v6 v1,v2,v3,v4,v5,v6 v1,v2,v3,v4,v5,v6 v1,v2,v3,v4,v5,v6 v1,v2,v3,v4,v5,v6 v1,v2,v3,v4,v5,v6 v1,v2,v3,v4,v5,v6 v1,v2,v3,v4,v5,v6 v1,v2,v3,v4,v5,v6 v1,v2,v3,v4,v5,v6 v1,v2,v3,v4,v5,v6 v1,v2,v3,v4,v5,v6 table At first, all data MUST be stored. 14年7月9日水曜日
    • Query for stored data v1,v2,v3,v4,v5,v6 v1,v2,v3,v4,v5,v6 v1,v2,v3,v4,v5,v6 v1,v2,v3,v4,v5,v6 v1,v2,v3,v4,v5,v6 v1,v2,v3,v4,v5,v6 v1,v2,v3,v4,v5,v6 v1,v2,v3,v4,v5,v6 v1,v2,v3,v4,v5,v6 v1,v2,v3,v4,v5,v6 v1,v2,v3,v4,v5,v6 v1,v2,v3,v4,v5,v6 SELECT v1,v2,COUNT(*) FROM table WHERE v3=’x’ GROUP BY v1,v2 table 14年7月9日水曜日
    • Query for stored data v1,v2,v3,v4,v5,v6 v1,v2,v3,v4,v5,v6 v1,v2,v3,v4,v5,v6 v1,v2,v3,v4,v5,v6 v1,v2,v3,v4,v5,v6 v1,v2,v3,v4,v5,v6 v1,v2,v3,v4,v5,v6 v1,v2,v3,v4,v5,v6 v1,v2,v3,v4,v5,v6 v1,v2,v3,v4,v5,v6 v1,v2,v3,v4,v5,v6 v1,v2,v3,v4,v5,v6 SELECT v1,v2,COUNT(*) FROM table WHERE v3=’x’ GROUP BY v1,v2 table SELECT v4,COUNT(*) FROM table WHERE v1 AND v2 GROUP BY v4 14年7月9日水曜日
    • Query for stored data v1,v2,v3,v4,v5,v6 v1,v2,v3,v4,v5,v6 v1,v2,v3,v4,v5,v6 v1,v2,v3,v4,v5,v6 v1,v2,v3,v4,v5,v6 v1,v2,v3,v4,v5,v6 v1,v2,v3,v4,v5,v6 v1,v2,v3,v4,v5,v6 v1,v2,v3,v4,v5,v6 v1,v2,v3,v4,v5,v6 v1,v2,v3,v4,v5,v6 v1,v2,v3,v4,v5,v6 SELECT v1,v2,COUNT(*) FROM table WHERE v3=’x’ GROUP BY v1,v2 table SELECT v4,COUNT(*) FROM table WHERE v1 AND v2 GROUP BY v4 “All data” means “data that not be used”. 14年7月9日水曜日
    • Query for stream data v1,v2,v3,v4,v5,v6 SELECT v1,v2,COUNT(*) FROM table.win:xxx WHERE v3=’x’ GROUP BY v1,v2 stream SELECT v4,COUNT(*) FROM table.win:xxx WHERE v1 AND v2 GROUP BY v4 v1,v2,v3,v4,v5,v6 v1,v2,v3,v4,v5,v6 v1,v2,v3,v4,v5,v6 v1,v2,v3,v4,v5,v6 v1,v2,v3,v4,v5,v6 v1,v2,v3,v4,v5,v6 v1,v2,v3,v4,v5,v6 v1,v2,v3,v4,v5,v6 v1,v2,v3,v4,v5,v6 v1,v2,v3,v4,v5,v6 v1,v2,v3,v4,v5,v6 14年7月9日水曜日
    • Query for stream data v1,v2,v3,v4,v5,v6 SELECT v1,v2,COUNT(*) FROM table.win:xxx WHERE v3=’x’ GROUP BY v1,v2 stream SELECT v4,COUNT(*) FROM table.win:xxx WHERE v1 AND v2 GROUP BY v4 v1,v2,v3 v1,v2,v4 v1,v2,v3,v4,v5,v6 v1,v2,v3,v4,v5,v6 v1,v2,v3,v4,v5,v6 v1,v2,v3,v4,v5,v6 v1,v2,v3,v4,v5,v6 v1,v2,v3,v4,v5,v6 v1,v2,v3,v4,v5,v6 v1,v2,v3,v4,v5,v6 v1,v2,v3,v4,v5,v6 14年7月9日水曜日
    • v1,v2,v3,v4,v5,v6 Query for stream data SELECT v1,v2,COUNT(*) FROM table.win:xxx WHERE v3=’x’ GROUP BY v1,v2 stream SELECT v4,COUNT(*) FROM table.win:xxx WHERE v1 AND v2 GROUP BY v4 v1,v2,v3 v1,v2,v4v1,v2,v3,v4,v5,v6 v1,v2,v3,v4,v5,v6 v1,v2,v3,v4,v5,v6 v1,v2,v3,v4,v5,v6 v1,v2,v3,v4,v5,v6 v1,v2,v3,v4,v5,v6 v1,v2,v3,v4,v5,v6 v1,v2,v3,v4,v5,v6 14年7月9日水曜日
    • v1,v2,v3,v4,v5,v6 Query for stream data SELECT v1,v2,COUNT(*) FROM table.win:xxx WHERE v3=’x’ GROUP BY v1,v2 stream SELECT v4,COUNT(*) FROM table.win:xxx WHERE v1 AND v2 GROUP BY v4 v1,v2,v3 v1,v2,v4 v1,v2,v3,v4,v5,v6 v1,v2,v3,v4,v5,v6 v1,v2,v3,v4,v5,v6 v1,v2,v3,v4,v5,v6 All data will be discarded just after inserted. (Bye-bye storage system maintenance!) 14年7月9日水曜日
    • Incremental calculation v1,v2,v3,v4,v5,v6 SELECT v1,v2,COUNT(*) FROM table.win:xxx WHERE v3=’x’ GROUP BY v1,v2 stream v1,v2,v3,v4,v5,v6 v1,v2,v3,v4,v5,v6 v1,v2,v3,v4,v5,v6 v1,v2,v3,v4,v5,v6 v1,v2,v3,v4,v5,v6 v1,v2,v3,v4,v5,v6 v1,v2,v3,v4,v5,v6 v1,v2,v3,v4,v5,v6 v1,v2,v3,v4,v5,v6 v1,v2,v3,v4,v5,v6 v1,v2,v3,v4,v5,v6 internal data (memory) v1 v2 COUNT TRUE TRUE 0 TRUE FALSE 1 FALSE TRUE 33 FALSE FALSE 2 14年7月9日水曜日
    • Incremental calculation v1,v2,v3,v4,v5,v6 SELECT v1,v2,COUNT(*) FROM table.win:xxx WHERE v3=’x’ GROUP BY v1,v2 stream v1,v2,v3,v4,v5,v6 v1,v2,v3,v4,v5,v6 v1,v2,v3,v4,v5,v6 v1,v2,v3,v4,v5,v6 v1,v2,v3,v4,v5,v6 v1,v2,v3,v4,v5,v6 v1,v2,v3,v4,v5,v6 v1,v2,v3,v4,v5,v6 v1,v2,v3,v4,v5,v6 v1,v2,v3,v4,v5,v6 v1,v2,v3,v4,v5,v6 internal data (memory) v1 v2 COUNT TRUE TRUE 1 TRUE FALSE 1 FALSE TRUE 33 FALSE FALSE 2 14年7月9日水曜日
    • Incremental calculation v1,v2,v3,v4,v5,v6 SELECT v1,v2,COUNT(*) FROM table.win:xxx WHERE v3=’x’ GROUP BY v1,v2 stream v1,v2,v3,v4,v5,v6 v1,v2,v3,v4,v5,v6 v1,v2,v3,v4,v5,v6 v1,v2,v3,v4,v5,v6 v1,v2,v3,v4,v5,v6 v1,v2,v3,v4,v5,v6 v1,v2,v3,v4,v5,v6 v1,v2,v3,v4,v5,v6 v1,v2,v3,v4,v5,v6 v1,v2,v3,v4,v5,v6 internal data (memory) v1 v2 COUNT TRUE TRUE 1 TRUE FALSE 1 FALSE TRUE 34 FALSE FALSE 2 14年7月9日水曜日
    • Incremental calculation SELECT v1,v2,COUNT(*) FROM table.win:xxx WHERE v3=’x’ GROUP BY v1,v2 stream v1,v2,v3,v4,v5,v6 v1,v2,v3,v4,v5,v6 v1,v2,v3,v4,v5,v6 v1,v2,v3,v4,v5,v6 v1,v2,v3,v4,v5,v6 v1,v2,v3,v4,v5,v6 internal data (memory) v1 v2 COUNT TRUE TRUE 1 TRUE FALSE 2 FALSE TRUE 37 FALSE FALSE 3 memory can store internal data 14年7月9日水曜日
    • Data window Target time (or size) range of queries Batch (or short-batch) FROM-TO: WHERE dt >= ‘2014-07-07 00:00:00‘ AND dt <= ‘2014-07-08 23:59:59’ Stream “Calculate this query for every 3 minutes” Extended SQL required SELECT v1,v2,COUNT(*) FROM table.win:xxx WHERE v3=’x’ GROUP BY v1,v2 14年7月9日水曜日
    • Stream processing with SQL Esper: Java library to process Stream With schema 14年7月9日水曜日
    • Stream processing with SQL Esper: Java library to process Stream Esper EPL SELECT param1, param2 FROM tbl WHERE age > 30 14年7月9日水曜日
    • Stream processing with SQL SELECT param, COUNT(*) AS c FROM tbl WHERE age > 30 GROUP BY param Esper: Java library to process Stream Esper EPL 14年7月9日水曜日
    • Stream processing with SQL SELECT param, COUNT(*) AS c FROM tbl.win:time_batch(1 hour) WHERE age > 30 GROUP BY param Esper: Java library to process Stream Esper EPL 14年7月9日水曜日
    • 14年7月9日水曜日
    • Norikra: Schema-less Stream Processing with SQL Server software, runs on JVM Open source software (GPLv2) http://norikra.github.io/ https://github.com/norikra/norikra 14年7月9日水曜日
    • Norikra: Schema-less event stream: Add/Remove data fields whenever you want SQL: No more restarts to add/remove queries w/ JOINs, w/ SubQueries w/ UDF (in Java/Ruby from rubygem) Truly Complex events: Nested Hash/Array, accessible directly from SQL HTTP RPC w/ JSON or MessagePack (fluentd plugin available!) 14年7月9日水曜日
    • How to setup Norikra: Install JRuby download jruby.tar.gz, extract it and export $PATH ‘rbenv install jruby-1.7.xx’ & ‘rbenv shell jruby-..’ Install Norikra ‘gem install norikra’ Execute Norikra server ‘norikra start’ 14年7月9日水曜日
    • Norikra Interface: Command line: norikra-client norikra-client target open ... norikra-client query add ... tail -f ... | norikra-client event send ... WebUI show status show/add/remove queries HTTP API JSON, MessagePack 14年7月9日水曜日
    • Norikra Queries: (1) SELECT name, age FROM events target 14年7月9日水曜日
    • Norikra Queries: (1) SELECT name, age FROM events {“name”:”tagomoris”, “age”:34, “address”:”Tokyo”, “corp”:”LINE”, “current”:”Tsukuba”} {“name”:”tagomoris”,”age”:34} 14年7月9日水曜日
    • Norikra Queries: (1) SELECT name, age FROM events nothing {“name”:”tagomoris”, “address”:”Tokyo”, “corp”:”LINE”, “current”:”Tsukuba”} 14年7月9日水曜日
    • Norikra Queries: (2) SELECT name, age FROM events WHERE current=”Tsukuba” {“name”:”tagomoris”,”age”:34} {“name”:”tagomoris”, “age”:34, “address”:”Tokyo”, “corp”:”LINE”, “current”:”Tsukuba”} 14年7月9日水曜日
    • Norikra Queries: (2) SELECT name, age FROM events WHERE current=”Tsukuba” nothing {“name”:”kawashima”, “age”:99, “address”:”Tsukuba”, “corp”:”Univ”, “current”:”Dream”} 14年7月9日水曜日
    • Norikra Queries: (3) SELECT age, COUNT(*) as cnt FROM events.win:time_batch(5 mins) GROUP BY age 14年7月9日水曜日
    • Norikra Queries: (3) SELECT age, COUNT(*) as cnt FROM events.win:time_batch(5 mins) GROUP BY age {”age”:34,”cnt”:3}, {“age”:33,”cnt”:1}, ... every 5 mins {“name”:”tagomoris”, “age”:34, “address”:”Tokyo”, “corp”:”LINE”, “current”:”Tsukuba”} 14年7月9日水曜日
    • Norikra Queries: (4) SELECT age, COUNT(*) as cnt FROM events.win:time_batch(5 mins) GROUP BY age {”age”:34,”cnt”:3}, {“age”:33,”cnt”:1}, ... SELECT max(age) as max FROM events.win:time_batch(5 mins) {“max”:51} {“name”:”tagomoris”, “age”:34, “address”:”Tokyo”, “corp”:”LINE”, “current”:”Tsukuba”} every 5 mins 14年7月9日水曜日
    • Norikra Queries: (5) SELECT age, COUNT(*) as cnt FROM events.win:time_batch(5 mins) GROUP BY age {“name”:”tagomoris”, “user:{“age”:34, “corp”:”LINE”, “address”:”Tokyo”}, “current”:”Tsukuba”, “speaker”:true, “attend”:[true,true,false, ...] } 14年7月9日水曜日
    • Norikra Queries: (5) SELECT user.age, COUNT(*) as cnt FROM events.win:time_batch(5 mins) GROUP BY user.age {“name”:”tagomoris”, “user:{“age”:34, “corp”:”LINE”, “address”:”Tokyo”}, “current”:”Tsukuba”, “speaker”:true, “attend”:[true,true,false, ...] } 14年7月9日水曜日
    • Norikra Queries: (5) SELECT user.age, COUNT(*) as cnt FROM events.win:time_batch(5 mins) WHERE current=”Tsukuba” AND attend.$0 AND attend.$1 GROUP BY user.age {“name”:”tagomoris”, “user:{“age”:34, “corp”:”LINE”, “address”:”Tokyo”}, “current”:”Kyoto”, “speaker”:true, “attend”:[true,true,false, ...] } 14年7月9日水曜日
    • Use cases in real world Enjoy following sessions! 14年7月9日水曜日
    • More queries, more simplicity and less latency. Thanks! 14年7月9日水曜日
    • See also: http://norikra.github.io/ “Batch processing and Stream processing by SQL” http://www.slideshare.net/tagomoris/hcj2014-sql “Log analysis systems and its designs in LINE Corp 2014 Early” http://www.slideshare.net/tagomoris/log-analysis-system-and-its-designs-in- line-corp-2014-early “Norikra in Action” http://www.slideshare.net/tagomoris/norikra-in-action-ver-2014-spring 14年7月9日水曜日