Your SlideShare is downloading. ×
Fluentd and Embulk Game Server 4
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Fluentd and Embulk Game Server 4

2
views

Published on

http://peatix.com/event/81553

http://peatix.com/event/81553

Published in: Technology

0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
2
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
5
Comments
0
Likes
3
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Masahiro Nakagawa Apr 18, 2015 Game Server meetup #4 Fluentd / Embulk For easy to reliable transfer
  • 2. Who are you? > Masahiro Nakagawa > github/twitter: @repeatedly > Treasure Data, Inc. > Senior Software Engineer > Fluentd / td-agent developer > Living at OSS :) > D language - Phobos committer > Fluentd - Main maintainer > MessagePack / RPC - D and Python (only RPC) > The organizer of several meetups (Presto, DTM, etc…) > etc…
  • 3. Structured logging ! Reliable forwarding ! Pluggable architecture http://fluentd.org/
  • 4. What’s Fluentd? > Data collector for unified logging layer > Streaming data transfer based on JSON > Written in Ruby > Gem based various plugins > http://www.fluentd.org/plugins > Working in production > http://www.fluentd.org/testimonials
  • 5. Background
  • 6. Data Analytics Flow Collect Store Process Visualize Data source Reporting Monitoring
  • 7. Data Analytics Flow Store Process Cloudera Horton Works Treasure Data Collect Visualize Tableau Excel R easier & shorter time ???
  • 8. TD Service Architecture Time to Value Send query result Result Push Acquire Analyze Store Plazma DB Flexible, Scalable, Columnar Storage Web Log App Log Censor CRM ERP RDBMS Treasure Agent(Server) SDK(JS, Android, iOS, Unity) Streaming Collector Batch / Reliability Ad-hoc /
 Low latency KPI$ KPI Dashboard BI Tools Other Products RDBMS, Google Docs, AWS S3, FTP Server, etc. Metric Insights Tableau, Motion Board etc. POS REST API ODBC / JDBC SQL, Pig Bulk Uploader Embulk,
 TD Toolbelt SQL-based query @AWS or @IDCF Connectivity Economy & Flexibility Simple & Supported
  • 9. Dive into Concept
  • 10. Divide & Conquer & Retry error retry error retry retry retry Batch Stream Other stream
  • 11. Application ・・・ Server2 Application ・・・ Server3 Application ・・・ Server1 FluentLog Server High Latency! must wait for a day... Before…
  • 12. Application ・・・ Server2 Application ・・・ Server3 Application ・・・ Server1 Fluentd Fluentd Fluentd Fluentd Fluentd In streaming! After…
  • 13. Why JSON / MessagePack? (1 > Schema on Write (Traditional MPP DB) > Writing data using schema for improving
 query performance > Pros > minimum query overhead > Cons > Need to design schema and workload before > Data load is expensive operation
  • 14. Why JSON / MessagePack? (2 > Schema on Read (Hadoop) > Writing data without schema and map schema at query time > Pros > Robust over schema and workload change > Data load is cheap operation > Cons > High overhead at query time
  • 15. Features
  • 16. Core Plugins > Divide & Conquer
 > Buffering & Retrying
 > Error handling
 > Message routing
 > Parallelism > Read / receive data > Parse data > Filter data > Buffer data > Format data > Write / send data

  • 17. Core Plugins > Divide & Conquer
 > Buffering & Retrying
 > Error handling
 > Message routing
 > Parallelism > Read / receive data > Parse data > Filter data > Buffer data > Format data > Write / send data
 Common Concerns Use Case Specific
  • 18. > default second unit > from data source Event structure(log message) ✓ Time > for message routing > where is from? ✓ Tag > JSON format > MessagePack
 internally > schema-free ✓ Record
  • 19. Architecture (v0.12 or later) EngineInput Filter Output Buffer > grep > record_transfomer > … > Forward > File tail > ... > Forward > File > ... Output > File > Memory not pluggable FormatterParser
  • 20. Configuration and operation > No central / master node > include helps configuration sharing > Operation depends on your environment > Use your deamon management > Use Chef in Treasure Data > Apache like syntax
  • 21. How to use
  • 22. Setup fluentd (e.g. Ubuntu) $ apt-get install ruby! ! $ gem install fluentd! ! $ edit fluent.conf! ! $ fluentd -c fluent.conf http://docs.fluentd.org/articles/faq#w-what-version-of-ruby-does-fluentd-support
  • 23. Treasure Agent (td-agent) > Treasure Data distribution of Fluentd > include ruby, popular plugins and etc > Treasure Agent 2 is current stable > Update core components > We recommend to use v2, not v1 > Latest version is 2.2.0 with fluentd v0.12
  • 24. Setup td-agent $ curl -L http://toolbelt.treasuredata.com/ sh/install-redhat-td-agent2.sh | sh! ! $ edit /etc/td-agent/td-agent.conf! ! $ sudo service td-agent start See: http://docs.fluentd.org/categories/installation
  • 25. Apache to Mongo tail insert event buffering routing 127.0.0.1 - - [11/Dec/2014:07:26:27] "GET / ... 127.0.0.1 - - [11/Dec/2014:07:26:30] "GET / ... 127.0.0.1 - - [11/Dec/2014:07:26:32] "GET / ... 127.0.0.1 - - [11/Dec/2014:07:26:40] "GET / ... 127.0.0.1 - - [11/Dec/2014:07:27:01] "GET / ... ... Fluentd Web Server 2014-02-04 01:33:51 apache.log { "host": "127.0.0.1", "method": "GET", ... }
  • 26. Plugins - use rubygems $ fluent-gem search -rd fluent-plugin! ! $ fluent-gem search -rd fluent-mixin! ! $ fluent-gem install fluent-plugin-mongo In td-agent:
 /usr/sbin/td-agent-gem install fluent-plugin-mongo
  • 27. # receive events via HTTP <source> @type http port 8888 </source> ! # read logs from a file <source> @type tail path /var/log/httpd.log format apache tag apache.access </source> ! # save access logs to MongoDB <match apache.access> @type mongo database apache collection log </match> # save alerts to a file <match alert.**> @type file path /var/log/fluent/alerts </match> ! # forward other logs to servers <match **> @type forward <server> host 192.168.0.11 weight 20 </server> <server> host 192.168.0.12 weight 60 </server> </match> ! @include http://example.com/conf
  • 28. > Apply filtering routine to event stream > No more tag tricks!
 
 
 
 
 
 Filter <match access.**> @type record_reformer tag reformed.${tag} </match> ! <match reformed.**> @type growthforecast </match> <filter access.**> @type record_transformer … </filter> v0.10: v0.12: <match access.**> @type growthforecast </match>
  • 29. Before
  • 30. After or Embulk
  • 31. Nagios MongoDB Hadoop Alerting Amazon S3 Analysis Archiving MySQL Apache Frontend Access logs syslogd App logs System logs Backend Databases buffering / processing / routing M x N → M + N
  • 32. Roadmap > v0.10 (old stable) > v0.12 (current stable) > Filter / Label / At-least-once > v0.14 (spring - early summer, 2015) > New plugin APIs, ServerEngine, Time… > v1 (summer - fall, 2015) > Fix new features / APIs https://github.com/fluent/fluentd/wiki/V1-Roadmap
  • 33. Use-cases
  • 34. Simple forwarding
  • 35. # logs from a file <source> type tail path /var/log/httpd.log pos_file /tmp/pos_file format apache2 tag backend.apache </source> ! # logs from client libraries <source> type forward port 24224 </source> ! # store logs to MongoDB <match backend.*> type mongo database fluent collection test </match>
  • 36. # Ruby! Fluent.open(“myapp”)! Fluent.event(“login”, {“user” => 38})! #=> 2012-12-11 07:56:01 myapp.login {“user”:38} > Ruby > Java > Perl > PHP > Python > D > Scala > ... Client libraries
  • 37. Less Simple Forwarding - At-most-once / At-least-once
 - HA (failover) - Load-balancing
  • 38. All data Near realtime and batch combo! Hot data
  • 39. # logs from a file <source> type tail path /var/log/httpd.log pos_file /tmp/pos_file format apache2 tag web.access </source> ! # logs from client libraries <source> type forward port 24224 </source> ! # store logs to ES and HDFS <match web.*> type copy <store> type elasticsearch logstash_format true </store> <store> type webhdfs host namenode port 50070 path /path/on/hdfs/ </store> </match>
  • 40. CEP for Stream Processing Norikra is a SQL based CEP engine: http://norikra.github.io/
  • 41. Container Logging
  • 42. > Kubernetes ! ! ! ! ! > Google Compute Engine > https://cloud.google.com/logging/docs/install/compute_install Fluentd on Kubernetes / GCE
  • 43. Treasure Data Frontend Job Queue Worker Hadoop Presto Fluentd Applications push metrics to Fluentd
 (via local Fluentd) Datadog for realtime monitoring Treasure Data for historical analysis Fluentd sums up data minutes
 (partial aggregation)
  • 44. hundreds of app servers sends event logs sends event logs sends event logs Rails app td-agent td-agent td-agent Google Spreadsheet Treasure Data MySQL Logs are available after several mins. Daily/Hourly Batch KPI visualizationFeedback rankings Rails app Rails app Unlimited scalability Flexible schema Realtime Less performance impact Cookpad ✓ Over 100 RoR servers (2012/2/4)
  • 45. Slideshare http://engineering.slideshare.net/2014/04/skynet-project-monitor-scale-and-auto-heal-a-system-in-the-cloud/
  • 46. Log Analysis System And its designs in LINE Corp. 2014 early
  • 47. Line BusinessConnect http://developers.linecorp.com/blog/?p=3386
  • 48. Eco-system
  • 49. fluent-bit > Made for Embedded Linux > OpenEmbedded & Yocto Project > Intel Edison, RasPi & Beagle Black boards > https://github.com/fluent/fluent-bit > Standalone application or Library mode > Built-in plugins > input: cpu, kmsg, output: fluentd > First release at the end of Mar 2015
  • 50. fluentd-forwarder > Forwarding agent written in Go > Focusing log forwarding to Fluentd > Work on Windows > Bundle TCP input/output and TD output > No flexible plugin mechanizm > We have a plan to add some input/output > Similar product > fluent-agent-lite, fluent-agent-hydra, ik
  • 51. fluentd-ui > Manage Fluentd instance via Web UI > https://github.com/fluent/fluentd-ui
 
 
 
 
 

  • 52. Bulk loading ! Parallel processing ! Pluggable architecture http://embulk.org/
  • 53. The problems at Treasure Data > Treasure Data Service on the Cloud > Customers want to try Treasure Data, but > SEs write scripts to bulk load their data. Hard work :( > Customers want to migrate their big data, but > Hard work :( > Fluentd solved streaming data collection, but > bulk data loading is another problem.
  • 54. Embulk > Bulk Loader version of Fluentd > Pluggable architecture > JRuby, JVM languages > High performance parallel processing > Share your script as a plugin > https://github.com/embulk
  • 55. The problems of bulk load > Data cleaning (normalization) > How to normalize broken records? > Error handling > How to remove broken records? > Idempotent retrying > How to retry without duplicated loading? > Performance optimization
  • 56. HDFS MySQL Amazon S3 Embulk CSV Files SequenceFile Salesforce.com Elasticsearch Cassandra Hive Redis ✓ Parallel execution ✓ Data validation ✓ Error recovery ✓ Deterministic behaviour ✓ Idempotent retrying Plugins Plugins bulk load http://www.embulk.org/plugins/
  • 57. How to use
  • 58. Setup embulk (e.g. Linux/Mac) $ curl --create-dirs -o ~/.embulk/bin/embulk -L “http://dl.embulk.org/embulk-latest.jar"! ! $ chmod +x ~/.embulk/bin/embulk! ! $ echo 'export PATH="$HOME/.embulk/bin: $PATH"' >> ~/.bashrc! ! $ source ~/.bashrc
  • 59. Try example $ embulk example ./try1! ! $ embulk guess ./example.yml -o config.yml! ! $ embulk preview config.yml! ! $ embulk run config.yml
  • 60. # install $ wget http://dl.embulk.org/embulk-latest.jar -O embulk.jar $ chmod 755 embulk.jar
 ! # guess $ vi example.yml $ ./embulk guess example.yml
 -o config.yml Guess format & schema in: type: file path_prefix: /path/to/sample_ out:
 type: stdout in: type: file path_prefix: /path/to/sample_ decoders: - {type: gzip} parser: charset: UTF-8 newline: CRLF type: csv delimiter: ',' quote: '"' skip_header_lines: 1 columns: - {name: id, type: long} - {name: account, type: long} - {name: time, type: timestamp, format: '%Y-%m-%d %H:%M:%S’} - {name: purchase, type: timestamp, format: ‘%Y%m%d'} - {name: comment, type: string} out:
 type: stdout guess by guess plugins
  • 61. # install $ wget http://dl.embulk.org/embulk-latest.jar -O embulk.jar $ chmod 755 embulk.jar
 ! # guess $ vi example.yml $ ./embulk guess example.yml
 -o config.yml
 ! # preview $ ./embulk preview config.yml $ vi config.yml # if necessary +--------------------------------------+---------------+--------------------+ | time:timestamp | uid:long | word:string | +--------------------------------------+---------------+--------------------+ | 2015-01-27 19:23:49 UTC | 32,864 | embulk | | 2015-01-27 19:01:23 UTC | 14,824 | jruby | | 2015-01-28 02:20:02 UTC | 27,559 | plugin | | 2015-01-29 11:54:36 UTC | 11,270 | fluentd | +--------------------------------------+---------------+--------------------+ Preview & fix config
  • 62. # install $ wget http://dl.embulk.org/embulk-latest.jar -O embulk.jar $ chmod 755 embulk.jar
 ! # guess $ vi example.yml $ ./embulk guess example.yml
 -o config.yml
 ! # preview $ ./embulk preview config.yml $ vi config.yml # if necessary ! # run $ ./embulk run config.yml -o config.yml exec: {} in: type: file path_prefix: /path/to/sample_ decoders: - {type: gzip} parser: charset: UTF-8 newline: CRLF type: csv delimiter: ',' quote: '"' skip_header_lines: 1 columns: - {name: id, type: long} - {name: account, type: long} - {name: time, type: timestamp,
 format: '%Y-%m-%d %H:%M:%S’} - {name: purchase, type: timestamp, format: ‘%Y%m%d'} - {name: comment, type: string} last_path: /path/to/sample_001.csv.gz out:
 type: stdout Deterministic run
  • 63. exec: {} in: type: file path_prefix: /path/to/sample_ decoders: - {type: gzip} parser: charset: UTF-8 newline: CRLF type: csv delimiter: ',' quote: '"' skip_header_lines: 1 columns: - {name: id, type: long} - {name: account, type: long} - {name: time, type: timestamp,
 format: '%Y-%m-%d %H:%M:%S’} - {name: purchase, type: timestamp, format: ‘%Y%m%d'} - {name: comment, type: string} last_path: /path/to/sample_01.csv.gz out:
 type: stdout Repeat # install $ wget http://dl.embulk.org/embulk-latest.jar -O embulk.jar $ chmod 755 embulk.jar
 ! # guess $ vi example.yml $ ./embulk guess example.yml
 -o config.yml
 ! # preview $ ./embulk preview config.yml $ vi config.yml # if necessary ! # run $ ./embulk run config.yml -o config.yml ! # repeat $ ./embulk run config.yml -o config.yml $ ./embulk run config.yml -o config.yml
  • 64. Use-cases
  • 65. Quipper from GDS slide
  • 66. Other cases > Treasure Data > Embulk worker for automatic import > Web services > Restore existing logs to Elasticsearch > Business / Batch systems > Database to Database > etc…
  • 67. Check: treasuredata.com Cloud service for the entire data pipeline