2.
Who am I
> Masahiro Nakagawa
> github: @repeatedly
> Treasure Data, Inc.
> Senior Software Engineer
> Fluentd / td-agent developer
> Living at OSS :)
> D language - Phobos, a.k.a standard library, committer
> Fluentd - Main maintainer
> MessagePack / RPC - D and Python (only RPC)
> The organizer of several meetups (Presto, DTM, etc…)
> etc…
4.
What’s Fluentd?
> Data collector for unified logging layer
> Streaming data transfer based on JSON
> Simple core + plugins written in Ruby
> Gem based various plugins
> http://www.fluentd.org/plugins
> List of users
> http://www.fluentd.org/testimonials
5.
Before
✓ duplicated code for error handling...
✓ messy code for retrying mechanism...
9.
Core Plugins
> Divide & Conquer
> Buffering & Retrying
> Error handling
> Message routing
> Parallelism
> Read / receive data
> Parse data
> Filter data
> Buffer data
> Format data
> Write / send data
10.
Core Plugins
> Divide & Conquer
> Buffering & Retrying
> Error handling
> Message routing
> Parallelism
> Read / receive data
> Parse data
> Filter data
> Buffer data
> Format data
> Write / send data
Common
Concerns
Use Case
Specific
11.
> default second unit
> from data source
Event structure(log message)
✓ Time
> for message routing
> where is from?
✓ Tag
> JSON format
> MessagePack
internally
> schema-free
✓ Record
12.
Reliable streaming data transfer
error retry
error retry retry
retry
Batch
Stream
Other stream
(micro batch)
13.
Nagios
PostgreSQL
Hadoop
Alerting
Amazon S3
Analysis
Archiving
Elasticsearch
Apache
Frontend
Access logs
syslogd
App logs
System logs
Backend
Databases
buffering / retrying / routing
M x N → M + N
plugins
16.
# logs from a file
<source>
type tail
path /var/log/httpd.log
pos_file /tmp/pos_file
format apache2
tag backend.apache
</source>
!
# logs from client libraries
<source>
type forward
port 24224
</source>
!
# store logs to MongoDB
<match backend.*>
type mongo
database fluent
collection test
</match>
17.
Less Simple Forwarding
- At-most-once / At-least-once
- HA (failover)
- Load-balancing
18.
All data
Near realtime and batch combo!
Hot data
19.
# logs from a file
<source>
type tail
path /var/log/httpd.log
pos_file /tmp/pos_file
format apache2
tag web.access
</source>
!
# logs from client libraries
<source>
type forward
port 24224
</source>
!
# store logs to ES and HDFS
<match web.*>
type copy
<store>
type elasticsearch
logstash_format true
</store>
<store>
type webhdfs
host namenode
port 50070
path /path/on/hdfs/
</store>
</match>
20.
CEP for Stream Processing
Norikra is a SQL based CEP engine: http://norikra.github.io/
28.
Input plugins
File tail (in_tail)
Syslog (in_syslog)
HTTP (in_http)
HTTP/2 (in_http2 WIP)
...
✓ Receive logs
✓ Or pull logs from data sources
✓ non-blocking
InpuInput
29.
Parser plugins
JSON
Regexp
Apache/Nginx/Syslog
CSV/TSV
etc.
✓ Parse into JSON
✓ Common formats out of the box
✓ Some inputs plugin depends on
Parser plugin
✓ v0.10.46 and above
ParseParser
30.
Filter plugins
grep
record_transformer
suppress
…
✓ Filter / Mutate record
✓ Record level and Stream level
✓ v0.12 and above
ParseParserFilter
31.
Buffer plugins
✓ Improve performance
✓ Provide reliability
✓ Provide thread-safety
Memory (buf_memory)
File (buf_file)
BuffeBuffer
32.
Buffer internal
✓ Chunk = adjustable unit of data
✓ Buffer = Queue of chunks
chunk
chunk
chunk output
Input
33.
Formatter plugins
✓ Format output
✓ Some plugins depends on
Formatter plugins
✓ v0.10.46 and above
JSON
CSV/TSV
“single value”
msgpack
FormattFormatter
37.
fluent-bit
> Made for Embedded Linux
> OpenEmbedded & Yocto Project
> Intel Edison, RasPi & Beagle Black boards
> https://github.com/fluent/fluent-bit
> Standalone application or Library mode
> Built-in plugins
> input: cpu, kmsg, output: fluentd
> First release at the end of Mar 2015
38.
fluentd-ui
> Manage Fluentd instance via Web UI
> https://github.com/fluent/fluentd-ui
39.
Treasure Agent (td-agent)
> Treasure Data distribution of Fluentd
> including Ruby and QA’ed plugins
> Treasure Agent 2 is current stable
> We recommend to use v2, not v1
> including fluentd-ui
> Next release, 2.2.0, uses fluentd v0.12
40.
Embulk
> Bulk Loader version of Fluentd
> Pluggable architecture
> JRuby, JVM languages
> High performance parallel processing
> Share your script as a plugin
> https://github.com/embulk
http://www.slideshare.net/frsyuki/embuk-making-data-integration-works-relaxed
Be the first to comment