• Like
Presto as a Service - Tips for operation and monitoring
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

Presto as a Service - Tips for operation and monitoring

  • 97 views
Published

 

Published in Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
97
On Slideshare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
0
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Copyright ©2015 Treasure Data. All Rights Reserved. Presto as a Service Tips for operation and monitoring Taro L. Saito Treasure Data, Inc. leo@treasure-data.com January 20, 2015 Presto Meetup @ 六本木 フリークアウト
  • 2. Copyright ©2015 Treasure Data. All Rights Reserved. About Me: @taroleo •  2007 University of Toyo. Ph.D. –  XML DBMS, Transaction Processing •  Relational-Style XML Query. ACM SIGMOD 2008 •  ~ 2014 Assistant Professor at University of Tokyo –  Genome Science Research •  Distributed Computing •  2014.3月~ Treasure Data –  Software Engineer, MPP Team Leader 2
  • 3. Copyright ©2015 Treasure Data. All Rights Reserved. My Open Source Projects •  sqlite-jdbc –  SQLite DBMS for Java –  1file =1DB •  snappy-java –  Fast compression library –  More than 100,000 downloads/month •  Used in Spark, Parquet, etc. •  msgpack-java •  UT Genome Browser (UTGB) –  Visualization of massive amount of genome science data 3
  • 4. Copyright ©2015 Treasure Data. All Rights Reserved. Topics •  Presto as a Service in Treasure Data –  Error Recovery –  Presto Deployment •  Tips for Monitoring Presto –  JSON API –  Presto + Fluentd 4
  • 5. Copyright ©2015 Treasure Data. All Rights Reserved. Treasure Data: Presto as a Service 5 Presto Public Release
  • 6. Hive TD API / Web ConsoleInteractive query batch query Presto Treasure Data PlazmaDB: MessagePack Columnar Storage td-presto connector
  • 7. Copyright ©2015 Treasure Data. All Rights Reserved. Deployment •  Building Presto takes more than 20 minutes. •  Facebook frequently releases new versions •  Let CircleCI build Presto –  Deploy jar files to private Maven repository –  We sometime use non-release versions •  for fixing serious bugs •  hot-fix patches •  Integration Test –  td-presto connector •  PlazmaDB, Multi-tenant query scheduler •  Query optimizer –  Run test queries on staging cluster 7
  • 8. Copyright ©2015 Treasure Data. All Rights Reserved. Production: Blue-Green Deployment •  http://martinfowler.com/bliki/BlueGreenDeployment.html •  2 Presto Coordinators (Blue/Green) –  Route Presto queries to the active cluster –  No down-time upon deployment •  Launch Presto worker instances with chef <- less than 5 min. in AWS •  Inactive clusters is used for pre-production testing and customer support –  Investigation and tuning of customer query performance –  Trouble shooting 8
  • 9. Copyright ©2015 Treasure Data. All Rights Reserved. Error Recovery •  Presto has no fault tolerance •  Error types –  User error •  Syntax errors –  SQL syntax, missing function •  Semantic errors –  missing tables/columns –  Insufficient resource •  Exceeded task memory size –  Internal failure •  I/O error –  S3/Riak CS •  worker failure •  etc. 9 Worth A Retry!
  • 10. Copyright ©2015 Treasure Data. All Rights Reserved. Failed Query Rate 10
  • 11. Copyright ©2015 Treasure Data. All Rights Reserved. 11
  • 12. Copyright ©2015 Treasure Data. All Rights Reserved. Query Retry Patterns used in TD •  Error code + message pattern 12
  • 13. Copyright ©2015 Treasure Data. All Rights Reserved. Monitoring Presto •  REST API for monitoring Presto state –  JSON format •  (presto server IP):8080/v1/query –  List of recent queries (BasicQueryInfo class) •  (presto server IP):8080/v1/query/(query id) –  Detailed query state information –  Query plan, tasks and running worker IDs –  Processed rows/data size 13
  • 14. Copyright ©2015 Treasure Data. All Rights Reserved. Query List /v1/query 14
  • 15. Copyright ©2015 Treasure Data. All Rights Reserved. Detailed query Info /v1/query/(query id) 15
  • 16. Copyright ©2015 Treasure Data. All Rights Reserved. /ui/query-execution/(query id) 16
  • 17. Copyright ©2015 Treasure Data. All Rights Reserved. Complex Queries 17
  • 18. Copyright ©2015 Treasure Data. All Rights Reserved. 18
  • 19. Copyright ©2015 Treasure Data. All Rights Reserved. Presto Coordinator •  Organizes query execution pipelines –  Coordinates presto workers •  Retrieves table partition and split location from connectors –  Creates distributed query plans •  Full GC –  Stalls coordinator •  When memory is insufficient –  Use memory-rich machine –  GC Tuning •  CMSInitiatingOccupancyFraction 19
  • 20. Copyright ©2015 Treasure Data. All Rights Reserved. Monitoring Presto with Fluentd 20 Hive Presto
  • 21. Copyright ©2015 Treasure Data. All Rights Reserved. presto-metrics (Ruby) •  https://github.com/xerial/presto-metrics 21
  • 22. Copyright ©2015 Treasure Data. All Rights Reserved. 22
  • 23. Copyright ©2015 Treasure Data. All Rights Reserved. 23
  • 24. Copyright ©2015 Treasure Data. All Rights Reserved. Detecting Anomaly •  Started Query Rate (in 5min/15min) –  If no query has started, cluster may be down (or not started properly) •  Processed rows in a query –  Sum up the number of the processed rows from all of the sub stages –  Simple, but the most reliable measure •  Send an alert –  HipChat notification –  PagerDuty call •  JP/US team rotation 24
  • 25. Copyright ©2015 Treasure Data. All Rights Reserved. Benchmarking •  Query performance comparison –  between two versions of Presto •  Benchmark –  Run query set multiple times –  Store the results to TD –  Report the result with Presto •  Aggregation query 25
  • 26. Copyright ©2015 Treasure Data. All Rights Reserved. Presto Operation Tool •  Prestop –  Our internal tool for managing multiple presto clusters •  written in Scala –  Query monitoring –  Benchmarking –  Workload simulation •  stress testing •  Monitoring –  Librato –  Datadog –  ChartIO (query stats) 26
  • 27. Copyright ©2015 Treasure Data. All Rights Reserved. WE ARE HIRING! 27 Check: www.treasuredata.com