Presto at Twitter

Presto at Twitter
From Alpha to Production
Bill Graham - @billgraham
Sailesh Mittal - @saileshmittal
March 22, 2016
Facebo...

● Scheduled jobs: Pig
● Ad-hoc jobs: Pig
Previously at Twitter

● Pig out
● Scalding in
Then

● Scheduled jobs: Scalding
● Ad-hoc queries for engineers: Scalding REPL
● Ad-hoc queries for non-engineers: ?
● Low-laten...

● Scheduled jobs: Scalding
● Ad-hoc queries: Presto
● Low-latency queries: Presto
Now

● Qualitative comparison early 2015
● Considered: Presto, SparkSQL, Impala, Drill, and Hive-on-Tez
● Selected Presto
○ Mat...

● Cloudera
● HortonWorks
● Yahoo
● MapR
● Rocana
● Stripe
● Playtika
Evaluation
● Facebook
● Dropbox
● Neilson
● TellApart...

● Deployment
● Integration
● Monitoring/Alerting
● Log Collection
● Authorization
● Stability
Alpha to Beta to Production

● 192 bare-metal workers
● 76GB RAM
● 24 cores
● 2 x 1 GbE NIC
Cluster

● Publish to internal maven repo
● Python + pssh
● Brittle
Deployment

● Building a dedicated mesos cluster
○ 200 nodes
○ 128GB ram
○ 56 cores
○ 10 GbE
● One worker per container per host
● Con...

Hive
Metastore
HDFS
Presto
MySQL
DAL
Data Pipeline,
Scalding,
ETL
Event Queue
Integration

● Internal system called viz
● Plugin on each node
● curl JMX stats and send
● Load spiky by nature, alerts hard
Monitorin...

● Internal system called loglens
● Java LogHandler adapters
● Airlift integration challenges
● Using Python log tailing ad...

● User-level auth required
● UGI.proxyUser when accessing HDFS (PR #4382 and Teradata PR #105)
● Manage access via LDAP gr...

● Hadoop client memory leaks (user x query x FileSystems)
● GC Pressure on coordinator
● Implemented FileSystem cache (use...

java.lang.OutOfMemoryError: unable to create new native thread
● Queries failing on the coordinator
● Coordinator is threa...

Encountered too many errors talking to a worker node
● Outbound network spikes hitting caps (300 Mb/s)
● Coordinator sendi...

Encountered too many errors talking to a worker node
● Timeouts still being hit
● Correlated GC pauses with errors
● Tuned...

G1 Garbage Collector
Stability #3

No worker nodes available
● Happens sporadically
● Network, HTTP responses, GC all look good
● Problem: workers saturating...

● Distributed log collection
● Metrics tracking
● Measure and Tune JVM pauses
● G1 Garbage Collector
● Measure network/NIC...

● MySQL connector with per-user auth
● Support for LZO/Thrift
● Improvements for Parquet nested data structures
Future Work