• Save
Scaling Apache Storm - Strata + Hadoop World 2014
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

Scaling Apache Storm - Strata + Hadoop World 2014

on

  • 1,200 views

Scaling Apache Storm: Cluster Sizing and Performance Optimization

Scaling Apache Storm: Cluster Sizing and Performance Optimization

Slides from my presentation at Strata + Hadoop World 2014

Statistics

Views

Total Views
1,200
Views on SlideShare
1,109
Embed Views
91

Actions

Likes
7
Downloads
0
Comments
0

3 Embeds 91

https://twitter.com 89
http://www.slideee.com 1
http://tweetedtimes.com 1

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Scaling Apache Storm - Strata + Hadoop World 2014 Presentation Transcript

  • 1. Scaling Apache Storm P. Taylor Goetz, Hortonworks @ptgoetz
  • 2. About Me Member of Technical Staff / Storm Tech Lead @ Hortonworks Apache Storm PMC Chair @ Apache
  • 3. About Me Member of Technical Staff / Storm Tech Lead @ Hortonworks Apache Storm PMC Chair @ Apache Volunteer Firefighter since 2004
  • 4. 1M+ messages / sec. on a 10-15 node cluster How do you get there?
  • 5. How do you fight fire?
  • 6. Put the wet stuff on the red stuff. Water, and lots of it.
  • 7. When you're dealing with big fire, you need big water.
  • 8. Static Water Sources Lakes Streams Reservoirs, Pools, Ponds
  • 9. Data Hydrant Active source Under pressure
  • 10. How does this relate to Storm?
  • 11. Little’s Law L=λW The long-term average number of customers in a stable system L is equal to the long-term average effective arrival rate, λ, multiplied by the average time a customer spends in the system, W; or expressed algebraically: L = λW. http://en.wikipedia.org/wiki/Little's_law
  • 12. Batch vs. Streaming
  • 13. Batch Processing Operates on data at rest Velocity is a function of performance Poor performance costs you time
  • 14. Stream Processing Data in motion At the mercy of your data source Velocity fluctuates over time Poor performance….
  • 15. Poor performance bursts the pipes. Buffers fill up and eat memory Timeouts / Replays “Sink” systems overwhelmed
  • 16. What can developers do?
  • 17. Keep tuple processing code tight public class MyBolt extends BaseRichBolt { ! public void prepare(Map stormConf, TopologyContext context, OutputCollector collector) { // initialize task } ! public void execute(Tuple input) { // process input — QUICKLY! } ! public void declareOutputFields(OutputFieldsDeclarer declarer) { // declare output } ! } Worry about this!
  • 18. Keep tuple processing code tight public class MyBolt extends BaseRichBolt { ! public void prepare(Map stormConf, TopologyContext context, OutputCollector collector) { // initialize task } ! public void execute(Tuple input) { // process input — QUICKLY! } ! public void declareOutputFields(OutputFieldsDeclarer declarer) { // declare output } ! } Not this.
  • 19. Know your latencies L1 cache reference 0.5 ns Branch mispredict 5 ns L2 cache reference 7 ns 14x L1 cache Mutex lock/unlock 25 ns Main memory reference 100 ns 20x L2 cache, 200x L1 cache Compress 1K bytes with Zippy 3,000 ns Send 1K bytes over 1 Gbps network 10,000 ns 0.01 ms Read 4K randomly from SSD* 150,000 ns 0.15 ms Read 1 MB sequentially from memory 250,000 ns 0.25 ms Round trip within same datacenter 500,000 ns 0.5 ms Read 1 MB sequentially from SSD* 1,000,000 ns 1 ms 4X memory Disk seek 10,000,000 ns 10 ms 20x datacenter roundtrip Read 1 MB sequentially from disk 20,000,000 ns 20 ms 80x memory, 20X SSD Send packet CA-­‐>Netherlands-­‐>CA 150,000,000 ns 150 ms https://gist.github.com/jboner/2841832
  • 20. Use a Cache Guava is your friend.
  • 21. Expose your knobs and gauges. DevOps will appreciate it.
  • 22. Externalize Configuration Hard-coded values require recompilation/repackaging. conf.setNumWorkers(3); builder.setSpout("spout", new RandomSentenceSpout(), 5); builder.setBolt("split", new SplitSentence(), 8).shuffleGrouping("spout"); builder.setBolt("count", new WordCount(), 12).fieldsGrouping("split", new Fields("word")); Values from external config. No repackaging! conf.setNumWorkers(props.get(“num.workers")); builder.setSpout("spout", new RandomSentenceSpout(), props.get(“spout.paralellism”)); builder.setBolt("split", new SplitSentence(), props.get(“split.paralellism”)).shuffleGrouping("spout"); builder.setBolt("count", new WordCount(), props.get(“count.paralellism”)).fieldsGrouping("split", new Fields("word"));
  • 23. What can DevOps do?
  • 24. How big is your hose?
  • 25. Text Find out!
  • 26. Performance testing is essential! Text
  • 27. How to deal with small pipes? (i.e. When your output is more like a garden hose.)
  • 28. Parallelize Slow sinks
  • 29. Parallelism == Manifold Take input from one big pipe and distribute it to many smaller pipes The bigger the size difference, the more parallelism you will need
  • 30. Sizeup Initial assessment
  • 31. Every fire is different.
  • 32. Text
  • 33. Every streaming use case is different.
  • 34. Sizeup — Fire What are my water sources? What GPM can they support? How many lines (hoses) do I need? How much water will I need to flow to put this fire out?
  • 35. Sizeup — Storm What are my input sources? At what rate do they deliver messages? What size are the messages? What's my slowest data sink?
  • 36. There is no magic bullet.
  • 37. But there are good starting points.
  • 38. Numbers Where to start.
  • 39. 1 Worker / Machine / Topology Keep unnecessary network transfer to a minimum
  • 40. 1 Acker / Worker Default in Storm 0.9.x
  • 41. 1 Executor / CPU Core Optimize Thread/CPU usage
  • 42. 1 Executor / CPU Core (for CPU-bound use cases)
  • 43. 1 Executor / CPU Core Multiply by 10x-100x for I/O bound use cases
  • 44. Example 10 Worker Nodes 16 Cores / Machine 10 * 16 = 160 “Parallelism Units” available
  • 45. Example 10 Worker Nodes 16 Cores / Machine 10 * 16 = 160 “Parallelism Units” available ! Subtract # Ackers: 160 - 10 = 150 Units.
  • 46. Example 10 Worker Nodes 16 Cores / Machine (10 * 16) - 10 = 150 “Parallelism Units” available
  • 47. Example 10 Worker Nodes 16 Cores / Machine (10 * 16) - 10 = 150 “Parallelism Units” available (* 10-100 if I/O bound) Distrubte this among tasks in topology. Higher for slow tasks, lower for fast tasks.
  • 48. Example 150 “Parallelism Units” available Emit Calculate Persist 10 40 100
  • 49. Watch Storm’s “capacity” metric This tells you how hard components are working. Adjust parallelism unit distribution accordingly.
  • 50. This is just a starting point. Test, test, test. Measure, measure, measure.
  • 51. Internal Messaging Handling backpressure.
  • 52. Internal Messaging (Intra-worker)
  • 53. Key Settings topology.max.spout.pending Spout/Bolt API: Controls how many tuples are in-flight (not ack’ed) Trident API: Controls how many batches are in flight (not committed)
  • 54. Key Settings topology.max.spout.pending When reached, Storm will temporarily stop emitting data from Spout(s) WARNING: Default is “unset” (i.e. no limit)
  • 55. Key Settings topology.max.spout.pending Spout/Bolt API: Start High (~1,000) Trident API: Start Low (~1-5)
  • 56. Key Settings topology.message.timeout.secs Controls how long a tuple tree (Spout/Bolt API) or batch (Trident API) has to complete processing before Storm considers it timed out and fails it. Default value is 30 seconds.
  • 57. Key Settings topology.message.timeout.secs Q: “Why am I getting tuple/batch failures for no apparent reason?” A: Timeouts due to a bottleneck. Solution: Look at the “Complete Latency” metric. Increase timeout and/or increase component parallelism to address the bottleneck.
  • 58. Turn knobs slowly, one at a time.
  • 59. Don't mess with settings you don't understand.
  • 60. Storm ships with sane defaults Override only as necessary
  • 61. Hardware Considerations
  • 62. Nimbus Generally light load Can collocate Storm UI service m1.xlarge (or equivalent) should suffice Save the big metal for Supervisor/Worker machines…
  • 63. Supervisor/Worker Nodes Where hardware choices have the most impact.
  • 64. CPU Cores More is usually better The more you have the more threads you can support (i.e. parallelism) Storm potentially uses a lot of threads
  • 65. Memory Highly use-case specific How many workers (JVMs) per node? Are you caching and/or holding in-memory state? Tests/metrics are your friends
  • 66. Network Use bonded NICs if necessary Keep nodes “close”
  • 67. Other performance considerations
  • 68. Don’t “Pancake!” Separate concerns.
  • 69. Don’t “Pancake!” Separate concerns. CPU Contention I/O Contention Disk Seeks (ZooKeeper)
  • 70. Keep this guy happy. He has big boots and a shovel.
  • 71. ZooKeeper Considerations Use dedicated machines, preferably bare-metal if an option Start with 3 node ensemble (can tolerate 1 node loss) I/O is ZooKeeper’s main bottleneck Dedicated disk for ZK storage SSDs greatly improve performance
  • 72. Recap Know/track your latencies and code appropriately Externalize configuration Scaling is a factor of balancing the I/O and CPU requirements of your use case Dev + DevOps + Ops coordination and collaboration is essential
  • 73. Thanks! P. Taylor Goetz, Hortonworks @ptgoetz