Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Redis at LINE

513 views

Published on

中村 俊介 (LINE) / Redis at LINE
「LINE Developer Meetup #36 - Redis -」での発表資料です

Published in: Technology
  • Be the first to comment

Redis at LINE

  1. 1. LINE , LINE LINE 1
  2. 2. About me ● : (Shunsuke Nakamura) ● : LINE LINE 1 7 ● : Redis team lead, messaging server tech lead ● ● : LINE HBase scalability ● : ● Storage infrastructure using HBase behind LINE messages 
 https://www.slideshare.net/naverjapan/storage-infrastructure-using-hbase-behind-line- messages ● HBase Redis 100 / LINE 
 https://www.slideshare.net/linecorp/a-5-47983106
  3. 3. Redis Talk for LINE messaging ● ● RedisConf18 @ San Francisco ● Redis at LINE, 25 Billion Messages Per Day
 https://www.slideshare.net/RedisLabs/redisconf18-redis-at- line-25-billion-messages-per-day (open today!) ● Tech Planet 2015 @ Korea ● LINE Redis Cluster 
 https://www.slideshare.net/lovejinstar/redis-at-line-tech- planet-2015 ●
  4. 4. Agenda ● How LINE messaging uses Redis ● Redis details for LINE messaging ● Challenge to Redis3.2 official cluster ● Redis hotspots daily handling
  5. 5. ● 2011 LINE messaging Redis ● messaging in-memory ● Redis 3 x2 (master/slave) client side sharding Redis for LINE messaging (As of 2011.6)
  6. 6. ● Redis (As of 2018.5) ● 60 Redis clusters ● 2 ~ 2000 master+slave nodes/cluster ● 14,000 Redis nodes ● 4 commands / sec ● 2 keys, 60 TB memory used Redis for LINE messaging (2011 → 2018) … …… … x60
  7. 7. ● sequences ● user/group/message sequence ID ● event revision ● caches ● message event TTL time series data ● immutable read heavy data LINE is powered by Redis ● storages ● secondary index (ex. follower list) ● CAS (ex. unread badge count) ● local queue in each API servers ● API server async task ( IO ) ● IO RPC context
  8. 8. 1.App. calls sendMessage thrift API 2.Acquire messageId from Redis sequence 3.Store Message, Event into Redis caches 1.Store into HBase storage 2.Enqueue task to local Redis queue if failed 4.Check receiver info with Redis storages 5.Deliver and notify event to receivers How messaging uses Redis G a t e w a y A P I s e r v e r s … queue … … … … … 1. sendMessage sequence caches storages 2.acquire 
 messageId 3.store Message 4. check
 receiver info 5. notify
  9. 9. Redis server at LINE ● Redis version 2.8 ~ 3.2 ● ● Multiple nodes per host ● 10 Gbps network ● Redis node NIC interrupt CPU affinity 
 ● Redis node ● standalone : master slave 1 1, host role slave ● disk less : No BGSAVE backup, No AOF, No Virtual Memory ● non HA : No Sentinel, No slave read
  10. 10. In-house cluster at LINE ● 3.0 official cluster (2011~2012) ● Proxy-less client side sharding ● ZooKeeper: shard ● Cluster Manager Server: ● LINE Redis Client ● ZooKeeper ● Redis key hash (MurmurHash3) 
 shard ● shard master Redis command
  11. 11. Redis client at LINE ● Jedis (Sync) or Lettuce (Async) Java client ● commands ser/des template ● Redis command client side metrics ● Availability ● Back pressure: ZooKeeper ● Circuit breaker: fail fast ● read Replicated cluster
  12. 12. Replicated clusters ● replica Redis cluster cache read ● client cluster random 
 origin storage fallback 
 (read-through cache)
 ● : LINE official account (followers1 ) 
 account
  13. 13. Recent works in LINE Redis team ● Redis3.2 Official cluster ● Async Redis client ● Redis
  14. 14. Challenge to Redis3.2 official cluster ● : In-house cluster 7 ... ● dynamic resizing ● cluster / 2 cluster migration ● Redis OSS ● REAL cache cluster 3.2 cluster ● in-house cluster client Jedis Cluster client ● REAL service crush test ● 
 => Redis3.2 cluster how to resize in-house cluster
  15. 15. 3 issues of Redis3.2 cluster for LINE ● Gossip traffic 1,000 nodes ● Redis official document 
 “High performance and linear scalability up to 1,000 nodes.”
 https://redis.io/topics/cluster-spec ● => LINE in-house cluster = 1,400 master nodes ● gossip issue: https://github.com/antirez/redis/issues/3929 ● PSYNC1 [#2] ● => Redis ● clustering memory overhead [#3] ● => memory bound cluster
  16. 16. Redis3.2 cluster - Issue #2 PSYNC1 ● H/W decommission ● master decommission (A → B) ● CLUSTER FAILOVER command master/slave 
 => master (B) PSYNC RDB Full sync !! ● PSYNC1 : master instance PSYNC ● https://gist.github.com/antirez/ae068f95c0d084891305 ● https://github.com/antirez/redis/issues/2683 ● workaround ● slots CLUSTER FAILOVER ● ● Redis4 PSYNC2
  17. 17. Redis3.2 cluster 
 - Issue #3 huge memory overhead ● slots → keys mapping 1 ZSET in-memory ● https://github.com/antirez/redis/blob/3.2.11/src/cluster.c#L469 ● CLUSTER GETKEYSINSLOT command ● key ● https://github.com/antirez/redis/issues/3800 ● Redis4 RAX ● https://github.com/antirez/rax (like Radix-tree) ● 40% (11.69GB → 9.42GB) ● 60% overhead
  18. 18. Redis hotspots daily handling Slow command OPS bursting Connection bursting
  19. 19. Slow command ● single thread Redis slow command ● monitoring system 10ms slow command alert ● => Hash / (Z)SET ● O(N) heavy command ● HBase Cassandra ● LINE 1 element bigkeys ;;; ● => SCAN command ● blocking iteration SMEMBERS SSCAN
  20. 20. ops bursting in the old days OPS 2.6 Million /min spike per-min metrics CPU 98%
  21. 21. OPS and connection bursting ● Per-sec Cluster Monitoring and auto bursting detection ● metrics collector: Akka cluster nodes ● store: ElasticSearch ● view: Kibana + Grafana ● ops/connections MONITOR command ● command client server IP ElasticSearch ● command dedup, Lua script, local cache IO collect store/view
  22. 22. Future works ● Redis4 cluster ● Async client connections ● message service latency API server Redis proxy ● API server connection Pool + Sync client Redis node connection ● latency ● scalability bottleneck Redis storage ● redis-server 1 100k+ OPS/s ● 3 ~ 4 ● Redis lock-in
  23. 23. THANK YOU

×
Save this presentationTap To Close