Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

SpringOne Platform 2019

223 views

Published on

SpringOne Platform 2019

SB Payment Service (part of the SoftBank

Published in: Engineering
  • Be the first to comment

  • Be the first to like this

SpringOne Platform 2019

  1. 1. 
  2. 2. We issue credit cards "Softbank Card" to consumers. Credit Card Issuer Payment Aggregator We provide a comprehensive payment platform that offers various online payment solutions. Credit Card Acquirer We are the only payment aggregator in Japan who accepts and processes transactions made with major brands (VISA/ mastercard/ UnionPay). Softbank customers can also pay for online purchases via their phone bill as Japan’s leading carrier company. Carrier Billing Provider
  3. 3. We issue credit cards "Softbank Card" to consumer. Credit Card Issuer Payment Aggregator We provide a comprehensive payment platform that offers various online payment solutions. Credit Card Acquirer We are an only payment aggregator in Japan that accepts and processes transactions made with a major brands(VISA/ mastercard/ UnionPay) as an acquirer. We provide consumers to pay for online purchases with their phone bill as Japanese leading carrier company. Carrier Billing Provider
  4. 4. If you have any questions, we’d be happy to answer them at the end of our presentation
  5. 5. ● ● ●
  6. 6. We left all development of our services to outside vendors. There were zero in-house engineers writing code. The development environment was not ready.
  7. 7. Problem Team
  8. 8. Problem Developed Support Tools Introduced ToolsTeam
  9. 9. Problem Team Developed Support Tools Introduced Tools 3 Engineers joined! Accelerated KAIZEN tasks by a team
  10. 10. ① Problem Team
  11. 11. ① Problem Created dashboards Introduced Tools Elasticsearch Logstash Kibana Team
  12. 12. ② Problem Team
  13. 13. ② Problem Re-architected with Spring Introduced Tools     Team
  14. 14. ② Problem Re-architected with Spring Introduced Tools     Team One more engineer joined!
  15. 15. ● ● ●
  16. 16. Merchants Financial Institutions E-Commerce Gaming Education RealEstate Etc E-Books/Movies Unified Payment Service Tickets Provides several payment methods as API to EC sites Credit Card Mobile Carrier Convenience Store Prepaid Card Account Transfer Point Account Integration 当社Our Company API Development target Online Payment Service
  17. 17. Merchants Financial Institutions E-Commerce Gaming Education RealEstate Etc E-Books/Movies Unified Payment Service Tickets Provides several payment methods as API to EC sites Credit Card Mobile Carrier Convenience Store Prepaid Card Account Transfer Point Account Integration 当社Our Company API Development target Online Payment Service Number of adoption 111,742 stores (As of May 2019) Transaction Amount $ 28 billion (As of 2018)
  18. 18. Merchants Financial Institutions E-Commerce Game Education RealEstate Etc E-Book/Movie Unified Payment Service Ticket Provides several payment methods as API to EC sites Credit Card Mobile Carrier Convenience Store Prepaid Card Account Transfer Point Account Integration 当社Our Company API Development target Online Payment Service 40+ payment methods supported
  19. 19. Merchants Financial Institutions E-Commerce Gaming Education RealEstate Etc E-Books/Movies Unified Payment Service Tickets Provides several payment methods as API to EC sites Credit Card Mobile Carrier Convenience Store Prepaid Card Account Transfer Point Account Integration 当社Our Company API Development target Online Payment Service Located between merchant systems and financial institution systems
  20. 20. Requirements for the new system ● ● ● Before… Every project was lead by external vendors (A long path from estimation / contract / requirement definition to acceptance)
  21. 21. Requirements for the new system ● ● ● 今までは… 案件毎に開発ベンダさんのチカラを借りて構築 (見積もり/要件定義から検収まで長い道のり) Outsourcing made it impossible to deliver incrementally and quickly in the agile way.
  22. 22. Requirements for the new system ● ● ● Speedy delivery and Continuous improvement through in-house development
  23. 23. Team In-house dev PJ One more engineer joined! 6-person team
  24. 24. Introduced Tools Cloud Native Platform based on Pivotal Platform
  25. 25. ・ ・
  26. 26. ?? ・ ・ ・
  27. 27. ?? What we needed ・ ・
  28. 28. ● ● ●
  29. 29. ● ● ● Buildpack converts source code to a container image. Developers don’t need to write a Dockerfile. cf push + Buildpack reduces extra works!
  30. 30. Team structure and responsibility boundary
  31. 31. Team structure and responsibility boundaryTeam structure and responsibility boundary Networking Storage Servers Virtualization O/S Middleware Runtime Platform Operators 2 people
  32. 32. Team structure and responsibility boundary Networking Storage Servers Virtualization O/S Middleware Runtime Data Application Application Developers 4 people Platform Operators 2 people
  33. 33. Networking Storage Servers Virtualization O/S Middleware Runtime Data Application Focus on building and operating platforms Application Developers 4 people Platform Operators 2 people Team structure and responsibility boundary
  34. 34. Networking Storage Servers Virtualization O/S Middleware Runtime Data Application Focus on design and implementation of business code Team structure and responsibility boundary Application Developers 4 people Platform Operators 2 people
  35. 35. Networking Storage Servers Virtualization O/S Middleware Runtime Data Application 12 Factor App The only contract is “12 Factor App”. No vendor lock in. Application Developers 4 people Team structure and responsibility boundary Platform Operators 2 people
  36. 36. ➢ ➢ ➢ ➢
  37. 37. syslog+TLS Logstash Elasticsearch Kibana cf pushConcourse PrometheusGrafana git push cf create-service cf bind-service
  38. 38. ➢ ➢ ➢ ➢
  39. 39. ( ➡ ) API Gateway Service A Service B Service C Merchant X Merchant Y Merchant Z Financial Institution A Financial Institution B Financial Institution C
  40. 40. ( ➡ ) API Gateway Service A Service B Service C Merchant X Merchant Y Merchant Z Financial Institution A Financial Institution B Financial Institution C 、 、 、 、
  41. 41. ( ➡ ) API Gateway Service A Service B Service C Financial Institution A Financial Institution B Financial Institution C Merchant X Merchant Y Merchant Z
  42. 42. ( ➡ ) API Gateway Service A Service B Service C Financial Institution A Financial Institution B Financial Institution C Merchant X Merchant Y Merchant Z 、 、 、
  43. 43. ( ➡ ) API Gateway Service A Service B Service C Merchant X Financial Institution A Financial Institution B Financial Institution C Merchant X Merchant Y Merchant Z Each app is deployed on PAS as a microservice
  44. 44. ( ➡ ) API Gateway Service A Service B Service C Financial Institution A Financial Institution B Financial Institution C Merchant X Merchant Y Merchant Z Each app is implemented with Java and Spring Boot
  45. 45. ( ➡ ) API Gateway Service A Service B Service C Financial Institution A Financial Institution B Financial Institution C Merchant X Merchant Y Merchant Z
  46. 46. API Gateway Service A Service B Service C Financial Institution A Financial Institution B Financial Institution C ( ➡ ) Merchant X Merchant Y Merchant Z
  47. 47. API Gateway Service A Service B Service C Merchant X Merchant Y Merchant Z Financial Institution A Financial Institution B Financial Institution C The systems of merchants system and financial institutions are out of our control ( ➡ )
  48. 48. Hystrix API Gateway Service A Service B Service C Merchant A Merchant Y Merchant Z Financial Institution A Financial Institution B Financial Institution C Introduced Hystrix Circuit Breaker for inter-system communications Hystrix Hystrix Hystrix ( ➡ )
  49. 49. API Gateway Service A Service B Service C Merchant X Merchant Y Merchant Z Financial Institution A Financial Institution B Financial Institution C Without Circuit Breaker, If a system outage happens in financial institution A … ( ➡ )
  50. 50. API Gateway Service A Service B Service C Merchant X Merchant Y Merchant Z Financial Institution A Financial Institution B Financial Institution C Slow Response, Timeout ( ➡ )
  51. 51. ( ➡ ) API Gateway Service A Service B Service C Merchant X Merchant Y Merchant Z Financial Institution A Financial Institution B Financial Institution C The failure is propagated to Service A, blocking processes and causing possible thread exhaustion
  52. 52. API Gateway Service A Service B Service C Merchant X Merchant Y Merchant Z Financial Institution A Financial Institution B Financial Institution C Failure propagated to API Gateway causing blocked processes, thread depletion ( ➡ )
  53. 53. API Gateway Service A Service B Service C Merchant X Merchant Y Merchant Z Financial Institution A Financial Institution B Financial Institution C ( ➡ ) Failure propagated to API Gateway causing blocked processes, thread depletion
  54. 54. API Gateway Service A Service B Service C Merchant X Merchant Y Merchant Z Financial Institution A Financial Institution B Financial Institution C B and C are affected by the failure of financial institution A. ( ➡ )
  55. 55. API Gateway Service A Service B Service C Merchant X Merchant Y Merchant Z Financial Institution A Financial Institution B Financial Institution C Hystrix Hystrix Hystrix Hystrix ( ➡ ) With Circuit Breaker If a system outage happens in financial institution A …
  56. 56. API Gateway Service A Service B Service C Merchant X Merchant Y Merchant Z Financial Institution A Financial Institution B Financial Institution C Hystrix Hystrix Hystrix Circuit Breaker prevents the failure propagation. No worry about the effect to other financial institutions. ( ➡ )
  57. 57. API Gateway Service A Service B Service C Merchant X Merchant Y Merchant Z Financial Institution A Financial Institution B Financial Institution C Hystrix Hystrix Hystrix Circuit Breaker adds fault tolerance and resiliency to the app ( ➡ )
  58. 58. Notification Gateway Receiver A Receiver B Receiver C Merchant X Merchant Y Merchant Z Financial Institution A Financial Institution B Financial Institution C Hystrix Hystrix Hystrix Hystrix ( ➡ )
  59. 59. ( ➡ ) Notification Gateway Receiver A Receiver B Receiver C Merchant X Merchant Y Merchant Z Financial Institution A Financial Institution B Financial Institution C Hystrix Hystrix Hystrix Hystrix
  60. 60. ( ➡ ) Notification Gateway Receiver A Receiver B Receiver C Merchant X Merchant Y Merchant Z Financial Institution A Financial Institution B Financial Institution C Hystrix Hystrix Hystrix Hystrix Introduced RabbitMQ + Spring Cloud Stream for async processing
  61. 61. ( ➡ ) Notification Gateway Receiver A Receiver B Receiver C Merchant X Merchant Y Merchant Z Financial Institution A Financial Institution B Financial Institution C Hystrix Hystrix Hystrix Hystrix
  62. 62. ( ➡ ) Notification Gateway Receiver A Receiver B Receiver C Merchant X Merchant Y Merchant Z Financial Institution A Financial Institution B Financial Institution C Hystrix Hystrix Hystrix Hystrix When a failure happens in the merchant system
  63. 63. ( ➡ ) Notification Gateway Receiver A Receiver B Receiver C Merchant X Merchant Y Merchant Z Financial Institution A Financial Institution B Financial Institution C Hystrix Hystrix Hystrix Hystrix The message will be diverted to a “Dead Letter Queue” and requeued later
  64. 64. ( ➡ ) Notification Gateway Receiver A Receiver B Receiver C Merchant X Merchant Y Merchant Z Financial Institution A Financial Institution B Financial Institution C Hystrix Hystrix Hystrix Even if an outage happens in the merchant, the Circuit Breaker will prevent the failure propagation. Hystrix
  65. 65.
  66. 66.
  67. 67. ➢ ➢ ➢ ➢
  68. 68. https://concourse-ci.org/ Concourse
  69. 69. https://concourse-ci.org/ Concourse
  70. 70. Staging Develop Production Push to a branch triggers the pipeline
  71. 71. Staging Develop Production
  72. 72. Unit test for each Java version to detect issues on new versions earlier Staging Develop Production
  73. 73. Staging Develop Production
  74. 74. Staging Develop Production
  75. 75. Develop Cycle by CI Staging Develop Production
  76. 76. Staging Develop Production
  77. 77. Merging to master branch triggers deploy to Nexus repository Staging Develop Production
  78. 78. Production One-click Deploy to production from Nexus Repository using nexus concourse resource
  79. 79. API Gateway Service A Service B Service C Financial Institution B Mock Financial Institution B Mock Financial Institution B Mock
  80. 80. ● ●
  81. 81. Problem HTTP transmission processing is low throughput Unexpected short of Circuit Breaker
  82. 82. Report html cf push
  83. 83. Performance testing should be repeated over and over during development to ensure continuous improvement.
  84. 84. ➢ ○ ○ ○
  85. 85. ※Metrics, tracing, and logging  https://peter.bourgon.org/blog/2017/02/21/metrics-tracing-and-logging.html
  86. 86. ➢ ○ ○ ○
  87. 87. ・ ・ ・ ・
  88. 88.   Preset Dashboard Preset Alerts
  89. 89. rabbitmq ⇒ { "targets":[ "10.0.1.1", "10.0.2.1", "10.0.3.1", "10.0.1.2", "10.0.2.2", ], "labels":{ "__meta_bosh_deployment":"rabbitmq", "__meta_bosh_job_process_name":"rabbitmq-server" } } Targets are added automatically
  90. 90. Org A ⇒ Promregator SpaceA CF Cloud Controller Devs don’t care Prometheus https://github.com/promregator/promregator Considering migration to Metrics Registrar
  91. 91. Micrometer <dependency> <groupId>io.micrometer</groupId> <artifactId>micrometer-registry-prometheus</artifactId> </dependency> management: endpoints: web: exposure: include: "*" base-path: /actuator
  92. 92. Also shows past changes Visualize if each deployment is healthy at this moment Visualize if each VM is healthy at this moment Warning Ops Warning
  93. 93. Ops ・ ・
  94. 94. Dev We can always check all metrics in Grafana that we have ever collected from logs ・CPU ・JVM Memory(per area) ・Thread ・GC(frequency、time) ・Classloader
  95. 95. Dev Can always check metrics in Grafana we have ever collected from logs ・CPU ・JVM Memory(per area) ・Thread ・GC(frequency、time) ・Classloader
  96. 96. Action is triggered by alerting
  97. 97. Ops Dev No outsourcing to external monitoring center org_name: OrgA severity: fatal org_name: OrgA severity: /.*/ severity: fatal severity: /.*/ Set route/receiver using org_name, severity as keys Twilio call for emergency (24 hour support)
  98. 98. Link to the Grafana dashboard Detect RabbitMQ Dead Letter QueueCall
  99. 99. ・ ・ ・ ・ ・ ・ ・ ・ ・ Detect the decrease of the service level before user’s report
  100. 100. ➢ ○ ○ ○
  101. 101.  ・  ・
  102. 102. Ops Dev Biz ElastAlert Praeco Firehose to syslog
  103. 103. Firehose-to -syslog Firehose-to -syslog Firehose-to -syslog Loggregator Agent Doppler STD OUT STD OUT Considering migration to Syslog Drain
  104. 104. --- releases: - name: syslog version: "11.4.0" url: "https://bosh.io/xxx?v=11.4.0" sha1: "xxx" addons: - name: syslog exclude: jobs: - name: syslog_forwarder release: syslog jobs: - name: syslog_forwarder release: syslog properties: syslog: address: logstash.xxx.jp port: 5514 transport: tcp tls_enabled: true permitted_peer: "*.xxx.jp" ca_cert: | -----BEGIN CERTIFICATE----- ... -----END CERTIFICATE-----
  105. 105. Ops Dev Biz
  106. 106. Biz
  107. 107. Dev System Error In a merchant High latency in a financial institution
  108. 108. Ops
  109. 109. Ops Dev cf_org_name: OrgA log_level: ERROR cf_org_name: OrgA log_level: WARN @message: xxx @message: xxx Set notification destination using cf_org_name, log_level as keys Twilio call for emergency (24 hour support) ElastAlert
  110. 110. ➢ ○ ○ ○
  111. 111. Service A Merchant X Financial Institution A API Gateway Receiver A Notification Gateway Visualize
  112. 112. You can see a bottleneck at a glance across multiple services.
  113. 113. Zipkin, Spring Cloud Sleuth <dependency> <groupId>org.springframework.cloud</groupId> <artifactId>spring-cloud-starter-zipkin</artifactId> </dependency> spring: zipkin: base-url: https://my-zipkin.example.com service.name: my-application sender.type: web
  114. 114. Ops Dev Biz Share Elasticsearch with LoggingConfigure instrumentations
  115. 115. # of transactions Duration (Avg) Duratin (Percentile)
  116. 116. On-premise Existing app Existing system team Apps on PCF Dev Something happens in the existing app? Let me check
  117. 117. ➢ ➢
  118. 118. Case1: Effect on Business members Specialist of credit card system (NOT developer) Biz # of transactions Duration (Avg) Duratin (Percentile)
  119. 119. Case1: Effect on Business members Specialist of credit card system (NOT developer) Something wrong... It should be shorter... Biz
  120. 120. Specialist of Credit Card System (Not developer) Jump to Zipkin Case1: Effect on Business members This process affects? Biz
  121. 121. Dev Case1: Effect on Business members Biz member can detect the problem. I’ll check now. This transaction seems to be something wrong.Biz
  122. 122. Case2: Detect of abnormal trend
  123. 123. Case2: Detect of abnormal trend With long time period, delayed on specific time slot. 22:00 22:00 22:00 No problem
  124. 124. Case2: Detect of abnormal trend No transaction finished. Transaction might be locked.
  125. 125. On-premise Existing app Apps on PCF Case2: Detect of abnormal trend Delays on a specific time slot Re-considered the number of parallels and timeout.
  126. 126. Case2: Detect of abnormal trend Our system was improved by detecting slight abnormality
  127. 127. Improves apps’ operation efficiency dramatically
  128. 128. ● ● ● ●
  129. 129. Before After Release Improvement Release Work Manual work One click Release Quality Human error occurs No mistakes Release Time 45 min 5 min Use of Cloud Scaleout operation Manual work One click Container Orchestration - Leave it to the platform Auto-restart Self-made tools Leave it to the platform
  130. 130. ● ● ●
  131. 131. A platform cannot be built by only relying on outsourced vendors. It’s possible to build and operate a platform by taking ownership in-house. A powerful platform allows a small engineering team to focus on application development.

×
Save this presentation