インフラ野郎Azureチーム Night

951 views

Published on

2016/12/26 NHNテコラスにて

Published in: Technology
0 Comments
10 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
951
On SlideShare
0
From Embeds
0
Number of Embeds
35
Actions
Shares
0
Downloads
12
Comments
0
Likes
10
Embeds 0
No embeds

No notes for slide

インフラ野郎Azureチーム Night

  1. 1. #infrayarou
  2. 2. { “名前” : “真壁 徹(まかべ とおる)”, “所属” : “日本マイクロソフト株式会社”, “役割” : “クラウド ソリューションアーキテクト”, “経歴” : “大和総研  HP Enterprise”, “特技” : “クラウド & オープンソース” }
  3. 3. https://docs.microsoft.com/ja-jp/azure/
  4. 4. [Microsoft Global Datacenters and Network Infrastructure] https://www.youtube.com/watch?v=bqZrejosqWU
  5. 5. 32 Regions Worldwide, 24 Generally Available… Central US Iowa West US California East US 2 Virginia US Gov Virginia North Central US Illinois US Gov Iowa South Central US Texas Brazil South Sao Paulo State West Europe Netherlands China North* Beijing China South* Shanghai Japan East Tokyo, Saitama Japan West Osaka India South Chennai East Asia Hong Kong SE Asia Singapore Australia South East Victoria Australia East New South Wales India Central Pune Canada East Quebec City Canada Central Toronto India West Mumbai Germany North East Magdeburg Germany Central Frankfurt United Kingdom Regions (2) North Europe Ireland US DoD West TBA US DoD East TBA East US Virginia Korea Regions (2) *Operated by 21Vianet Announced/not operational Operational 公表32リージョン/稼働済み24 その時点の配置 (*) (*)現在は公表38/稼働済み30
  6. 6. $azure location list --details info: Executing command location list + Getting ARM registered providers info: Getting locations... data: data: Location : eastasia data: DisplayName : East Asia data: data: […]
  7. 7. https://blogs.technet.microsoft.com/hybridcloud/2016/05/26/microsoft- and-facebook-to-build-subsea-cable-across-atlantic/ https://azure.microsoft.com/ja-jp/blog/microsoft-invests-in-subsea- cables-to-connect-datacenters-globally/
  8. 8. Colocation Density 2.0+ Power Usage Effectiveness (PUE) 1.4 – 1.6 PUE Discrete servers Capacity 20 year technology Rack Density & deployment Minimized resource impact Generation 1 Generation 2 Containment Modular Hyper-scale 1.2 – 1.5 PUE 1.12 – 1.20 PUE 1.07 – 1.19 PUE Containers, PODs Scalability & sustainability Air & water Economization Differentiated SLAs Deployment Areas & ITPACs No more traditional IT Right-sized Faster time-to-market Outside air cooled Fully integrated Resilient software Common infrastructure Operational simplicity Flexible & scalable Generation 3 Generation 4 Generation 5
  9. 9. S. Sankar, K. Vaid, M. Shaw “Impact of Temperature on Hard Disk Drive Reliability in Large Datacenters” Microsoft, IEEE, 2011 Inlet Temperature and Impact on Hard Disk Failure Rates HDD Case Temp Relative AFR HDD Case Temp Relative AFR 10 C 50 F 11 C 100% 30 C 100% 15 C 59 F 16 C 100% 34 C 100% 20 C 68 F 21 C 100% 38 C 100% 25 C 77 F 26 C 100% 41 C 106% 30 C 86 F 31 C 100% 45 C 131% 35 C 95 F 36 C 100% 49 C 153% 40 C 104 F 41 C 106% 53 C 189% 45 C 113 F 46 C 138% 56 C 231% 50 C 122 F 51 C 179% 60 C 281% HDD's in Front, ΔT 1˚C Buried HDDs Design, ΔT 20˚C cold de-rated to ΔT 10˚C hotInlet Temp “Azure Network and Datacenter Infrastructure: Enterprise Quality at Cloud Scale” Microsoft Ignite 2015
  10. 10. http://natick.research.microsoft.com/
  11. 11. • 2014 年にマイクロソフトとして カーボン ニュートラルを達成済み ( https://blogs.microsoft.com/green/category/renewable-energy/ ) https://news.microsoft.com/2016/11/14/microsoft- announces-largest-wind-energy-purchase-to-date
  12. 12. Geo Region Region DCs/Zones DCs/Zones
  13. 13. 汎用・柔軟 効率・性能
  14. 14. ( https://docs.microsoft.com/ja-jp/azure/virtual-machines/virtual-machines-linux-sizes )
  15. 15. October 15, 2016
  16. 16. October 15, 2016
  17. 17. • Hyper-V VMSwitch拡張 • AzureでSDNを実現するためのコア機能 • Address Virtualization for VNET • VIP -> DIP Translation for SLB • ACLs, Metering, and Security Guards • プログラマブル ルール/フローテーブルでパ ケット毎のアクション定義 • Windows Server 2016で利用可 NIC vNIC VM Switch VFP VM vNIC VM ACLs, Metering, Security VNET SLB (NAT) Microsoft's Production Configurable Cloud” Mark Russinovich, Chief Technology Officer, Microsoft Azure, SCS Distinguished Lecture, 11/15/2016
  18. 18. Host: 10.4.1.5 • VMSwitchがMatch-Action-Table型の APIをコントローラーへ提供 • コントローラーがポリシーを定義 • ポリシー毎のテーブル • パケット毎にどう処理すべきかを厳密 に定義 Tenant Description VNet Description VNet Routing Policy ACLsNAT Endpoints VFP VM1 10.1.1.2 NIC Flow ActionFlow ActionFlow Action TO: 10.2/16 Encap to GW TO: 10.1.1.5 Encap to 10.5.1.7 TO: !10/8 NAT out of VNET Flow ActionFlow Action TO: 79.3.1.2 DNAT to 10.1.1.2 TO: !10/8 SNAT to 79.3.1.2 Flow Action TO: 10.1.1/24 Allow 10.4/16 Block TO: !10/8 Allow VNET LB NAT ACLS Controller Microsoft's Production Configurable Cloud” Mark Russinovich, Chief Technology Officer, Microsoft Azure, SCS Distinguished Lecture, 11/15/2016
  19. 19. Flow Action Decap, DNAT, Rewrite, Meter1.2.3.1->1.3.4.1, 62362->80 VFP Southbound API GFT Offload API (NDIS) VMSwitch VM Northbound API GFT Table First Packet GFT Offload Engine SmartNIC 50G QoSCrypto RDMAFlow Action Decap, DNAT, Rewrite, Meter1.2.3.1->1.3.4.1, 62362->80 GFT Transposition Engine Rewrite SLB Decap SLB NAT VNET ACL Metering Rule Action Rule ActionRule Action Rule Action Rule Action Rule Action Decap* DNAT* Rewrite* Allow* Meter* ControllerControllerController Encap Microsoft's Production Configurable Cloud” Mark Russinovich, Chief Technology Officer, Microsoft Azure, SCS Distinguished Lecture, 11/15/2016
  20. 20. • IaaS仮想マシンD15v2、 DS15v2で利用可能 • プライベートプレビュー中 Microsoft's Production Configurable Cloud” Mark Russinovich, Chief Technology Officer, Microsoft Azure, SCS Distinguished Lecture, 11/15/2016
  21. 21. ToR FPGA NIC Server FPGA NIC Server FPGA NIC Server FPGA NIC Server CS0 CS1 CS2 CS3 ToR FPGA NIC Server FPGA NIC Server FPGA NIC Server FPGA NIC Server SP0 SP1 SP2 SP3 L0 L1/L2 Microsoft's Production Configurable Cloud” Mark Russinovich, Chief Technology Officer, Microsoft Azure, SCS Distinguished Lecture, 11/15/2016 October 15, 2016
  22. 22. Credits Virtual Channel Data Header Elastic Router (multi- VC on-chip router) Send Connection Table Transmit State Machine Send Frame QueueConnection Lookup Packetizer and Transmit Buffer Unack’d Frame Store Ethernet Encap Ethernet Decap 40G MAC+PHY Receive Connection Table Credits Virtual Channel Data Header Depacketizer Credit Management Ack Receiver Ack Generation Receive State Machine Solid links show Data flow, Dotted links show ACK flow Datacenter Network Microsoft's Production Configurable Cloud” Mark Russinovich, Chief Technology Officer, Microsoft Azure, SCS Distinguished Lecture, 11/15/2016
  23. 23. Microsoft's Production Configurable Cloud” Mark Russinovich, Chief Technology Officer, Microsoft Azure, SCS Distinguished Lecture, 11/15/2016
  24. 24. https://www.sdxcentral.com/articles/news/microsoft-azure-will-use-intel-silicon- photonics/2016/08/ Microsoft expects to deploy silicon photonics in Azure data centers soon, “initially going for switch-to-switch connectivity,” said Kushagra Vaid, Azure’s general manager of hardware engineering, speaking at the Intel Developer Forum.
  25. 25. “The problem I have right now? It is supply chain. I am not so worried about technology. We have our Open Cloud Server, which I think is very compelling in that it offers some real economic capabilities. But I have got to nurture my supply chain because traditionally we bought from OEMs and now we are designing with ODMs so we can take advantage of prices and lower our overall costs. So I am moving very, very quickly to build out new capacity, and I want to do it in a very efficient and effective way and it is really about the commoditization of the infrastructure.” ( https://www.nextplatform.com/2016/09/26/rare-tour-microsofts-hyperscale-datacenters/ ) Rick Bakken, Sr. Director, Data Center Evangelism, Microsoft
  26. 26. https://azure.microsoft.com/en-us/blog/microsoft-reimagines-open-source-cloud-hardware/
  27. 27. Azure Storage https://infrayarou.blob.core.win dows.net/vhds/myubuntu.vhd
  28. 28. FE 2 Partition 3 (F-J) Stream 2 Partition Layer Stream Layer
  29. 29. FE 2 Partition 3 (F-J) Stream 2 Request 1: Partition F; Row 102 Request 1: シンプルな例
  30. 30. FE 1 Partition 3 (F-J) Stream 4 Request 1: Partition F; Row 102 Request 2: Partition F; Row 507 Request 2: 異なる Front End、同じPartition Server、異なるStream Server
  31. 31. FE 4 Partition 4 (K-T) Stream 2 Request 1: Partition F; Row 102 Request 2: Partition F; Row 507 Request 3: Partition T; Row 356 Request 3: 違うFront End、違うPartition Server、同じStream Server
  32. 32. FE 4 Partition 5 (U-Z) Stream 3 Stream 4 Request 1: Partition F; Row 102 Request 2: Partition F; Row 507 Request 3: Partition T; Row 356 Request 4: Partition W; Rows 213 & 672 Request 4: トランザクションの例 ひとつのPartition Serverが複数のStream Server上のデータをAtomicに更新
  33. 33. https://docs.microsoft.com/ja-jp/azure/storage/storage-scalability-targets
  34. 34. Disk(Page Blob) C:¥, /dev/sda C:¥, /dev/sda copy C:¥, /dev/sda Image Cache copy C:¥, /dev/sda
  35. 35. L3 L2 L3 East/West トラフィックが 遠い Routerが大型に なり高コスト LB/FWが ボトルネック
  36. 36. T2-1-1 T2-1-2 T2-1-8 T3-1 T3-2 T3-3 T3-4 Row Spine T2-4-1 T2-4-2 T2-4-4Data Center Spine T1-1 T1-8T1-7 …T1-2 … … Regional Spine … T1-1 T1-8T1-7 …T1-2 T1-1 T1-8T1-7 …T1-2 Rack …T0-1 T0-2 T0-20 Servers …T0-1 T0-2 T0-20 Servers …T0-1 T0-2 T0-20 Servers Microsoft's Production Configurable Cloud” Mark Russinovich, Chief Technology Officer, Microsoft Azure, SCS Distinguished Lecture, 11/15/2016
  37. 37. https://azure.githu b.io/SONiC/
  38. 38. ” Albert Greenberg, Distinguished Engineer Director of Networking, Microsoft, SIGCOMM 2015
  39. 39. P802.3by)
  40. 40. • Today’s Server to Tier 0 • Interconnect is based on 25G technology • Links are 50G Ethernet - 2x25G based on 25G Ethernet Consortium spec • Bandwidth growth drove us to use 50G • Don’t require an 802.3 specification here • Tomorrow’s Server to Tier 0 • Interconnect will be based upon 50G PAM4 technology • Expect links will be 100G Ethernet (2x50G) • Choice for 802.3: • Create the specification • Let a consortium do it
  41. 41. Azureで実現したいこと LB製品を使った実装では スケール • VIPあたり100Gbps • 障害発生時、数1000のVIPを素早く 再構成したい • $80,000で20Gbps • VIPあたり20Gbps • VIPあたり再構成に1秒かかる 可用性 • N+1 冗長化 and Quick failover • 1+1 冗長化 or Slow failover 配置柔軟性 • サーバーとLB/NATはL2境界を越え て柔軟に配置したい • NATやDSR(Direct Server Return)は 同じL2でしかサポートされない テナント分離 • ユーザーテナント起因での過負荷が、 他テナントに影響しないようにした い • ユーザーテナントからの過度なSNAT要 求が他テナントに影響を及ぼす
  42. 42. Ananta: Cloud Scale Load Balancing” Microsoft, SIGCOMM 2013 VM Switch VMN Host Agent VM1 . . . VM Switch VMN Host Agent VM1 . . . Controller ControllerAnanta Manager VIP Configuration: VIP, ports, # DIPs Multiplexer Multiplexer Multiplexer. . . VM Switch VMN Host Agent VM1 . . . . . .
  43. 43. 2nd Tier: Provides connection-level (layer-4) load spreading, implemented in servers. 1st Tier: Provides packet-level (layer-3) load spreading, implemented in routers via ECMP. 3rd Tier: Provides stateful NAT implemented in the virtual switch in every server. Multiplexer Multiplexer Multiplexer. . . VM Switch VMN Host Agent VM1 . . . VM Switch VMN Host Agent VM1 . . . VM Switch VMN Host Agent VM1 . . . . . . Ananta: Cloud Scale Load Balancing” Microsoft, SIGCOMM 2013
  44. 44. Ananta: Cloud Scale Load Balancing” Microsoft, SIGCOMM 2013 RouterRouter MUX Host MUXRouter MUX … Host Agent 1 2 3 VM DIP 4 5 6 7 8 Dest: VIP Src: Client Packet Headers Dest: VIP Dest: DIP Src: Mux Src: Client Dest: Client Src: VIPPacket Headers Client
  45. 45. Ananta: Cloud Scale Load Balancing” Microsoft, SIGCOMM 2013 Packet Headers Dest: Server:80 Src: VIP:1025 VIP:1025  DIP2 Server Dest: Server:80 Src: DIP2:5555
  46. 46. Ananta: Cloud Scale Load Balancing” Microsoft, SIGCOMM 2013
  47. 47. 足りなくなったら単純にサーバー足す、いちいちエンジニアリングしない 手作業で増設、設定していては無理なスケールと変化スピード 50Gbpsを超える世界で、CPUだけでは頑張れない 各種チップを活用しているが、FPGAが鍵
  48. 48. LinkedInのエンジニアリングチームもとんがってます 情報公開も積極的 (https://engineering.linkedin.com/blog )

×