Your SlideShare is downloading. ×
The AWS Big Data Platform – Overview
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

The AWS Big Data Platform – Overview

553
views

Published on

The introductory morning session will discuss big data challenges and provide an overview of the AWS Big Data Platform. We will also cover: …

The introductory morning session will discuss big data challenges and provide an overview of the AWS Big Data Platform. We will also cover:

• How AWS customers leverage the platform to manage massive volumes of data from a variety of sources while containing costs.
• Reference architectures for popular use cases, including: connected devices (IoT), log streaming, real-time intelligence, and analytics.
• The AWS big data portfolio of services, including Amazon S3, Kinesis, DynamoDB, Elastic MapReduce (EMR) and Redshift.
• The latest relational database engine, Amazon Aurora - a MySQL-compatible, highly-available relational database engine which provides up to five times better performance than MySQL at a price one-tenth the cost of a commercial database.
• Amazon Machine Learning – the latest big data service from AWS provides visualization tools and wizards that guide you through the process of creating machine learning (ML) models without having to learn complex ML algorithms and technology.

Published in: Technology

0 Comments
6 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
553
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
24
Comments
0
Likes
6
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. April 21, 2015 Seattle AWS Big Data Platform
  • 2. Agenda Overview 10:30 AM Introduction to Big Data @ AWS 12:30 PM Use Case Technical Deep Dive Sessions •  Data Collection and Storage •  Real-time Event Processing •  Analytics
  • 3. Broad & Deep Core Services
  • 4. Rich Platform Services
  • 5. Big Data Pipeline Data Answers Collect   Process   Analyze   Store  
  • 6. Primitive Patterns Collect   Process   Analyze   Store   Data Collection and Storage Data Processing Event Processing Data Analysis
  • 7. Collect   Process   Analyze   Store   Data Collection and Storage Data Processing Data Analysis Event Processing Primitive Patterns S3 Kinesis DynamoDB RDS  (Aurora) MySQL AWS  Lambda KCL  Apps EMR RedshiA Machine Learning
  • 8. Collect   Process   Analyze   Store   Data Collection and Storage Primitive Patterns S3 Kinesis DynamoDB RDS  (Aurora) MySQL
  • 9. Data Collection and Storage File Stream Transactional AppsLogging  Frameworks
  • 10. AWS Services – Data Collection and Storage MySQL
  • 11. Benefits of Streamlined Data Collection
  • 12. S3   $0.030/GB-­‐Mo   Redshi)   Starts  at     $0.25/hour   EC2   Starts  at     $0.02/hour   Glacier   $0.010/GB-­‐Mo   Kinesis   $0.015/shard  1MB/s  in;  2MB/out   $0.028/million  puts  
  • 13. Cost & Scale
  • 14. Benefits of Streamlined Data Collection ExisGng   ApplicaGon   DynamoDB  table(s)   GET  calls  &  Queries   PUT  calls   Query(… PutItem(…
  • 15. Benefits of Streamlined Data Collection Customers    Devices    Data  Items    Item  Size      Frequency    Challenge:  compounding  scale    Benefit:  improved  data  quality  
  • 16. Collect   Process   Analyze   Store   Event Processing Primitive Patterns AWS  Lambda KCL  Apps
  • 17. Event Processing – Enabling Capabilities AWS  Lambda KCL  Apps
  • 18. Real-Time Event Processing •  Examples:
  • 19. Benefits of Event Processing Collect    |    Store    |    Analyze   Alert
  • 20. Primitive Patterns Collect   Process   Analyze   Store   Data Collection and Storage Data Processing Event Processing Data Analysis EMR RedshiA Machine Learning
  • 21. Retail  &  POS   AnalyGcs   Process  10’s  of  TB  in   hours  vs.  2  weeks   80-­‐90%  reducGon  in   costs  
  • 22. Big Data Use Cases Internet  of  Things Digital  AdverLsing Online  Gaming Log  AnalyLcs Customer  Value  Scoring PersonalizaLon  Engine Collect   Process   Analyze   Store   Data Collection and Storage Data Processing Event Processing Data Analysis
  • 23. 2009 2010 2011 2012 2013 Move  to  AWS   cameras Switch  to   DynamoDB   Connected Devices / IoT Simple  video  monitoring  &  security   Fast  growth  –  “suddenly  petabytes”  
  • 24. •  S3  (CVR  data)   •  DynamoDB  (meta  data)   •  EMR  (acGvity  recogniGon)   •  CloudFront  (CDN)   •  EC2  (live  streaming)  
  • 25. Applying Analytics to Connected Device Data VPC Subnet MQTT Broker on EC2 Instance VPC Internet Gateway EMR Kinesis DynamoDB Redshift Lambda SNS S3 Data Pipeline
  • 26. Backend Analytics Architecture for Connected Device Data
  • 27. AWS Big Data Ecosystem S3   Kinesis   EMR   Redshif   Data  Pipeline   DynamoDB  
  • 28. Big Data Partner Solutions Solutions vetted by the AWS Partner Competency Program
  • 29. Big Data Service Offers Service expertise vetted by the AWS Partner Competency Program
  • 30. AWS Marketplace Advanced Analytics Database and Data Enablement Business Intelligence 1-click deployment to launch, on multiple regions around the world Pay-as-you-go pricing with no long term contracts required 2,000+ product listings to browse, test and buy software Enterprise software store for business users who need simplified procurement
  • 31. April 21, 2015 Seattle Analytics in Minutes, Not Weeks
  • 32. What is Driving Big Data Adoption?
  • 33. Everything is Connected
  • 34. Everything is Achievable
  • 35. Everything is Achievable
  • 36. Flexible Transform all types of data into self-service analytics
  • 37. Flexible Transform all types of data into self-service analytics
  • 38. Flexible Transform all types of data into self-service analytics Amazon EMR Amazon RDS Amazon Redshift
  • 39. A Modern Cloud Based BI Architecture
  • 40. Use Data to Provide the Best Player Experience Run Agile and Lean Self-Service
  • 41. PRIVATE Amazon Redshift PUBLIC MySQL CRM MySQL Game Servers Data Files Alteryx SQL ToolsSecured External Access Secured External Access Tableau Online Live Connection
  • 42. Amazon Redshift Tableau Online Tableau Desktop Tableau Desktop Live Connection
  • 43. Thank you
  • 44. Amazon Machine Learning Amazon Aurora
  • 45. Introducing… Amazon Machine Learning
  • 46. Smart Applications e-­‐commerce:  recommendaGons   made  based  on  your  past  purchases   finance:  alerts  from  your  bank  when   they  suspect  fraudulent  transacGons   retail:  emails  when  items  related  to   things  you  typically  buy  are  on  sale  
  • 47. Amazon Machine Learning 1.  Build  &  Train  Model   •  Create  a  datasource  object  (connect  to  Redshif,  RDS,  S3)   •  Explore  and  understand  your  data   •  Transform  and  train  your  model   2.  Evaluate  the  Model  &  OpGmize   •  Assess  model  quality   •  Fine-­‐tune  the  model   3.  Retrieve  PredicGons   •  Batch:  asynchronous,  large  volume  predicGon   •  Real-­‐Gme:  synchronous,  single-­‐item  predicGon  
  • 48. Amazon Machine Learning example use cases •  Fraud  detecGon   •  Demand  forecasGng   •  PredicGve  customer  support   •  Click  predicGon   •  Content  personalizaGon   •  Document  classificaGon  
  • 49. Amazon Machine Learning Currently  Available  in  US-­‐East-­‐1  
  • 50. Amazon Aurora Amazon’s New Relational Database Engine
  • 51. Reimagining the relational database What  if  you  were  invenAng  the  database  today?   •  You  wouldn’t  design  it  the  way  we  did  in  1970.  At   least  not  enGrely.   •  You’d  build  something  that  can  scale  out,  that  is  self-­‐ healing,  and  that  leverages  exisGng  AWS  services.  
  • 52. Relational databases reimagined for the cloud •  Speed  and  availability  of  high-­‐end  commercial  databases   •  Simplicity  and  cost-­‐effecAveness  of  open  source  databases   •  Drop-­‐in  compaAbility  with  MySQL   •  Simple  pay  as  you  go  pricing   Delivered  as  a  managed  service  
  • 53. A service-oriented architecture applied to the database •  Moved  the  logging  and  storage  layer   into  a  mulG-­‐tenant,  scale-­‐out   database-­‐opGmized  storage  service   •  Integrated  with  other  AWS  services   like  Amazon  EC2,  Amazon  VPC,   Amazon  DynamoDB,  Amazon  SWF,   and  Amazon  Route  53  for  control   plane  operaGons   •  Integrated  with  Amazon  S3  for   conGnuous  backup  with   99.999999999%  durability   Control PlaneData Plane Amazon DynamoDB Amazon SWF Amazon Route 53 Logging + Storage SQL Transactions Caching Amazon S3
  • 54. Simplify database management •  Create  a  database  in  minutes   •  Automated  patching   •  Push-­‐bulon  scale  compute   •  ConGnuous  backups  to  S3   •  AutomaGc  failure  detecGon  and  failover   Amazon  RDS  
  • 55. Simplify storage management •  Read  replicas  are  available  as  failover  targets—no  data  loss   •  Instantly  create  user  snapshots—no  performance  impact   •  ConGnuous,  incremental  backups  to  S3   •  AutomaGc  storage  scaling  up  to  64  TB—no  performance  or   availability  impact   •  AutomaGc  restriping,  mirror  repair,  hot  spot  management,   encrypGon  
  • 56. Simplify data security •  EncrypGon  to  secure  data  at  rest   –  AES-­‐256;  hardware  accelerated   –  All  blocks  on  disk  and  in  Amazon  S3  are  encrypted   –  Key  management  via  AWS  KMS   •  SSL  to  secure  data  in  transit   •  Network  isolaGon  via  Amazon  VPC  by  default   •  No  direct  access  to  nodes   •  Supports  industry  standard  security  and  data   protecGon  cerGficaGons   Storage SQL Transactions Caching Amazon S3 Application
  • 57. Aurora storage Highly  available  by  default   •  6-­‐way  replicaGon  across  3  AZs   •  4  of  6  write  quorum   –  AutomaGc  fallback  to  3  of  4  if  an  AZ  is  unavailable   •  3  of  6  read  quorum   SSD,  scale-­‐out,  mulA-­‐tenant  storage   •  Seamless  storage  scalability   •  Up  to  64  TB  database  size   •  Only  pay  for  what  you  use   Log-­‐structured  storage   •  Many  small  segments,  each  with     their  own  redo  logs   •  Log  pages  used  to  generate  data  pages   •  Eliminates  chaler  between  database  and  storage   SQL Transactions AZ 1 AZ 2 AZ 3 Caching Amazon S3
  • 58. Self-healing, fault-tolerant •  Lose  two  copies  or  an  AZ  failure  without  read  or  write   availability  impact   •  Lose  three  copies  without  read  availability  impact   •  AutomaGc  detecGon,  replicaGon,  and  repair   SQL Transactio n AZ 1 AZ 2 AZ 3 Caching SQL Transactio n AZ 1 AZ 2 AZ 3 Caching Read and write availabilityRead availability
  • 59. Instant crash recovery TradiAonal  databases   •  Have  to  replay  logs  since   the  last  checkpoint   •  Single-­‐threaded  in   MySQL;  requires  a  large   number  of  disk  accesses   Amazon  Aurora   •  Underlying  storage  replays   redo  records  on  demand  as   part  of  a  disk  read   •  Parallel,  distributed,   asynchronous   Checkpointed Data Redo Log Crash at T0 requires a re-application of the SQL in the redo log since last checkpoint T0 T0 Crash at T0 will result in redo logs being applied to each segment on demand, in parallel, asynchronously
  • 60. Write performance (console screenshot) •  MySQL  Sysbench   •  R3.8XL  with  32  cores   and  244  GB  RAM   •  4  client  machines  with   1,000  threads  each  
  • 61. Read performance (console screenshot) •  MySQL  Sysbench   •  R3.8XL  with  32  cores   and  244  GB  RAM   •  Single  client  with     1,000  threads  
  • 62. Read replica lag (console screenshot) •  Aurora  Replica  with  7.27  ms  replica  lag  at  13.8  K  updates/second   •  MySQL  5.6  on  the  same  hardware  has  ~2  s  lag  at  2  K  updates/second    
  • 63. Aurora preview •  Sign  up  for  preview  access  at:     hlps://aws.amazon.com/rds/aurora/preview   •  Now  available  in  US  West  (Oregon)  and  EU  (Ireland),  in   addiGon  to  US  East  (N.  Virginia)   Thousands  of  customers  already  invited  to  the  limited   preview   •  Now  moving  to  unlimited  preview;  accepGng  all  requests  in   2–3  weeks   •  Full  service  launch  in  the  coming  months  
  • 64. AWS Big Data platform •  Choice  –  platorm  breadth  supports  many  use  cases   •  SpecializaAon  –  use  the  best  service  for  the  job   •  Managed  Services  –  eliminate  undifferenGated  effort   S3 Kinesis DynamoDB RDS  (Aurora) MySQL AWS  Lambda KCL  Apps EMR RedshiA Machine Learning
  • 65. Thank you Questions?