Using MLOps to Bring ML to Production/The Promise of MLOps

1. Foundation for ML Your data + Microsoft data Breakthrough advancements Data Cloud Models Power of Azure

2. SpeechVision Language 2016 2017 20182018 Microsoft ML breakthroughs

3. Microsoft 365 ML at Microsoft | Research

4. ML at scale Monthly active Office 365 users using AI 180 million Questions Asked of Cortana 18 Billion Number of Signals Analyzed to Block Emerging Threats DAILY 6.5 Trillion

5. But ML is HARD!

6. Building a model

7. Building a model Data ingestion Data analysis Data transformation Data validation Data splitting Trainer Model validation Training at scale LoggingRoll-out Serving Monitoring

8. Ok, but, like, I’m a data scientist. IDGAF I don’t care about all that.

9. Yes You Do!

10. 11

11. Cowboys and Ranchers Can Be Friends! SRE/ML EngineersData Scientist • Quick iteration • Frameworks they understand • Best of breed tools • No management headaches • Unlimited scale • Reuse of tooling and platforms • Corporate compliance • Observability • Uptime

12. Haven’t I Heard This Before?

13. GitOps = Git + Dev + Ops

14. GitOps == VELOCITY and SECURITY

15. MLOps!

16. MLOps = ML + DEV + OPS Experiment Data Acquisition Business Understanding Initial Modeling Develop Modeling Operate Continuous Delivery Data Feedback Loop System + Model Monitoring ML + Testing Continuous Integration Continuous Deployment

17. MLOps Benefits • Code drives generation and deployments • Pipelines are reproducible and verifiable • All artifacts can be tagged and audited • SWE best practices for quality control • Offline comparisons of model quality • Minimize bias and enable explainability • Controlled rollout capabilities • Live comparison of predicted vs. expected performance • Results fed back to watch for drift and improve model Automation / Observability Validation Reproducibility /Auditability == VELOCITY and SECURITY (For ML)

18. Internal MLOps Platforms FBLearner FlowTensorFlow Extended Uber’s Michelangelo Microsoft Aether

19. But I Don’t Work at a Big Company With Thousands of ML Engineers!

20. Build Your Own MLOps Platform And many MANY more… + +

21. Cloud Provider MLOps Platforms

22. Real World Multi-Cloud CI/CD Pipeline Process Train Stage Serve Data Distributed Cloud SRE/ML Engineers Data Scientist ENV #1 ENV #2

23. Azure DevOps Pipelines Cloud-hosted pipelines for Linux, Windows and macOS. Any language, any platform, any cloud Build, test, and deploy Node.js, Python,  Java, PHP, Ruby, C/C++, .NET, Android, and iOS apps. Run in parallel on Linux, macOS, and Windows. Deploy to Azure, AWS, GCP or on-premises Extensible Explore and implement a wide range of community- built build, test, and deployment tasks, along with hundreds of extensions from Slack to SonarCloud. Support for YAML, reporting and more Containers and Kubernetes Easily build and push images to container registries like Docker Hub and Azure Container Registry. Deploy containers to individual hosts or Kubernetes.

24. Azure DevOps + Azure ML

25. First Class Model Training Tasks CI pipeline captures: 1. Create sandbox 2. Run unit tests and code quality checks 3. Attach to compute 4. Run training pipeline 5. Evaluate model 6. Register model

26. Automated Deployment CD pipeline captures: 1. Package model into container image 2. Validate and profile model 3. Deploy model to DevTest (ACI) 4. If all is well, proceed to rollout to AKS Everything is done via the CLI

27. Model Versioning & Storage • which data, • which experiment / previous model(s), • where’s the code / notebook) • Was it converted / quantized? • Private / compliant data

28. Model Validation • Data (changes to shape / profile) • Model in isolation (offline A/B) • Model + app (functional testing) • Only deploy after initial validation passes • Ramp up traffic to new model using A/B experimentations • Functional behavior • Performance characteristics

29. Model Profiling

30. Model Deployment • Focus on ML, not DevOps • Get telemetry for service health and model behavior • code-generation • API specifications / interfaces • Cloud Services • Mobile / Embedded Applications • Edge Devices • Quantize / optimize models for target platform • Compliant + Safe

31. Seems Like a Lot of Work…

32. 33

33. MLOps Gets You to Production • End-to-end ownership by data science teams using SWE best practices • Continuously deliver of value to end users. • Enables lineage, auditability and regulatory compliance through consistency

34. Ok… but WHY?

35. What Does All This Stuff Solve For? 1. Does My Model Actually Work? 2. What Did My Customers See? 3. Is My Model Still Good?

37. Does My Model Actually Work?

38. Does My Model Actually Work? SRE/ML EngineersData Scientist Time to test out my model… Laptop The Cloud

39. Does My Model Actually Work? SRE/ML EngineersData Scientist Laptop The Cloud

40. Does My Model Actually Work? SRE/ML EngineersData Scientist Looks good to me! To Production! Laptop The Cloud

41. Does My Model Actually Work? SRE/ML EngineersData Scientist Laptop The Cloud Wait, what? Oh… oh no…

42. Does My Model Actually Work? SRE/ML EngineersData Scientist Laptop The Cloud WOAH there.

43. Does My Model Actually Work? SRE/ML EngineersData Scientist Laptop The Cloud WOAH there. Source Control

44. What is happening… Source Control Does My Model Actually Work? SRE/ML EngineersData Scientist Laptop The Cloud

45. A Small Example of Issues You Can Have… • Inappropriate HW/SW stack • Mismatched driver versions • Crash looping deployment • Data/model versioning [Nick Walsh] • Non-standard images/OS version • Pre-processing code doesn’t match production pre-processing • Production data doesn’t match training/test data • Output of the model doesn’t match application expectations • Hand-coded heuristics better than model [Adam Laiacano] • Model freshness (train on out-of-date data/input shape changed) • Test/production statistics/population shape skew • Overfitting on training/test data • Bias introduction (or not tested) • Over/under HW provisioning • Latency issues • Permissions/certs • Failure to obey health checks • Killed production model before roll out of new/in wrong order • Thundering herd for new model • Logging to the wrong location • Storage for model not allocated properly/accessible by deployment tooling • Route to artifacts not available for download • API signature changes not propagated/expected • Cross-data center latency • Expected benefit doesn’t materialize (e.g. multiple components in the app change simultaneously) • Get wrong/no traffic because A/B config didn’t roll out • Get too much traffic too soon (expected to canary/exponential roll out) • Lack of visibility into real-time model behavior (detecting data drift, live data distribution vs train data, etc) [Nick Walsh] • Outliers not predicted [MikeBSilverman] • Change was a good change, but didn’t communicate with the rest of the team (so you must roll back) • No dates! (date to measure impact/improvement against a pre- agreed measure; date scheduled to assess data changes) [Mary Branscombe] • No CI/CD; manual changes untracked [Jon Peck] • LACK OF DOCUMENTATION!! (the problem, the testing, the solution, lots more) [Terry Christiani] • Successful model causes pain elsewhere in the organization (e.g. detecting faults previously missed) [Mark Round] Or It Just Doesn’t Work! At All!

46. Does My Model Actually Work? SRE/ML EngineersData Scientist Laptop The Cloud Source Control Automated Validation & Profiling Package For Rollout Explain Model & Look for Bias Clean/ Minimize Code Sane Deployment Nice. Nice. ü

47. But I Can Do All These Manually…

48. No.

49. MLOps is a Platform and a Philosophy Even if: o Every data scientist trained... o And you had all the tools necessary... o And they all worked together... o And your SREs understood ML modeling... o And and and and ... You’d still need a permenant, repeatble record of what you did

50. That’s MLOps!

53. What Did My Customers See?

54. What Did My Customers See? SRE/ML Engineers The Cloud Front End Model Server Customer I’d Like a loan, please. Source Control

55. What Did My Customers See? SRE/ML Engineers The Cloud Front End Model Server Customer No. Source Control

56. What Did My Customers See? SRE/ML Engineers The Cloud Front End Model Server Customer Ok, but why? Source Control

57. Source Control What Did My Customers See? SRE/ML Engineers The Cloud Front End Model Server Customer Uh oh. Lawyer Lawyer Lawyer Lawyer Lawyer Lawyer Lawyer Lawyer Lawyer Lawyer Lawyer Lawyer Lawyer Lawyer Lawyer Lawyer LawyerLawyer

58. It’s Not Just About Explainability! • Yes, models are complicated • But, that’s not enough: o What data did you train on? o How did you transform/exclude outliers? o What are the data statistics? o Did anything change between code and production? o What model did you actually serve (to this person)? • MLOps can help!

59. What Did My Customers See? SRE/ML Engineers The Cloud Front End Model Server Customer Source Control Automated Validation & Profiling Package For Rollout Explain Model & Look for Bias Clean/ Minimize Code Sane Deployment

60. 32c04681d7573 What Did My Customers See? SRE/ML Engineers The Cloud Front End Model Server Customer Automated Validation & Profiling Package For Rollout Explain Model & Look for Bias Clean/ Minimize Code Sane Deployment Source Control Immutable Metadata Store b151f8e65b32a c7f4e7607b4b7 0ef1d58921d89 e2e1e994c4251 786c8e57a6d51 9ce88802f0759 9ce88802f0759

61. What Did My Customers See? SRE/ML Engineers The Cloud Front End Model Server Customer Automated Validation & Profiling Package For Rollout Explain Model & Look for Bias Clean/ Minimize Code Sane Deployment Source Control Immutable Metadata Store b151f8e65b32a c7f4e7607b4b7 0ef1d58921d89 e2e1e994c4251 786c8e57a6d51 9ce88802f0759 32c04681d7573 Why didn’t I get a loan? 9ce88802f0759

62. What Did My Customers See? SRE/ML Engineers The Cloud Front End Model Server Customer Automated Validation & Profiling Package For Rollout Explain Model & Look for Bias Clean/ Minimize Code Sane Deployment Source Control Immutable Metadata Store b151f8e65b32a c7f4e7607b4b7 0ef1d58921d89 e2e1e994c4251 786c8e57a6d51 9ce88802f0759 32c04681d7573 32c04681d7573 9ce88802f0759

65. Is My Model Still Good?

66. Is My Model Still Good? SRE/ML Engineers The Cloud There is a blue or orange DUCK inside this barn. What color is the duck?

67. Let’s Use Machine Learning!!

68. Is My Model Still Good? SRE/ML Engineers The Cloud Front End Model Server f7c5f9fe7b762 It’s a duck! BLUE There is a blue or orange DUCK inside this barn. What color is the duck?

69. But wait...

70. Is My Model Still Good? SRE/ML Engineers The Cloud Front End Model Server f7c5f9fe7b762 It’s a duck! BLUE 5 Blue Ducks 995 Yellow Ducks Accuracy = 99% False Positive = 1% ???????????????????

71. Thomas Bayes

72. 𝑷 𝑨| 𝑩 = 𝑷 𝑩| 𝑨 ⋅ 𝑷 𝑨 𝑷 𝑩 Bayes’ Theorem

73. Accuracy depends on the population distribution!

74. Is My Model Still Good? SRE/ML Engineers The Cloud Front End Model Server f7c5f9fe7b762 It’s a duck! BLUE 995 Yellow Ducks 5 Blue Ducks WRONG 2/3rd of the Time! Accuracy = 99% False Positive = 1% ???????????????????

75. Who cares…

76. This Can Be Addressed!

77. Is My Model Still Good? SRE/ML Engineers The Cloud Front End Model Server f7c5f9fe7b762 It’s a duck! BLUE 995 Yellow Ducks 5 Blue Ducks Model Server d4093cc84b267

78. But…

79. Is My Model Still Good? SRE/ML Engineers The Cloud Front End Model Server 995 Yellow Ducks 5 Blue Ducks d4093cc84b267

80. Is My Model Still Good? SRE/ML Engineers The Cloud Front End Model Server 500 Yellow Ducks 500 Blue Ducks d4093cc84b267

81. Is My Model Still Good? • Models != Code – they can go stale... QUICKLY. • IMPORTANT: o Watch your model & data for drift from training o Regularly (if not continuously) retrain, even before performance begins to fail o Multiple versions rollbacks are not uncommon! • Without an e2e MLOps pipeline, many of the above are O(really really hard)!

83. Next for MLOps

84. MLOps Gives* You… • Software best practices for building machine learning solutions • Repeatable workflow for training a model and rolling it out to production • An immutable record of what’s actually running • Lineage of model creation including data sources • Acceleration from code to customer benefits * Requires some human and software work

85. What’s Next for MLOps • Simplify monitoring and retraining • Extend MLOps for data incl prep and profiling • Enterprise features o Test cases o Auditing o Security o Resource management (bin packing / resource optimization) o Network isolation • Metadata and API standards Or, better yet, you tell us!

86. It’s a whole new world • Data science will touch EVERY industry. • We can’t ask people to become a PhD in statistics though. • How do WE help everyone take advantage of this transformation?

87. me: David Aronchick (david.aronchick@microsoft.com) twitter: @aronchick github: • https://github.com/aronchick/kubeflow-and-mlops • https://aka.ms/mlops THANK YOU!

Using MLOps to Bring ML to Production/The Promise of MLOps

Weaveworks

Using MLOps to Bring ML to Production/The Promise of MLOps