Apache Hiveの今とこれから - 2016

3. 3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved 自己紹介 Ã  ⼤浦譲太郎 Twitter：@JOOOURA Ã  5歳児と8歳児の⽗ Ã  サーバ、ストレージのシステム営業を経て2011年にフラッシュメモリストレージ企業の⽇本法⼈⽴ち上げに参画。Evangelist、プリセールスSE、広報、営業など⼀通りをカバーエンタープライズフラッシュの代名詞ともなるioDriveシリーズを⽇本国内の通信キャリア、⾦融機関、WEBサービス事業者、アドテク、DC事業者に多数導⼊。 Ã  2016年1⽉より、ホートンワークスジャパンの⼆⼈⽬の営業として参画。現在はエヴァンジェリスト活動及びエンタープライズ向けセールス、パートナー⽀援を⾏なっている。

4. 4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved About Hortonworks お客様との歩み •  ~800 社 (2016年2月現在) •  152 社は 2015年第三四半期で •  2015年10月NASDAQへ上場: HDP The Leader in Connected Data Platforms •  Hortonworks DataFlow for data in moNon •  Hortonworks Data PlaOorm for data at rest •  Powering new modern data applicaNons Partner for Customer Success •  Leader in open-source community, focused on innovaNon to meet enterprise needs •  Unrivaled support subscripNons Founded in 2011 Yahoo! で初代の Hadoop 開発を手がけたアーキテクト、デベロッパー、オペレータ　24名によって創立 1000+ E M P L O Y E E S 1500+ E C O S Y S T E M PA R T N E R S

5. 5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Our Model: Drive an Enterprise-focused Roadmap 1.  Innovate Exis6ng Projects –  Hive/SNnger, YARN, HDFS, common ops & security via Ambari & Ranger 2.  Incubate New Projects –  Metron (was OpenSOC), Ranger, Knox, Atlas, Falcon, Ambari, Tez, etc. 3.  Acquire IP & Contribute –  Acquired XASecure and created Apache Ranger; contributed OpenSOC 4.  Partner & Deliver Joint Solu6ons –  Microsod, EMC, HP, SAS, Pivotal, Red Hat, Teradata, etc. 5.  Rally the Ecosystem –  Fast SQL via SNnger iniNaNve, Data Governance iniNaNve, ODPi DataAccess (batch,interactive,realtime) Integration& GovernanceOperationsSecurity Apache Project Hortonworks CommiNers Hortonworks PMC HWX % of CommiNers Hadoop 29 24 31% Accumulo 2 2 9% Calcite 6 3 43% HBase 8 5 17% Hive 19 11 38% NiFi 5 5 42% Phoenix 5 5 22% Pig 5 5 24% Slider 12 12 100% Spark 1 0 2% Storm 4 4 19% Tez 15 15 44% Atlas 7 0 35% Falcon 7 5 41% Flume 1 1 4% KaZa 0 0 0% Sqoop 1 1 4% Ambari 39 30 76% Oozie 4 2 22% Zookeeper 2 1 13% Knox 12 2 80% Ranger 13 11 76% TOTAL 197 144 Source: Apache Sodware FoundaNon. As of October 5, 2015. A commi'er is someone who has “earned their stripes” within the Apache community and has the ability to commit code directly to their corresponding Apache project source code repository

6. 6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Page 6 © Hortonworks Inc. 2011 – 2015. All Rights Reserved 100% Open Source Connected Data Plaaorms Eliminates Risk of vendor lock-in by delivering 100% Apache open source technology Maximizes Community Innovation with hundreds of developers across hundreds of companies Integrates Seamlessly through commijed co-engineering partnerships with other leading technologies M A X I M U M C O M M U N I T Y I N N O VAT I O N T H E I N N O VAT I O N A D VA N TA G E P R O P R I E T A R Y H A D O O P T I M E INNOVATION O P E N C O M M U N I T Y

7. 7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved 自己紹介 Ã  今井雄太 Twijer：@imai_factory Ã  SoluNons Engineer Ã  広告配信サーバーのレポート作成のために MapReduce(perl + streaming!)を使ったのがHadoopとの出会い。 Ã  その後、AWSにてアドテクやゲームのお客様を担当しつつ、EMRやS3などのビッグデータなプロダクトを主に担当。そんなつながりでHortonworksに入社してHadoopをやっています。

9. 9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved 最近のApache Hive: Key highlights Ã ~Hive1.2.1 – Tez – Cost Based Optimizer(CBO) – ORC File format – Vectorization Ã Hive2.0 – LLAP Stinger Initiative Hiveを100倍以上⾼速化 Already available on HDP!

10. 10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Sub-second ショートクエリで 1秒以下のレスポンスを⽬指す Ã ~Hive1.2.1 – Tez – Cost Based Optimizer(CBO) – ORC File format – Vectorization Ã Hive2.0 – LLAP Stinger Initiative Hiveを100倍以上⾼速化 Already available on HDP! 最近のApache Hive: Key highlights

12. 12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Hive performance recap •  Stinger: •  Apache Hiveのパフォーマンスを100倍にするというゴールのもとに始まったプロジェクト Vectorized SQL Engine, Tez ExecuNon Engine, ORC Columnar format Cost Based OpNmizer Hive 0.10 Batch Processing 100-150x Query Speedup Hive 0.14 Human InteracNve (5 seconds)

13. 13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved TPC-DS Benchmark at 30 Terabyte Scale •  TPC-DSより 50 のサンプルクエリを 30 terabyte のスケールで実⾏ •  平均 52 倍の速度アップ, 最⼤ 160 倍の速度アップ •  ベンチマークの総実⾏時間が 7.8 ⽇から 9.3 時間に短縮 •  Hive 14に追加された Cost-Based Optimizer が更に 2.5倍の速度アップ実現

15. 15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Apache Tez Page 15 Ã データ処理アプリのための汎⽤分散処理エンジン – アプリ（フレームワーク）向け、エンドユーザー向けではない – Hive on Tez, Pig on Tez, Cascading on Tez, … Ã MapReduceの教訓を活かした結果 – ⼤幅なパフォーマンス改善 – バッチ、インタラクティブ – Petabytesスケール Ã YARNの上で動かす – クラスタリソースの活⽤ DAG(無閉路有向グラフ)

16. 16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved MapReduce & Tez M M M R R M M R M M R M M R HDFS HDFS HDFS M M M R R R M M R R Map – Reduce Intermediate results in HDFS Tez Optimized Pipeline •  中間データをHDFS に書き出さない •  Map-Reduce- Reduceのような構成を取ることができる •  セッションによるコンテナの再利⽤ •  ジョブを通してのパイプラインの最適化

17. 17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved What is DAG & Why DAG Projection Filter GroupBy … Join Union Intersect … Split … • Directed Acyclic Graph（無閉路有向グラフ） • どんなに複雑なDAGでも、基本的には以下の3つのパターンに分類ができる – Sequential – Merge – Divide

19. 19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Tez – Key beneﬁts • DAGの表現⼒ •  Easier to express computation in DAG • 中間データをHDFSに吐き出さない •  レイテンシ •  NameNodeへの負荷 • Tezセッション/コンテナ再利⽤ •  AM/タスクコンテナアロケーションのオーバーヘッド •  ResourceManagerの負荷 •  Object Registryによるデータ使い回し（MapJoin⽤のテーブルなど） •  JITによる実⾏コードの最適化 • DAG全体を⾒渡しての最適化

20. 20 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Tez - architecture Ã Client –  Starts session –  Submits DAG Ã Application Master –  DAG Scheduler –  Task Scheduler –  Vertex Manager Ã TezTask Containers –  Execution

21. 21 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Tez - architecture Ã Client –  Starts session –  Submits DAG Ã Application Master –  DAG Scheduler –  Task Scheduler –  Vertex Manager Ã TezTask Containers –  Execution

24. 24 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Hadoopで使われるファイルフォーマット •  Text •  SequenceFile •  RCFile •  + Can be read required column •  + Compression on each column •  - type-free binary blobs •  - no index •  - Compression by stream-based codec

25. 25 © Hortonworks Inc. 2011 – 2016. All Rights Reserved ORCFile – Hiveのためのカラム型ストレージ Ã High Compression – カラムごとに適⽤されるデータの型スペシフィックな圧縮 – ストリーム単位でのZLIBやSNAPPYによる圧縮 Ã High Performance – File, Stripe, Rowそれぞれのレベルでのインデックス、メタデータ – Predicate Pushdown Ã Flexible Data Model – Complex types(struct, list, map, union) – New types(datetime, decimal) Page 25

26. 26 © Hortonworks Inc. 2011 – 2016. All Rights Reserved ORC at Facebook Saved more than 1,400 servers worth of storage. (2) Compression i Compression raNo increased from 5x to 8x globally. (2) Compression i

33. 33 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Page 33 … … … … … Stream: INDEX Stream: BROOM FILTER Stream: DATA Stream: LENGTH Stream: DICTIONARY Row Group(Default: 10K records for each RG) ORCFile – ファイルフォーマット

34. 34 © Hortonworks Inc. 2011 – 2016. All Rights Reserved File - Column1 - min - max - sum - hasNull - Column2 - Column3 - ColumnN - Compression - Footer Length Stripe1 - Column1 - min - max - sum - hasNull - Column2 - Column3 - ColumnN Column1 RG1 - min - max - sum - hasNull - pos RG1 - min - max - sum - hasNull - pos … ColumnN RG1 - min - max - sum - hasNull - pos RG1 - min - max - sum - hasNull - pos … … StripeN … ORCFile – ファイルフォーマット

35. 35 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Compression Ã データの型スペシフィックな圧縮(Light-Weight Compression) – カラムごとに適⽤される圧縮 – 必ず適⽤される – RLE, Direct, Patch Base, Delta Ã データストリームの圧縮(Generic Compression) – ファイル全体を通して共通で適⽤される圧縮 – 実際にはそれぞれのStream、Footerに適⽤される – 上記のLight-Weight Compressionが適⽤された上に適⽤される – NONE, ZLIB, SNAPPY, LZO Page 35

38. 38 © Hortonworks Inc. 2011 – 2016. All Rights Reserved ORCの情報をダンプする orcfiledump hive --service orcfiledump /apps/hive/warehouse/rankings/000045_0 RowGroupごとのインデックス情報を含めるには rowindex <カラム番号> を指定。0を指定すれば全カラムの情報がとれる hive --service orcfiledump --rowindex 1 /apps/hive/warehouse/rankings/000045_0 Page 38

39. 39 © Hortonworks Inc. 2011 – 2016. All Rights Reserved File Statistics File Statistics: Column 0: count: 1620325 hasNull: false Column 1: count: 1620325 hasNull: false min: 1.0.100.215 max: 99.99.97.199 sum: 21531540 Column 2: count: 1620325 hasNull: false min…max: …sum: 88890214 Column 3: count: 1620325 hasNull: false min: 1970-01-01 max: 2012-04-30 Column 4: count: 1620325 hasNull: false min: …-8 max: …sum: 810757.3001111746 Column 5: count: 1620325 hasNull: false min… max: … sum: 85357610 Column 6: count: 1620325 hasNull: false min: ALB max: ZAF sum: 4860975 Page 39

40. 40 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Stripe Statistics Stripe Statistics: Stripe 1: Column 0: count: 1545000 hasNull: false Column 1: count: 1545000 hasNull: false min: 1.0.100.215 max: 99.99.97.199 sum: 20530443 Column 2: count: 1545000 hasNull: false min: … max: … sum: 84763272 Column 3: count: 1545000 hasNull: false min: 1970-01-01 max: 2012-04-30 Column 4: count: 1545000 hasNull: false min: … max: … sum: 773016.625769496 Column 5: count: 1545000 hasNull: false min: … max: … sum: 81385950 Column 6: count: 1545000 hasNull: false min: ALB max: ZAF sum: 4635000 Page 40

41. 41 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Row Group Indexes Row group indices for column 1: Entry 0: count: 10000 hasNull: false min: 1.101.125.195 max: 99.98.152.204 sum: 132919 positions: 0,0,0,0,0 Entry 1: count: 10000 hasNull: false min: 1.104.147.167 max: 99.85.51.213 sum: 132976 positions: 0,132919,0,6119,52 Entry 2: count: 10000 hasNull: false min: 1.1.228.147 max: 99.88.166.75 sum: 132826 positions: 120403,3751,0,12339,3 Entry 3: count: 10000 hasNull: false min: 1.104.90.89 max: 99.96.30.136 sum: 132853 positions: 120403,136577,0,18482,4 Entry 4: count: 10000 hasNull: false min: 1.11.252.134 max: 99.71.248.30 sum: 132856 positions: 240743,7286,0,24600,2 Entry 5: count: 10000 hasNull: false min: 1.119.19.221 max: 99.96.184.74 sum: 132977 positions: 240743,140142,0,30713,8 Entry 6: count: 10000 hasNull: false min: 1.1.244.95 max: 99.99.242.168 sum: 132735 positions: 360961,10975,0,36946,1 Entry 7: count: 10000 hasNull: false min: 1.1.146.20 max: 99.93.105.159 sum: 132869 positions: 360961,143710,0,43145,2 Page 41

42. 42 © Hortonworks Inc. 2011 – 2016. All Rights Reserved SARG & Predicate Pushdown Ã SARG: Search ARGument Ã SELECT COUNT(*) FROM CUSTOMER WHERE CUSTOMER.state = ʻCAʼ; Ã 上記のようなクエリにおいて、RecordReaderはwhere clauseにマッチする ORCファイル、Stripe、RowGroupだけをストレージから読み出す Page 42

43. 43 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Bloom Filter Index 1 0 1 110 1 0 11 x y z w m=10 k=3 m個の要素を持つ配列に対して⼊⼒値に対してk回のハッシュ関数をかけて結果を格納しておく。確認対象の値をk回ハッシュして、結果がすべて1であれば、そのインデックスに値が含まれる。そうでなければ含まれないのでスキップする。偽陽性の結果になる可能性もある。

44. 44 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Bloom Filter Indexes Improvements 5999989709 540,000 10,000 No Indexes Min-Max Indexes Bloomﬁlter Indexes select * from tpch_1000.lineitem where l_orderkey = 1212000001; (log scale – smaller is beNer) Rows Read

45. 45 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Bloom Filter Indexes Improvements 74 4.5 1.34 No Indexes Min-Max Indexes Bloomﬁlter Indexes select * from tpch_1000.lineitem where l_orderkey=1212000001; (smaller is beNer) Time Taken (seconds) ~16x improvement ~3.3x improvement

46. 46 © Hortonworks Inc. 2011 – 2016. All Rights Reserved ORCFile – テーブル定義の例 Ã テーブルまたはパーティション別に定義 Ã 選べられる圧縮コーデック Page 46 create table Addresses ( name string, street string, city string, state string, zip int ) stored as orc tblproperties ("orc.compress"=”ZLIB");

47. 47 © Hortonworks Inc. 2011 – 2016. All Rights Reserved ORCFile – テキストからORCに変換 Ã ORCを使わない理由はない Ã SQL 1つでテキストからORCに変換 Page 47 -- Create Text & ORC tables CREATE TABLE test_details_txt( visit_id INT, store_id SMALLINT) STORED AS TEXTFILE; CREATE TABLE test_details_orc( visit_id INT, store_id SMALLINT) STORED AS ORC; -- Load into Text table LOAD DATA LOCAL INPATH '/home/user/test_details.csv' INTO TABLE test_details_txt; -- Copy to ORC table INSERT OVERWRITE INTO test_details_orc SELECT * FROM test_details_txt;

51. 51 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Column Store Characteristics Row Store •  TextFile, SequenceFile, Avro •  Slower read performance •  Reads whole columns •  Lower compression ratio •  Higher local cardinality Column Store •  RCFile, Parquet, ORC •  Faster read performance •  Reads needed columns only •  Higher compression ratio •  Lower local cardinality •  Room for further optimization •  Vectorization

52. 52 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Hive Vectorization 2014 Rewriting Hive execution engine for performance •  No method calls •  Low instruction count •  Cache locality to 1,024 values •  No pipeline stalls •  SIMD in Java 8 But not excellent without SIMD set hive.vectorized.execution.enabled = true; J. Sompolski, M. Zukowski, P. Boncz. Vectorization vs. Compilation in Query Execution. 2011

54. 54 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Cost Based Optimizer Ã  Apache Calciteを利⽤ Ã  何をしてくれるもの？ –  Ordering joins –  Bushy Join Tree –  Converting join algorithms Ã  Paper: https://cwiki.apache.org/conﬂuence/display/Hive/Cost-based +optimization+in+Hive Ã  Anatomy: http://hortonworks.com/blog/hive-0-14-cost-based- optimizer-cbo-technical-overview/

55. 55 © Hortonworks Inc. 2011 – 2016. All Rights Reserved MySQL Splunk Expression tree SELECT p.“product_name”, COUNT(*) AS c FROM “splunk”.”splunk” AS s JOIN “mysql”.”products” AS p ON s.”product_id” = p.”product_id” WHERE s.“action” = 'purchase' GROUP BY p.”product_name” ORDER BY c DESC join Key: product_id group Key: product_name Agg: count ﬁlter Condition: action = 'purchase' sort Key: c DESC scan scan Table: splunk Table: products

56. 56 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Splunk Expression tree (optimized) SELECT p.“product_name”, COUNT(*) AS c FROM “splunk”.”splunk” AS s JOIN “mysql”.”products” AS p ON s.”product_id” = p.”product_id” WHERE s.“action” = 'purchase' GROUP BY p.”product_name” ORDER BY c DESC join Key: product_id group Key: product_name Agg: count ﬁlter Condition: action = 'purchase' sort Key: c DESC scan Table: splunk MySQL scan Table: products

57. 57 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Query preparation – Hive 0.13 SQL parser Semantic analyzer Logical Optimizer Physical Optimizer Abstract Syntax Tree (AST) Hive SQL Annotated AST Plan Tez Tuned Plan

58. 58 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Query preparation – Hive 0.14 SQL parser Semantic analyzer Logical Optimizer Physical Optimizer Hive SQL AST with optimized join- ordering Tez Tuned Plan Translate to algebra Optiq optimize r

60. 60 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Query combining two stars SELECT product.id, sum(sales.units), sum(inventory.on_hand) FROM sales ON … JOIN customer ON … JOIN time ON … JOIN product ON … JOIN inventory ON … JOIN warehouse ON … WHERE time.year = 2014 AND time.quarter = ʻQ1ʼ AND product.color = ʻRedʼ AND warehouse.state = ʻWAʼ GROUP BY … Sales InventoryTime Product Customer Warehouse

61. 61 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Left-deep tree “left-deep”ツリーすべてのジョインがシリアルに⾏われる。ジョインの順番は考慮されているが、ツリーの形は考慮されていない。よくあるプラン: •  最⼤のテーブルを左下に置いてスタート •  絞り込みの⼤きいJoinから適⽤していく Sales Customer Time Product Inventory Warehouse

62. 62 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Bushy tree (Bush:低⽊、茂み） Joinがどこで⾏われるかに制約をかけない “Bushes” はファクトテーブル (Sales and Inventory)と関連するディメンションテーブルで形成されるディメンションテーブルがフィルターの役割を果たす結果としてデータの読み込み⾏数やネットワークを介してのやり取りを少なくしていける Sales Customer Time Product Inventory Warehouse

63. 63 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Cost variables Ã  Hr - This is the cost of Reading 1 byte from HDFS in nano seconds. Ã  Hw - This is the cost of Writing 1 byte to HDFS in nano seconds. Ã  Lr - This is the cost of Reading 1 byte from Local FS in nano seconds. Ã  Lw - This is the cost of writing 1 byte to Local FS in nano seconds. Ã  NEt – This is the average cost of transferring 1 byte over network in the Hadoop cluster from any node to any node; expressed in nano seconds. Ã  T(R) - This is the number of tuples in the relation. Ã  Tsz – Average size of the tuple in the relation Ã  V(R, a) –The number of distinct values for attribute a in relation R Ã  CPUc – CPU cost for a comparison in nano seconds

67. 67 © Hortonworks Inc. 2011 – 2016. All Rights Reserved What is LLAP? •  Hiveの処理実⾏のための常駐型プロセス •  タスクの起動コストの低減 •  JITオプティマイザがより利きやすい •  プロセスではなくスレッド型のExecutor •  メタデータやMapJoinのテーブルなどをタスク間で共有できる •  IOの⾮同期化とキャッシュの導⼊ •  Query fragment API Node LLAP Process Cache Query Fragment HDFS Query Fragment

68. 68 © Hortonworks Inc. 2011 – 2016. All Rights Reserved What LLAP isn't •  Hive execution engine (like Tez, MR, Spark…) •  Execution enginesは処理の組み⽴てやを⾏う •  Not a storage layer •  LLAPデーモンはステートレスで、データはHDFSをsource of truth として利⽤する •  Does not supersede existing Hive •  Containerベースの実⾏も引き続き進化していく

69. 69 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Example execution: MR vs Tez vs Tez+LLAP M M M R R M M R M M R M M R HDFS HDFS HDFS T T T R R R T T T R M M M R R R M M R R HDFS In-Memory columnar cache Map – Reduce Intermediate results in HDFS Tez Optimized Pipeline Tez with LLAP Resident process on Nodes Map tasks read HDFS

70. 70 © Hortonworks Inc. 2011 – 2016. All Rights Reserved LLAP in your cluster •  LLAPデーモンはYARN上で実⾏される •  Apache Sliderがデーモン⽤コンテナのプロビジョンとリカバリを⾏う •  Resource management via YARN delegation model (WIP) •  LLAP and containers dynamically balance resource usage (WIP)

72. 72 © Hortonworks Inc. 2011 – 2016. All Rights Reserved •  DAGによる処理の組み⽴てはそのまま利⽤される。Tezのランタイムもそのまま利⽤される。 •  フラグメント/タスクはLLAPもしくは通常のコンテナ、AM内のいずれでも実⾏可能 •  どこで実⾏されるかはHive Clientによって決定される •  Conﬁgurable – all in LLAP, none in LLAP, intelligent mix •  LLAPにタスクを割り当てるポリシー(in auto mode) •  No user code (or only blessed user code) •  Data source – HDFS •  ORC and vectorized execution (for now) •  Others can still run in LLAP in "all" mode, w/o IO elevator and cache •  Data size limitations (avoid heavy / long running processing within LLAP) Tez + LLAP – overview

76. 76 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Scheduling for LLAP in Tez AM •  Greedy scheduling per query •  クラスタ全体が利⽤可能な前提でスケジューリングが⾏われる •  Schedule work to preferred location (HDFS locality) •  同じデータにアクセスする複数のクエリ間で、preferred locationの設定によって同じデーモン上でタスクを実⾏させることができる

77. 77 © Hortonworks Inc. 2011 – 2016. All Rights Reserved LLAP Queue Queuing fragments •  LLAPデーモンはスレッドプールを使ってタスク/フラグメントを実⾏する •  内部にキューを持っており、プラガブルな優先度付の仕組みもある Executor Q1 Reducer 2 Executor Q1 Map 1 Executor Q1 Map 1 Executor Q3 Map 19 Q1 Reducer 2 Q1 Map 1 Q3 Map 19 Q1 Reducer 2

78. 78 © Hortonworks Inc. 2011 – 2016. All Rights Reserved LLAP Scheduling – pipelining and preemption •  フラグメントは⼊⼒データが揃いきっていなくても実⾏開始できる •  ⼊⼒データが揃った時点で”ﬁnishable”というフラグが付与される LLAP QueueExecutor Executor Interactive query map 1/3 … Interactive query map 3/3 Executor Interactive query map 2/3 Wide query reduce Well, 10 mapper out of 100 are done!

79. 79 © Hortonworks Inc. 2011 – 2016. All Rights Reserved LLAP Scheduling – pipelining and preemption •  フラグメントは⼊⼒データが揃いきっていなくても実⾏開始できる •  ⼊⼒データが揃った時点で”ﬁnishable”というフラグが付与される •  ﬁnishableになるまでexecutorを解放はしない LLAP QueueExecutor Executor Interactive query map 1/3 … Interactive query map 3/3 Executor Interactive query map 2/3 Wide query reduce

80. 80 © Hortonworks Inc. 2011 – 2016. All Rights Reserved LLAP Scheduling – pipelining and preemption •  フラグメントは⼊⼒データが揃いきっていなくても実⾏開始できる •  ⼊⼒データが揃った時点で”ﬁnishable”というフラグが付与される •  ﬁnishableになるまでexecutorを解放はしない LLAP QueueExecutor Executor Interactive query map 1/3 … Interactive query map 3/3 Executor Interactive query map 2/3 Wide query reduce

81. 81 © Hortonworks Inc. 2011 – 2016. All Rights Reserved LLAP Scheduling – pipelining and preemption •  フラグメントは⼊⼒データが揃いきっていなくても実⾏開始できる •  ⼊⼒データが揃った時点で”finishable”というフラグが付与される •  finishableになるまでexecutorを解放はしない •  Non-finishableなフラグメントはプリエンプションされる LLAP QueueExecutor Executor Interactive query map 1/3 … Interactive query map 3/3 Executor Interactive query map 2/3 Wide query reduce

84. 84 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Asynchronous IO •  LLAPでは、IOエレベータースレッドがディスクIO、圧縮、などを⾮同期に執り⾏う •  IO threads can be spindle aware (WIP) •  Depending on workload, IO and processing threads can balance resource usage (throttle IO, etc.) (WIP)

85. 85 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Caching and oﬀ-heap data •  解凍されたデータはoﬀ-heapにキャッシュされる •  キャッシュについてはGCを気にしないでいいように •  HDFSのIOと解凍コストを排除。特にディメンションテーブルに有効 •  プラガブルなEviction Policy •  現在はFIFO, LRFUをサポート

86. 86 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Other beneﬁts •  ファイルのメタデータやインデックスもキャッシュされる •  Predicate Pushdownの⾼速化 •  MapJoin⽤のハッシュテーブルやフラグメントの実⾏計画もJVM内で共有される •  タスク/フラグメントごとに実⾏計画のデシリアライズのコストが減る •  Better use of JIT optimizer •  起動しっぱなしのデーモンなので、JITが仕事をするための時間がより⻑く取れる •  Especially good with vectorization!

88. 88 © Hortonworks Inc. 2011 – 2016. All Rights Reserved Sub-second ショートクエリで 1秒以下のレスポンスを⽬指す Ã ~Hive1.2.1 – Tez – Cost Based Optimizer(CBO) – ORC File format – Vectorization Ã Hive2.0 – LLAP Stinger Initiative Hiveを100倍以上⾼速化 Already available on HDP! 最近のApache Hive: Key highlights

Apache Hiveの今とこれから - 2016

Yuta Imai

Apache Hiveの今とこれから - 2016