最新のデータベース技術の方向性で思うこと

477 views

Published on

2006/4 クラウド研究会 @Google

Published in: Software

最新のデータベース技術の方向性で思うこと

  1. 1.    “Stream-driven” paradigm
  2. 2. 4 レガシーデバイス (カスタムプロトコル) IoT デバイス IP 通信可能な デバイス (Windows/Linux) 省電力駆動のデバ イス (RTOS) Cloud gateways Field gateways ストリーム処理 (Stream Analytics) クエリと検索 (Azure Search) 見える化・分析 (Power BI) ダッシュボード (Azure App Services) 機械学習 (Machine Learning API) 機械学習 (Machine Learning / Revolution R Enterprise) 並列データ処理 (Azure Data Lake Analytics) デバイスへの通知 (Notification Hubs) DWH ドキュメント データベース 時系列データ イベント ブローカー/ デバイス管理 Azure Active Directory 認証基盤 Azure Active Directory
  3. 3.    Partition 1 Partition 2 Partition “n” Consumer Group C Callback for prtn. 6 Callback for prtn. 2 Worker “n” Callback for prtn. 1 Callback “n” Worker 1 Consumer Group B Callback for prtn. 6 Callback for prtn. 2 Worker “n” Callback for prtn. 1 Callback “n” Worker 1 Consumer Group A Worker “n” Callback for prtn. 6 Callback for prtn. 2 Callback for prtn. 1 Callback “n” Worker 1
  4. 4.        
  5. 5.   
  6. 6. Intermediary Broker Backpressure Feedback
  7. 7.     
  8. 8.       Broker Broker Broker
  9. 9.       
  10. 10.  
  11. 11. Transformation Meaning map(func) Return a new DStream by passing each element of the source DStream through a function func. flatMap(func) Similar to map, but each input item can be mapped to 0 or more output items. filter(func) Return a new DStream by selecting only the records of the source DStream on which func returns true. repartition(numPartitions) Changes the level of parallelism in this DStream by creating more or fewer partitions. union(otherStream) Return a new DStream that contains the union of the elements in the source DStream and otherDStream. count() Return a new DStream of single-element RDDs by counting the number of elements in each RDD of the source DStream. reduce(func) Return a new DStream of single-element RDDs by aggregating the elements in each RDD of the source DStream using a function func (which takes two arguments and returns one). The function should be associative so that it can be computed in parallel.
  12. 12. Transformation Meaning countByValue() When called on a DStream of elements of type K, return a new DStream of (K, Long) pairs where the value of each key is its frequency in each RDD of the source DStream. reduceByKey(func, [numTasks]) When called on a DStream of (K, V) pairs, return a new DStream of (K, V) pairs where the values for each key are aggregated using the given reduce function. Note: By default, this uses Spark's default number of parallel tasks (2 for local mode, and in cluster mode the number is determined by the config property spark.default.parallelism) to do the grouping. You can pass an optional numTasks argument to set a different number of tasks. join(otherStream, [numTasks]) When called on two DStreams of (K, V) and (K, W) pairs, return a new DStream of (K, (V, W)) pairs with all pairs of elements for each key. cogroup(otherStream, [numTasks]) When called on DStream of (K, V) and (K, W) pairs, return a new DStream of (K, Seq[V], Seq[W]) tuples. transform(func) Return a new DStream by applying a RDD-to-RDD function to every RDD of the source DStream. This can be used to do arbitrary RDD operations on the DStream. updateStateByKey(func) Return a new "state" DStream where the state for each key is updated by applying the given function on the previous state of the key and the new values for the key. This can be used to maintain arbitrary state data for each key.
  13. 13. Transformation Description Map Takes one element and produces one element. A map function that doubles the values of the input stream FlatMap Takes one element and produces zero, one, or more elements. A flatmap function that splits sentences to words Filter Evaluates a boolean function for each element and retains those for which the function returns true. A filter that filters out zero values: KeyBy Logically partitions a stream into disjoint partitions, each partition containing elements of the same key. Internally, this is implemented with hash partitioning. Reduce A "rolling" reduce on a keyed data stream. Combines the current element with the last reduced value and emits the new value. Fold A "rolling" fold on a keyed data stream with an initial value. Combines the current element with the last folded value and emits the new value. Aggregations Rolling aggregations on a keyed data stream. The difference between min and minBy is that min returns the minimun value, whereas minBy returns the element that has the minimum value in this field (same for max and maxBy). Window Windows can be defined on already partitioned KeyedStreams. Windows group the data in each key according to some characteristic (e.g., the data that arrived within the last 5 seconds). WindowAll Windows can be defined on regular DataStreams. Windows group all the stream events according to some characteristic (e.g., the data that arrived within the last 5 seconds). Window Apply Applies a general function to the window as a whole. Below is a function that manually sums the elements of a window. Window Reduce Applies a functional reduce function to the window and returns the reduced value. Window Fold Applies a functional fold function to the window and returns the folded value.
  14. 14. Transformation Description Aggregations on windows Aggregates the contents of a window. The difference between min and minBy is that min returns the minimun value, whereas minBy returns the element that has the minimum value in this field (same for max and maxBy). Union Union of two or more data streams creating a new stream containing all the elements from all the streams. Node: If you union a data stream with itself you will get each element twice in the resulting stream. Window Join Join two data streams on a given key and a common window. Window CoGroup Cogroups two data streams on a given key and a common window. Connect "Connects" two data streams retaining their types. Connect allowing for shared state between the two streams. CoMap, CoFlatMap Similar to map and flatMap on a connected data stream Split Split the stream into two or more streams according to some criterion. Select Select one or more streams from a split stream. Iterate Creates a "feedback" loop in the flow, by redirecting the output of one operator to some previous operator. This is especially useful for defining algorithms that continuously update a model. The following code starts with a stream and applies the iteration body continuously. Elements that are greater than 0 are sent back to the feedback channel, and the rest of the elements are forwarded downstream. Extract Timestamps Extracts timestamps from records in order to work with windows that use event time semantics.
  15. 15.   出典: S-Store: Streaming Meets Transaction Processing
  16. 16.  http://yahoohadoop.tumblr.com/post/135370591481/benchmarking-streaming-computation-engines-at 
  17. 17.     
  18. 18.          
  19. 19.        1.4億 TATP tps   PB オーダー 
  20. 20.      
  21. 21.         
  22. 22. 出典: No compromises distributed transactions with consistency, availability, and performance
  23. 23. 出典: 同上
  24. 24. 出典: 同上
  25. 25.         
  26. 26.  本書に記載した情報は、本書各項目に関する発行日現在の Microsoft の見解を表明するものです。Microsoftは絶えず変化する市場に対応しなければならないため、ここに記載した情報に 対していかなる責務を負うものではなく、提示された情報の信憑性については保証できません。  本書は情報提供のみを目的としています。 Microsoft は、明示的または暗示的を問わず、本書にいかなる保証も与えるものではありません。  すべての当該著作権法を遵守することはお客様の責務です。Microsoftの書面による明確な許可なく、本書の如何なる部分についても、転載や検索システムへの格納または挿入を行うこ とは、どのような形式または手段(電子的、機械的、複写、レコーディング、その他)、および目的であっても禁じられています。これらは著作権保護された権利を制限するものではあ りません。  Microsoftは、本書の内容を保護する特許、特許出願書、商標、著作権、またはその他の知的財産権を保有する場合があります。Microsoftから書面によるライセンス契約が明確に供給さ れる場合を除いて、本書の提供はこれらの特許、商標、著作権、またはその他の知的財産へのライセンスを与えるものではありません。 © 2015 Microsoft Corporation. All rights reserved. Microsoft, Windows, その他本文中に登場した各製品名は、Microsoft Corporation の米国およびその他の国における登録商標または商標です。 その他、記載されている会社名および製品名は、一般に各社の商標です。

×