(cache) LLVMとコンパイラとVM 何故JVM(HotSpot)のUnsafe APIは速いのか

何故JVM(HotSpot)のUnsafe APIは速いのか

デシリアライズ速度の比較 ByteBuffer vs DirectBuffer vs Unsafe vs C

上記blogを読んで非常に面白い結果だなーと思いまして、
え、Unsafeってこんなに速いの？と思ったので、いろいろ調べてみました。
...逆にいうとByteBufferを使うとCほどには速くならない。

(1) HotSpot Unsafeの仕組み

unsafe系のAPIは、HotSpotの組み込み関数(vmIntrinsics)に指定されています。
また以降のPrintInliningでも確認できます。

hotspot/src/share/vm/classfile/vmSymbol.hpp


do_intrinsic(_getObject,                sun_misc_Unsafe,        getObject_name, getObject_signature,    
do_intrinsic(_getBoolean,               sun_misc_Unsafe,        getBoolean_name, getBoolean_signature,  
do_intrinsic(_getByte,                  sun_misc_Unsafe,        getByte_name, getByte_signature,        
do_intrinsic(_getShort,                 sun_misc_Unsafe,        getShort_name, getShort_signature,      
do_intrinsic(_getChar,                  sun_misc_Unsafe,        getChar_name, getChar_signature,        
do_intrinsic(_getInt,                   sun_misc_Unsafe,        getInt_name, getInt_signature,          
do_intrinsic(_getLong,                  sun_misc_Unsafe,        getLong_name, getLong_signature,        
do_intrinsic(_getFloat,                 sun_misc_Unsafe,        getFloat_name, getFloat_signature,      
do_intrinsic(_getDouble,                sun_misc_Unsafe,        getDouble_name, getDouble_signature,    
do_intrinsic(_putObject,                sun_misc_Unsafe,        putObject_name, putObject_signature,    
do_intrinsic(_putBoolean,               sun_misc_Unsafe,        putBoolean_name, putBoolean_signature,  
do_intrinsic(_putByte,                  sun_misc_Unsafe,        putByte_name, putByte_signature,        
do_intrinsic(_putShort,                 sun_misc_Unsafe,        putShort_name, putShort_signature,      
do_intrinsic(_putChar,                  sun_misc_Unsafe,        putChar_name, putChar_signature,        
do_intrinsic(_putInt,                   sun_misc_Unsafe,        putInt_name, putInt_signature,          
do_intrinsic(_putLong,                  sun_misc_Unsafe,        putLong_name, putLong_signature,        
do_intrinsic(_putFloat,                 sun_misc_Unsafe,        putFloat_name, putFloat_signature,      
do_intrinsic(_putDouble,                sun_misc_Unsafe,        putDouble_name, putDouble_signature,

そもそもvmIntrinsicsが何かというと、
HotSpotでは上記で登録済みのsignatureにパターンマッチして、
Java Bytecodeのinvokeの引数がsignatureにマッチするか判定します。
BytecodeをJITコンパイルする際に、invoke系のBytecodeをCall Nodeに置換しますが、
その呼び出し先を通常のclassのsignatureではなく、vmIntrinsicsとなるように展開します。

vmIntrinsicsへの呼び出しは、各JITコンパイラが個別に命令列に展開していきます。
C2コンパイラであるoptoの対応箇所は以下になります。
hotspot/src/share/vm/opto/library_call.cpp


...
case vmIntrinsics::_getInt:                   return inline_unsafe_access(!is_native_ptr, !is_store, T_INT,     !is_volatile);
case vmIntrinsics::_getLong:                  return inline_unsafe_access(!is_native_ptr, !is_store, T_LONG,    !is_volatile);
...

optoの中間表現はidealという名前で、Nodeクラスがグラフ構造でつながっています。
library_callではintrinsicsへのCall Nodeを、さらに対応する別のNodeに置換しています。
上記のgetIntやgetLongに関しては、make_laod()により、IdealのLoad Node系に置換します。
LoadNodeは、最終的にCodeGeneratorによりアセンブラに変換されます。。

上記のような仕組みであるため、vmIntrinsicsは
特定のsymbolへの関数呼び出しに置換されるわけではないです。
多くの場合、opto内で対応するNodeにinline展開され、そのままアセンブラに変換されます。

(2) Oracle JVMでログを参照する
Oracle JVMの隠しオプションをいくつか使ってコンパイルログを参照し、
どうなっているのかざっくり把握してみる。

こういうときによく使うオプションは以下です。
"-Xbatch -XX:+UnlockDiagnosticVMOptions -XX:+PrintCompilation -XX:+PrintInlining"

-Xbatchは、JITコンパイルと実行中のプログラムが並行して走らないようにします。
Unlockで-XXオプションを指定可能にします。
PrintCompilationはJITコンパイルしたログを出力します。
PrintInliningはInline展開のログを出力します。

簡単な処理なんかは、上記でどんな傾向なのか分かります。

下記のバージョンでログを確認してみました。
全ログファイルはここを参照。
oracle_inlining.log


java version "1.7.0_51"
Java(TM) SE Runtime Environment (build 1.7.0_51-b13)
Java HotSpot(TM) 64-Bit Server VM (build 24.51-b03, mixed mode)

まずはログの見方を解説


5049  104     n 0       sun.misc.Unsafe::compareAndSwapObject (native)   
5170  105     n 0       sun.misc.Unsafe::getByte (native)   
5170  106     n 0       sun.misc.Unsafe::getInt (native)   
5170  107     n 0       sun.misc.Unsafe::getLong (native)   
5188  108 %  b  3       DeserBenchmark$UnsafeRunnable::run @ 10 (98 bytes)
                           @ 29   sun.misc.Unsafe::getByte (0 bytes)   intrinsic
                           @ 55   sun.misc.Unsafe::getInt (0 bytes)   intrinsic
                           @ 82   sun.misc.Unsafe::getLong (0 bytes)   intrinsic

JITコンパイルの仕組みや、上記ログのみかたはJRubyを開発しているNutterさんの資料が詳しい。
slideshare

1列目の5049とか5170って数字は、JVM起動からの経過時間(ms)
104-105ってのは、JITコンパイルの順番を表す数値

% OnStackReplacementでJITコンパイルしたメソッドを表す
b なんだっけ、、
n native method、大体はintrinsics
! 例外発生するもの。例外はJITコンパイルに悪影響与える可能性大なので、kernel loopからは取り除くべき

nの隣の数字 0-4に関して
最近のHotSpotのJITコンパイラは4Tier JIT Compilationなので、0-4はその数値です。
0はインタプリタ、1-2はC1コンパイラ、3-4はC2コンパイラだったかな、詳しいことは忘れた。。
数字が出ていない場合は、TierCompilationが無効なはず。

inline (hot) inline展開されたもの
(native) nativeなシンボル。JNIとかもそう。
intrinsic intrinsicへの呼び出しに展開したもの。
too big callerのサイズが大きく、inline展開の閾値の関係で諦めたもの
callee is too large 展開のしすぎでcallee本体が閾値に達したもの

まずはByteBufferRunnable heap


    140   12 %  b        DeserBenchmark$ByteBufferRunnable::run @ 13 (74 bytes)
                            @ 23   java.nio.HeapByteBuffer::get (15 bytes)   inline (hot)
                             \-> TypeProfile (11264/11264 counts) = java/nio/HeapByteBuffer
                              @ 7   java.nio.Buffer::checkIndex (22 bytes)   inline (hot)
                              @ 10   java.nio.HeapByteBuffer::ix (7 bytes)   inline (hot)
                            @ 40   java.nio.HeapByteBuffer::getInt (19 bytes)   inline (hot)
                             \-> TypeProfile (5724/5724 counts) = java/nio/HeapByteBuffer
                              @ 5   java.nio.Buffer::checkIndex (24 bytes)   inline (hot)
                              @ 8   java.nio.HeapByteBuffer::ix (7 bytes)   inline (hot)
                              @ 15   java.nio.Bits::getInt (18 bytes)   inline (hot)
                                @ 6   java.nio.Bits::getIntB (30 bytes)   inline (hot)
                                  @ 2   java.nio.HeapByteBuffer::_get (7 bytes)   inline (hot)
                                  @ 9   java.nio.HeapByteBuffer::_get (7 bytes)   inline (hot)
                                  @ 16   java.nio.HeapByteBuffer::_get (7 bytes)   inline (hot)
                                  @ 23   java.nio.HeapByteBuffer::_get (7 bytes)   inline (hot)
                                  @ 26   java.nio.Bits::makeInt (29 bytes)   inline (hot)
                                @ 14   java.nio.Bits::getIntL (30 bytes)   never executed
                            @ 58   java.nio.HeapByteBuffer::getLong (20 bytes)   inline (hot)
                             \-> TypeProfile (5540/5540 counts) = java/nio/HeapByteBuffer
                              @ 6   java.nio.Buffer::checkIndex (24 bytes)   inline (hot)
                              @ 9   java.nio.HeapByteBuffer::ix (7 bytes)   inline (hot)
                              @ 16   java.nio.Bits::getLong (18 bytes)   inline (hot)
                                @ 6   java.nio.Bits::getLongB (60 bytes)   inline (hot)
                                  @ 2   java.nio.HeapByteBuffer::_get (7 bytes)   inline (hot)
                                  @ 9   java.nio.HeapByteBuffer::_get (7 bytes)   inline (hot)
                                  @ 16   java.nio.HeapByteBuffer::_get (7 bytes)   inline (hot)
                                  @ 23   java.nio.HeapByteBuffer::_get (7 bytes)   inline (hot)
                                  @ 30   java.nio.HeapByteBuffer::_get (7 bytes)   inline (hot)
                                  @ 37   java.nio.HeapByteBuffer::_get (7 bytes)   inline (hot)
                                  @ 45   java.nio.HeapByteBuffer::_get (7 bytes)   inline (hot)
                                  @ 53   java.nio.HeapByteBuffer::_get (7 bytes)   inline (hot)
                                  @ 56   java.nio.Bits::makeLong (77 bytes)   inline (hot)
                                @ 14   java.nio.Bits::getLongL (60 bytes)   too big

見どころは
(a) TypeProfile
抽象クラスのメソッドやinterfaceは、どの実装が呼び出されるのかわからないので、
どの実装が呼び出されているのかprofileしている。
この段階ではHeapByteBufferのみなので、ガードを挿入した上でHeapByteBufferのコードを展開している。
optoのinline展開は、静的にクラス階層解析するのではなく、
ログ取得したTypeProfile任せにinline展開先を決定する。

(b) intrinsicsが存在しない。
HeapByteBufferはintrinsicsをwrapしたものではない。

ByteBufferRunnable direct


   2374   14 %  b        DeserBenchmark$ByteBufferRunnable::run @ 13 (74 bytes)
                            @ 23   java.nio.HeapByteBuffer::get (15 bytes)   inline (hot)
                            @ 23   java.nio.DirectByteBuffer::get (16 bytes)   inline (hot)
                             \-> TypeProfile (4096/16384 counts) = java/nio/DirectByteBuffer
                             \-> TypeProfile (12288/16384 counts) = java/nio/HeapByteBuffer
                              @ 6   java.nio.Buffer::checkIndex (22 bytes)   inline (hot)
                              @ 9   java.nio.DirectByteBuffer::ix (10 bytes)   inline (hot)
                              @ 12   sun.misc.Unsafe::getByte (0 bytes)   (intrinsic)
                              @ 7   java.nio.Buffer::checkIndex (22 bytes)   inline (hot)
                              @ 10   java.nio.HeapByteBuffer::ix (7 bytes)   inline (hot)
                            @ 40   java.nio.HeapByteBuffer::getInt (19 bytes)   inline (hot)
                            @ 40   java.nio.DirectByteBuffer::getInt (15 bytes)   inline (hot)
                             \-> TypeProfile (2034/8273 counts) = java/nio/DirectByteBuffer
                             \-> TypeProfile (6239/8273 counts) = java/nio/HeapByteBuffer
                              @ 5   java.nio.Buffer::checkIndex (24 bytes)   inline (hot)
                              @ 8   java.nio.DirectByteBuffer::ix (10 bytes)   inline (hot)
                              @ 11   java.nio.DirectByteBuffer::getInt (39 bytes)   too big
                              @ 5   java.nio.Buffer::checkIndex (24 bytes)   inline (hot)
                              @ 8   java.nio.HeapByteBuffer::ix (7 bytes)   inline (hot)
                              @ 15   java.nio.Bits::getInt (18 bytes)   inline (hot)
                                @ 6   java.nio.Bits::getIntB (30 bytes)   inline (hot)
                                  @ 2   java.nio.HeapByteBuffer::_get (7 bytes)   inline (hot)
                                  @ 9   java.nio.HeapByteBuffer::_get (7 bytes)   inline (hot)
                                  @ 16   java.nio.HeapByteBuffer::_get (7 bytes)   inline (hot)
                                  @ 23   java.nio.HeapByteBuffer::_get (7 bytes)   inline (hot)
                                  @ 26   java.nio.Bits::makeInt (29 bytes)   inline (hot)
                                @ 14   java.nio.Bits::getIntL (30 bytes)   never executed
                            @ 58   java.nio.HeapByteBuffer::getLong (20 bytes)   inline (hot)
                            @ 58   java.nio.DirectByteBuffer::getLong (16 bytes)   inline (hot)
                             \-> TypeProfile (2062/8111 counts) = java/nio/DirectByteBuffer
                             \-> TypeProfile (6049/8111 counts) = java/nio/HeapByteBuffer
                              @ 6   java.nio.Buffer::checkIndex (24 bytes)   inline (hot)
                              @ 9   java.nio.DirectByteBuffer::ix (10 bytes)   inline (hot)
                              @ 12   java.nio.DirectByteBuffer::getLong (39 bytes)   too big
                              @ 6   java.nio.Buffer::checkIndex (24 bytes)   inline (hot)
                              @ 9   java.nio.HeapByteBuffer::ix (7 bytes)   inline (hot)
                              @ 16   java.nio.Bits::getLong (18 bytes)   inline (hot)
                                @ 6   java.nio.Bits::getLongB (60 bytes)   inline (hot)
                                  @ 2   java.nio.HeapByteBuffer::_get (7 bytes)   inline (hot)
                                  @ 9   java.nio.HeapByteBuffer::_get (7 bytes)   inline (hot)
                                  @ 16   java.nio.HeapByteBuffer::_get (7 bytes)   inline (hot)
                                  @ 23   java.nio.HeapByteBuffer::_get (7 bytes)   inline (hot)
                                  @ 30   java.nio.HeapByteBuffer::_get (7 bytes)   inline (hot)
                                  @ 37   java.nio.HeapByteBuffer::_get (7 bytes)   inline (hot)
                                  @ 45   java.nio.HeapByteBuffer::_get (7 bytes)   inline (hot)
                                  @ 53   java.nio.HeapByteBuffer::_get (7 bytes)   inline (hot)
                                  @ 56   java.nio.Bits::makeLong (77 bytes)   inline (hot)
                                @ 14   java.nio.Bits::getLongL (60 bytes)   too big

見どころは
(c) TypeProfileが複数存在する。
ここでDirectByteBufferとHeapByteBufferの2つの呼び出しログが衝突している。
複数のガードを挿入した上で、双方をインライン展開している。
はっきりいうとHeapByteBufferが邪魔

(d) DirectByteBufferのgetByteがintrinsic
Unsafe::getByteをwrapしているらしい。
ここが性能の差かもしれない。。

(e) too big
DirectByteBufferのgetIntとgetLongが重要なのに、
HeapByteBufferのTypeProfileが邪魔でinline展開できていない。。

@ 11 java.nio.DirectByteBuffer::getInt (39 bytes) too big
@ 12 java.nio.DirectByteBuffer::getLong (39 bytes) too big

これは、、と思ったのでコードを変更して測定しなおしてみる。
何を変えたかというと、最初のHeapByteBufferを呼び出さずに、
DirectBufferとUnsafeの呼び出しのみ行う。

実行結果



変更前
-- ByteBuffer heap
  11.23 msec/loop
  849.22 MB/s
-- ByteBuffer direct
  9.18 msec/loop
  1038.86 MB/s
-- Unsafe heap
  6.26 msec/loop
  1523.44 MB/s
-- Unsafe direct
  6.30 msec/loop
  1513.77 MB/s

変更後
-- ByteBuffer direct
  6.89 msec/loop
  1384.14 MB/s
-- Unsafe heap
  6.24 msec/loop
  1528.32 MB/s
-- Unsafe direct
  6.23 msec/loop
  1530.78 MB/s

Unsafeの性能は変わらないけど、ByteBuffer directの実行性能が大きく上がっている。
ログでも内容を確認してみる。


    264   12 %  b        DeserBenchmark$ByteBufferRunnable::run @ 13 (74 bytes)
                            @ 23   java.nio.DirectByteBuffer::get (16 bytes)   inline (hot)
                             \-> TypeProfile (11264/11264 counts) = java/nio/DirectByteBuffer
                              @ 6   java.nio.Buffer::checkIndex (22 bytes)   inline (hot)
                              @ 9   java.nio.DirectByteBuffer::ix (10 bytes)   inline (hot)
                              @ 12   sun.misc.Unsafe::getByte (0 bytes)   (intrinsic)
                            @ 40   java.nio.DirectByteBuffer::getInt (15 bytes)   inline (hot)
                             \-> TypeProfile (5724/5724 counts) = java/nio/DirectByteBuffer
                              @ 5   java.nio.Buffer::checkIndex (24 bytes)   inline (hot)
                              @ 8   java.nio.DirectByteBuffer::ix (10 bytes)   inline (hot)
                              @ 11   java.nio.DirectByteBuffer::getInt (39 bytes)   inline (hot)
                                @ 10   sun.misc.Unsafe::getInt (0 bytes)   (intrinsic)
                                @ 26   java.nio.Bits::swap (5 bytes)   inline (hot)
                                  @ 1   java.lang.Integer::reverseBytes (26 bytes)   (intrinsic)
                            @ 58   java.nio.DirectByteBuffer::getLong (16 bytes)   inline (hot)
                             \-> TypeProfile (5540/5540 counts) = java/nio/DirectByteBuffer
                              @ 6   java.nio.Buffer::checkIndex (24 bytes)   inline (hot)
                              @ 9   java.nio.DirectByteBuffer::ix (10 bytes)   inline (hot)
                              @ 12   java.nio.DirectByteBuffer::getLong (39 bytes)   inline (hot)
                                @ 10   sun.misc.Unsafe::getLong (0 bytes)   (intrinsic)
                                @ 26   java.nio.Bits::swap (5 bytes)   inline (hot)
                                  @ 1   java.lang.Long::reverseBytes (46 bytes)   (intrinsic)

見どころは
(f) TypeProfileはDirectByteBufferのみになった。
Profile結果1つだけなので、inline展開し易くなっている。
inline展開の閾値を操作するだけでも性能は上がったかもしれない。

(g) DirectByteBufferのメソッドは、最終的にUnsafeのintrinsicに展開されている。
too bigが解消されて、UnsafeのgetInt getLongまで展開されている。
DirectByteBufferはUnsafeをwrapしたクラスっぽい。

Unsafe heap


   4262   29 %  b        DeserBenchmark$UnsafeRunnable::run @ 10 (98 bytes)
                            @ 29   sun.misc.Unsafe::getByte (0 bytes)   (intrinsic)
                            @ 55   sun.misc.Unsafe::getInt (0 bytes)   (intrinsic)
                            @ 82   sun.misc.Unsafe::getLong (0 bytes)   (intrinsic)

最後にUnsafeですが、、、
な…　何を言っているのか　わからねーと思うが　
おれも　何をされたのか　わからなかった…

Unsafeのメソッドがintrinsicなので、上記のようなTypeProfileを気にしなくてもよいので、こんな結果に。
これは確かにCと同等のコードが生成されますよね。。

(3) OpenJDKでJITコンパイルしたコードを逆アセンブルする

疲れたので後日更新予定。。

テーマ : ソフトウェア開発
ジャンル : コンピュータ

Pagetop

ホーム » JVMのオプションをソースコードから追う

trackback

この記事にトラックバックする(FC2ブログユーザー)

Pagetop

コメントの投稿

Pagetop

ホーム prev »

Pagetop