Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

GPGPU deいろんな問題解いてみた

2,033 views

Published on

Published in: Education, Technology, Business
  • Be the first to comment

GPGPU deいろんな問題解いてみた

  1. 1. GPU DE いろんな ちょっとした問題解いてみた Terumi YAMADA
  2. 2. 自己紹介• 山田てるみ(研修中• SIMD大好きっ子 • Twitter: telmin_orca
  3. 3. もくじ• 自己紹介• 前フリ• 巡回セールスマン問題解いてみた• Aobench走らせてみた• まとめ
  4. 4. 前フリ
  5. 5. OpenCLOpenCLとは? ステマ
  6. 6. OpenCLOpenCLとは? ステマ
  7. 7. OpenCLとは
  8. 8. OpenCLとは
  9. 9. OpenCLとはHeterogeneous
  10. 10. Heterogeneous?
  11. 11. NVIDIA• Geforce GTX 580 • Fermi • 512 CUDA core • 3GB RAM • PCIe 2.0
  12. 12. AMD• Radeon HD 7970 • GCN • 2048 Streaming Processor • 3GB RAM • PCIe 3.0
  13. 13. HOST• Intel Core i7 2600K • SandyBridge • 8GB RAM
  14. 14. 巡回セールスマン問題解いてみた
  15. 15. 巡回セールスマン問題?
  16. 16. 解法• 遺伝的アルゴリズム• 蟻コロニー最適化• μ-opt法 • LK法
  17. 17. 2-opt法i k i kl j l j
  18. 18. Parallel 2-opt• SIMD 2-opt法のGPGPUへの適応と評価 • 第74回情報処理学会全国大会 GPUセッション
  19. 19. 重いのは? i k i k l j l j
  20. 20. CPU -> GPU 経路長計算最短経路選択 最短経路交換
  21. 21. Result CPU NVIDIA AMD10万 152.241 114.02 2472.0612万 235.05 168.58 3487.4114万 296.395 266.21116万 427.161 328.547
  22. 22. …? CPU NVIDIA AMD10万 152.241 114.02 2472.0612万 235.05 168.58 3487.4114万 296.395 266.21116万 427.161 328.547
  23. 23. …? CPU NVIDIA AMD10万 152.241 114.02 2472.0612万 235.05 168.58 3487.4114万 296.395 266.21116万 427.161 328.547
  24. 24. …? CPU NVIDIA AMD10万 152.241 114.02 2472.0612万 235.05 168.58 3487.4114万 296.395 266.21116万 427.161 328.547 \(^o^)/
  25. 25. Aobench 走らせてみた
  26. 26. Aobench?• Ambient Occlution benchmark. • @syoyo氏制作 • 浮動小数点演算のベンチマーク
  27. 27. Ambient Occlution• Global Illumination • 間接光 • 結構重い
  28. 28. 重いのは?• Intersection • Sphere * 3 + Plane = 4 • AO sample 64 * 64 = 256
  29. 29. CPU -> GPU
  30. 30. Result CPU NVIDIA AMD 256 * 256 6.30 0.057 0.061 64 * 64 512 * 512 24.58 0.213 0.131 64 * 641024 * 1024 96.735 0.831 0.4462 64 * 64
  31. 31. :: が :   //: /:::|::,|::、:::::::::\:.:\.:.:.ヽ:.:.:\:.:..\::::::::::::\、::::\    : : :: 何 :  /!::|::l:::: /|:::l:ヽ:\::ヽ:.:\:.:\.:::ヽ:.:.:ヽ:.:.:.:\::::::::::::\ ̄   : : :: だ :   |/l::|::|::|: ト、:::::::::、、:ヽ、:.:.:.:::::::::::::::ヽ::::.:ヽ:.:.:.:.\:.:.:.ヽ:::\.   : : :: か :   |::|::/l::|::|r-ヽ:::::ヽ(ヽー,―\::::::、::::::::::ヽ::.:.::::::.:::::::ヾ. ̄   : : ::    :   }//l::|:::|{(:::)ヾ、:::ヽ \!(:::) ヽ,:::ヽ:::::::::::::::::::::::::::::::::::ヾ、   : : :: わ :.   |/l::|::|:::|ヽ==" \:ヽ、ヽ==" |:::::::::::::::::::::::::::::::::::ヽ、::::\  か     / ,|::|:::|   /   `゛       |!::::::::::::::::::::::::::::ト、::ト、_` ゛`  ら      l::!::::ト、  、 _         ||::::::::::::::::::::::::ト:ヽヾ| | ̄ ̄ ̄`ヽ、  な     r"´||,::::,                 |:::::/l:::::|\:::ト、ヾ | |     / / \  い   /   ll ,::, 、 ーこニ=-       /!::/ ヽ:::|  ヾ、  ノ ノ  /  ,イ   ヽ、
  32. 32. Device type: Unknown ???Max resource 2D width/height: 16384/16384Total GPU memory size: 3072 MBTotal CPU cached space size: 508 MBTotal CPU uncached space size: 1788 MBGPU engine clock: 925 MHzGPU memory clock: 1375 MHzNumber of timing loops: 100[ 16 bytes] CPU->GPU= 800.000 KB/sec, GPU->CPU 533.333 KB/sec[ 32 bytes] CPU->GPU= 1.600 MB/sec, GPU->CPU 1.067 MB/sec[ 64 bytes] CPU->GPU= 2.133 MB/sec, GPU->CPU 2.133 MB/sec[ 128 bytes] CPU->GPU= 2.560 MB/sec, GPU->CPU 4.267 MB/sec[ 256 bytes] CPU->GPU= 8.533 MB/sec, GPU->CPU 8.533 MB/sec[ 512 bytes] CPU->GPU= 17.067 MB/sec, GPU->CPU 25.600 MB/sec[ 1024 bytes] CPU->GPU= 51.200 MB/sec, GPU->CPU 34.133 MB/sec[ 2048 bytes] CPU->GPU= 102.400 MB/sec, GPU->CPU 68.267 MB/sec[ 4096 bytes] CPU->GPU= 204.800 MB/sec, GPU->CPU 204.800 MB/sec[ 8192 bytes] CPU->GPU= 409.600 MB/sec, GPU->CPU 409.600 MB/sec[ 16384 bytes] CPU->GPU= 409.600 MB/sec, GPU->CPU 819.200 MB/sec[ 32768 bytes] CPU->GPU= 1.638 GB/sec, GPU->CPU 1.638 GB/sec[ 65536 bytes] CPU->GPU= 2.185 GB/sec, GPU->CPU 3.277 GB/sec...[ 4194304 bytes] CPU->GPU= 6.658 GB/sec, GPU->CPU 4.033 GB/sec[ 8388608 bytes] CPU->GPU= 6.658 GB/sec, GPU->CPU 3.884 GB/sec[ 16777216 bytes] CPU->GPU= 6.684 GB/sec, GPU->CPU 3.233 GB/sec[ 33554432 bytes] CPU->GPU= 6.697 GB/sec, GPU->CPU 2.993 GB/sec[ 67108864 bytes] CPU->GPU= 6.697 GB/sec, GPU->CPU 2.870 GB/sec[ 134217728 bytes] CPU->GPU= 6.704 GB/sec, GPU->CPU 2.789 GB/sec[ 268435456 bytes] CPU->GPU= 6.699 GB/sec, GPU->CPU 2.767 GB/sec[ 536870912 bytes] CPU->GPU= 6.705 GB/sec, GPU->CPU 2.797 GB/sec[1073741824 bytes] CPU->GPU= 6.705 GB/sec, GPU->CPU 2.771 GB/seccalResAllocRemote2D() returned an error when trying to allocate 1874853888 bytes (uncached)!Peak CPU->GPU Bandwidth = 6.705 GB/sec [data size = 536870912 bytes]Peak GPU->CPU Bandwidth = 4.369 GB/sec [data size = 131072 bytes]
  33. 33. ????GeForce GTX 580Quick ModeHost to Device Bandwidth, 1 Device(s), Paged memory, direct access Transfer Size (Bytes) Bandwidth(MB/s) 33554432 5561.7Device to Host Bandwidth, 1 Device(s), Paged memory, direct access Transfer Size (Bytes) Bandwidth(MB/s) 33554432 5466.2Device to Device Bandwidth, 1 Device(s) Transfer Size (Bytes) Bandwidth(MB/s) 33554432 138261.9
  34. 34. ?????__kernel void map_test(__global int* src,__global int* dst,const int limit){ int id = get_global_id(0); if(id > limit) return; dst[id] = src[limit - 1 - id];}
  35. 35. ?????__kernel void map_test(__global int* src,__global int* dst,const int limit){ int id = get_global_id(0); if(id > limit) return; dst[id] = src[limit - 1 - id];} 1000 ~
  36. 36. ! NVIDIA AMD 0.355824 1.70634 1000 0.16186 0.7224 3.54601 14.130510000 1.697 6.1982 35.4747 128.583100000 16.213 58.0289
  37. 37. まとめ
  38. 38. • GPGPUやるならGeforce GTX 580• Radeon HD 7970は… • スロースターター 足に爆弾 • カーネルが大きくなれば…

×
Save this presentationTap To Close