2012年10月のブログ記事一覧-最適化問題に対する超高速＆安定計算

同じ CPU その４

2012年10月31日 01時55分40秒 | Weblog

正式に TSUBAME2.0 の１ノードで TEPS 値の測定を行った。以下のように Scale 25, 24コアでの測定結果となっている。3.671GTEPS/kWという値は、このクラスの CPU では相当高めの値になる。

----------------------------------------------------------------------
Parallel Breadth-First Search for Graph500 Benchmark version 3.52
----------------------------------------------------------------------
CPU name is Intel(R) Xeon(R) CPU X5670 @ 2.93GHz
freq / RAM is 2933.374 MHz / 53.17 GB
#cpu, #nodes, #cores is 24 2 12
COMPILER is GCC (GNU C Compiler) version 4.3.4
----------------------------------------------------------------------
scale, edgefactor is 25 16
energy_loop is disable
#threads, #NUMAs is 24 2
mpol_bind is ON(mmap with mbind(MPOL_BIND))
mem_interleave is OFF
switching parameter is 0.000350 (n ~= 1.174405e+04)
queue buffer size is 16384
----------------------------------------------------------------------
SCALE: 25
nvtx: 33554432
edgefactor: 16
terasize: 8.58993459199999983e-03
A: 5.69999999999999951e-01
B: 1.90000000000000002e-01
C: 1.90000000000000002e-01
D: 5.00000000000000444e-02
generation_time: 3.18413941860198975e+01
construction_time: 3.12248620986938477e+01
nbfs: 64
min_time: 1.31865024566650391e-01
firstquartile_time: 1.41672849655151367e-01
median_time: 1.48192524909973145e-01
thirdquartile_time: 1.60642564296722412e-01
max_time: 2.27340793609619141e+00
mean_time: 2.52842102199792862e-01
stddev_time: 3.77578056422843644e-01
min_nedge: 5.36865498000000000e+08
firstquartile_nedge: 5.36865498000000000e+08
median_nedge: 5.36865498000000000e+08
thirdquartile_nedge: 5.36865498000000000e+08
max_nedge: 5.36865498000000000e+08
mean_nedge: 5.36865498000000000e+08
stddev_nedge: 0.00000000000000000e+00
min_TEPS: 2.36150094083811790e+08
firstquartile_TEPS: 3.40998940083618927e+09
median_TEPS: 3.67106491095643044e+09
thirdquartile_TEPS: 3.79863995016410017e+09
max_TEPS: 4.07132596201538277e+09
harmonic_mean_TEPS: 2.12332318600869393e+09
harmonic_stddev_TEPS: 3.99487487817118108e+08

同じ CPU その３

2012年10月30日 01時31分20秒 | Weblog

前回の続きで、Graph500 ではなく、SDPA の性能で比較してみると以下のように同じCPU（１ノード）でも OPT クラスタの方がはるかに速い。にも関わらず Graph500 では TSUBAME2.0 の方が速くなっている。

◯ソフトウェア SDPA 7.4.0

問題１：theta6.dat-s
TSUBAME2.0 : 11.47s
OPT クラスタ：8.07s

問題２：FH2+.1A1.STO6G.pqgt1t2p.dat-s
TSUBAME2.0 : 96.00s
OPT クラスタ：73.60s

問題３：nug12_r2.dat-s
TSUBAME2.0 : 132.79s
OPT クラスタ：96.25s

同じ CPU その２

2012年10月29日 00時42分32秒 | Weblog

昨日の続き。性能差の原因は良くわからないが、stream ベンチマークで性能を測定すると以下のようにメモリバンド幅に大きな差が見られる。

◯ TSUBAME 2.0
-------------------------------------------------------------
Function Rate (MB/s) Avg time Min time Max time
Copy: 16087.4659 0.0021 0.0020 0.0021
Scale: 15347.9392 0.0022 0.0021 0.0022
Add: 15857.4820 0.0031 0.0030 0.0032
Triad: 16483.2645 0.0030 0.0029 0.0031
-------------------------------------------------------------

◯ OPT クラスタ
-------------------------------------------------------------
Function Rate (MB/s) Avg time Min time Max time
Copy: 10616.8113 0.0031 0.0030 0.0034
Scale: 9822.0072 0.0033 0.0033 0.0036
Add: 10079.9375 0.0050 0.0048 0.0051
Triad: 10202.0164 0.0047 0.0047 0.0048
-------------------------------------------------------------

同じ CPU

2012年10月28日 01時06分19秒 | Weblog

1ノードだけの比較で Scale 値も小さいことから、あまり大した比較ではないが、以下のように TESPS 値にはかなりの差がある。ちなみに両者の CPU は全く同じもの。ネットワーク次第だが、このままマルチノードへ拡張していったら何 GTEPS ぐらい出るのだろうか？

median_TEPS: 3.10677601976136351e+09 : TSUBAME 2.0
median_TEPS: 2.43110363832451296e+09 : OPT クラスタ

◯ TSUBAME 2.0

----------------------------------------------------------------------
CPU name is Intel(R) Xeon(R) CPU X5670 @ 2.93GHz
freq / RAM is 2933.381 MHz / 53.17 GB
#cpu, #nodes, #cores is 24 2 12
COMPILER is GCC (GNU C Compiler) version 4.3.4
----------------------------------------------------------------------
scale, edgefactor is 22 16
energy_loop is disable
#threads, #NUMAs is 12 2
mpol_bind is ON(mmap with mbind(MPOL_BIND))
mem_interleave is OFF
switching parameter is 0.000350 (n ~= 1.468006e+03)
queue buffer size is 16384
----------------------------------------------------------------------
SCALE: 22
nvtx: 4194304
edgefactor: 16
terasize: 1.07374182399999998e-03
A: 5.69999999999999951e-01
B: 1.90000000000000002e-01
C: 1.90000000000000002e-01
D: 5.00000000000000444e-02
generation_time: 4.63479804992675781e+00
construction_time: 3.45396018028259277e+00
nbfs: 64
min_time: 2.03080177307128906e-02
firstquartile_time: 2.10850834846496582e-02
median_time: 2.16900110244750977e-02
thirdquartile_time: 2.34054327011108398e-02
max_time: 2.85382270812988281e-02
mean_time: 2.24080123007297516e-02
stddev_time: 1.87665045017972585e-03
min_nedge: 6.71081140000000000e+07
firstquartile_nedge: 6.71081140000000000e+07
median_nedge: 6.71081140000000000e+07
thirdquartile_nedge: 6.71081140000000000e+07
max_nedge: 6.71081140000000000e+07
mean_nedge: 6.71081140000000000e+07
stddev_nedge: 0.00000000000000000e+00
min_TEPS: 2.35151657490230417e+09
firstquartile_TEPS: 2.89759915151431465e+09
median_TEPS: 3.10677601976136351e+09
thirdquartile_TEPS: 3.18990973665092087e+09
max_TEPS: 3.30451326613275719e+09
harmonic_mean_TEPS: 2.99482672087851906e+09
harmonic_stddev_TEPS: 3.15995921852265522e+07

◯ OPT クラスタ

----------------------------------------------------------------------
Parallel Breadth-First Search for Graph500 Benchmark version 3.52
----------------------------------------------------------------------
CPU name is Intel(R) Xeon(R) CPU X5670 @ 2.93GHz
freq / RAM is 2926.092 MHz / 125.97 GB
#cpu, #nodes, #cores is 24 2 12
COMPILER is GCC (GNU C Compiler) version 4.4.6
----------------------------------------------------------------------
scale, edgefactor is 22 16
energy_loop is disable
#threads, #NUMAs is 12 2
mpol_bind is ON(mmap with mbind(MPOL_BIND))
mem_interleave is OFF
switching parameter is 0.000350 (n ~= 1.468006e+03)
queue buffer size is 16384
----------------------------------------------------------------------
SCALE: 22
nvtx: 4194304
edgefactor: 16
terasize: 1.07374182399999998e-03
A: 5.69999999999999951e-01
B: 1.90000000000000002e-01
C: 1.90000000000000002e-01
D: 5.00000000000000444e-02
generation_time: 4.92600226402282715e+00
construction_time: 4.21614789962768555e+00
nbfs: 64
min_time: 2.52830982208251953e-02
firstquartile_time: 2.69842743873596191e-02
median_time: 2.77304649353027344e-02
thirdquartile_time: 3.07412147521972656e-02
max_time: 3.64937782287597656e-02
mean_time: 2.86859124898910522e-02
stddev_time: 2.71162063548880793e-03
min_nedge: 6.71081140000000000e+07
firstquartile_nedge: 6.71081140000000000e+07
median_nedge: 6.71081140000000000e+07
thirdquartile_nedge: 6.71081140000000000e+07
max_nedge: 6.71081140000000000e+07
mean_nedge: 6.71081140000000000e+07
stddev_nedge: 0.00000000000000000e+00
min_TEPS: 1.83889192232537603e+09
firstquartile_TEPS: 2.21152116616008377e+09
median_TEPS: 2.43110363832451296e+09
thirdquartile_TEPS: 2.48950771186180830e+09
max_TEPS: 2.65426782010142851e+09
harmonic_mean_TEPS: 2.33941012068725300e+09
harmonic_stddev_TEPS: 2.78609775751486160e+07

コメント (2)

当 CREST の研究計画（結果）の概要

2012年10月27日 02時23分05秒 | Weblog

◯次世代ポストペタスパコンでの解決すべき課題
　– 並列数の爆発的増大、不均質化、高密度化
　– 記憶装置の多階層化・大容量化
　– アルゴリズム的、システム的に様々な解決すべき課題と困難が存在
◯大規模グラフ解析及び数理最適化システム
　– 緊急に取り組むべき課題と実社会へのインパクト
　– Graph500(Green Graph500)ベンチマーク (巨大グラフ, BFS)
　　• ISC12 : 358GTEPS (世界３位) , 8.15GTEPS(1ノード世界1位)
　– 数理計画問題（SDP): (世界記録更新：148万制約 ; 533TFlops)
　　• SC12(Tech. paper) : 疎&密データ計算(24,480CPUコア & 4080GPU)
　　• 最適化とHPC系研究者のポストペタスパコン上での Co-design による解決
◯ポストペタスパコン上での基盤ソフトの整備に貢献すると共に安心安全な社会の実現を目指す

ビッグデータ＆データマネジメント

2012年10月26日 09時06分53秒 | Weblog

ビッグデータ＆データマネジメント展ですが、本日の１８時までとなりました。参加したかったのですが、いろいろな事情で結局参加出来ず、代わりに学生の方に参加してもらっています。

　　　　 http://www.data-m.jp/aki/sokuho/

　【会期】２０１２年１０月２４日（水）～２６日（金）　　１０：００～１８：００
　【会場】幕張メッセ　４～６ホール
　【主催】リードエグジビションジャパン（株）
　【併催】ビッグデータ＆データマネジメント展　専門セミナー
　【同時開催展】
　　　　　第３回クラウドコンピューティング EXPO【秋】
　　第２回情報セキュリティ EXPO【秋】
　　　　　第２回 Web＆モバイルマーケティング EXPO【秋】
　　　　　第２回スマートフォン＆モバイル EXPO【秋】
　　　　　第１回データセンター構築運用展【秋】

冷却方法と消費電力

2012年10月25日 01時38分07秒 | Weblog

以前にも書いたようにメモリ自体も１枚 6W ぐらいの消費電力を使う大きな発熱要素となっているが、本体ファンと背面ファンも止めてしまって以下のように外部から強制的に冷却していくと、さらに電力量は減る。しかしさすがに CPU ファンは止められない。

そこで、さらなる発展形として以下のように HDD 等の回転部分を取り去った上でノード全体を油に沈めてしまう。ここまで行ってしまうと GTEPS/kW の値などは非常に高くなり、普通の空冷マシンでは勝てない。

The 4th Graph500 List と当 CREST チームの成果

2012年10月24日 00時31分00秒 | Weblog

The 4th Graph500 List(ISC12, June 2012) における当 CREST チームの主な成果。写真の中身について特に深い意味はありません。

The 5th Graph500 提出締切り

2012年10月23日 01時03分22秒 | Weblog

１０月２２日(多分アメリカ時間)が Graph500 & Green Graph500 の提出締切りです。Green Graph500 はデータ入力時に以下の Total system power を入力することで対応します。ただし、以下のように１台に対して入力する項目が多いのが難点。

The following information will be collected:

Computer Information:
Computer/System Name
Manufacturer
Computer Type/Model
Installation Site
Location
Year of Installation/Last Major Upgrade
Field of Use: government, university, industry, etc.
Field of Application: geophysics, automotive, etc.
Number of Nodes
Number of Cores
Main Memory Size
Total system power
Interconnection network
Graph 500 Implementation Used: reference, custom, etc.
Contact Name
Contact Email
Benchmark Information:
Problem Scale
GigaTEPS
Graph Construction Time
Full Benchmark Result

Graph500 と Intel Xeon 5460 その２

2012年10月22日 01時59分57秒 | Weblog

以前も報告したように以下の Intel Xeon 5460 では Graph500 の性能が悪い。黒CPU として有名な以下の AMD Barcelona 2356 よりも値が悪いというのは不思議な現象。とは言っても Intel Xeon 5460 のような古いマシンでの性能が悪くても大勢に影響は無い。

Scale 24
◯計算サーバ１
median_TEPS: 7.17062404385241747e+08
◯計算サーバ２
median_TEPS: 1.18128896908352566e+09

◯計算サーバ１
CPU : Intel Xeon 5460 3.16GHz (quad cores) x 2 / node
Memory : 48GB / node
NIC : GbE x 2 and Myrinet-10G x 1 / node
OS : CentOS 5.8 for x86_64

◯計算サーバ２
CPU : AMD Opteron 2356 2.3GHz x 2 (2CPU x 4 コア = 8 コア)
Memory : 32GB
OS : CentOS 6.3 for X86_64

Tesla C1060 と SDPARA(SDPA)

2012年10月21日 13時29分37秒 | Weblog

NVIDIA Tesla C1060 は倍精度演算性能が低いこともあって、以下のように Tesla C1060 1台と CPU 4コアでは後者の方が性能が良い。３年程前に購入したが、倍精度演算ではあまり使い道が無かった。

◯問題 nug12_r2.dat-s

◯SDPARA 7.5.0-G (CPU + GPU)
ELEMENTS : 48.87s
CHOLESKY : 204.37s
Total : 268.50s

◯SDPA 7.4.0 (CPU)
ELEMENTS : 12.73s
CHOLESKY : 96.31s
Total : 112.55s

○計算サーバ (1 CPU x 4 コア = 4 コア)
CPU : Intel Corei7 2600K (3.40GHz / 8MB L3) x 2
Memory : 8GB (4 x 2GB)
GPGPU : Tesla C1060 x 1 (CUDA 4.2)
OS : CentOS 6.3

メモリと消費電力

2012年10月20日 09時29分34秒 | Weblog

以下のサーバには 16GB(Actica製)が 16枚入っているが、これを各バンクから均等に抜いてメモリを全部で 8 枚にする。そうすると消費電力が約 0.5A(50W)減る。あまり多くを抜いて各バンクのメモリ数が不均一になると性能が落ちるので注意。

◯計算サーバ
SandyBridge-EP マシン：Intel Xeon E5-2690 2.90GHz : 8 Core 20M L3 cache x 2
Memory DDR 3 1600 ECC REG 256GB (16GB x 16)
OS : CentOS 6.3

SC12 における Graph 関係の論文

2012年10月19日 09時29分08秒 | Weblog

11/13 (Tue)

0:30AM - 11:00AM

Papers, Best Student Paper Finalists

Breadth First Search

Direction-Optimizing Breadth-First Search

Scott Beamer, Krste Asanović, David Patterson

255-EF

11:00AM - 11:30AM

Papers

Breadth First Search

Breaking the Speed and Scalability Barriers for Graph Exploration on Distributed-Memory Machines

Fabio Checconi, Fabrizio Petrini, Jeremiah Willcock, Andrew Lumsdaine, Yogish Sabharwal, Anamitra Choudhury

255-EF

11:30AM - 12:00PM

Papers

Breadth First Search

Large-Scale Energy-Efficient Graph Traversal - A Path to Efficient Data-Intensive Supercomputing

Nadathur Satish, Changkyu Kim, Jatin Chhugani, Pradeep Dubey

255-EF

12:15PM - 1:15PM

Birds of a Feather

Fifth Graph500 List

David A. Bader, Richard Murphy, Marc Snir

255-BC

5:15PM - 7:00PM

Posters and Electronic Posters

Research Poster Reception

Hybrid Breadth First Search Implementation for Hybrid-Core Computers

Kevin Wadleigh

East Entrance

5:15PM - 7:00PM

Posters and Electronic Posters

Research Poster Reception

Analyzing Patterns in Large-Scale Graphs Using MapReduce in Hadoop

Joshua Schultz, Jonathan Vierya, Enyue Lu

East Entrance

5:30PM - 7:00PM

Birds of a Feather

Cyber Security’s Big Data, Graphs, and Signatures

Daniel M. Best

250-AB

11/14 (Thu)

12:15PM - 1:15PM

Birds of a Feather

Graph Analytics in Big Data

2:00PM - 2:30PM

Papers

Performance Optimization

NUMA-Aware Graph Mining Techniques for Performance and Energy Efficiency

Michael R. Frasca, Kamesh Madduri, Padma Raghavan

255-BC

3:30PM - 4:00PM

Papers

Graph Algorithms

A New Scalable Parallel DBSCAN Algorithm Using the Disjoint-Set Data Structure

Md. Mostofa Ali Patwary, Diana Palsetia, Ankit Agrawal, Wei-keng Liao, Fredrik Manne, Alok Choudhary

355-EF

4:00PM - 4:30PM

Papers

Graph Algorithms

Parallel Bayesian Network Structure Learning with Application to Gene Networks

Olga Nikolova, Srinivas Aluru

355-EF

4:30PM - 5:00PM

Papers

Graph Algorithms

A Multithreaded Algorithm for Network Alignment via Approximate Matching

Arif Khan, David Gleich, Mahantesh Halappanavar, Alex Pothen

355-EF

--

Graph500 BOF セッション in SC12

2012年10月18日 09時45分06秒 | Weblog

Fifth Graph500 List

SESSION: Fifth Graph500 List

EVENT TYPE: Birds of a Feather

TIME: 12:15PM - 1:15PM

SESSION LEADER(S):David A. Bader, Richard Murphy, Marc Snir

ROOM:255-BC

ABSTRACT:
Data intensive applications represent increasingly important workloads but are ill suited for most of today’s machines. The Graph500 has demonstrated the challenges of even simple analytics. Backed by a steering committee of over 30 international HPC experts from academia, industry, and national laboratories, this effort serves to enhance data intensive workloads for the community. This BOF will unveil the fifth Graph500 list, and delve into the specification for the second kernel. We will further explore the new energy metrics for the Green Graph500, and unveil the first results.

The 5th Graph500 への対応

2012年10月17日 00時04分18秒 | Weblog

詳しいことは現在公表できませんが、次の Graph500 & Green Graph500 に向けて超巨大スパコンから超小型モバイルデバイスまで幅広く最適化実装の開発と性能の測定中です。

Submissions November 2012 List (10/22 締切り予定)

Green Graph500 ホームページ：Graph500 提出時に電力量も入力

アクセス
閲覧	574	PV
訪問者	300	IP
トータル
閲覧	5,065,044	PV
訪問者	1,298,912	IP

	【gooブロガー・先着】dアカウント連携でdポイント2,000pt
	ブログを読むだけ。毎月の訪問日数に応じてポイント進呈
	【コメント募集中】goo blogスタッフの気になったニュース
	gooブロガーの今日のひとこと
	訪問者数に応じてdポイント最大1,000pt当たる！

最適化問題に対する超高速＆安定計算

大規模最適化問題、グラフ探索、機械学習やデジタルツインなどの研究のお話が中心

同じ CPU その４

同じ CPU その３

同じ CPU その２

同じ CPU

当 CREST の研究計画（結果）の概要

ビッグデータ＆データマネジメント

冷却方法と消費電力

The 4th Graph500 List と当 CREST チームの成果

The 5th Graph500 提出締切り

Graph500 と Intel Xeon 5460 その２

Tesla C1060 と SDPARA(SDPA)

メモリと消費電力

SC12 における Graph 関係の論文

Graph500 BOF セッション in SC12

The 5th Graph500 への対応

カレンダー

Twitter

最新記事

検索

バックナンバー

ブックマーク

文字サイズ変更

アクセス状況

goo blog おすすめ

goo blog お知らせ

2012年10月
日	月	火	水	木	金	土
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30	31

大規模最適化問題、グラフ探索、機械学習やデジタルツインなどの研究のお話が中心

カレンダー

Twitter

最新記事

検索

ログイン

バックナンバー

ブックマーク

文字サイズ変更

アクセス状況

goo blog おすすめ

goo blog お知らせ