ISC14 で以下の Paper の発表となっております。
Name: Fast & Energy-Efficient Breadth-First Search on a Single NUMA System
Time: Thursday, June 26, 2014
09:00 am - 09:30 am
Room: Hall 5
CCL - Congress Center Leipzig
Abstract: Breadth-first search (BFS) is an important graph analysis kernel. The Graph500 benchmark measures a computer's BFS performance using the traversed edges per second (TEPS) ratio. Our previous nonuniform memory access (NUMA)-optimized BFS reduced memory accesses to remote RAM on a NUMA architecture system; its performance was 11 GTEPS (giga TEPS) on a 4-way Intel Xeon E5-4640 system. Herein, we investigated the computational complexity of the bottom-up, a major bottleneck in NUMA-optimized BFS. We clarify the relationship between vertex out-degree and bottom-up performance. In November 2013, our new implementation achieved a Graph500 benchmark performance of 37.66 GTEPS (fastest for a single node) on an SGI Altix UV1000 (one-rack) and 31.65 GTEPS (fastest for a single server) on a 4-way Intel Xeon E5-4650 system. Furthermore, we achieved the highest Green Graph500 performance of 153.17 MTEPS/W (mega TEPS per watt) on an Xperia-A SO-04E with a Qualcomm Snapdragon S4 Pro APQ8064.
Authors
Yuichiro Yasui, Chuo University & JST CREST; Katsuki Fujisawa, Chuo University & JST CREST; Yukinori Sato, JAIST & JST CREST
Name: Fast & Energy-Efficient Breadth-First Search on a Single NUMA System
Time: Thursday, June 26, 2014
09:00 am - 09:30 am
Room: Hall 5
CCL - Congress Center Leipzig
Abstract: Breadth-first search (BFS) is an important graph analysis kernel. The Graph500 benchmark measures a computer's BFS performance using the traversed edges per second (TEPS) ratio. Our previous nonuniform memory access (NUMA)-optimized BFS reduced memory accesses to remote RAM on a NUMA architecture system; its performance was 11 GTEPS (giga TEPS) on a 4-way Intel Xeon E5-4640 system. Herein, we investigated the computational complexity of the bottom-up, a major bottleneck in NUMA-optimized BFS. We clarify the relationship between vertex out-degree and bottom-up performance. In November 2013, our new implementation achieved a Graph500 benchmark performance of 37.66 GTEPS (fastest for a single node) on an SGI Altix UV1000 (one-rack) and 31.65 GTEPS (fastest for a single server) on a 4-way Intel Xeon E5-4650 system. Furthermore, we achieved the highest Green Graph500 performance of 153.17 MTEPS/W (mega TEPS per watt) on an Xperia-A SO-04E with a Qualcomm Snapdragon S4 Pro APQ8064.
Authors
Yuichiro Yasui, Chuo University & JST CREST; Katsuki Fujisawa, Chuo University & JST CREST; Yukinori Sato, JAIST & JST CREST