likwid 、姫野ベンチ の導入は make するだけなので、省略します。
[root@hoge Downloads]# likwid-perfctr -C 1 -g FLOPS_SP ./bmt
--------------------------------------------------------------------------------
CPU name: Intel(R) Core(TM) i7-7500U CPU @ 2.70GHz
CPU type: Intel Kabylake processor
CPU clock: 2.90 GHz
--------------------------------------------------------------------------------
mimax = 129 mjmax = 129 mkmax = 257
imax = 128 jmax = 128 kmax =256
Start rehearsal measurement process.
Measure the performance in 3 times.
MFLOPS: 5008.406795 time(s): 0.082125 1.733593e-03
Now, start the actual measurement process.
The loop will be excuted in 2191 times
This will take about one minute.
Wait for a while
Loop executed for 2191 times
Gosa : 4.477886e-04
MFLOPS measured : 6004.316730 cpu : 50.030231
Score based on Pentium III 600MHz : 73.223375
--------------------------------------------------------------------------------
Group 1: FLOPS_SP
+------------------------------------------+---------+--------------+
| Event | Counter | Core 1 |
+------------------------------------------+---------+--------------+
| INSTR_RETIRED_ANY | FIXC0 | 244033624741 |
| CPU_CLK_UNHALTED_CORE | FIXC1 | 173945548078 |
| CPU_CLK_UNHALTED_REF | FIXC2 | 144661450175 |
| FP_ARITH_INST_RETIRED_128B_PACKED_SINGLE | PMC0 | 72415611576 |
| FP_ARITH_INST_RETIRED_SCALAR_SINGLE | PMC1 | 11146222208 |
| FP_ARITH_INST_RETIRED_256B_PACKED_SINGLE | PMC2 | 0 |
+------------------------------------------+---------+--------------+
+----------------------+-----------+
| Metric | Core 1 |
+----------------------+-----------+
| Runtime (RDTSC) [s] | 50.2231 |
| Runtime unhalted [s] | 59.9028 |
| Clock [MHz] | 3491.6156 |
| CPI | 0.7128 |
| SP MFLOP/s | 5989.4533 |
| AVX SP MFLOP/s | 0 |
| Packed MUOPS/s | 1441.8797 |
| Scalar MUOPS/s | 221.9344 |
| Vectorization ratio | 86.6611 |
+----------------------+-----------+
このように、ちゃんと近い値が出ました。-g オプションで指定する group name は、-a オプションで表示できます。
[root@hoge Downloads]# likwid-perfctr -a
Group name Description
--------------------------------------------------------------------------------
DATA Load to store ratio
UOPS_ISSUE UOPs issueing
TLB_INSTR L1 Instruction TLB miss rate/ratio
TLB_DATA L2 data TLB miss rate/ratio
FLOPS_AVX Packed AVX MFLOP/s
UOPS_RETIRE UOPs retirement
BRANCH Branch prediction miss rate/ratio
UOPS_EXEC UOPs execution
ICACHE Instruction cache miss rate/ratio
UOPS UOPs execution info
L3 L3 cache bandwidth in MBytes/s
FLOPS_SP Single Precision MFLOP/s
L2CACHE L2 cache miss rate/ratio
RECOVERY Recovery duration
L3CACHE L3 cache miss rate/ratio
L2 L2 cache bandwidth in MBytes/s
CYCLE_ACTIVITY Cycle Activities
ENERGY Power and Energy consumption
FALSE_SHARE False sharing
CLOCK Power and Energy consumption
FLOPS_DP Double Precision MFLOP/s
なお、姫野ベンチ(static M)のソースを grep するとわかりますが、float で計算しているので、FLOPS_DP ではなく FLOPS_SP を使いました。
[root@hoge Downloads]# grep -w float himenoBMTxps.c
float jacobi();
static float p[MIMAX][MJMAX][MKMAX];
static float a[4][MIMAX][MJMAX][MKMAX],
static float bnd[MIMAX][MJMAX][MKMAX];
static float wrk1[MIMAX][MJMAX][MKMAX],
static float omega;
float gosa;
p[i][j][k]=(float)(i*i)/(float)((imax-1)*(imax-1));
float
float gosa, s0, ss;
0 件のコメント:
コメントを投稿