
likwid で 姫野ベンチ を計測してみた

likwid 、姫野ベンチ の導入は make するだけなので、省略します。
[root@hoge Downloads]# likwid-perfctr -C 1 -g FLOPS_SP ./bmt 
CPU name: Intel(R) Core(TM) i7-7500U CPU @ 2.70GHz
CPU type: Intel Kabylake processor
CPU clock: 2.90 GHz
mimax = 129 mjmax = 129 mkmax = 257
imax = 128 jmax = 128 kmax =256
 Start rehearsal measurement process.
 Measure the performance in 3 times.

 MFLOPS: 5008.406795 time(s): 0.082125 1.733593e-03

 Now, start the actual measurement process.
 The loop will be excuted in 2191 times
 This will take about one minute.
 Wait for a while

 Loop executed for 2191 times
 Gosa : 4.477886e-04 
 MFLOPS measured : 6004.316730 cpu : 50.030231
 Score based on Pentium III 600MHz : 73.223375
Group 1: FLOPS_SP
|                   Event                  | Counter |    Core 1    |
|             INSTR_RETIRED_ANY            |  FIXC0  | 244033624741 |
|           CPU_CLK_UNHALTED_CORE          |  FIXC1  | 173945548078 |
|           CPU_CLK_UNHALTED_REF           |  FIXC2  | 144661450175 |
| FP_ARITH_INST_RETIRED_128B_PACKED_SINGLE |   PMC0  |  72415611576 |
|    FP_ARITH_INST_RETIRED_SCALAR_SINGLE   |   PMC1  |  11146222208 |
| FP_ARITH_INST_RETIRED_256B_PACKED_SINGLE |   PMC2  |            0 |

|        Metric        |   Core 1  |
|  Runtime (RDTSC) [s] |   50.2231 |
| Runtime unhalted [s] |   59.9028 |
|      Clock [MHz]     | 3491.6156 |
|          CPI         |    0.7128 |
|      SP MFLOP/s      | 5989.4533 |
|    AVX SP MFLOP/s    |         0 |
|    Packed MUOPS/s    | 1441.8797 |
|    Scalar MUOPS/s    |  221.9344 |
|  Vectorization ratio |   86.6611 |
このように、ちゃんと近い値が出ました。-g オプションで指定する group name は、-a オプションで表示できます。
[root@hoge Downloads]# likwid-perfctr -a
 Group name Description
       DATA Load to store ratio
 UOPS_ISSUE UOPs issueing
  TLB_INSTR L1 Instruction TLB miss rate/ratio
   TLB_DATA L2 data TLB miss rate/ratio
UOPS_RETIRE UOPs retirement
     BRANCH Branch prediction miss rate/ratio
  UOPS_EXEC UOPs execution
     ICACHE Instruction cache miss rate/ratio
       UOPS UOPs execution info
         L3 L3 cache bandwidth in MBytes/s
   FLOPS_SP Single Precision MFLOP/s
    L2CACHE L2 cache miss rate/ratio
   RECOVERY Recovery duration
    L3CACHE L3 cache miss rate/ratio
         L2 L2 cache bandwidth in MBytes/s
CYCLE_ACTIVITY Cycle Activities
     ENERGY Power and Energy consumption
FALSE_SHARE False sharing
      CLOCK Power and Energy consumption
   FLOPS_DP Double Precision MFLOP/s
なお、姫野ベンチ(static M)のソースを grep するとわかりますが、float で計算しているので、FLOPS_DP ではなく FLOPS_SP を使いました。
[root@hoge Downloads]# grep -w float himenoBMTxps.c 
float jacobi();
static float  p[MIMAX][MJMAX][MKMAX];
static float  a[4][MIMAX][MJMAX][MKMAX],
static float  bnd[MIMAX][MJMAX][MKMAX];
static float  wrk1[MIMAX][MJMAX][MKMAX],
static float omega;
  float  gosa;
  float gosa, s0, ss;

0 件のコメント:


人気ブログランキングへ にほんブログ村 IT技術ブログへ