I've been confused about the scaling of pencil on Kraken. Just to remind myself, here are data on a 64 x 256^2 run
some data:
[jsoishi@krakenpf7 B100Re1600Pm4_64]$ grep microsec * R1600P4_64_128proc.o175745: Wall clock time/timestep/meshpoint [microsec] = 0.398E-01 R1600P4_64_256proc.o175750: Wall clock time/timestep/meshpoint [microsec] = 0.216E-01 R1600P4_64_64proc.o175743: Wall clock time/timestep/meshpoint [microsec] = 0.755E-01
in table form, where speedup is the efficiency compared to perfect scaling (ie, 2 for 128, 4 for 256)
procs |
usec/step/point |
speedup |
64 |
0.755E-01 |
1 |
128 |
0.398E-01 |
0.95 |
256 |
0.216E-01 |
0.87 |
Actual Performance
On the problem, we did 12 hours on 256 processors, going 289000 steps, to 34.4 orbits, or roughly 1/3 of the way through. This means that we should take 3*256*12 = 9216 CPU hours for this run. The second 11.9 hour run went 354100 steps, to 74.6 orbits, or 3/4 (rather than 2/3) of the way. Performance appears quite variable: this run didn't exceed wall limits and so reported a usec/step/point of 0.290E-01.
