Message boards : Multicore CPUs : Message about suboptimal build
Author | Message |
---|---|
I got this message: Compiled SIMD instructions: AVX_256 (Gromacs could use AVX_128_FMA on this machine, which is better) and The current CPU can measure timings more accurately than the code in mdrun_mtavx.901 was configured to use. This might affect your simulation speed as accurate timings are needed for load-balancing. Please consider rebuilding mdrun_mtavx.901 with the GMX_USE_RDTSCP=OFF CMake option. | |
ID: 39115 | Rating: 0 | rate: / Reply Quote | |
Nothing important. | |
ID: 39126 | Rating: 0 | rate: / Reply Quote | |
Nothing important.However, I think it is important because performance is very low. My notebook which has 8 threads clocked 1.86GHz has the performance of 7.603ns/day. A system that has 64 threads (32 used by application) clocked at 2.5GHz is faster only about 2 times reaching only 15.829ns/day. According to the study by Professor Agner Fog from the Technical University of Denmark, processors with Bulldozer and Piledriver architecture, quote: "- The throughput of 256-bit store instructions is less than half the throughput of 128-bit store instructions on Bulldozer and Piledriver. It is particularly bad on the Piledriver, which has a throughput of one 256-bit store per 17 - 20 clock cycles.and: "Therefore, there is no advantage in using 256-bit instructions on Bulldozer and Piledriver when the bottleneck is execution unit throughput or instruction decoding. The poor throughput of 256-bit stores makes it a disadvantage to use 256-bit registers on the Piledriver." This is probably reason why the developers of GROMACS application sacrificed time to develop an appropriate optimization. Quote from the GROMACS project site: "Currently the supported acceleration options are: none, SSE2, SSE4.1, AVX-128-FMA (AMD Bulldozer + Piledriver) and AVX-256 (Intel Sandy+Ivy Bridge)."and: "On x86, the performance difference between SSE2 and SSE4.1 is minor. All other, higher acceleration differences are significant." Therefore, I think it would be good to also have application version built with such optimizations. Certainly I'd be delighted. | |
ID: 39129 | Rating: 0 | rate: / Reply Quote | |
Message boards : Multicore CPUs : Message about suboptimal build