Hey,
I experimented the other day with some algorithms that were written in Akka and Golang (https://breaking-the-system.blogspot.com/2022/10/how-we-can-learn-parallel-computing.html), which is highly parallelised/threaded. I tested those against 2950x/3960x AMD ThreadRippers and against Apple M1 + M1 Max. The results were really shocking for me. The Apple won (by a long shot) the ThreadRippers, even when the TRs were OC to 250W and 350W+ and they are one the fastest CPUs money can buy for their time and costing so much more than the Apple one (not considering the insane cooling/electricity they need) and the algorithm is highly parallelized.
In this post I want to further experiment with the Apple processors to find out what the hell is happening here. I wanted to test "memory parallelism", which basically means how much the processor continues while waiting for memory, and how much memory data the CPU can access in parallel.
I search the web and found this really popular article (https://lemire.me/blog/2021/01/06/memory-access-on-the-apple-m1-processor/) claiming 26x memory parallelism (to be fair they wrote "or more") which did not make much sense to me. M1 has low frequency and high memory latency
M1 took 0.45 seconds while 2950X TR took 2.42 seconds.
No comments:
Post a Comment