This bug was initially created as a copy of Bug #1779057 I am copying this bug because: Description of problem: In an unrelated fedora commit 2b62e65299c881127f59855ac54eb999b55afc34 the --enable-fat flag which was enabling CPU optimized code was dropped, probably accidentally. That's has a significant impact on the performance of nettle and in turn gnutls. In particular the RSA implementation of nettle as compared to openssl drops to (hogweed-benchmark tool): name size sign/ms verify/ms rsa 2048 0.8881 27.1422 rsa (openssl) 2048 1.4249 45.2295 While on a non-Fedora system where gmp is compiled with --enable-fat: rsa 2048 0.2106 7.2703 rsa (openssl) 2048 0.2024 6.4992 Please compile gmp with --enable-fat
https://src.fedoraproject.org/rpms/gmp/pull-request/2
Should be fixed in rawhide: https://koji.fedoraproject.org/koji/taskinfo?taskID=39420536
Did you measure any significant benefit? On my x86-64 if I compile the same source code locally I get: rsa 2048 1.5079 53.4444 rsa (openssl) 2048 1.8138 61.0715 while, the fedora code after the --enable-fat is significantly slower: rsa 2048 1.1020 41.3436 rsa (openssl) 2048 1.8138 61.0715 You can verify the benchmark by compiling https://gitlab.com/gnutls/nettle and running $ examples/hogweed-test rsa
I've tried the benchmark with the original version, the rawhide version and [1] (this should be the last version before the --enable-fat was removed) and I'm getting: rsa 2048 1.2620 45.5201 rsa 2048 1.1557 43.8995 rsa 2048 1.1458 43.8338 respectively [1]: https://koji.fedoraproject.org/koji/buildinfo?buildID=843681
The values are from running examples/hogweed-benchmark from the gitlab repo from comment#3
When compiling locally I see that the flags used by gmp are: -mtune=skylake -march=broadwell -fomit-frame-pointer while fedora sets: -mtune=generic Adding -fomit-frame-pointer does not give much, so it seems that the mtune and march give a very big boost in my case. Not sure what we can do to take advantage of that :(
Nikos, we should consult with someone in glibc/dev tool on techniques to enable different code paths on different architectures. I know it is possible to do at runtime (like enabling AVX2 when available and falling back to SSSE3 when not and similar), but it may require significant engineering around GMP to do that.