Description of problem:
Glibc adds a huge overhead to auditing by auditing every PLT call, whether the auditor includes PLT auditing or not. For applications that have small routines, the PLT auditing overhead is unacceptable for performance tools.
To avoid these issues, some auditors rewrite the GOT to avoid the performance hit. However, details of filling in GOT table entries are platform dependent and somewhat tricky for load modules that use secure PLT. Moreover, rewriting the GOT by a tool is strongly discouraged and in future processors that technique will be prevented using hardware mechanisms.
For these reasons, glibc should take responsibility for filling in the GOT if there is no PLT auditor present.
How reproducible: 100% (by customer)
Steps to Reproduce:
1. git clone https://github.com/hpctoolkit/auditor-tests
2. cd auditor-tests/tier1/slow-audit-plt
3. make
Actual results:
The time for 2^27 PLT calls to an empty routine on IBM’s POWER9 @ 2.8GHz takes ~3.54 seconds.
Expected results:
The time for 2^27 PLT calls to an empty routine on IBM’s POWER9 @ 2.8GHz should take ~0.32 seconds.
Additional info:
Problem was first identified upstream:
https://sourceware.org/bugzilla/show_bug.cgi?id=15533