Bug 1051239

Summary: Performance drop over 50% of executable produced with CFLAGS=-fPIC LDFLAGS="-z now"
Product: Red Hat Enterprise Linux 7 Reporter: Jiri Hladky <jhladky>
Component: gccAssignee: Jakub Jelinek <jakub>
Status: CLOSED NOTABUG QA Contact: qe-baseos-tools-bugs
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 7.0CC: aokuliar, i, jhladky, kkolakow, law, mkocka, mpolacek
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-07-11 18:41:24 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1048416    

Description Jiri Hladky 2014-01-09 22:00:52 UTC
Description of problem:

According to
https://fedoraproject.org/wiki/Packaging:Guidelines#PIE

I have tried to compile the haveged package with -fPIC and -z now flags:

CFLAGS=-fPIC LDFLAGS="-z now" ./configure --prefix=/${HOME}/haveged/haveged_fPIC_z_now_binary

When I compare the performance of the resulted binary with binary compiled with ./configure there is a more than 50% drop in the performance.

(haveged is a random number generator which is using CPU state as the entropy source) 

The resulting speed of generating random numbers is:

 ./haveged_default_binary/sbin/haveged -otc -n0 | dd iflag=fullblock of=/dev/null bs=4k count=262144
Writing unlimited bytes to stdout
262144+0 records in
262144+0 records out
1073741824 bytes (1.1 GB) copied, 3.30622 s, 325 MB/s

$ ./haveged_fPIC_z_now_binary/sbin/haveged -otc -n0 | dd iflag=fullblock of=/dev/null bs=4k count=262144
Writing unlimited bytes to stdout
262144+0 records in
262144+0 records out
1073741824 bytes (1.1 GB) copied, 5.34215 s, 201 MB/s


The performance drop is from 325 MB/s to 201 MB/s.


Version-Release number of selected component (if applicable):
I have reproduced it on AMD and Intel servers running RHEL7 (RHEL-7.0-20131222.0), Fedora 19 and Fedora 20. It does NOT happen on Fedora 18. 

It happens with gcc 4.8.2
gcc --version
gcc (GCC) 4.8.2 20131106 (Red Hat 4.8.2-3)


How reproducible:
Run attached 
./haveged-compile-test.sh

and watch for the data rates log from dd command.

Actual results:

$ ./haveged_default_binary/sbin/haveged -otc -n0 | dd iflag=fullblock of=/dev/null bs=4k count=262144
Writing unlimited bytes to stdout
262144+0 records in
262144+0 records out
1073741824 bytes (1.1 GB) copied, 3.30622 s, 325 MB/s

$ ./haveged_fPIC_z_now_binary/sbin/haveged -otc -n0 | dd iflag=fullblock of=/dev/null bs=4k count=262144
Writing unlimited bytes to stdout
262144+0 records in
262144+0 records out
1073741824 bytes (1.1 GB) copied, 5.34215 s, 201 MB/s


Expected results:
Both rates to be the same.

Additional info:
It's important to run haveged with -otc option. This will turn off the online statistical tests which have huge impact on runtimes. The impact depends on the random data produced by the program which is however each time different.

Comment 1 Jakub Jelinek 2014-01-09 22:12:30 UTC
You haven't said which architecture this is on, but in any case, I wonder why do you expect that -fPIC wouldn't have significant performance impact.  It of course has significant performance impact, even on x86_64.

Comment 2 Jiri Hladky 2014-01-09 22:24:58 UTC
I have performed the tests on x86_64.

Well, over 50% performance drop is a HUGE number. In fact, the performance drop is as large as 80% when online statistical testing of the produced data is enabled.

Plus there is no measurable performance drop when using  gcc 4.7.2. See bellow results on F18 with gcc (GCC) 4.7.2 20121109 (Red Hat 4.7.2-8). This is why I believe it's a bug in gcc 4.8.2


$haveged_default/sbin/haveged -otc -n0 | dd iflag=fullblock of=/dev/null bs=4k count=262144
Writing unlimited bytes to stdout
262144+0 records in
262144+0 records out
1073741824 bytes (1.1 GB) copied, 3.09276 s, 347 MB/s


$haveged_fPIC_z_now/sbin/haveged -otc -n0 | dd iflag=fullblock of=/dev/null bs=4k count=262144
Writing unlimited bytes to stdout
262144+0 records in
262144+0 records out
1073741824 bytes (1.1 GB) copied, 3.10366 s, 346 MB/s

Comment 4 Jakub Jelinek 2014-01-09 22:42:22 UTC
Well, if it is a binary rather than shared library, then using -fPIC makes no sense, plus if you aren't linking it as a -pie even -fPIE doesn't make sense.
Hardening effect (very little in fact) is only achieved if you build the binary as a -pie, and in that case you should compile it's source files with -fPIE.  You might get better performance with -fPIE over -fPIC, in that case the compiler can assume locally defined public symbols bind locally and thus can use IP relative addressing instead of extra indirection.  Similarly, you can improve code by using visibility attribude, guaranteeing to the compiler what symbols will be defined in the binary and thus can be addressed directly (which is ~ as expensive as accessing vars for -fno-pic).

Comment 5 Jiri Hladky 2014-01-09 23:33:15 UTC
Hi Jakub,

thanks for sharing this information. Clearly

https://fedoraproject.org/wiki/Packaging:Guidelines#PIE

is misleading as it states:
====================================================================
%global _hardened_build 1

This adds -fPIC (if -fPIE is not already present) to the compiler flags, and adds -z now to the linker flags. 
====================================================================

The package consists of shared library and front-end program compiled against this shared library.

I will try
CFLAGS=-fPIE LDFLAGS=-pie ./configure 

and post my results here. 

Could you please comment on hardening effect with CFLAGS=-fPIE LDFLAGS=-pie flags? Is it worth the effort? In comment 4 you say that hardening effect is very little, is that right?

Jirka

Comment 6 Jiri Hladky 2014-01-09 23:45:57 UTC
Using 

CFLAGS=-fPIE LDFLAGS=-pie ./configure 

I see the same performance drop with gcc 4.7.2 and gcc 4.8.2


CFLAGS=-fPIE LDFLAGS=-pie ./configure compared to regular build brings performance drop from 346MB/s to 200MB/s :

./haveged_PIE/sbin/haveged -otc -n0 | dd iflag=fullblock of=/dev/null bs=4k count=262144
Writing unlimited bytes to stdout
262144+0 records in
262144+0 records out
1073741824 bytes (1.1 GB) copied, 5.37325 s, 200 MB/s

Please comment on the results. If this is expected I will close the bug. How much does this bring in terms of hardening?

Jirka

Comment 8 Jeff Law 2014-07-11 18:41:24 UTC
Jiri,

The performance drop would be highly dependent on the code's behaviour.  While a 50% hit is bad and higher than is typically seen with these options, it's not out of the realm of possibility.  I would suggest using some profiling tools to explore why your application is impacted so heavily and take appropriate action (possibly with some input from Jakub and the larger tools team about how to mitigate the impact).


The hardening effect of PIE is randomize the address space which makes exploiting return-to-libc and ROP attacks harder because its harder to predict where "interesting" code sequences are within the address space.  Typically these are secondary methods of attack -- ie, the bad guys start with a buffer overflow, and then have to resort to ROPs and similar mechanisms to enable the exploit.