1051239 – Performance drop over 50% of executable produced with CFLAGS=-fPIC LDFLAGS="-z now"

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1051239 - Performance drop over 50% of executable produced with CFLAGS=-fPIC LDFLAGS="-z now"

Summary: Performance drop over 50% of executable produced with CFLAGS=-fPIC LDFLAGS="-...

Keywords:
Status:	CLOSED NOTABUG
Alias:	None
Product:	Red Hat Enterprise Linux 7
Classification:	Red Hat
Component:	gcc
Sub Component:
Version:	7.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	rc
Target Release:	---
Assignee:	Jakub Jelinek
QA Contact:	qe-baseos-tools-bugs
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1048416
TreeView+	depends on / blocked

Reported:	2014-01-09 22:00 UTC by Jiri Hladky
Modified:	2014-07-11 18:41 UTC (History)
CC List:	7 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2014-07-11 18:41:24 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Bugzilla	1048416	0	unspecified	CLOSED	Harden build	2021-02-22 00:41:40 UTC

Internal Links: 1048416

Description Jiri Hladky 2014-01-09 22:00:52 UTC

Description of problem:

According to
https://fedoraproject.org/wiki/Packaging:Guidelines#PIE

I have tried to compile the haveged package with -fPIC and -z now flags:

CFLAGS=-fPIC LDFLAGS="-z now" ./configure --prefix=/${HOME}/haveged/haveged_fPIC_z_now_binary

When I compare the performance of the resulted binary with binary compiled with ./configure there is a more than 50% drop in the performance.

(haveged is a random number generator which is using CPU state as the entropy source) 

The resulting speed of generating random numbers is:

 ./haveged_default_binary/sbin/haveged -otc -n0 | dd iflag=fullblock of=/dev/null bs=4k count=262144
Writing unlimited bytes to stdout
262144+0 records in
262144+0 records out
1073741824 bytes (1.1 GB) copied, 3.30622 s, 325 MB/s

$ ./haveged_fPIC_z_now_binary/sbin/haveged -otc -n0 | dd iflag=fullblock of=/dev/null bs=4k count=262144
Writing unlimited bytes to stdout
262144+0 records in
262144+0 records out
1073741824 bytes (1.1 GB) copied, 5.34215 s, 201 MB/s


The performance drop is from 325 MB/s to 201 MB/s.


Version-Release number of selected component (if applicable):
I have reproduced it on AMD and Intel servers running RHEL7 (RHEL-7.0-20131222.0), Fedora 19 and Fedora 20. It does NOT happen on Fedora 18. 

It happens with gcc 4.8.2
gcc --version
gcc (GCC) 4.8.2 20131106 (Red Hat 4.8.2-3)


How reproducible:
Run attached 
./haveged-compile-test.sh

and watch for the data rates log from dd command.

Actual results:

$ ./haveged_default_binary/sbin/haveged -otc -n0 | dd iflag=fullblock of=/dev/null bs=4k count=262144
Writing unlimited bytes to stdout
262144+0 records in
262144+0 records out
1073741824 bytes (1.1 GB) copied, 3.30622 s, 325 MB/s

$ ./haveged_fPIC_z_now_binary/sbin/haveged -otc -n0 | dd iflag=fullblock of=/dev/null bs=4k count=262144
Writing unlimited bytes to stdout
262144+0 records in
262144+0 records out
1073741824 bytes (1.1 GB) copied, 5.34215 s, 201 MB/s


Expected results:
Both rates to be the same.

Additional info:
It's important to run haveged with -otc option. This will turn off the online statistical tests which have huge impact on runtimes. The impact depends on the random data produced by the program which is however each time different.

Comment 1 Jakub Jelinek 2014-01-09 22:12:30 UTC

You haven't said which architecture this is on, but in any case, I wonder why do you expect that -fPIC wouldn't have significant performance impact.  It of course has significant performance impact, even on x86_64.

Comment 2 Jiri Hladky 2014-01-09 22:24:58 UTC

I have performed the tests on x86_64.

Well, over 50% performance drop is a HUGE number. In fact, the performance drop is as large as 80% when online statistical testing of the produced data is enabled.

Plus there is no measurable performance drop when using  gcc 4.7.2. See bellow results on F18 with gcc (GCC) 4.7.2 20121109 (Red Hat 4.7.2-8). This is why I believe it's a bug in gcc 4.8.2


$haveged_default/sbin/haveged -otc -n0 | dd iflag=fullblock of=/dev/null bs=4k count=262144
Writing unlimited bytes to stdout
262144+0 records in
262144+0 records out
1073741824 bytes (1.1 GB) copied, 3.09276 s, 347 MB/s


$haveged_fPIC_z_now/sbin/haveged -otc -n0 | dd iflag=fullblock of=/dev/null bs=4k count=262144
Writing unlimited bytes to stdout
262144+0 records in
262144+0 records out
1073741824 bytes (1.1 GB) copied, 3.10366 s, 346 MB/s

Comment 4 Jakub Jelinek 2014-01-09 22:42:22 UTC

Well, if it is a binary rather than shared library, then using -fPIC makes no sense, plus if you aren't linking it as a -pie even -fPIE doesn't make sense.
Hardening effect (very little in fact) is only achieved if you build the binary as a -pie, and in that case you should compile it's source files with -fPIE.  You might get better performance with -fPIE over -fPIC, in that case the compiler can assume locally defined public symbols bind locally and thus can use IP relative addressing instead of extra indirection.  Similarly, you can improve code by using visibility attribude, guaranteeing to the compiler what symbols will be defined in the binary and thus can be addressed directly (which is ~ as expensive as accessing vars for -fno-pic).

Comment 5 Jiri Hladky 2014-01-09 23:33:15 UTC

Hi Jakub,

thanks for sharing this information. Clearly

https://fedoraproject.org/wiki/Packaging:Guidelines#PIE

is misleading as it states:
====================================================================
%global _hardened_build 1

This adds -fPIC (if -fPIE is not already present) to the compiler flags, and adds -z now to the linker flags. 
====================================================================

The package consists of shared library and front-end program compiled against this shared library.

I will try
CFLAGS=-fPIE LDFLAGS=-pie ./configure 

and post my results here. 

Could you please comment on hardening effect with CFLAGS=-fPIE LDFLAGS=-pie flags? Is it worth the effort? In comment 4 you say that hardening effect is very little, is that right?

Jirka

Comment 6 Jiri Hladky 2014-01-09 23:45:57 UTC

Using 

CFLAGS=-fPIE LDFLAGS=-pie ./configure 

I see the same performance drop with gcc 4.7.2 and gcc 4.8.2


CFLAGS=-fPIE LDFLAGS=-pie ./configure compared to regular build brings performance drop from 346MB/s to 200MB/s :

./haveged_PIE/sbin/haveged -otc -n0 | dd iflag=fullblock of=/dev/null bs=4k count=262144
Writing unlimited bytes to stdout
262144+0 records in
262144+0 records out
1073741824 bytes (1.1 GB) copied, 5.37325 s, 200 MB/s

Please comment on the results. If this is expected I will close the bug. How much does this bring in terms of hardening?

Jirka

Comment 8 Jeff Law 2014-07-11 18:41:24 UTC

Jiri,

The performance drop would be highly dependent on the code's behaviour.  While a 50% hit is bad and higher than is typically seen with these options, it's not out of the realm of possibility.  I would suggest using some profiling tools to explore why your application is impacted so heavily and take appropriate action (possibly with some input from Jakub and the larger tools team about how to mitigate the impact).


The hardening effect of PIE is randomize the address space which makes exploiting return-to-libc and ROP attacks harder because its harder to predict where "interesting" code sequences are within the address space.  Typically these are secondary methods of attack -- ie, the bad guys start with a buffer overflow, and then have to resort to ROPs and similar mechanisms to enable the exploit.

Note You need to log in before you can comment on or make changes to this bug.