RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 809616 - default enabled --build-id causes excessive memory utilization for f77 codes with a large BSS
Summary: default enabled --build-id causes excessive memory utilization for f77 codes ...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: binutils
Version: 6.2
Hardware: All
OS: Linux
medium
medium
Target Milestone: rc
: ---
Assignee: Nick Clifton
QA Contact: qe-baseos-tools-bugs
URL:
Whiteboard:
: 691347 (view as bug list)
Depends On:
Blocks: 787802
TreeView+ depends on / blocked
 
Reported: 2012-04-03 20:00 UTC by Travis Gummels
Modified: 2018-11-28 21:49 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2012-06-20 14:03:47 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Reproducer (29.17 KB, application/x-bzip)
2012-04-03 20:00 UTC, Travis Gummels
no flags Details
Skip sections with no contents (843 bytes, patch)
2012-04-05 09:03 UTC, Nick Clifton
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2012:0872 0 normal SHIPPED_LIVE binutils bug fix and enhancement update 2012-06-19 20:47:50 UTC

Description Travis Gummels 2012-04-03 20:00:10 UTC
Created attachment 574941 [details]
Reproducer

Description of problem:

Fortran77 requires static arrays. These end up as BSS sections in the binary. 
This particular test has a large array in memory:
[25] .bss              NOBITS           0000000000608380  00008374
      00000004fb4d1528  0000000000000000  WA       0     0     32
This 0x4fb4d1528 ends up being 19GB 

When this process is being linked and it gets to the part where it it calculates the build-id. It allocates all that space (which is nothing but zeros) and then calculates the buildid including that. This greatly increases the time needed to link the binary, can cause this shared diskless machine to oomkill, and impacts other users who are trying to do something. 

When we attach to the bloated process with gdb:

(gdb) where
#0  0x00000000004282a3 in sha1_process_block (buffer=0x2aaf56f99bde,  len=<value optimized out>, ctx=0x7fffffffd2c0) at ./sha1.c:355
#1  0x000000000042925b in sha1_process_bytes (buffer=<value optimized out>,  len=<value optimized out>, ctx=0x7fffffffd2c0) at ./sha1.c:245
#2  0x00002aaaaad1beca in bfd_elf64_checksum_contents (abfd=0x6a5be0,  process=0x4291b0 <sha1_process_bytes>, arg=0x7fffffffd2c0) at elfcode.h:1206
#3  0x0000000000420ee7 in gldelf_x86_64_write_build_id_section (abfd=0x6a5be0) at eelf_x86_64.c:906
#4  0x00002aaaaad26a2f in _bfd_elf_write_object_contents (abfd=0x6a5be0) at elf.c:5155
#5  0x00002aaaaad01437 in bfd_close (abfd=0x6a5be0) at opncls.c:692
#6  0x0000000000417f7c in main (argc=46, argv=0x7fffffffd6b8) at ./ldmain.c:515

Looking at where the problem seems to be:
#2  0x00002aaaaad1beca in bfd_elf64_checksum_contents (abfd=0x6a5be0,  process=0x4291b0 <sha1_process_bytes>, arg=0x7fffffffd2c0) at elfcode.h:1206
1206                    (*process) (sec->contents, i_shdr.sh_size, arg);
(gdb) p *i_shdr 

$3 = {sh_name = 240, sh_type = 8, sh_flags = 3, sh_addr = 6325120,  sh_offset = 0, sh_size = 21396002088, sh_link = 0, sh_info = 0,  sh_addralign = 32, sh_entsize = 0, bfd_section = 0xbd2b30, contents = 0x0}

There is our 19GB.

Thus we can see clearly that the problem is when it is calculating the checksum of for the BSS.

We have a work around passing in -Wl,--build-id={none,uuid} but we believe that it would better if we had an optimized buildid calculation which didn't allocate the bss when it calculates the checksum. 

The problem appears to only happen on rhel6 not rhel5 or F16.

It seems to be geared toward mpich2 rather than openmpi.

Version-Release number of selected component (if applicable):
binutils-2.20.51.0.2-5.28.el6.x86_64

How reproducible:
100%

Steps to Reproduce:

The attached reproducer requires MPI.
  
Actual results:
Excessive memory utilization.

Expected results:
Appropriate memory utilization.

Additional info:

Comment 1 Nick Clifton 2012-04-05 09:03:21 UTC
Created attachment 575329 [details]
Skip sections with no contents

Comment 2 Nick Clifton 2012-04-05 09:05:39 UTC
This bug has also been reported against the FSF binutils:

  http://sourceware.org/bugzilla/show_bug.cgi?id=12451

The uploaded patch is a simplified version of the patch that fixes that PR.  Once some internal networking problems are resolved I will add it to the RHEL6.3 binutils rpm.

Cheers
  Nick

Comment 3 Ben Woodard 2012-04-05 18:17:46 UTC
Thanks Nick, 

I never would have made the association between 12451 and the reported bug. The patch looks really simple and I think that the users are OK with the current workaround for a bit but if you would like some testing on some alpha/beta test packages for 6.3 just let us know where to find the packages and we'll run them through our env for a while.

Comment 5 Michal Nowak 2012-04-10 14:33:14 UTC
(In reply to comment #0)
> Created attachment 574941 [details]
> Reproducer

$ sh REPRODUCER_SCRIPT.sh
[...]
mpif77 -c  -O3 -fPIC cg.f
make[1]: mpif77: Command not found
[...]

I have openmpi and mpich2 packages installed. How do I compile it on RHEL6?

Comment 7 Ben Woodard 2012-04-10 17:24:02 UTC
right or wrong mpif77 is in /usr/lib64/mpich2/bin/mpif77 and so you need to add /usr/lib64/mpich2/bin to your path.

Comment 10 Jeff Law 2012-04-17 21:23:20 UTC
*** Bug 691347 has been marked as a duplicate of this bug. ***

Comment 15 errata-xmlrpc 2012-06-20 14:03:47 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2012-0872.html


Note You need to log in before you can comment on or make changes to this bug.