RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1262498 - r_ext4_small_bg test failure with gcc-4.8.5-4, valgrind errors
Summary: r_ext4_small_bg test failure with gcc-4.8.5-4, valgrind errors
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: e2fsprogs
Version: 7.2
Hardware: ppc64le
OS: Linux
unspecified
unspecified
Target Milestone: rc
: ---
Assignee: Eric Sandeen
QA Contact: Boyang Xue
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-09-11 23:35 UTC by Martin Sebor
Modified: 2016-11-04 06:41 UTC (History)
3 users (show)

Fixed In Version: e2fsprogs-1.42.9-9.el7
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-11-04 06:41:40 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Valgrind output for e2fsck. (126.51 KB, text/plain)
2015-09-11 23:35 UTC, Martin Sebor
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2016:2454 0 normal SHIPPED_LIVE e2fsprogs bug fix update 2016-11-03 14:04:38 UTC

Description Martin Sebor 2015-09-11 23:35:31 UTC
Created attachment 1072655 [details]
Valgrind output for e2fsck.

After upgrading GCC to gcc version 4.8.5 20150623 (Red Hat 4.8.5-4), the r_ext4_small_bg test fails on ppc64le (I haven't tested other targets) with the following output:

Running e2fsprogs test suite...
 
dumpe2fs 1.42.9 (28-Dec-2013)
r_ext4_small_bg: ext4 1024 blocksize with small block groups: failed
141 tests succeeded	1 tests failed
Tests failed: r_ext4_small_bg 

All other tests pass.

While debugging the failure I narrowed it down to the e2fsck/util.c file where I was able to make it go away by changing compiler options or making small code changes.  For instance, disabling optimization helped, as well as compiling the file with _FORTIFY_SOURCE undefined (and optimization enabled), and surprisingly, even removing -g or stripping the e2fsck program.  Since neither the presence or absence of debugging symbols or other symbols has any effect on the generated code, the problem must be in the program data.  Running the e2fsck program under valgrind revealed a large number of errors pointing out uses of uninitialized data (see the attachment). I believe these are the cause of the test failure.

Comment 2 Martin Sebor 2015-09-11 23:48:24 UTC
I should clarify that the e2fsck/util.c file isn't the only one where even small changes can cause the test failure to disappear.  The failure can also be eliminated by making what should otherwise be inconsequential changes in other source files that e2fsck links with.

Comment 3 Eric Sandeen 2015-09-14 14:11:40 UTC
==85688== Conditional jump or move depends on uninitialised value(s)
==85688==    at 0x42EC394: ??? (in /usr/lib64/power8/libc-2.17.so)
==85688==    by 0x434C2BF: ??? (in /usr/lib64/power8/libc-2.17.so)
==85688==    by 0x40DC287: check_mntent_file (ismounted.c:112)
==85688==    by 0x40DC803: check_mntent (ismounted.c:227)
==85688==    by 0x40DC803: ext2fs_check_mount_point (ismounted.c:360)
==85688==    by 0x40DC91F: ext2fs_check_if_mounted (ismounted.c:400)
==85688==    by 0x100098E3: check_mount (unix.c:228)
==85688==    by 0x100098E3: main (unix.c:1234)

Could you please run this again with glibc-debuginfo installed?

Thanks,
-Eric

Comment 4 Eric Sandeen 2015-09-14 19:19:41 UTC
Rebuilding with

gcc version 4.8.5 20150623 (Red Hat 4.8.5-4) (GCC) 

on x86_64 succeeded; could this be arch-specific?

Or - did you run this by rebuilding the RHEL7 RPM, or some other e2fsprogs version?

Comment 5 Martin Sebor 2015-09-14 19:52:33 UTC
This came up during a mass RHEL 7.2 rebuild on powerpc64le.  I haven't tried any other targets.  The exact e2fsprogs version is 1.42.9-8.el7.  For some reason mock/yum isn't finding glibc-devel here so I don't have a more complete stack trace at the moment.

Comment 6 Eric Sandeen 2015-09-14 20:54:58 UTC
Ok, on ppc64le I'm getting the failure too, but so far valgrind isn't showing anything...

Comment 7 Martin Sebor 2015-09-15 14:59:36 UTC
I can reproduce the same or similar valgrind errors even with GCC 4.8.3-9 by modifying tests/scripts/resize_test like so and rerunning the test via 'make check TESTS=r_ext4_small_bg':

--- tests/scripts/resize_test.~0~	2015-09-15 10:58:04.866164753 -0400
+++ tests/scripts/resize_test	2015-09-15 10:58:08.466211056 -0400
@@ -53,7 +53,7 @@
 fi
 
 echo $FSCK -fp $TMPFILE >> $LOG 2>&1 
-if ! $FSCK -fp $TMPFILE >> $LOG 2>&1
+if ! valgrind $FSCK -fp $TMPFILE >> $LOG 2>&1
 then
 	dumpe2fs $TMPFILE >> $LOG
 	return 1

Comment 8 Eric Sandeen 2015-09-15 17:27:08 UTC
Weird, I did the same, and get nothing from valgrind.

But if you get the same errors, then the valgrind output is probably not related to the new-gcc-specific failure, I suppose.

Comment 9 Eric Sandeen 2015-09-17 00:37:35 UTC
I'm not sure this is a gcc issue.  if I take the original mkfs'd filesystem, transport it to another old rhel6 machine, and run resize on it there, I get minor corruption.  It seems to be a bitmap marking problem during resize.

I can't explain the gcc impact; it looks like just a straightforward bug.

If gcc affects the layout of the original un-resized filesystem, somehow, maybe that's it?  Very strange.

Comment 10 Eric Sandeen 2015-09-17 23:04:53 UTC
This sure looks like a plain bug in resize2fs.  Patch sent upstream:

http://marc.info/?l=linux-ext4&m=144252982403894&w=2

I can't explain how gcc versions might tickle this, unless it's affecting something else which changes how allocations behave during the test...

-Eric

Comment 11 Eric Sandeen 2015-09-18 03:20:10 UTC
I think the only reason the different gcc tweaked the bug is that the
test copies the e2fsck/e2fsck binary into the filesystem under test,
and the size changes depending on the compiler.

This leads to a different allocation pattern, and tickles the bug.

I don't think it's a gcc problem, or even a regression, though of course we'd like it to pass the self-checks on rebuild...

-Eric

Comment 12 Martin Sebor 2015-09-18 16:15:57 UTC
I also don't believe it's a gcc bug (even though had initially I suspected it because of the effect of even subtle code changes, until I noticed they had no impact on the generated assembly). Thanks for the confirmation!

Comment 13 Eric Sandeen 2016-01-14 14:10:35 UTC
The patch has been sent upstream, but never merged, pinged again...

Comment 14 Eric Sandeen 2016-02-19 22:51:26 UTC
moving to rhel7.3; still can't get any success merging it uptream despite 2 reviewers, but I do have a patch that fixes this.

-Eric

Comment 16 Eric Sandeen 2016-06-13 14:48:27 UTC
commit f3745728bc254892da4c569ba3fd8801895f3524
Author: Eric Sandeen <sandeen>
Date:   Sun Mar 6 21:51:23 2016 -0500

    resize2fs: clear uninit BG if allocating from new group
    
    If resize2fs_get_alloc_block() allocates from a BLOCK_UNINIT group, we
    need to make sure that the UNINIT flag is cleared on both file system
    structures which are maintained by resize2fs.  This causes the
    modified bitmaps to not get written out, which leads to post-resize2fs
    e2fsck errors; used blocks in UNINIT groups, not marked in the block
    bitmap.  This was seen on r_ext4_small_bg.
    
    This patch uses clear_block_uninit() to clear the flag,
    and my problem goes away.
    
    Signed-off-by: Eric Sandeen <sandeen>
    Reviewed-by: Darrick J. Wong <darrick.wong>
    Reviewed-by: Andreas Dilger <adilger>
    Signed-off-by: Theodore Ts'o <tytso>

Comment 20 errata-xmlrpc 2016-11-04 06:41:40 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-2454.html


Note You need to log in before you can comment on or make changes to this bug.