Bug 1262498
Summary: | r_ext4_small_bg test failure with gcc-4.8.5-4, valgrind errors | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Martin Sebor <msebor> | ||||
Component: | e2fsprogs | Assignee: | Eric Sandeen <esandeen> | ||||
Status: | CLOSED ERRATA | QA Contact: | Boyang Xue <bxue> | ||||
Severity: | unspecified | Docs Contact: | |||||
Priority: | unspecified | ||||||
Version: | 7.2 | CC: | eguan, msebor, xzhou | ||||
Target Milestone: | rc | ||||||
Target Release: | --- | ||||||
Hardware: | ppc64le | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | e2fsprogs-1.42.9-9.el7 | Doc Type: | Bug Fix | ||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2016-11-04 06:41:40 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
I should clarify that the e2fsck/util.c file isn't the only one where even small changes can cause the test failure to disappear. The failure can also be eliminated by making what should otherwise be inconsequential changes in other source files that e2fsck links with. ==85688== Conditional jump or move depends on uninitialised value(s) ==85688== at 0x42EC394: ??? (in /usr/lib64/power8/libc-2.17.so) ==85688== by 0x434C2BF: ??? (in /usr/lib64/power8/libc-2.17.so) ==85688== by 0x40DC287: check_mntent_file (ismounted.c:112) ==85688== by 0x40DC803: check_mntent (ismounted.c:227) ==85688== by 0x40DC803: ext2fs_check_mount_point (ismounted.c:360) ==85688== by 0x40DC91F: ext2fs_check_if_mounted (ismounted.c:400) ==85688== by 0x100098E3: check_mount (unix.c:228) ==85688== by 0x100098E3: main (unix.c:1234) Could you please run this again with glibc-debuginfo installed? Thanks, -Eric Rebuilding with gcc version 4.8.5 20150623 (Red Hat 4.8.5-4) (GCC) on x86_64 succeeded; could this be arch-specific? Or - did you run this by rebuilding the RHEL7 RPM, or some other e2fsprogs version? This came up during a mass RHEL 7.2 rebuild on powerpc64le. I haven't tried any other targets. The exact e2fsprogs version is 1.42.9-8.el7. For some reason mock/yum isn't finding glibc-devel here so I don't have a more complete stack trace at the moment. Ok, on ppc64le I'm getting the failure too, but so far valgrind isn't showing anything... I can reproduce the same or similar valgrind errors even with GCC 4.8.3-9 by modifying tests/scripts/resize_test like so and rerunning the test via 'make check TESTS=r_ext4_small_bg': --- tests/scripts/resize_test.~0~ 2015-09-15 10:58:04.866164753 -0400 +++ tests/scripts/resize_test 2015-09-15 10:58:08.466211056 -0400 @@ -53,7 +53,7 @@ fi echo $FSCK -fp $TMPFILE >> $LOG 2>&1 -if ! $FSCK -fp $TMPFILE >> $LOG 2>&1 +if ! valgrind $FSCK -fp $TMPFILE >> $LOG 2>&1 then dumpe2fs $TMPFILE >> $LOG return 1 Weird, I did the same, and get nothing from valgrind. But if you get the same errors, then the valgrind output is probably not related to the new-gcc-specific failure, I suppose. I'm not sure this is a gcc issue. if I take the original mkfs'd filesystem, transport it to another old rhel6 machine, and run resize on it there, I get minor corruption. It seems to be a bitmap marking problem during resize. I can't explain the gcc impact; it looks like just a straightforward bug. If gcc affects the layout of the original un-resized filesystem, somehow, maybe that's it? Very strange. This sure looks like a plain bug in resize2fs. Patch sent upstream: http://marc.info/?l=linux-ext4&m=144252982403894&w=2 I can't explain how gcc versions might tickle this, unless it's affecting something else which changes how allocations behave during the test... -Eric I think the only reason the different gcc tweaked the bug is that the test copies the e2fsck/e2fsck binary into the filesystem under test, and the size changes depending on the compiler. This leads to a different allocation pattern, and tickles the bug. I don't think it's a gcc problem, or even a regression, though of course we'd like it to pass the self-checks on rebuild... -Eric I also don't believe it's a gcc bug (even though had initially I suspected it because of the effect of even subtle code changes, until I noticed they had no impact on the generated assembly). Thanks for the confirmation! The patch has been sent upstream, but never merged, pinged again... moving to rhel7.3; still can't get any success merging it uptream despite 2 reviewers, but I do have a patch that fixes this. -Eric commit f3745728bc254892da4c569ba3fd8801895f3524 Author: Eric Sandeen <sandeen> Date: Sun Mar 6 21:51:23 2016 -0500 resize2fs: clear uninit BG if allocating from new group If resize2fs_get_alloc_block() allocates from a BLOCK_UNINIT group, we need to make sure that the UNINIT flag is cleared on both file system structures which are maintained by resize2fs. This causes the modified bitmaps to not get written out, which leads to post-resize2fs e2fsck errors; used blocks in UNINIT groups, not marked in the block bitmap. This was seen on r_ext4_small_bg. This patch uses clear_block_uninit() to clear the flag, and my problem goes away. Signed-off-by: Eric Sandeen <sandeen> Reviewed-by: Darrick J. Wong <darrick.wong> Reviewed-by: Andreas Dilger <adilger> Signed-off-by: Theodore Ts'o <tytso> Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2016-2454.html |
Created attachment 1072655 [details] Valgrind output for e2fsck. After upgrading GCC to gcc version 4.8.5 20150623 (Red Hat 4.8.5-4), the r_ext4_small_bg test fails on ppc64le (I haven't tested other targets) with the following output: Running e2fsprogs test suite... dumpe2fs 1.42.9 (28-Dec-2013) r_ext4_small_bg: ext4 1024 blocksize with small block groups: failed 141 tests succeeded 1 tests failed Tests failed: r_ext4_small_bg All other tests pass. While debugging the failure I narrowed it down to the e2fsck/util.c file where I was able to make it go away by changing compiler options or making small code changes. For instance, disabling optimization helped, as well as compiling the file with _FORTIFY_SOURCE undefined (and optimization enabled), and surprisingly, even removing -g or stripping the e2fsck program. Since neither the presence or absence of debugging symbols or other symbols has any effect on the generated code, the problem must be in the program data. Running the e2fsck program under valgrind revealed a large number of errors pointing out uses of uninitialized data (see the attachment). I believe these are the cause of the test failure.