Bug 694706 - [xfsprogs] xfs_repair -n segfault on corrupted image
Summary: [xfsprogs] xfs_repair -n segfault on corrupted image
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: xfsprogs
Version: 6.1
Hardware: x86_64
OS: Unspecified
low
medium
Target Milestone: rc
: ---
Assignee: Lukáš Czerner
QA Contact: BaseOS QE - Apps
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-04-08 04:52 UTC by Eryu Guan
Modified: 2011-12-06 18:18 UTC (History)
3 users (show)

Fixed In Version: xfsprogs-3.1.1-6.el6
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2011-12-06 18:18:15 UTC
Target Upstream Version:


Attachments (Terms of Use)
Core file (615.78 KB, application/x-gzip)
2011-04-08 04:52 UTC, Eryu Guan
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2011:1736 0 normal SHIPPED_LIVE xfsprogs bug fix update 2011-12-06 01:01:55 UTC

Description Eryu Guan 2011-04-08 04:52:09 UTC
Created attachment 490692 [details]
Core file

Description of problem:
When testing Bug 694702, I tried to scan the bad image using xfs_repair with -n option(no modify mode), but xfs_repair got segfault.

[root@ibm-x3550m3-02 ~]# xfs_repair -nf -o force_geometry xfs.test.img
Phase 1 - find and verify superblock...
Phase 2 - using internal log
        - scan filesystem freespace and inode maps...
agi unlinked bucket 10 is 4294945279 in ag 0 (inode=4294945279)
primary/secondary superblock 1 conflict - AG superblock geometry info conflicts with filesystem geometry
would reset bad sb for ag 1
bad uncorrected agheader 1, skipping ag...
sb_fdblocks 27887, counted 14580
        - found root inode chunk
Phase 3 - for each AG... 
        - scan (but don't clear) agi unlinked lists...
error following ag 0 unlinked list
error following ag 1 unlinked list
        - process known inodes and perform inode discovery...
        - agno = 0
bad magic number 0x49f7 on inode 95
bad magic number 0xbe4e on inode 110
entry "config.log" at block 0 offset 192 in directory inode 64 references invalid inode 182518930210889
        would clear inode number in entry at offset 192...
bad non-zero extent size 33024 for non-realtime/extsize inode 71, would reset to zero
bad attr fork offset 142 in inode 73, max=19
would have cleared inode 73
bad extent #0 count (31) in symlink 74 data fork
bad data fork in symlink 74
would have cleared inode 74
bad nblocks 957777707213 for inode 88, would reset to 205
bad nblocks 14417942 for inode 92, would reset to 22
would have corrected attribute entry count in inode 93 from 79 to 1
bad magic number 0x49f7 on inode 95, would reset magic number
would have cleared inode 95
bad nblocks 93458488360960 for inode 103, would reset to 0
bad magic number 0xbe4e on inode 110, would reset magic number
        - agno = 1
        - process newly discovered inodes...
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list...
        - check for inodes claiming duplicate blocks...
        - agno = 0
        - agno = 1
entry "config.log" at block 0 offset 192 in directory inode 64 references invalid inode 182518930210889
        would clear inode number in entry at offset 192...
entry "config.status" at block 0 offset 216 in directory inode 64 references free inode 74
        would clear inode number in entry at offset 216...
bad non-zero extent size 33024 for non-realtime/extsize inode 71, would reset to zero
bad attr fork offset 142 in inode 73, max=19
would have cleared inode 73
bad extent #0 count (31) in symlink 74 data fork
bad data fork in symlink 74
would have cleared inode 74
bad nblocks 957777707213 for inode 88, would reset to 205
bad nblocks 14417942 for inode 92, would reset to 22
bad magic number 0x49f7 on inode 95, would reset magic number
would have cleared inode 95
bad nblocks 93458488360960 for inode 103, would reset to 0
bad magic number 0xbe4e on inode 110, would reset magic number
No modify flag set, skipping phase 5
Phase 6 - check inode connectivity...
        - traversing filesystem ...
Segmentation fault (core dumped)

If run xfs_repair without -n option, the image can be fixed successfully.


Version-Release number of selected component (if applicable):
xfsprogs-qa-devel-3.1.1-4.el6.x86_64
xfsprogs-3.1.1-4.el6.x86_64
xfsprogs-debuginfo-3.1.1-4.el6.x86_64
xfsprogs-devel-3.1.1-4.el6.x86_64

How reproducible:
100%

Steps to Reproduce:
1. Download bad image from Bug 694702
2. xfs_repair -nf -o force_geometry xfs.test.img
  
Actual results:
core dump

Expected results:
No segfault

Additional info:

Comment 2 RHEL Program Management 2011-04-08 05:17:33 UTC
This request was evaluated by Red Hat Product Management for
inclusion in the current release of Red Hat Enterprise Linux.
Because the affected component is not scheduled to be updated
in the current release, Red Hat is unfortunately unable to
address this request at this time. Red Hat invites you to
ask your support representative to propose this request, if
appropriate and relevant, in the next release of Red Hat
Enterprise Linux. If you would like it considered as an
exception in the current release, please ask your support
representative.

Comment 3 Eric Sandeen 2011-04-20 19:38:45 UTC
Problem persists upstream.

Comment 4 Eric Sandeen 2011-04-20 19:44:24 UTC
Actually it runs fine on a real repair.  It only segfaults with -n.

Comment 5 Eric Sandeen 2011-04-20 20:35:57 UTC
In phase 6 when it's traversing the fs, there is an invalid inode which was noted before:

entry "config.log" at block 0 offset 192 in directory inode 64 references invalid inode 182518930210889

but in no-modify mode, this inode isn't junked, so we encounter it later.

The corrupt inode translates to a very large AG, which overflows the array used in find_inode_rec() and segfaults.

Not sure of the best way out of this; we could verify_inum() before this segfaulting call, but phase6 has many calls into this code.

We may have to pass mp into find_inode_rec() to validate the inode and return NULL for invalid... but that's a lot of churn and a lot of extra tests just for the -n case.

Not really sure what the best plan is here.

Comment 6 Eric Sandeen 2011-04-20 20:47:07 UTC
Program received signal SIGSEGV, Segmentation fault.
0x000000000042304d in find_inode_rec (mp=<value optimized out>, ip=0x6c39c0, num_illegal=0x7fffffffe150, need_dot=0x7fffffffe15c, 
    current_irec=0x7fffd4008930, current_ino_offset=0, bpp=0x6b8cd0, hashtab=0x6c3b10, freetabp=0x7fffffffdf10, da_bno=0, isblock=1) at incore.h:321
321		return((ino_tree_node_t *)
Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.7.el6.x86_64 libuuid-2.17.2-6.el6.x86_64
(gdb) bt
#0  0x000000000042304d in find_inode_rec (mp=<value optimized out>, ip=0x6c39c0, num_illegal=0x7fffffffe150, need_dot=0x7fffffffe15c, 
    current_irec=0x7fffd4008930, current_ino_offset=0, bpp=0x6b8cd0, hashtab=0x6c3b10, freetabp=0x7fffffffdf10, da_bno=0, isblock=1) at incore.h:321
#1  longform_dir2_entry_check_data (mp=<value optimized out>, ip=0x6c39c0, num_illegal=0x7fffffffe150, need_dot=0x7fffffffe15c, 
    current_irec=0x7fffd4008930, current_ino_offset=0, bpp=0x6b8cd0, hashtab=0x6c3b10, freetabp=0x7fffffffdf10, da_bno=0, isblock=1) at phase6.c:2047
#2  0x0000000000423aab in longform_dir2_entry_check (mp=0x7fffffffe200, ino=64, ip=0x6c39c0, num_illegal=0x7fffffffe150, need_dot=0x7fffffffe15c, 
    irec=0x7fffd4008930, ino_offset=0, hashtab=0x6c3b10) at phase6.c:2519
#3  0x0000000000428912 in process_dir_inode (mp=0x7fffffffe200, agno=<value optimized out>, irec=0x7fffd4008930, ino_offset=0) at phase6.c:3290
#4  0x0000000000428ee4 in traverse_function (mp=0x7fffffffe200) at phase6.c:3606
#5  traverse_ags (mp=0x7fffffffe200) at phase6.c:3648
#6  phase6 (mp=0x7fffffffe200) at phase6.c:3740
#7  0x0000000000431cef in main (argc=<value optimized out>, argv=<value optimized out>) at xfs_repair.c:743

Comment 12 errata-xmlrpc 2011-12-06 18:18:15 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2011-1736.html


Note You need to log in before you can comment on or make changes to this bug.