Hide Forgot
Created attachment 490692 [details] Core file Description of problem: When testing Bug 694702, I tried to scan the bad image using xfs_repair with -n option(no modify mode), but xfs_repair got segfault. [root@ibm-x3550m3-02 ~]# xfs_repair -nf -o force_geometry xfs.test.img Phase 1 - find and verify superblock... Phase 2 - using internal log - scan filesystem freespace and inode maps... agi unlinked bucket 10 is 4294945279 in ag 0 (inode=4294945279) primary/secondary superblock 1 conflict - AG superblock geometry info conflicts with filesystem geometry would reset bad sb for ag 1 bad uncorrected agheader 1, skipping ag... sb_fdblocks 27887, counted 14580 - found root inode chunk Phase 3 - for each AG... - scan (but don't clear) agi unlinked lists... error following ag 0 unlinked list error following ag 1 unlinked list - process known inodes and perform inode discovery... - agno = 0 bad magic number 0x49f7 on inode 95 bad magic number 0xbe4e on inode 110 entry "config.log" at block 0 offset 192 in directory inode 64 references invalid inode 182518930210889 would clear inode number in entry at offset 192... bad non-zero extent size 33024 for non-realtime/extsize inode 71, would reset to zero bad attr fork offset 142 in inode 73, max=19 would have cleared inode 73 bad extent #0 count (31) in symlink 74 data fork bad data fork in symlink 74 would have cleared inode 74 bad nblocks 957777707213 for inode 88, would reset to 205 bad nblocks 14417942 for inode 92, would reset to 22 would have corrected attribute entry count in inode 93 from 79 to 1 bad magic number 0x49f7 on inode 95, would reset magic number would have cleared inode 95 bad nblocks 93458488360960 for inode 103, would reset to 0 bad magic number 0xbe4e on inode 110, would reset magic number - agno = 1 - process newly discovered inodes... Phase 4 - check for duplicate blocks... - setting up duplicate extent list... - check for inodes claiming duplicate blocks... - agno = 0 - agno = 1 entry "config.log" at block 0 offset 192 in directory inode 64 references invalid inode 182518930210889 would clear inode number in entry at offset 192... entry "config.status" at block 0 offset 216 in directory inode 64 references free inode 74 would clear inode number in entry at offset 216... bad non-zero extent size 33024 for non-realtime/extsize inode 71, would reset to zero bad attr fork offset 142 in inode 73, max=19 would have cleared inode 73 bad extent #0 count (31) in symlink 74 data fork bad data fork in symlink 74 would have cleared inode 74 bad nblocks 957777707213 for inode 88, would reset to 205 bad nblocks 14417942 for inode 92, would reset to 22 bad magic number 0x49f7 on inode 95, would reset magic number would have cleared inode 95 bad nblocks 93458488360960 for inode 103, would reset to 0 bad magic number 0xbe4e on inode 110, would reset magic number No modify flag set, skipping phase 5 Phase 6 - check inode connectivity... - traversing filesystem ... Segmentation fault (core dumped) If run xfs_repair without -n option, the image can be fixed successfully. Version-Release number of selected component (if applicable): xfsprogs-qa-devel-3.1.1-4.el6.x86_64 xfsprogs-3.1.1-4.el6.x86_64 xfsprogs-debuginfo-3.1.1-4.el6.x86_64 xfsprogs-devel-3.1.1-4.el6.x86_64 How reproducible: 100% Steps to Reproduce: 1. Download bad image from Bug 694702 2. xfs_repair -nf -o force_geometry xfs.test.img Actual results: core dump Expected results: No segfault Additional info:
This request was evaluated by Red Hat Product Management for inclusion in the current release of Red Hat Enterprise Linux. Because the affected component is not scheduled to be updated in the current release, Red Hat is unfortunately unable to address this request at this time. Red Hat invites you to ask your support representative to propose this request, if appropriate and relevant, in the next release of Red Hat Enterprise Linux. If you would like it considered as an exception in the current release, please ask your support representative.
Problem persists upstream.
Actually it runs fine on a real repair. It only segfaults with -n.
In phase 6 when it's traversing the fs, there is an invalid inode which was noted before: entry "config.log" at block 0 offset 192 in directory inode 64 references invalid inode 182518930210889 but in no-modify mode, this inode isn't junked, so we encounter it later. The corrupt inode translates to a very large AG, which overflows the array used in find_inode_rec() and segfaults. Not sure of the best way out of this; we could verify_inum() before this segfaulting call, but phase6 has many calls into this code. We may have to pass mp into find_inode_rec() to validate the inode and return NULL for invalid... but that's a lot of churn and a lot of extra tests just for the -n case. Not really sure what the best plan is here.
Program received signal SIGSEGV, Segmentation fault. 0x000000000042304d in find_inode_rec (mp=<value optimized out>, ip=0x6c39c0, num_illegal=0x7fffffffe150, need_dot=0x7fffffffe15c, current_irec=0x7fffd4008930, current_ino_offset=0, bpp=0x6b8cd0, hashtab=0x6c3b10, freetabp=0x7fffffffdf10, da_bno=0, isblock=1) at incore.h:321 321 return((ino_tree_node_t *) Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.7.el6.x86_64 libuuid-2.17.2-6.el6.x86_64 (gdb) bt #0 0x000000000042304d in find_inode_rec (mp=<value optimized out>, ip=0x6c39c0, num_illegal=0x7fffffffe150, need_dot=0x7fffffffe15c, current_irec=0x7fffd4008930, current_ino_offset=0, bpp=0x6b8cd0, hashtab=0x6c3b10, freetabp=0x7fffffffdf10, da_bno=0, isblock=1) at incore.h:321 #1 longform_dir2_entry_check_data (mp=<value optimized out>, ip=0x6c39c0, num_illegal=0x7fffffffe150, need_dot=0x7fffffffe15c, current_irec=0x7fffd4008930, current_ino_offset=0, bpp=0x6b8cd0, hashtab=0x6c3b10, freetabp=0x7fffffffdf10, da_bno=0, isblock=1) at phase6.c:2047 #2 0x0000000000423aab in longform_dir2_entry_check (mp=0x7fffffffe200, ino=64, ip=0x6c39c0, num_illegal=0x7fffffffe150, need_dot=0x7fffffffe15c, irec=0x7fffd4008930, ino_offset=0, hashtab=0x6c3b10) at phase6.c:2519 #3 0x0000000000428912 in process_dir_inode (mp=0x7fffffffe200, agno=<value optimized out>, irec=0x7fffd4008930, ino_offset=0) at phase6.c:3290 #4 0x0000000000428ee4 in traverse_function (mp=0x7fffffffe200) at phase6.c:3606 #5 traverse_ags (mp=0x7fffffffe200) at phase6.c:3648 #6 phase6 (mp=0x7fffffffe200) at phase6.c:3740 #7 0x0000000000431cef in main (argc=<value optimized out>, argv=<value optimized out>) at xfs_repair.c:743
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2011-1736.html