| Summary: | [xfs] BUG: unable to handle kernel NULL pointer dereference at 0000000000000010 | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 6 | Reporter: | Eryu Guan <eguan> | ||||
| Component: | kernel | Assignee: | Dave Chinner <dchinner> | ||||
| Status: | CLOSED DUPLICATE | QA Contact: | Red Hat Kernel QE team <kernel-qe> | ||||
| Severity: | high | Docs Contact: | |||||
| Priority: | unspecified | ||||||
| Version: | 6.1 | CC: | dchinner, rwheeler | ||||
| Target Milestone: | rc | ||||||
| Target Release: | --- | ||||||
| Hardware: | x86_64 | ||||||
| OS: | Unspecified | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2011-10-13 03:12:05 UTC | Type: | --- | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Attachments: |
|
||||||
|
Description
Eryu Guan
2011-04-02 06:36:54 UTC
Call trace BUG: unable to handle kernel NULL pointer dereference at 0000000000000010 IP: [<ffffffffa027801e>] xfs_alloc_fix_freelist+0x2e/0x480 [xfs] PGD 2711c9067 PUD 275ac2067 PMD 0 Oops: 0000 [#1] SMP last sysfs file: /sys/devices/system/cpu/cpu15/cache/index2/shared_cpu_map CPU 0 Modules linked in: xfs exportfs autofs4 sunrpc cpufreq_ondemand acpi_cpufreq freq_table ipv6 dm_mirror dm_region_hash dm_log cdc_ether usbnet mii serio_raw iTCO_wdt iTCO_vendor_support i2c_i801 i2c_core shpchp sg ioatdma dca i7core_edac edac_core bnx2 ext4 mbcache jbd2 sd_mod crc_t10dif pata_acpi ata_generic ata_piix mptsas mptscsih mptbase scsi_transport_sas dm_mod [last unloaded: microcode] Modules linked in: xfs exportfs autofs4 sunrpc cpufreq_ondemand acpi_cpufreq freq_table ipv6 dm_mirror dm_region_hash dm_log cdc_ether usbnet mii serio_raw iTCO_wdt iTCO_vendor_support i2c_i801 i2c_core shpchp sg ioatdma dca i7core_edac edac_core bnx2 ext4 mbcache jbd2 sd_mod crc_t10dif pata_acpi ata_generic ata_piix mptsas mptscsih mptbase scsi_transport_sas dm_mod [last unloaded: microcode] Pid: 7279, comm: fstest Tainted: G W ---------------- 2.6.32-122.el6.x86_64 #1 System x3550 M3 -[7944I21]- RIP: 0010:[<ffffffffa027801e>] [<ffffffffa027801e>] xfs_alloc_fix_freelist+0x2e/0x480 [xfs] RSP: 0018:ffff880473d31b58 EFLAGS: 00010292 RAX: 0000000000000000 RBX: ffff880473d31c38 RCX: 000000000000000c RDX: ffff880473d001f8 RSI: 0000000000000002 RDI: ffff880473d31c38 RBP: ffff880473d31c28 R08: 0000000000000000 R09: ffff8802730c1d48 R10: 000000000000000a R11: 0000000000000000 R12: 000000000000de3a R13: 0000000000000002 R14: 0000000000000000 R15: ffff880473d00000 FS: 00007f25c5ecd700(0000) GS:ffff880028200000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000010 CR3: 000000027513c000 CR4: 00000000000006f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process fstest (pid: 7279, threadinfo ffff880473d30000, task ffff880475ad8b00) Stack: ffff880274d5fe00 ffff880273dfa0c0 ffff880473d31bd8 0000000000000000 <0> ffff880473d31be8 0000000000000000 ffff880273dfa100 0004125000000010 <0> ffff880273dfa0e0 ffff880273dfa0d0 ffff88026a533d18 0000000000000000 Call Trace: [<ffffffffa02b9278>] ? xfs_perag_get+0x48/0xc0 [xfs] [<ffffffffa02784f3>] xfs_free_extent+0x83/0xc0 [xfs] [<ffffffffa0281f0d>] xfs_bmap_finish+0x15d/0x1a0 [xfs] [<ffffffffa02a7d73>] xfs_itruncate_finish+0x183/0x390 [xfs] [<ffffffffa02c1622>] xfs_free_eofblocks+0x1d2/0x2a0 [xfs] [<ffffffffa02c1c50>] xfs_release+0x110/0x1d0 [xfs] [<ffffffffa02cb8c5>] xfs_file_release+0x15/0x20 [xfs] [<ffffffff811727b5>] __fput+0xf5/0x210 [<ffffffff811728f5>] fput+0x25/0x30 [<ffffffff8116de6d>] filp_close+0x5d/0x90 [<ffffffff8116df45>] sys_close+0xa5/0x100 [<ffffffff8100b172>] system_call_fastpath+0x16/0x1b Code: e5 41 57 41 56 41 55 41 54 53 48 81 ec a8 00 00 00 0f 1f 44 00 00 48 8b 47 18 4c 8b 7f 08 48 89 fb 41 89 f5 48 89 85 48 ff ff ff <80> 78 10 00 4c 8b 27 0f 84 a5 02 00 00 48 c7 45 c0 00 00 00 00 RIP [<ffffffffa027801e>] xfs_alloc_fix_freelist+0x2e/0x480 [xfs] RSP <ffff880473d31b58> CR2: 0000000000000010 (In reply to comment #0) <only addressing the call trace output in this response> > Not every bad image causes this kernel BUG, but they all do generate huge call > trace when testing with fstest, is this an issue? When testing on ext4 bad > image there is no such call trace. Isn't it obvious what the traces XFS is producing are? > Filesystem "loop0": corrupt dinode 12357, extent total = -1744830463, nblocks = > 11993113. Unmount and run xfs_repair. That is, you've corrupted the filesystem and XFS is detecting the corruption and reporting exactly how it came across it. This is what an enterprise filesystem is supposed to do when it encounters corruption. ;) If ext4 is not telling you everything it knows about the corruptions that are being detected (or even detecting them), then that is a problem with ext4, not XFS. (In reply to comment #2) > Call trace > > BUG: unable to handle kernel NULL pointer dereference at 0000000000000010 > IP: [<ffffffffa027801e>] xfs_alloc_fix_freelist+0x2e/0x480 [xfs] ..... > [<ffffffffa02b9278>] ? xfs_perag_get+0x48/0xc0 [xfs] > [<ffffffffa02784f3>] xfs_free_extent+0x83/0xc0 [xfs] > [<ffffffffa0281f0d>] xfs_bmap_finish+0x15d/0x1a0 [xfs] > [<ffffffffa02a7d73>] xfs_itruncate_finish+0x183/0x390 [xfs] > [<ffffffffa02c1622>] xfs_free_eofblocks+0x1d2/0x2a0 [xfs] > [<ffffffffa02c1c50>] xfs_release+0x110/0x1d0 [xfs] > [<ffffffffa02cb8c5>] xfs_file_release+0x15/0x20 [xfs] > [<ffffffff811727b5>] __fput+0xf5/0x210 > [<ffffffff811728f5>] fput+0x25/0x30 > [<ffffffff8116de6d>] filp_close+0x5d/0x90 > [<ffffffff8116df45>] sys_close+0xa5/0x100 > [<ffffffff8100b172>] system_call_fastpath+0x16/0x1b This is almost certainly a corrupted block number in an extent tree. On a debug kernel it would have fired an ASSERT because the AG number is greater than the number of AGs in the filesytem. Hence xfs_perag_get() returned a NULL pointer, and the first thing that xfs_alloc_fix_freelist() does is dereference pag->pagf_init which is at offset 0x10 into the perag structure. Indeed, one of the errors xfs_check outputs: inode 19353 bad block number 30601643628 [7471104,1644], offset 0 which translates as AG 74771104. Given this is a single AG filesystem, we have a winner. I'll put together a fix for upstream and test it, and go from there. BTW, when you fuzz a filesystem, you might want to fuzz a filesystem that you can actually run fsck on successfully. i.e. not just test the kernel code, but also exercise the check and repair code to see if it detects and corrects all the errors you introduced. On this filesystem image, xfs_repair tells you straight away that superblock is corrupt and that it cannot repair the fs because there is no secondary superblock: Phase 1 - find and verify superblock... bad primary superblock - inconsistent filesystem geometry in realtime filesystem component !!! attempting to find secondary superblock... .Sorry, could not find valid secondary superblock Exiting now. Basically, a single AG filesystem in XFS is a big no-no because it has no redundant superblocks to enable recovery. You should be testing fuzzing with a 32MB, 2AG filesystem at minimum. Cheers, Dave. Since RHEL 6.1 External Beta has begun, and this bug remains unresolved, it has been rejected as it is not proposed as exception or blocker. Red Hat invites you to ask your support representative to propose this request, if appropriate and relevant, in the next release of Red Hat Enterprise Linux. (In reply to comment #5) > > Basically, a single AG filesystem in XFS is a big no-no because it has no > redundant superblocks to enable recovery. You should be testing fuzzing with a > 32MB, 2AG filesystem at minimum. I'll update the script, thanks! > > Cheers, > > Dave. This is fixed in RHEL 6.2 by this commit:
commit be1c42e4920e32f877503a96c07a8a93c84ae53b
Author: Dave Chinner <dchinner>
Date: Tue Aug 9 22:55:16 2011 -0400
[fs] xfs: catch bad block numbers freeing extents.
Message-id: <1312930551-29224-327-git-send-email-dchinner>
Patchwork-id: 39900
O-Subject: [REHL6.2 PATCH 326/361] xfs: catch bad block numbers freeing extents.
Bugzilla: 695827
RH-Acked-by: Eric Sandeen <sandeen>
RH-Acked-by: Christoph Hellwig <chellwig>
RH BZ: 695827 - XFS update to 2.6.39 code base for delayed logging
Upstream Commit: be65b18a10e62321c5ba09a1dc0f70babeb0eba1
Upstream commit message:
A fuzzed filesystem crashed a kernel when freeing an extent with a
block number beyond the end of the filesystem. Convert all the debug
asserts in xfs_free_extent() to active checks so that we catch bad
extents and return that the filesytsem is corrupted rather than
crashing.
Signed-off-by: Dave Chinner <dchinner>
Reviewed-by: Christoph Hellwig <hch>
Reviewed-by: Alex Elder <aelder>
Signed-off-by: Kyle McMartin <kmcmarti>
What's the next state the BZ needs to go to so that QA can verify it is fixed?
Since RHEL 6.2 External Beta has begun, and this bug remains unresolved, it has been rejected as it is not proposed as exception or blocker. Red Hat invites you to ask your support representative to propose this request, if appropriate and relevant, in the next release of Red Hat Enterprise Linux. Mark it as a dup of the big update bug 695827. I've verified this bug has been fixed. *** This bug has been marked as a duplicate of bug 695827 *** |