Bug 153925
Summary: | Kernel panic when attempting to backup snapshot volume | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 4 | Reporter: | Stephen N. Stremmel <stremmel> |
Component: | kernel | Assignee: | LVM and device-mapper development team <lvm-team> |
Status: | CLOSED CURRENTRELEASE | QA Contact: | |
Severity: | high | Docs Contact: | |
Priority: | medium | ||
Version: | 4.0 | CC: | agk, davej, dwysocha, ksorensen, mbroz, rh-admins |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | i686 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2011-05-03 09:13:09 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Stephen N. Stremmel
2005-04-05 20:45:20 UTC
I can confirm this issue on the latest RHELv4 AS kernel: Linux version 2.6.9-22.0.2.ELsmp (bhcompile.redhat.com) (gcc version 3.4.5 20051201 (Red Hat 3.4.5-2)) #1 SMP Thu Jan 5 17:13:01 EST 2006 Using cpio to backup LVM2 snapshots causes an immediate kernel panic. It is always reproducable. I have a complete netdump of a crash for analysis if it's needed. This really looks like an lvm issue, not a SCSI issue. Reassigning. Any chance that this will be fixed soon? I have a server that crashes once a week because of this issue, or is just this another case of "The money you pay for your RedHat subscriptions does not imply that anyone at RedHat will lift a finger in order to fix any issue."? Seriously, this bug has been open for 4 and a half years now. If you have such serious problem, please fill ticket in Red Hat support http://www.redhat.com/support and escalate the problem through official support channel. If you can crash kernel from RHEL 4.8 update, please post kernel panic bactrace here (from recent kernel, there were too many fixes so old post is no longer usable) but I think these problems were already fixed in updates (e.g. some problem with bouncing pages were fixed in 2007 in http://rhn.redhat.com/errata/RHBA-2007-0791.html - bug 156385, but for some reson the bug is private). The crashes happens with SMP-kernel 2.6.9-78.0.22. Switching to UP kernel does not seem to help. I've just changed to the latest SMP-kernel in the hope that this will stop the crashes. I can't provide you with a full stack strace for now, as I do not have a serial console on the server in question (at least not yet.) The strange thing about this issue is that it seems to appear more frequent. At first it seemed to appear once every 2 or 3 months, but now it seems to be roughly once a week. The only explanation for this behaviour could that the snapshot device contains more files now than it did when the crashes were less frequent. I can't get a capture of the crash on 2.6.9-89.0.11 since there seems to be a bug in the e1000 driver which causes a different kernel panic up to twice a day, so that kernel is not an option for my production server. Wasn't RHEL supposed to be at least somewhat stable? I now successfully managed to get a serial console connection up and running, so I should be able to provide you with a crash dump relating to this issue within a week or so. OK. Here comes the backtrace: Unable to handle kernel NULL pointer dereference at virtual address 00000000 printing eip: 00000000 *pde = 2f2be001 Oops: 0000 [#1] SMP Modules linked in: md5 ipv6 w83627hf eeprom i2c_sensor i2c_isa i2c_i801 i2c_dev i2c_core nfs lockd nfs_acl sunrpc cpufreq_powersave button battery ac uhci_hcd hw_random e100 mii e1000 floppy dm_snapshot dm_zero dm_mirror ext3 jbd raid1 dm_mod ata_piix libata sd_mod scsi_mod CPU: 1 EIP: 0060:[<00000000>] Not tainted VLI EFLAGS: 00010082 (2.6.9-78.0.22.ELsmp) EIP is at 0x0 eax: 00000001 ebx: db482f0c ecx: c3136de0 edx: 00000000 esi: d7e3dee4 edi: d1a6ca80 ebp: c0120572 esp: d7e3def0 ds: 007b es: 007b ss: 0068 Process gzip (pid: 18200, threadinfo=d7e3d000 task=f4542bb0) Stack: db482f0c 00000001 c011e845 00000000 00000000 d1a6ca88 00000001 00000001 d1a6ca80 00000001 d7e3df3c c011e8ea 00000001 00000000 00000202 00000001 d1a6ca80 d7e3df80 080a25c0 00001000 c016757a 00000000 00000000 ecf2a000 Call Trace: [<c011e845>] __wake_up_common+0x36/0x51 [<c011e8ea>] __wake_up_sync+0x3b/0x56 [<c016757a>] pipe_readv+0x200/0x29e [<c0167634>] pipe_read+0x1c/0x20 [<c015c942>] vfs_read+0xb6/0xe2 [<c015cb57>] sys_read+0x3c/0x62 [<c02e0a2f>] syscall_call+0x7/0xb [<c02e007b>] __lock_text_end+0x820/0x1071 Code: Bad EIP value. <0>Fatal exception: panic in 5 seconds Kernel panic - not syncing: Fatal exception And today I had another crash: Red Hat Enterprise Linux ES release 4 (Nahant Update 5) Kernel 2.6.9-78.0.22.ELsmp on an i686 indus.nordija.com login: Unable to handle kernel paging request at virtual address fffff010 printing eip: c014a018 *pde = 00200074 Oops: 0000 [#1] SMP Modules linked in: md5 ipv6 w83627hf eeprom i2c_sensor i2c_isa i2c_i801 i2c_dev i2c_core nfs lockd nfs_acl sunrpc cpufreq_powersave button battery ac uhci_hcd hw_random e100 mii e1000 floppy dm_snapshot dm_zero dm_mirror ext3 jbd raid1 dm_mod ata_piix libata sd_mod scsi_mod CPU: 0 EIP: 0060:[<c014a018>] Not tainted VLI EFLAGS: 00010286 (2.6.9-78.0.22.ELsmp) EIP is at lru_add_drain+0xd/0x77 eax: e1f08080 ebx: fffff000 ecx: eb65d3e4 edx: c03d5b80 esi: e1f08080 edi: da688b74 ebp: b7f12000 esp: f2dedf68 ds: 007b es: 007b ss: 0068 Process tar (pid: 31926, threadinfo=f2ded000 task=ebb58eb0) Stack: c03d3260 c0152147 eb65d3e4 e1f08080 00000000 e1f080c4 da688b74 b7f12000 e1f08080 c015243a b7f12000 b7f13000 b7f13000 b7f13000 eb65d3e4 e1f08080 e1f080b0 00000000 f2ded000 c01524aa b7f12000 09562220 c02e0a2f b7f12000 Call Trace: [<c0152147>] unmap_region+0x24/0xef [<c015243a>] do_munmap+0xf8/0x116 [<c01524aa>] sys_munmap+0x52/0x6a [<c02e0a2f>] syscall_call+0x7/0xb [<c02e007b>] __lock_text_end+0x820/0x1071 Code: 53 0c f0 ff 42 04 8b 01 89 5c 81 08 40 83 f8 0e 89 01 75 08 5b 89 c8 e9 9c 03 00 00 5b c3 53 bb 00 f0 ff ff ba 80 5b 3d c0 21 e3 <8b> 43 10 03 14 85 20 f1 3d c0 83 3a 00 74 07 89 d0 e8 c3 02 00 <0>Fatal exception: panic in 5 seconds Kernel panic - not syncing: Fatal exception This always happens at night when the snapshot LV is mounted Hm, seems this bug reporten in 2005 got lost in queue for long time, sorry for that. The comment #9 is probably unrelated crash. Anyway, there were several DM snapshot fixes in RHEL4 kernel, I think it should be fixed now. If you still see the problem, please better report new bug or support ticket, thanks. |