From Bugzilla Helper: User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.6) Gecko/20050317 Firefox/1.0.2 Description of problem: When using the snapshot feature, the kernel panics when any backup of the mounted snapshot is attempted. Version-Release number of selected component (if applicable): lvm2-2.00.31-1.0.RHEL4 & kernel-smp-2.6.9-5.0.3.EL How reproducible: Always Steps to Reproduce: 1. [root@flamingo ~]# lvcreate --size 500m --snapshot --name bksnap /dev/VolGroup00/LogVol02 Rounding up size to full physical extent 512.00 MB Logical volume "bksnap" created (/var/log/messages entry: Apr 5 11:41:51 flamingo kernel: kjournald starting. Commit interval 5 seconds Apr 5 11:41:51 flamingo kernel: EXT3 FS on dm-5, internal journal Apr 5 11:41:51 flamingo kernel: EXT3-fs: mounted filesystem with ordered data mode. Apr 5 11:41:51 flamingo kernel: SELinux: initialized (dev dm-5, type ext3), uses xattr ) 2. [root@flamingo ~]# lvscan ACTIVE '/dev/VolGroup00/LogVol00' [14.00 GB] inherit ACTIVE Original '/dev/VolGroup00/LogVol02' [9.75 GB] inherit ACTIVE '/dev/VolGroup00/LogVol01' [1.94 GB] inherit ACTIVE Snapshot '/dev/VolGroup00/bksnap' [512.00 MB] inherit 3. [root@flamingo ~]# mount /dev/VolGroup00/bksnap /mnt 4. [root@flamingo ~]# mount /dev/mapper/VolGroup00-LogVol00 on / type ext3 (rw) none on /proc type proc (rw) none on /sys type sysfs (rw) none on /dev/pts type devpts (rw,gid=5,mode=620) usbfs on /proc/bus/usb type usbfs (rw) /dev/sda1 on /boot type ext3 (rw) none on /dev/shm type tmpfs (rw) /dev/mapper/VolGroup00-LogVol02 on /var type ext3 (rw) none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw) sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw) /dev/mapper/VolGroup00-bksnap on /mnt type ext3 (rw) 5. [root@flamingo ~]# rsync -av --stats /mnt /rsync-depot/test Unable to handle kernel NULL pointer dereference at virtual address 00000000 printing eip: c0146691 *pde = 203cc001 Oops: 0000 [#1] SMP Modules linked in: nfs lockd md5 ipv6 parport_pc lp parport autofs4 i2c_dev i2c_core sunrpc ipt_REJECT ipt_state ip_conntrack iptable_filter ip_tables button battery ac uhci_hcd e1000 e100 mii dm_snapshot dm_zero dm_mirror ext3 jbd dm_mod 3w_xxxx sd_mod scsi_mod CPU: 0 EIP: 0060:[<c0146691>] Not tainted VLI EFLAGS: 00010282 (2.6.9-5.0.3.ELsmp) EIP is at page_address+0x6/0x6e eax: 00000000 ebx: 00000000 ecx: dfdf0700 edx: dfdf0680 esi: dfdf0700 edi: 00000000 ebp: 00000000 esp: c03b8ef8 ds: 007b es: 007b ss: 0068 Process swapper (pid: 0, threadinfo=c03b8000 task=c0312a60) Stack: f7fb3c80 dfdf0700 00000000 00000000 c0146315 dfdf0680 dfdf0680 f7c09b00 c0146411 00000000 c0146406 dfdf0680 00002000 c0146427 c015a43e 00002000 dfdf0680 00000000 c03b8f68 c0219298 f7d6d5ec 00000000 00000000 00001000 Call Trace: [<c0146315>] copy_to_high_bio_irq+0x2b/0x4c [<c0146411>] bounce_end_io_read+0x0/0x1b [<c0146406>] __bounce_end_io_read+0x19/0x24 [<c0146427>] bounce_end_io_read+0x16/0x1b [<c015a43e>] bio_endio+0x50/0x55 [<c0219298>] __end_that_request_first+0xea/0x1ab [<f8843620>] scsi_end_request+0x1b/0xa0 [scsi_mod] [<f88439e3>] scsi_io_completion+0x20b/0x417 [scsi_mod] [<f883fad6>] scsi_finish_command+0xad/0xb1 [scsi_mod] [<f883f9fb>] scsi_softirq+0xb6/0xbe [scsi_mod] [<c0124b2c>] __do_softirq+0x4c/0xb1 [<c0107f39>] do_softirq+0x4f/0x56 ======================= [<c010784f>] do_IRQ+0x125/0x130 [<c02c6a68>] common_interrupt+0x18/0x20 [<c0104018>] default_idle+0x0/0x2c [<c0104041>] default_idle+0x29/0x2c [<c010409d>] cpu_idle+0x26/0x3b [<c0382784>] start_kernel+0x194/0x198 Code: 08 0f 0b da 01 55 70 2d c0 89 d8 5b e9 d0 fd ff ff 5b c3 69 c0 01 00 37 9e c1 e8 19 c1 e0 07 05 00 65 42 c0 c3 55 57 56 53 89 c3 <8b> 00 f6 c4 01 75 19 2b 1d 10 c5 42 c0 c1 fb 05 c1 e3 0c 8d 83 <0>Kernel panic - not syncing: Fatal exception in interrupt Actual Results: Kernel panic Additional info: This is a consistant problem. Everything works well until you try to access the snapshot to back it up. I tried using 'tar -cvf /rsync-depot/test.tar /mnt (/rsync-depot is an NFS mount), as well as 'tar -cvf /tmp/test.tar /mnt (a local FS). I booted into the installation kernel 2.6.9-5 and tried it with the same results. I tried adding the '--permission r' to the 'lvcreate' command line, and using the '-r' flag when I mounted the snapshot volume. Same results. We are trying to use the snapshot feature for our node farm backups. I saw references to this bug regarding the 2.4.X kernel, but it was supposed to be resolved in RHEL4, or so it seemed.
I can confirm this issue on the latest RHELv4 AS kernel: Linux version 2.6.9-22.0.2.ELsmp (bhcompile.redhat.com) (gcc version 3.4.5 20051201 (Red Hat 3.4.5-2)) #1 SMP Thu Jan 5 17:13:01 EST 2006 Using cpio to backup LVM2 snapshots causes an immediate kernel panic. It is always reproducable. I have a complete netdump of a crash for analysis if it's needed.
This really looks like an lvm issue, not a SCSI issue. Reassigning.
Any chance that this will be fixed soon? I have a server that crashes once a week because of this issue, or is just this another case of "The money you pay for your RedHat subscriptions does not imply that anyone at RedHat will lift a finger in order to fix any issue."? Seriously, this bug has been open for 4 and a half years now.
If you have such serious problem, please fill ticket in Red Hat support http://www.redhat.com/support and escalate the problem through official support channel. If you can crash kernel from RHEL 4.8 update, please post kernel panic bactrace here (from recent kernel, there were too many fixes so old post is no longer usable) but I think these problems were already fixed in updates (e.g. some problem with bouncing pages were fixed in 2007 in http://rhn.redhat.com/errata/RHBA-2007-0791.html - bug 156385, but for some reson the bug is private).
The crashes happens with SMP-kernel 2.6.9-78.0.22. Switching to UP kernel does not seem to help. I've just changed to the latest SMP-kernel in the hope that this will stop the crashes. I can't provide you with a full stack strace for now, as I do not have a serial console on the server in question (at least not yet.) The strange thing about this issue is that it seems to appear more frequent. At first it seemed to appear once every 2 or 3 months, but now it seems to be roughly once a week. The only explanation for this behaviour could that the snapshot device contains more files now than it did when the crashes were less frequent.
I can't get a capture of the crash on 2.6.9-89.0.11 since there seems to be a bug in the e1000 driver which causes a different kernel panic up to twice a day, so that kernel is not an option for my production server. Wasn't RHEL supposed to be at least somewhat stable? I now successfully managed to get a serial console connection up and running, so I should be able to provide you with a crash dump relating to this issue within a week or so.
OK. Here comes the backtrace: Unable to handle kernel NULL pointer dereference at virtual address 00000000 printing eip: 00000000 *pde = 2f2be001 Oops: 0000 [#1] SMP Modules linked in: md5 ipv6 w83627hf eeprom i2c_sensor i2c_isa i2c_i801 i2c_dev i2c_core nfs lockd nfs_acl sunrpc cpufreq_powersave button battery ac uhci_hcd hw_random e100 mii e1000 floppy dm_snapshot dm_zero dm_mirror ext3 jbd raid1 dm_mod ata_piix libata sd_mod scsi_mod CPU: 1 EIP: 0060:[<00000000>] Not tainted VLI EFLAGS: 00010082 (2.6.9-78.0.22.ELsmp) EIP is at 0x0 eax: 00000001 ebx: db482f0c ecx: c3136de0 edx: 00000000 esi: d7e3dee4 edi: d1a6ca80 ebp: c0120572 esp: d7e3def0 ds: 007b es: 007b ss: 0068 Process gzip (pid: 18200, threadinfo=d7e3d000 task=f4542bb0) Stack: db482f0c 00000001 c011e845 00000000 00000000 d1a6ca88 00000001 00000001 d1a6ca80 00000001 d7e3df3c c011e8ea 00000001 00000000 00000202 00000001 d1a6ca80 d7e3df80 080a25c0 00001000 c016757a 00000000 00000000 ecf2a000 Call Trace: [<c011e845>] __wake_up_common+0x36/0x51 [<c011e8ea>] __wake_up_sync+0x3b/0x56 [<c016757a>] pipe_readv+0x200/0x29e [<c0167634>] pipe_read+0x1c/0x20 [<c015c942>] vfs_read+0xb6/0xe2 [<c015cb57>] sys_read+0x3c/0x62 [<c02e0a2f>] syscall_call+0x7/0xb [<c02e007b>] __lock_text_end+0x820/0x1071 Code: Bad EIP value. <0>Fatal exception: panic in 5 seconds Kernel panic - not syncing: Fatal exception
And today I had another crash: Red Hat Enterprise Linux ES release 4 (Nahant Update 5) Kernel 2.6.9-78.0.22.ELsmp on an i686 indus.nordija.com login: Unable to handle kernel paging request at virtual address fffff010 printing eip: c014a018 *pde = 00200074 Oops: 0000 [#1] SMP Modules linked in: md5 ipv6 w83627hf eeprom i2c_sensor i2c_isa i2c_i801 i2c_dev i2c_core nfs lockd nfs_acl sunrpc cpufreq_powersave button battery ac uhci_hcd hw_random e100 mii e1000 floppy dm_snapshot dm_zero dm_mirror ext3 jbd raid1 dm_mod ata_piix libata sd_mod scsi_mod CPU: 0 EIP: 0060:[<c014a018>] Not tainted VLI EFLAGS: 00010286 (2.6.9-78.0.22.ELsmp) EIP is at lru_add_drain+0xd/0x77 eax: e1f08080 ebx: fffff000 ecx: eb65d3e4 edx: c03d5b80 esi: e1f08080 edi: da688b74 ebp: b7f12000 esp: f2dedf68 ds: 007b es: 007b ss: 0068 Process tar (pid: 31926, threadinfo=f2ded000 task=ebb58eb0) Stack: c03d3260 c0152147 eb65d3e4 e1f08080 00000000 e1f080c4 da688b74 b7f12000 e1f08080 c015243a b7f12000 b7f13000 b7f13000 b7f13000 eb65d3e4 e1f08080 e1f080b0 00000000 f2ded000 c01524aa b7f12000 09562220 c02e0a2f b7f12000 Call Trace: [<c0152147>] unmap_region+0x24/0xef [<c015243a>] do_munmap+0xf8/0x116 [<c01524aa>] sys_munmap+0x52/0x6a [<c02e0a2f>] syscall_call+0x7/0xb [<c02e007b>] __lock_text_end+0x820/0x1071 Code: 53 0c f0 ff 42 04 8b 01 89 5c 81 08 40 83 f8 0e 89 01 75 08 5b 89 c8 e9 9c 03 00 00 5b c3 53 bb 00 f0 ff ff ba 80 5b 3d c0 21 e3 <8b> 43 10 03 14 85 20 f1 3d c0 83 3a 00 74 07 89 d0 e8 c3 02 00 <0>Fatal exception: panic in 5 seconds Kernel panic - not syncing: Fatal exception This always happens at night when the snapshot LV is mounted
Hm, seems this bug reporten in 2005 got lost in queue for long time, sorry for that. The comment #9 is probably unrelated crash. Anyway, there were several DM snapshot fixes in RHEL4 kernel, I think it should be fixed now. If you still see the problem, please better report new bug or support ticket, thanks.