Description of problem: 1. Made a 10G filesystem 2. Put a 3G (or so) file in it using dd 3. Created two 3G snapshots of that filesystem 4. Started dd'ing to another file in that filesystem 5. With the dd I/O running, attempted to remove both of the snapshots [that caused the machine to panic] ----------- [cut here ] --------- [please bite here ] --------- Kernel BUG at kcopyd:145 invalid operand: 0000 [1] CPU 0 Modules linked in: md5 ipv6 parport_pc lp parport autofs4 i2c_dev i2c_core sunrpc ds yenta_socket pcmcia_core button battery ac ohci_hcd hw_random tg3 floppy dm_snapshot dm_zero dm_mirror ext3 jbd dm_mod qla2300 qla2xxx scsi_transport_fc mptscsih mptsas mptspi mptfc mptscsi mptbase sd_mod scsi_mod Pid: 4410, comm: lvremove Not tainted 2.6.9-29.EL RIP: 0010:[<ffffffffa00ba779>] <ffffffffa00ba779>{:dm_mod:client_free_pages+12} RSP: 0000:000001001f4b1d18 EFLAGS: 00010283 RAX: 0000000000000100 RBX: 00000100062d3680 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 00000100062d3680 RBP: ffffff0000032080 R08: 000001003ffd2010 R09: 000001003ffef280 R10: 0000000000000046 R11: 00000100062d3680 R12: ffffffffa00c4e60 R13: 00000000c138fd04 R14: 00000000006aa940 R15: ffffffffa00b8714 FS: 0000002a955783e0(0000) GS:ffffffff80538980(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 00000000006bb92b CR3: 0000000000101000 CR4: 00000000000006e0 Process lvremove (pid: 4410, threadinfo 000001001f4b0000, task 00000100195e8570) Stack: 00000100062d3680 ffffffffa00bb9d5 000001002f0fe2c0 000001002f0fe2c0 0000000000000000 ffffffffa010d72c ffffffff80216690 ffffff0000032080 000001001d106000 ffffffffa00b67b5 Call Trace:<ffffffffa00bb9d5>{:dm_mod:kcopyd_client_destroy+30} <ffffffffa010d72c>{:dm_snapshot:snapshot_dtr+242} <ffffffff80216690>{kobject_release+0} <ffffffffa00b67b5>{:dm_mod:table_destroy+72} <ffffffffa00b7fde>{:dm_mod:__hash_remove+124} <ffffffffa00b87ad>{:dm_mod:dev_remove+153} <ffffffffa00b9a96>{:dm_mod:ctl_ioctl+599} <ffffffff801a37e2>{sys_ioctl+1006} <ffffffff80110b76>{tracesys+209} Code: 0f 0b 36 cc 0b a0 ff ff ff ff 91 00 48 8b 7b 40 e8 b3 ff ff RIP <ffffffffa00ba779>{:dm_mod:client_free_pages+12} RSP <000001001f4b1d18> <0>Kernel panic - not syncing: Oops Here is how far the two lvremove processes made it: [...] stat("/dev/mapper/VolGroup00-LogVol00", {st_mode=S_IFBLK|0660, st_rdev=makedev(253, 0), ...}) = 0 lstat("/dev", {st_mode=S_IFDIR|0755, st_size=5780, ...}) = 0 lstat("/dev", {st_mode=S_IFDIR|0755, st_size=5780, ...}) = 0 lstat("/dev/mapper", {st_mode=S_IFDIR|0755, st_size=220, ...}) = 0 lstat("/dev/VolGroup00", {st_mode=S_IFDIR|0700, st_size=80, ...}) = 0 lstat("/dev/mapper/VolGroup00-LogVol00", {st_mode=S_IFBLK|0660, st_rdev=makedev(253, 0), ...}) = 0 lstat("/dev/VolGroup00/LogVol00", {st_mode=S_IFLNK|0777, st_size=31, ...}) = 0 stat("/dev/dm-2", {st_mode=S_IFBLK|0640, st_rdev=makedev(253, 2), ...}) = 0 stat("/dev/hda1", {st_mode=S_IFBLK|0660, st_rdev=makedev(3, 1), ...}) = 0 stat("/dev/ram1", {st_mode=S_IFBLK|0660, st_rdev=makedev(1, 1), ...}) = 0 lstat("/dev", {st_mode=S_IFDIR|0755, st_size=5780, ...}) = 0 lstat("/dev", {st_mode=S_IFDIR|0755, st_size=5780, ...}) = 0 lstat("/dev/ram1", {st_mode=S_IFBLK|0660, st_rdev=makedev(1, 1), ...}) = 0 lstat("/dev/ram", {st_mode=S_IFLNK|0777, st_size=4, ...}) = 0 stat("/dev/sda1", {st_mode=S_IFBLK|0660, st_rdev=makedev(8, 1), ...}) = 0 stat("/dev/sdb1", {st_mode=S_IFBLK|0660, st_rdev=makedev(8, 17), ...}) = 0 stat("/dev/sdc1", {st_mode=S_IFBLK|0660, st_rdev=makedev(8, 33), ...}) = 0 stat("/dev/sdd1", {st_mode=S_IFBLK|0660, st_rdev=makedev(8, 49), ...}) = 0 stat("/dev/sde1", {st_mode=S_IFBLK|0660, st_rdev=makedev(8, 65), ...}) = 0 stat("/dev/sdf1", {st_mode=S_IFBLK|0660, st_rdev=makedev(8, 81), ...}) = 0 stat("/dev/ramdisk", {st_mode=S_IFBLK|0660, st_rdev=makedev(1, 0), ...}) = 0 stat("/dev/mapper/VolGroup00-LogVol01", {st_mode=S_IFBLK|0660, st_rdev=makedev(253, 1), ...}) = 0 lstat("/dev", {st_mode=S_IFDIR|0755, st_size=5780, ...}) = 0 lstat("/dev", {st_mode=S_IFDIR|0755, st_size=5780, ...}) = 0 lstat("/dev/mapper", {st_mode=S_IFDIR|0755, st_size=220, ...}) = 0 lstat("/dev/VolGroup00", {st_mode=S_IFDIR|0700, st_size=80, ...}) = 0 lstat("/dev/mapper/VolGroup00-LogVol01", {st_mode=S_IFBLK|0660, st_rdev=makedev(253, 1), ...}) = 0 lstat("/dev/VolGroup00/LogVol01", {st_mode=S_IFLNK|0777, st_size=31, ...}) = 0 stat("/dev/ram0", {st_mode=S_IFBLK|0660, st_rdev=makedev(1, 0), ...}) = 0 lstat("/dev", {st_mode=S_IFDIR|0755, st_size=5780, ...}) = 0 lstat("/dev", {st_mode=S_IFDIR|0755, st_size=5780, ...}) = 0 lstat("/dev/ram0", {st_mode=S_IFBLK|0660, st_rdev=makedev(1, 0), ...}) = 0 lstat("/dev/ramdisk", {st_mode=S_IFLNK|0777, st_size=4, ...}) = 0 stat("/dev/root", {st_mode=S_IFBLK|0660, st_rdev=makedev(253, 0), ...}) = 0 stat("/dev/dm-1", {st_mode=S_IFBLK|0640, st_rdev=makedev(253, 1), ...}) = 0 stat("/dev/hda2", {st_mode=S_IFBLK|0660, st_rdev=makedev(3, 2), ...}) = 0 stat("/dev/ram2", {st_mode=S_IFBLK|0660, st_rdev=makedev(1, 2), ...}) = 0 stat("/dev/ram8", {st_mode=S_IFBLK|0660, st_rdev=makedev(1, 8), ...}) = 0 stat("/dev/ram9", {st_mode=S_IFBLK|0660, st_rdev=makedev(1, 9), ...}) = 0 stat("/dev/dm-0", {st_mode=S_IFBLK|0640, st_rdev=makedev(253, 0), ...}) = 0 lstat("/dev", {st_mode=S_IFDIR|0755, st_size=5780, ...}) = 0 lstat("/dev", {st_mode=S_IFDIR|0755, st_size=5780, ...}) = 0 lstat("/dev/dm-0", {st_mode=S_IFBLK|0640, st_rdev=makedev(253, 0), ...}) = 0 lstat("/dev/root", {st_mode=S_IFLNK|0777, st_size=24, ...}) = 0 stat("/dev/ram15", {st_mode=S_IFBLK|0660, st_rdev=makedev(1, 15), ...}) = 0 stat("/dev/hda3", {st_mode=S_IFBLK|0660, st_rdev=makedev(3, 3), ...}) = 0 stat("/dev/ram3", {st_mode=S_IFBLK|0660, st_rdev=makedev(1, 3), ...}) = 0 stat("/dev/ram4", {st_mode=S_IFBLK|0660, st_rdev=makedev(1, 4), ...}) = 0 stat("/dev/ram7", {st_mode=S_IFBLK|0660, st_rdev=makedev(1, 7), ...}) = 0 stat("/dev/snapper/origin", {st_mode=S_IFBLK|0660, st_rdev=makedev(253, 2), ...}) = 0 getpid() = 4412 stat("/etc/lvm/lvm.conf", {st_mode=S_IFREG|0644, st_size=10437, ...}) = 0 stat("/var/lock/lvm", {st_mode=S_IFDIR|0700, st_size=4096, ...}) = 0 access("/var/lock/lvm", R_OK|W_OK|X_OK) = 0 stat("/proc/lvm/VGs/snapper", 0x7fbfffb4c0) = -1 ENOENT (No such file or directory) rt_sigprocmask(SIG_SETMASK, ~[RTMIN RT_1], [], 8) = 0 open("/var/lock/lvm/V_snapper", O_RDWR|O_APPEND|O_CREAT, 0777) = 3 rt_sigaction(SIGINT, {0x44fd60, [INT], SA_RESTORER|SA_RESTART, 0x336f82e380}, {SIG_DFL}, 8) = 0 rt_sigprocmask(SIG_SETMASK, ~[INT RTMIN RT_1], NULL, 8) = 0 rt_sigaction(SIGINT, NULL, {0x44fd60, [INT], SA_RESTORER|SA_RESTART, 0x336f82e380}, 8) = 0 rt_sigaction(SIGINT, {0x44fd60, [INT], SA_RESTORER, 0x336f82e380}, NULL, 8) = 0 flock(3, LOCK_EX [...] read(5, "snapper {\nid = \"Vm8vhZ-i5oP-dStF"..., 512) = 512 lseek(6, 0, SEEK_SET) = 0 read(6, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 2048) = 2048 lseek(6, 4096, SEEK_SET) = 4096 read(6, "\17A\0335 LVM2 x[5A%r0N*>\1\0\0\0\0\20\0\0\0\0\0\0"..., 512) = 512 lseek(6, 52736, SEEK_SET) = 52736 read(6, "snapper {\nid = \"Vm8vhZ-i5oP-dStF"..., 512) = 512 lseek(7, 0, SEEK_SET) = 0 read(7, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 2048) = 2048 lseek(7, 4096, SEEK_SET) = 4096 read(7, "\17A\0335 LVM2 x[5A%r0N*>\1\0\0\0\0\20\0\0\0\0\0\0"..., 512) = 512 lseek(7, 52736, SEEK_SET) = 52736 read(7, "snapper {\nid = \"Vm8vhZ-i5oP-dStF"..., 512) = 512 lseek(8, 0, SEEK_SET) = 0 read(8, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 2048) = 2048 lseek(8, 4096, SEEK_SET) = 4096 read(8, "\17A\0335 LVM2 x[5A%r0N*>\1\0\0\0\0\20\0\0\0\0\0\0"..., 2048) = 2048 lseek(8, 51200, SEEK_SET) = 51200 read(8, "OQ-t2W0-qx5w-ty2x-LX7NVg\"\nstatus"..., 2048) = 2048 lseek(9, 0, SEEK_SET) = 0 read(9, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 2048) = 2048 lseek(9, 4096, SEEK_SET) = 4096 read(9, "\17A\0335 LVM2 x[5A%r0N*>\1\0\0\0\0\20\0\0\0\0\0\0"..., 512) = 512 lseek(9, 52736, SEEK_SET) = 52736 read(9, "snapper {\nid = \"Vm8vhZ-i5oP-dStF"..., 512) = 512 lseek(9, 4096, SEEK_SET) = 4096 read(9, "\17A\0335 LVM2 x[5A%r0N*>\1\0\0\0\0\20\0\0\0\0\0\0"..., 512) = 512 lseek(9, 52736, SEEK_SET) = 52736 read(9, "snapper {\nid = \"Vm8vhZ-i5oP-dStF"..., 512) = 512 lseek(9, 52736, SEEK_SET) = 52736 read(9, "snapper {\nid = \"Vm8vhZ-i5oP-dStF"..., 2560) = 2560 lseek(4, 0, SEEK_SET) = 0 read(4, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 2048) = 2048 lseek(4, 4096, SEEK_SET) = 4096 read(4, "\17A\0335 LVM2 x[5A%r0N*>\1\0\0\0\0\20\0\0\0\0\0\0"..., 512) = 512 lseek(4, 52736, SEEK_SET) = 52736 read(4, "snapper {\nid = \"Vm8vhZ-i5oP-dStF"..., 512) = 512 lseek(5, 0, SEEK_SET) = 0 read(5, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 2048) = 2048 lseek(5, 4096, SEEK_SET) = 4096 read(5, "\17A\0335 LVM2 x[5A%r0N*>\1\0\0\0\0\20\0\0\0\0\0\0"..., 512) = 512 lseek(5, 52736, SEEK_SET) = 52736 read(5, "snapper {\nid = \"Vm8vhZ-i5oP-dStF"..., 512) = 512 lseek(6, 0, SEEK_SET) = 0 read(6, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 2048) = 2048 lseek(6, 4096, SEEK_SET) = 4096 read(6, "\17A\0335 LVM2 x[5A%r0N*>\1\0\0\0\0\20\0\0\0\0\0\0"..., 512) = 512 lseek(6, 52736, SEEK_SET) = 52736 read(6, "snapper {\nid = \"Vm8vhZ-i5oP-dStF"..., 512) = 512 lseek(7, 0, SEEK_SET) = 0 read(7, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 2048) = 2048 lseek(7, 4096, SEEK_SET) = 4096 read(7, "\17A\0335 LVM2 x[5A%r0N*>\1\0\0\0\0\20\0\0\0\0\0\0"..., 512) = 512 lseek(7, 52736, SEEK_SET) = 52736 read(7, "snapper {\nid = \"Vm8vhZ-i5oP-dStF"..., 512) = 512 lseek(8, 0, SEEK_SET) = 0 read(8, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 2048) = 2048 lseek(8, 4096, SEEK_SET) = 4096 read(8, "\17A\0335 LVM2 x[5A%r0N*>\1\0\0\0\0\20\0\0\0\0\0\0"..., 2048) = 2048 lseek(8, 51200, SEEK_SET) = 51200 read(8, "OQ-t2W0-qx5w-ty2x-LX7NVg\"\nstatus"..., 2048) = 2048 lseek(9, 0, SEEK_SET) = 0 read(9, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 2048) = 2048 lseek(9, 4096, SEEK_SET) = 4096 read(9, "\17A\0335 LVM2 x[5A%r0N*>\1\0\0\0\0\20\0\0\0\0\0\0"..., 512) = 512 lseek(9, 52736, SEEK_SET) = 52736 read(9, "snapper {\nid = \"Vm8vhZ-i5oP-dStF"..., 512) = 512 ioctl(10, DM_DEV_STATUS, 0x6aa940) = 0 mlockall(MCL_CURRENT|MCL_FUTURE) = 0 brk(0x12bb000) = 0x12bb000 brk(0xabb000) = 0xabb000 getpriority(PRIO_PROCESS, 0) = 20 setpriority(PRIO_PROCESS, 0, 4294967278 Version-Release number of selected component (if applicable): [root@link-02 ~]# lvremove --version LVM version: 2.02.01 (2005-11-23) Library version: 1.02.02 (2006-01-04) Driver version: 4.5.0 [root@link-02 ~]# rpm -q device-mapper device-mapper-1.02.02-3.0.RHEL4 [root@link-02 ~]# rpm -q lvm2 lvm2-2.02.01-1.3.RHEL4 [root@link-02 ~]# uname -ar Linux link-02 2.6.9-29.EL #1 Wed Jan 25 11:18:11 EST 2006 x86_64 x86_64 x86_64 GNU/Linux It was actually link-08 that paniced, but link-02 has that same installed rpm versions.
Just a note that with mulitple snapshots, this is very reproducable.
I can reproduce this without the filesystem involved, (unless this is a different bug). I was running b_doio/b_iogen to an origin volume (no filesystems involved at all). And while that I/O was going on, I created two snapshots and then successfully deleted one, and then attempted to delete the other, which caused this. slab error <1>Unable to handle kernel NULL pointer dereference at 0000000000000030 RIP: <ffffffff8016a618>{mempool_free+11} PML4 38196067 PGD 38b32067 PMD 0 Oops: 0000 [1] CPU 0 Modules linked in: md5 ipv6 parport_pc lp parport autofs4 i2c_dev i2c_core sunrpc ds yenta_socket pcmcia_core button battery ac ohci_hcd hw_random tg3 floppy dm_snapshot dm_zero dm_mirror ext3 jbd dm_mod qla2300 qla2xxx scsi_transport_fc mptscsih mptsas mptspi mptfc mptscsi mptbase sd_mod scsi_mod Pid: 4155, comm: lvremove Not tainted 2.6.9-29.EL RIP: 0010:[<ffffffff8016a618>] <ffffffff8016a618>{mempool_free+11} RSP: 0000:ffffffff804a4258 EFLAGS: 00010246 RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000002000 RDX: 0000000000000000 RSI: 0000000000000000 RDI: 00000100357d9580 RBP: 0000000000000000 R08: 0000000000000000 R09: 00000100357d9580 R10: 0000000000000246 R11: ffffffffa00b9f2c R12: 00000100362381d0 R13: ffffffffa00ba87c R14: 0000000000000000 R15: 0000000000080000 FS: 0000002a955783e0(0000) GS:ffffffff80538980(0000) knlGS:00000000f7fdd6c0 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000000000000030 CR3: 0000000000101000 CR4: 00000000000006e0 Process lvremove (pid: 4155, threadinfo 00000100383dc000, task 00000100389280f0) Stack: 0000000000000246 0000000000000246 0000000000000000 ffffffffa00b9f15 0000010035fa3d80 0000010038ffe268 00000100390c4a00 ffffffffa00b9fcd 00000100352598b8 ffffffffa00b5203 Call Trace:<IRQ> <ffffffffa00b9f15>{:dm_mod:dec_count+67} <ffffffffa00b9fcd>{:dm_mod:endio+161} <ffffffffa00b5203>{:dm_mod:clone_endio+205} <ffffffff8028785c>{__end_that_request_first+238} <ffffffffa0008914>{:scsi_mod:scsi_end_request+44} <ffffffffa0008cf8>{:scsi_mod:scsi_io_completion+474} <ffffffffa000330c>{:scsi_mod:scsi_softirq+219} <ffffffff8013faec>{__do_softirq+76} <ffffffff8013fb73>{do_softirq+49} <ffffffff80113c57>{do_IRQ+664} <ffffffff80110fdb>{ret_from_intr+0} <EOI> <ffffffff80196180>{bdev_destroy_inode+0} <ffffffff8027bac7>{serial_in+83} <ffffffff8027e702>{serial8250_console_write+113} <ffffffffa00b8714>{:dm_mod:dev_remove+0} <ffffffff80139e1f>{__call_console_drivers+68} <ffffffff8013a166>{release_console_sem+495} <ffffffff8013a5e6>{vprintk+873} <ffffffff8013a709>{printk+141} <ffffffff8016ff30>{kmem_freepages+192} <ffffffff80196180>{bdev_destroy_inode+0} <ffffffff8016ff30>{kmem_freepages+192} <ffffffff80196180>{bdev_destroy_inode+0} <ffffffff80170cb2>{slab_destroy+115} <ffffffff80171314>{__cache_shrink+519} <ffffffff8017155a>{kmem_cache_destroy+191} <ffffffffa00b9b1a>{:dm_mod:bio_set_exit+34} <ffffffffa00b9ca8>{:dm_mod:resize_pool+61} <ffffffffa00b8714>{:dm_mod:dev_remove+0} <ffffffffa00bb9cd>{:dm_mod:kcopyd_client_destroy+22} <ffffffffa010d72c>{:dm_snapshot:snapshot_dtr+242} <ffffffff80216690>{kobject_release+0} <ffffffffa00b67b5>{:dm_mod:table_destroy+72} <ffffffffa00b7fde>{:dm_mod:__hash_remove+124} <ffffffffa00b87ad>{:dm_mod:dev_remove+153} <ffffffffa00b9a96>{:dm_mod:ctl_ioctl+599} <ffffffff801a37e2>{sys_ioctl+1006} <ffffffff80110a12>{system_call+126} Code: 8b 46 30 39 46 34 0f 8d 3c 01 00 00 9c 8f 44 24 08 fa 48 81 RIP <ffffffff8016a618>{mempool_free+11} RSP <ffffffff804a4258> CR2: 0000000000000030 <0>Kernel panic - not syncing: Oops
What's happening here is that the snapshot device gets removed while kcopyd is still using it. Possible fixes: Maintain a counter of the number of outstanding bios referencing the snapshot structure and wait for it to drop to 0 before deletion; [Disadvantage - still have to wait for the backlog of now-worthless I/O to be generated and completed. Advantage - fairly quick and straightforward to code.] Provide a mechanism to cancel outstanding kcopyd jobs referencing a device. [Disadvantage - involves more work as the jobs that need cancelling are not indexed anywhere yet. Advantage - should avoid the long delays you can get when running 'lvremove', as I/O can get discarded.]
This does appear to be fixed however with bz182935 appearing, I'm worried about closing this bug just yet.
I doubt they're related. What I think is happening in general is that now that snapshots are usable at last, you're able to stress them in ways that you couldn't before, and this is finding new issues that used to be masked by the other problems.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2006-0132.html