Description of problem: I had just finished up the LVM I/O on the x86_64 cluster (link-01, link-02, link-08) and was tearing down lvm volumes inorder to make new ones for file system testing. An lvremove attempt caused all my nodes to panic: Unable to handle kernel paging request at 0000000030345f4e RIP: <ffffffff801dced5>{rb_first+10} PML4 1d829067 PGD 1f6e1067 PMD 0 Oops: 0000 [1] SMP CPU 0 Modules linked in: gnbd(U) lock_nolock(U) gfs(U) lock_dlm(U) dlm(U) cman(U) lock_harness(U) md5 ipv6 parport_pc lp parport autofs4 i2c_dev i2c_core sunrpc ds yenta_socket pcmcia_core buttonbattery ac ohci_hcd hw_random tg3 floppy dm_snapshot dm_zero dm_mirror ext3 jbd dm_mod qla2300qla2xxx scsi_transport_fc mptscsih mptbase sd_modscsi_mod Pid: 14792, comm: clvmd Not tainted 2.6.9-11.ELsmp RIP: 0010:[<ffffffff801dced5>] <ffffffff801dced5>{rb_first+10} RSP: 0018:000001001e743ea0 EFLAGS: 00010206 RAX: 0000000030345f36 RBX: 000001001fdbb6a8 RCX: 0000010037e49c00 RDX: 0000000000000000 RSI: 000000000000006c RDI: 000001001fdbb6a0 RBP: 000001003d64c000 R08: 0000000000000025 R09: 0000000000000000 R10: 0000000000000000 R11: ffffffff80170638 R12: 000001001fdbb6a0 R13: 000000000069b4f7 R14: 000001001fdbb760 R15: 00000000006782b0 FS: 0000000041401960(005b) GS:ffffffff804c1700(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 0000000030345f4e CR3: 0000000000101000 CR4: 00000000000006e0 Process clvmd (pid: 14792, threadinfo 000001001e742000, task 0000010037d5c7f0) Stack: ffffffff8016da67 000001001fdbb678 000001003d64c000 000001003a608408 ffffffff80170649 0000000000000000 ffffffff80181672 000001003a00d6d8 000001003ffec200 00000010010889cc Call Trace:<ffffffff8016da67>{mpol_free_shared_policy+53} <ffffffff80170649>{shmem_destroy_inode+17} <ffffffff80181672>{sys_unlink+261} <ffffffff8011003e>{system_call+126} Code: 48 83 78 18 00 74 06 48 8b 40 18 eb f3 48 89 c2 48 89 d0 c3 RIP <ffffffff801dced5>{rb_first+10} RSP <000001001e743ea0> CR2: 0000000030345f4e <0>Kernel panic - not syncing: Oops Version-Release number of selected component (if applicable): [root@link-01 ~]# rpm -qa | grep lvm2 lvm2-2.01.08-1.0.RHEL4 lvm2-cluster-2.01.09-3.1.RHEL4 How reproducible: Still trying
reproduced again with exact same above senario.
This is caused by a force remove of an active lv. [root@link-02 ~]# lvscan ACTIVE '/dev/stripe_8_4096_4/stripe_8_4096_40' [924.00 GB] anywhere lvremove -f /dev/stripe_8_4096_4/stripe_8_4096_40 strace: [...] stat("/dev/sdf1", {st_mode=S_IFBLK|0660, st_rdev=makedev(8, 81), ...}) = 0 stat("/dev/sdf1", {st_mode=S_IFBLK|0660, st_rdev=makedev(8, 81), ...}) = 0 open("/dev/sdf1", O_RDWR|O_DIRECT|0x40000) = 5 fstat(5, {st_mode=S_IFBLK|0660, st_rdev=makedev(8, 81), ...}) = 0 ioctl(5, BLKBSZGET, 0x67f9a0) = 0 lseek(5, 2048, SEEK_SET) = 2048 read(5, "_\332\24\f LVM2 x[5A%r0N*>\1\0\0\0\0\10\0\0\0\0\0\0"..., 512) = 512 lseek(5, 4096, SEEK_SET) = 4096 read(5, "stripe_8_4096_4 {\nid = \"ADcb5J-K"..., 512) = 512 close(5) = 0 lseek(4, 0, SEEK_SET) = 0 read(4, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 2048) = 2048 lseek(4, 2048, SEEK_SET) = 2048 read(4, "_\332\24\f LVM2 x[5A%r0N*>\1\0\0\0\0\10\0\0\0\0\0\0"..., 512) = 512 lseek(4, 4096, SEEK_SET) = 4096 read(4, "stripe_8_4096_4 {\nid = \"ADcb5J-K"..., 512) = 512 close(4) = 0 brk(0x6db000) = 0x6db000 open("/proc/devices", O_RDONLY) = 4 fstat(4, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2a97c3b000 read(4, "Character devices:\n 1 mem\n 4 /"..., 1024) = 445 close(4) = 0 munmap(0x2a97c3b000, 4096) = 0 open("/proc/misc", O_RDONLY) = 4 fstat(4, {st_mode=S_IFREG|0444, st_size=0, ...}) = 0 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2a97c3b000 read(4, " 60 dlm_clvmd\n 61 gnbd_ctl\n 62 d"..., 1024) = 94 close(4) = 0 munmap(0x2a97c3b000, 4096) stat("/dev/mapper/control", {st_mode=S_IFCHR|0600, st_rdev=makedev(10, 63), ...}) = 0 open("/dev/mapper/control", O_RDWR) = 4 ioctl(4, DM_VERSION, 0x6ba260) = 0 ioctl(4, DM_DEV_STATUS, 0x6a41c0) = 0 brk(0x6d3000) = 0x6d3000 uname({sys="Linux", node="link-01", ...}) = 0 open("/etc/lvm/archive/.lvm_link-01_9269_145392622", O_WRONLY|O_APPEND|O_CREAT|O_EXCL, 0666) =5 fcntl(5, F_SETLK, {type=F_WRLCK, whence=SEEK_SET, start=0, len=0}) = 0 fcntl(5, F_GETFL) = 0x8401 (flags O_WRONLY|O_APPEND|O_LARGEFILE|0x8000) fstat(5, {st_mode=S_IFREG|0600, st_size=0, ...}) = 0 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x2a97c3b000 lseek(5, 0, SEEK_CUR) = 0 uname({sys="Linux", node="link-01", ...}) = 0 write(5, "# Generated by LVM2: Wed Jun 8 "..., 2292) = 2292 close(5) = 0 munmap(0x2a97c3b000, 4096) = 0 open("/etc/lvm/archive", O_RDONLY|O_NONBLOCK|O_DIRECTORY) = 5 fstat(5, {st_mode=S_IFDIR|0700, st_size=4096, ...}) = 0 fcntl(5, F_SETFD, FD_CLOEXEC) = 0 getdents64(5, /* 88 entries */, 4096) = 3472 getdents64(5, /* 0 entries */, 4096) = 0 close(5) = 0 link("/etc/lvm/archive/.lvm_link-01_9269_145392622", "/etc/lvm/archive/stripe_8_4096_4_00008.vg") = 0 stat("/etc/lvm/archive/.lvm_link-01_9269_145392622", {st_mode=S_IFREG|0600, st_size=2292, ...}) = 0 unlink("/etc/lvm/archive/.lvm_link-01_9269_145392622") = 0 write(3, "2\0\377\277\0\0\0\0\0\0\0\0C\0\0\0\0\30\0ADcb5JKgkAFga"..., 85) = 85 read(3,
Don't we get useful tracebacks on X86_64? oh dear. If it is caused by removing a volume then it could be a device-mapper bug. Does it happen on a non-clustered system?
*** This bug has been marked as a duplicate of 158956 ***