Description of problem: While trying to do LVM2 cluster testing I ran into a problem where the systems are panicking and dropping into the system monitor. I was able to get backtraces in a few instances and found that the common function was device_write. Version-Release number of selected component (if applicable): kernel-2.6.18-53.el5 cman-2.0.73-1.el5 lvm2-cluster-2.02.26-1.el5 lvm2-2.02.26-3.el5 How reproducible: Easily without LVM Steps to Reproduce: 1. On multiple nodes in the cluster use dlm_tool to join and leave a lockspace. Actual results: Unable to handle kernel paging request for data at address 0x28002482100922a0 Faulting instruction address: 0xc00000000035d820 cpu 0x0: Vector: 300 (Data Access) at [c00000006f777510] pc: c00000000035d820: ._spin_lock+0x20/0x88 lr: d000000000a74f08: .dlm_user_add_ast+0xec/0x330 [dlm] sp: c00000006f777790 msr: 8000000000009032 dar: 28002482100922a0 dsisr: 40000000 current = 0xc0000000016582b0 paca = 0xc000000000474e00 pid = 2856, comm = clvmd enter ? for help 0:mon> t [c00000006f777810] d000000000a74f08 .dlm_user_add_ast+0xec/0x330 [dlm] [c00000006f7778c0] d000000000a603a0 .dlm_add_ast+0x3c/0x158 [dlm] [c00000006f777960] d000000000a64138 .queue_cast+0x12c/0x15c [dlm] [c00000006f7779f0] d000000000a6619c .do_unlock+0xcc/0xf4 [dlm] [c00000006f777a80] d000000000a66d38 .unlock_lock+0x64/0xa0 [dlm] [c00000006f777b20] d000000000a6a300 .dlm_user_unlock+0xc4/0x1a4 [dlm] [c00000006f777c10] d000000000a74b80 .device_write+0x4f0/0x78c [dlm] [c00000006f777cf0] c0000000000ebda8 .vfs_write+0x118/0x200 [c00000006f777d90] c0000000000ec518 .sys_write+0x4c/0x8c [c00000006f777e30] c0000000000086a4 syscall_exit+0x0/0x40 --- Exception: c00 (System Call) at 000000000fc8f37c Activating VGs: Unable to handle kernel paging request for data at address 0x00000020 Faulting instruction address: 0xc0000000000e2464 cpu 0x0: Vector: 300 (Data Access) at [c000000071597620] pc: c0000000000e2464: .cache_alloc_refill+0x124/0x264 lr: c0000000000e2404: .cache_alloc_refill+0xc4/0x264 sp: c0000000715978a0 msr: 8000000000001032 dar: 20 dsisr: 40010000 current = 0xc00000000274dae0 paca = 0xc000000000474e00 pid = 2817, comm = clvmd enter ? for help 0:mon> t [c000000071597950] c0000000000e2bf0 .kmem_cache_alloc+0xac/0xd8 [c0000000715979e0] d000000000a6ddf8 .allocate_lkb+0x28/0x60 [dlm] [c000000071597a60] d000000000a66fd8 .create_lkb+0x24/0x198 [dlm] [c000000071597b00] d000000000a6bb10 .dlm_user_request+0x68/0x20c [dlm] [c000000071597c10] d000000000a74aa4 .device_write+0x414/0x78c [dlm] [c000000071597cf0] c0000000000ebda8 .vfs_write+0x118/0x200 [c000000071597d90] c0000000000ec518 .sys_write+0x4c/0x8c [c000000071597e30] c0000000000086a4 syscall_exit+0x0/0x40 --- Exception: c00 (System Call) at 000000000fc4f37c SP (f75ee730) is in userspace Expected results: System should not panic during testing. Additional info: Once I got a set of logical volumes created, restarting clvmd on any node would cause it to panic.
I took libdlm+dlm_tool from cvs HEAD, compiled as 64 bit, and they work fine. Next, same libdlm+dlm_tool from cvs HEAD, compiled as 32 bit, and they work fine. # ldd ./dlm_tool | grep libdlm libdlm.so.DEVEL => /usr/lib/libdlm.so.DEVEL (0x0ff90000) # ldd /usr/sbin/clvmd | grep libdlm libdlm.so.2 => /usr/lib/libdlm.so.2 (0x0fba0000) # ls -l /usr/lib/libdlm.so.DEVEL lrwxrwxrwx 1 root root 26 Dec 3 15:44 /usr/lib/libdlm.so.DEVEL -> libdlm.so.DEVEL.1196717827 and /usr/lib/libdlm.so.2 -> libdlm.so.2.0.73*, but I'm not sure what that really means, if anything much. Next, tried clvmd with the new libdlm: # rm /usr/lib/libdlm.so.2 # cd /usr/lib; ln -s libdlm.so.DEVEL.1196717827 libdlm.so.2 and clvmd now starts up fine on all the nodes; and also shuts down fine on all. I'm using cvs HEAD because I can't get libdlm to compile from the RHEL5 branch, probably something dumb I'm doing. But, there's no difference between libdlm source in RHEL5 and HEAD. So, despite my ignorance about all this build stuff, I'm inclined to say that the code is fine, and there's something wrong with the ppc rpm builds.
Chris, can you take a look at the builds to see if there is something odd there?
Adding the TestBlocker flag as this is preventing me from getting through ppc verification.
Moving this up to a beta blocker.
Some documentation on how to work with these ppc nodes. doral, basic, newport, kent Use qe's "console" program, e.g. 'console kent', from some machine that has it installed (use null.msp.redhat.com if you'd like). On some errors, the machines will drop you into the system monitor over the console. To reboot from there, you do 'zr'. You can get a backtrace from there too with 't'. You can also use sysrq: Jan 11 13:39:45 <refried> dct: you can send it a sysrq Jan 11 13:39:52 <refried> ^ecl0 b Jan 11 13:41:16 <refried> that's in sequence ^E c l 0 b Or apc: Jan 11 13:39:22 <dean> dct: you can always use the apc if all else fails. http://smoke-apc (although I think these machines take forever to come up when power cycled) Compile userland programs on basic, the others are missing some rpm's to build; cd cluster/dlm/tests/usertest/; gcc dlmtest2.c -I../../lib -o dlmtest2 -ldlm I then scp this to the other nodes. Compile experimental dlm.ko modules using the rhel51 linux source tree in /root/linux-rhel51/, e.g. - cd linux-rhel51/fs/dlm - edit files, add printk's, etc - remain in fs/dlm dir to build... - make -C /lib/modules/`uname -r`/build M=`pwd` - insmod ./dlm.ko The tree with my own dlm debugging is /root/linux-dct/. Most of my own debugging is trying to determine whether there could be a race handling userland lkb's or a refcounting problem with userland lkb's. I haven't found anything wrong, though. In my own testing, I've been starting a limited set of the clustering stuff: modprobe configfs mount -t configfs none /sys/kernel/config mount -t debugfs none /sys/kernel/debug insmod linux-dct/fs/dlm/dlm.ko ccsd -X cman_tool join groupd dlm_controld run tests I ran make_panic on gfs on the four nodes all weekend without a problem. I've reproduced the problems just running dlmtest2 stress, so it's not the fault of clvmd. The way we've usually been reproducing the problem is running 'service clvmd start' on one more more nodes, sometimes it takes a couple times doing stop/start before seeing it, sometimes right away. service clvmd start activates a number of lv's, so some dlm locking takes place. (lvm was set up by qe's 'activator' lvm test.)
Found it! This is massive memory corruption caused by the compat32 code not checking the lock name length when it copies the lock information from userspace 32 bit structure to 64 bit kernel space. The dlm_unlock call does not specify a name length in the structure passed into the kernel, so it can contain garbage. This causes the kernel to try and copy <garbage> bytes into it's kernel 64bit version of the data structure. However, it has only allocated enough memory to hold the bare structure, not any sort of name. The proper fix is two-fold: 1) Fix libdlm to zero namelen before passing it into the kernel. This will fix the bug and is the easiest thing to do if building kernels is a problem in the short-term. 2) Proper bounds checking of the input data in the kernel. Doing just 1) leaves an exploitable DoS bug. I'll produce patches for these in the morning.
userland fix checked into RHEL5 branch: Checking in libdlm.c; /cvs/cluster/cluster/dlm/lib/libdlm.c,v <-- libdlm.c new revision: 1.32.2.4; previous revision: 1.32.2.3 done
Created attachment 291975 [details] Patch for RHEL-5 kernel This is the patch for the RHEL-5 kernel.
in 2.6.18-72.el5 You can download this test kernel from http://people.redhat.com/dzickus/el5
I can get through the LVM test suite with the 5.2 bits.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2008-0314.html