Bug 409221 (lvm2-cluster-ppc) - DLM: panic after device_write
Summary: DLM: panic after device_write
Keywords:
Status: CLOSED ERRATA
Alias: lvm2-cluster-ppc
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel
Version: 5.1
Hardware: ppc64
OS: Linux
urgent
urgent
Target Milestone: rc
: ---
Assignee: Christine Caulfield
QA Contact: Cluster QE
URL:
Whiteboard:
Depends On:
Blocks: 429181 Cluster5-ppc
TreeView+ depends on / blocked
 
Reported: 2007-12-03 20:04 UTC by Nate Straz
Modified: 2008-05-21 15:02 UTC (History)
2 users (show)

Fixed In Version: RHBA-2008-0314
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2008-05-21 15:02:53 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Patch for RHEL-5 kernel (1.04 KB, patch)
2008-01-17 09:56 UTC, Christine Caulfield
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2008:0314 0 normal SHIPPED_LIVE Updated kernel packages for Red Hat Enterprise Linux 5.2 2008-05-20 18:43:34 UTC

Description Nate Straz 2007-12-03 20:04:15 UTC
Description of problem:

While trying to do LVM2 cluster testing I ran into a problem where the systems
are panicking and dropping into the system monitor.  I was able to get
backtraces in a few instances and found that the common function was device_write.

Version-Release number of selected component (if applicable):
kernel-2.6.18-53.el5
cman-2.0.73-1.el5
lvm2-cluster-2.02.26-1.el5
lvm2-2.02.26-3.el5

How reproducible:
Easily without LVM

Steps to Reproduce:
1. On multiple nodes in the cluster use dlm_tool to join and leave a lockspace.
  
Actual results:
 Unable to handle kernel paging request for data at address 0x28002482100922a0
Faulting instruction address: 0xc00000000035d820
cpu 0x0: Vector: 300 (Data Access) at [c00000006f777510]
    pc: c00000000035d820: ._spin_lock+0x20/0x88
    lr: d000000000a74f08: .dlm_user_add_ast+0xec/0x330 [dlm]
    sp: c00000006f777790
   msr: 8000000000009032
   dar: 28002482100922a0
 dsisr: 40000000
  current = 0xc0000000016582b0
  paca    = 0xc000000000474e00
    pid   = 2856, comm = clvmd
enter ? for help
0:mon> t
[c00000006f777810] d000000000a74f08 .dlm_user_add_ast+0xec/0x330 [dlm]
[c00000006f7778c0] d000000000a603a0 .dlm_add_ast+0x3c/0x158 [dlm]
[c00000006f777960] d000000000a64138 .queue_cast+0x12c/0x15c [dlm]
[c00000006f7779f0] d000000000a6619c .do_unlock+0xcc/0xf4 [dlm]
[c00000006f777a80] d000000000a66d38 .unlock_lock+0x64/0xa0 [dlm]
[c00000006f777b20] d000000000a6a300 .dlm_user_unlock+0xc4/0x1a4 [dlm]
[c00000006f777c10] d000000000a74b80 .device_write+0x4f0/0x78c [dlm]
[c00000006f777cf0] c0000000000ebda8 .vfs_write+0x118/0x200
[c00000006f777d90] c0000000000ec518 .sys_write+0x4c/0x8c
[c00000006f777e30] c0000000000086a4 syscall_exit+0x0/0x40
--- Exception: c00 (System Call) at 000000000fc8f37c

Activating VGs: Unable to handle kernel paging request for data at address
0x00000020
Faulting instruction address: 0xc0000000000e2464
cpu 0x0: Vector: 300 (Data Access) at [c000000071597620]
    pc: c0000000000e2464: .cache_alloc_refill+0x124/0x264
    lr: c0000000000e2404: .cache_alloc_refill+0xc4/0x264
    sp: c0000000715978a0
   msr: 8000000000001032
   dar: 20
 dsisr: 40010000
  current = 0xc00000000274dae0
  paca    = 0xc000000000474e00
    pid   = 2817, comm = clvmd
enter ? for help
0:mon> t
[c000000071597950] c0000000000e2bf0 .kmem_cache_alloc+0xac/0xd8
[c0000000715979e0] d000000000a6ddf8 .allocate_lkb+0x28/0x60 [dlm]
[c000000071597a60] d000000000a66fd8 .create_lkb+0x24/0x198 [dlm]
[c000000071597b00] d000000000a6bb10 .dlm_user_request+0x68/0x20c [dlm]
[c000000071597c10] d000000000a74aa4 .device_write+0x414/0x78c [dlm]
[c000000071597cf0] c0000000000ebda8 .vfs_write+0x118/0x200
[c000000071597d90] c0000000000ec518 .sys_write+0x4c/0x8c
[c000000071597e30] c0000000000086a4 syscall_exit+0x0/0x40
--- Exception: c00 (System Call) at 000000000fc4f37c
SP (f75ee730) is in userspace

Expected results:
System should not panic during testing.

Additional info:

Once I got a set of logical volumes created, restarting clvmd on any node would
cause it to panic.

Comment 1 David Teigland 2007-12-03 22:28:40 UTC
I took libdlm+dlm_tool from cvs HEAD, compiled as 64 bit, and they work fine.
Next, same libdlm+dlm_tool from cvs HEAD, compiled as 32 bit, and they work fine.

# ldd ./dlm_tool | grep libdlm
        libdlm.so.DEVEL => /usr/lib/libdlm.so.DEVEL (0x0ff90000)

# ldd /usr/sbin/clvmd | grep libdlm
        libdlm.so.2 => /usr/lib/libdlm.so.2 (0x0fba0000)

# ls -l /usr/lib/libdlm.so.DEVEL
lrwxrwxrwx 1 root root 26 Dec  3 15:44 /usr/lib/libdlm.so.DEVEL ->
libdlm.so.DEVEL.1196717827

and /usr/lib/libdlm.so.2 -> libdlm.so.2.0.73*, but I'm not sure what
that really means, if anything much.

Next, tried clvmd with the new libdlm:

# rm /usr/lib/libdlm.so.2
# cd /usr/lib; ln -s libdlm.so.DEVEL.1196717827 libdlm.so.2

and clvmd now starts up fine on all the nodes; and also shuts down
fine on all.

I'm using cvs HEAD because I can't get libdlm to compile from the RHEL5
branch, probably something dumb I'm doing.  But, there's no difference
between libdlm source in RHEL5 and HEAD.  So, despite my ignorance about
all this build stuff, I'm inclined to say that the code is fine, and there's
something wrong with the ppc rpm builds.


Comment 2 Nate Straz 2007-12-06 15:31:07 UTC
Chris, can you take a look at the builds to see if there is something odd there?  

Comment 3 Nate Straz 2007-12-10 22:57:08 UTC
Adding the TestBlocker flag as this is preventing me from getting through ppc
verification.

Comment 4 Nate Straz 2008-01-04 15:52:04 UTC
Moving this up to a beta blocker.

Comment 5 David Teigland 2008-01-14 17:52:08 UTC
Some documentation on how to work with these ppc nodes.
doral, basic, newport, kent

Use qe's "console" program, e.g. 'console kent', from some
machine that has it installed (use null.msp.redhat.com if you'd like).

On some errors, the machines will drop you into the system monitor over
the console.  To reboot from there, you do 'zr'.  You can get a backtrace
from there too with 't'.

You can also use sysrq:
Jan 11 13:39:45 <refried>       dct: you can send it a sysrq
Jan 11 13:39:52 <refried>       ^ecl0 b
Jan 11 13:41:16 <refried>       that's in sequence ^E c l 0 b

Or apc:
Jan 11 13:39:22 <dean>  dct: you can always use the apc if all else fails.
http://smoke-apc
(although I think these machines take forever to come up when power cycled)

Compile userland programs on basic, the others are missing some rpm's to build;
cd cluster/dlm/tests/usertest/; gcc dlmtest2.c -I../../lib -o dlmtest2 -ldlm
I then scp this to the other nodes.

Compile experimental dlm.ko modules using the rhel51 linux source tree in
/root/linux-rhel51/, e.g.
- cd linux-rhel51/fs/dlm
- edit files, add printk's, etc
- remain in fs/dlm dir to build...
- make -C /lib/modules/`uname -r`/build M=`pwd`
- insmod ./dlm.ko

The tree with my own dlm debugging is /root/linux-dct/.  Most of my own
debugging is trying to determine whether there could be a race handling
userland lkb's or a refcounting problem with userland lkb's.  I haven't
found anything wrong, though.

In my own testing, I've been starting a limited set of the clustering stuff:
modprobe configfs
mount -t configfs none /sys/kernel/config
mount -t debugfs none /sys/kernel/debug
insmod linux-dct/fs/dlm/dlm.ko
ccsd -X
cman_tool join
groupd
dlm_controld
run tests

I ran make_panic on gfs on the four nodes all weekend without a problem.
I've reproduced the problems just running dlmtest2 stress, so it's not the
fault of clvmd.

The way we've usually been reproducing the problem is running
'service clvmd start' on one more more nodes, sometimes it takes a couple
times doing stop/start before seeing it, sometimes right away.  service
clvmd start activates a number of lv's, so some dlm locking takes place.
(lvm was set up by qe's 'activator' lvm test.)


Comment 6 Christine Caulfield 2008-01-16 17:19:33 UTC
Found it!

This is massive memory corruption caused by the compat32 code not checking the
lock name length when it copies the lock information from userspace 32 bit
structure to 64 bit kernel space.

The dlm_unlock call does not specify a name length in the structure passed into
the kernel, so it can contain garbage. This causes the kernel to try and copy
<garbage> bytes into it's kernel 64bit version of the data structure. However,
it has only allocated enough memory to hold the bare structure, not any sort of
name.

The proper fix is two-fold:

1) Fix libdlm to zero namelen before passing it into the kernel. This will fix
the bug and is the easiest thing to do if building kernels is a problem in the
short-term.

2) Proper bounds checking of the input data in the kernel. Doing just 1) leaves
an exploitable DoS bug.

I'll produce patches for these in the morning.


Comment 7 Christine Caulfield 2008-01-17 09:40:39 UTC
userland fix checked into RHEL5 branch:

Checking in libdlm.c;
/cvs/cluster/cluster/dlm/lib/libdlm.c,v  <--  libdlm.c
new revision: 1.32.2.4; previous revision: 1.32.2.3
done


Comment 8 Christine Caulfield 2008-01-17 09:56:47 UTC
Created attachment 291975 [details]
Patch for RHEL-5 kernel

This is the patch for the RHEL-5 kernel.

Comment 13 Don Zickus 2008-01-22 18:49:43 UTC
in 2.6.18-72.el5
You can download this test kernel from http://people.redhat.com/dzickus/el5

Comment 15 Nate Straz 2008-02-28 00:59:44 UTC
I can get through the LVM test suite with the 5.2 bits.

Comment 17 errata-xmlrpc 2008-05-21 15:02:53 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2008-0314.html



Note You need to log in before you can comment on or make changes to this bug.