Bug 193851 - corrupt dlm message during heavy nfs usage
Summary: corrupt dlm message during heavy nfs usage
Keywords:
Status: CLOSED INSUFFICIENT_DATA
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel
Version: 5.0
Hardware: All
OS: Linux
medium
medium
Target Milestone: ---
: ---
Assignee: David Teigland
QA Contact: Red Hat Kernel QE team
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2006-06-02 06:20 UTC by Andrej Filipcic
Modified: 2009-09-03 16:50 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2006-12-12 21:49:04 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Andrej Filipcic 2006-06-02 06:20:35 UTC
Description of problem:

nfs-exported gfs partiton has severe problems with dlm. After heavy nfs load
from several nfs clients, dlm_recvd starts to consume all available cpu and
eventually locks gfs access from cluster nodes. nfsd consumes the rest of (SMP)
 cpu in a system. The only workaround is to reboot the server.
There is the same problem with cluster-1.02.00, but CVS version does not corrupt
cluster state so the rest of the cluster remains operational after reboot of the
problematic server. The server also joins after reboot without any problem.


Version-Release number of selected component (if applicable):
cluster CVS stable, 2006-05-30

How reproducible:
after heavy nfs load when trying to access lots of small files from several nfs
clients

Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:
dmesg log
----------------
dlm: midcomms: bad header version 18ce0378
dlm: midcomms: cmd=0, flags=0, length=256, lkid=4290794496, lockspace=4294934785
dlm: midcomms: base=ffff810048b83000, offset=256, len=2640, ret=256,
limit=00001000 newbuf=0
78 03 ce 18 00 00 00 01-00 54 c0 ff 01 81 ff ff
fe ff fe ff 02 81 ff ff-00 02 64 98 00 81 ff ff
15 d4 16 80 ff ff ff ff-01 00 00 00 00 00 00 00
00 54 c0 ff 01 81 ff ff-d0 00 00 00 00 00 00 00
01 00 01 00 05 00 48 00-9c 01 09 19 08 00 00 01
00 54 c0 ff 01 81 ff ff-fe ff fe ff 02 81 ff ff
00 02 64 98 00 81 ff ff-15 d4 16 80 ff ff ff ff
01 00 00 00 00 00 00 00-00 54 c0 ff 01 81 ff ff
d0 00 00 00 00 00 00 00-01 00 01 00 05 00 48 00
22 00 ee 18 08 00 00 01-00 54 c0 ff 01 81 ff ff
fe ff fe ff 02 81 ff ff-00 02 64 98 00 81 ff ff
15 d4 16 80 ff ff ff ff-01 00 00 00 00 00 00 00
00 54 c0 ff 01 81 ff ff-d0 00 00 00 00 00 00 00
01 00 01 00 05 00 48 00-63 02 c9 18 08 00 00 01
00 54 c0 ff 01 81 ff ff-fe ff fe ff 02 81 ff ff
00 02 64 98
00 81 ff ff
15 d4 16 80
ff
ff
ff
ff
dlm: lowcomms: addr=ffff810048b83000, base=0, len=2896, iov_len=4096,
iov_base[0]=ffff810048b83b50, read=2896
dlm: midcomms: bad header version 18ce0378
dlm: midcomms: cmd=0, flags=0, length=256, lkid=4290794496, lockspace=4294934785
dlm: midcomms: base=ffff810048b83000, offset=256, len=3840, ret=256,
limit=00001000 newbuf=0
78 03 ce 18 00 00 00 01-00 54 c0 ff 01 81 ff ff
fe ff fe ff 02 81 ff ff-00 02 64 98 00 81 ff ff
15 d4 16 80 ff ff ff ff-01 00 00 00 00 00 00 00
00 54 c0 ff 01 81 ff ff-d0 00 00 00 00 00 00 00
01 00 01 00 05 00 48 00-9c 01 09 19 08 00 00 01
00 54 c0 ff 01 81 ff ff-fe ff fe ff 02 81 ff ff
00 02 64 98 00 81 ff ff-15 d4 16 80 ff ff ff ff
01 00 00 00 00 00 00 00-00 54 c0 ff 01 81 ff ff
d0 00 00 00 00 00 00 00-01 00 01 00 05 00 48 00
22 00 ee 18 08 00 00 01-00 54 c0 ff 01 81 ff ff
fe ff fe ff 02 81 ff ff-00 02 64 98 00 81 ff ff
15 d4 16 80 ff ff ff ff-01 00 00 00 00 00 00 00
00 54 c0 ff 01 81 ff ff-d0 00 00 00 00 00 00 00
01 00 01 00 05 00 48 00-63 02 c9 18 08 00 00 01
00 54 c0 ff 01 81 ff ff-fe ff fe ff 02 81 ff ff
00 02 64 98
00 81 ff ff
15 d4 16 80
ff
ff
ff
ff
dlm: lowcomms: addr=ffff810048b83000, base=0, len=4096, iov_len=1200,
iov_base[0]=ffff810048b84000, read=1200

System info:
Gentoo Base System version 1.12.0_pre19
Portage 2.1_rc2-r3 (default-linux/amd64/2005.1, gcc-4.1.1, glibc-2.4-r3,
2.6.16-gentoo-r4 x86_64)
=================================================================
System uname: 2.6.16-gentoo-r4 x86_64 AMD Opteron(tm) Processor 246
distcc 2.18.3 x86_64-pc-linux-gnu (protocols 1 and 2) (default port 3632) [enabled]
dev-lang/python:     2.3.5-r2, 2.4.3-r1
dev-python/pycrypto: 2.0.1-r5
dev-util/ccache:     [Not Present]
dev-util/confcache:  [Not Present]
sys-apps/sandbox:    1.2.18
sys-devel/autoconf:  2.13, 2.59-r7
sys-devel/automake:  1.4_p6, 1.5, 1.6.3, 1.7.9-r1, 1.8.5-r3, 1.9.6-r2
sys-devel/binutils:  2.16.1-r2
sys-devel/libtool:   1.5.22
virtual/os-headers:  2.6.11-r3

Comment 1 David Teigland 2006-10-17 17:07:39 UTC
Is the cman/dlm traffic running on a separate network from the
nfs traffic?  If not, is it possible to try that?


Comment 2 Nate Straz 2007-12-13 17:40:50 UTC
Moving all RHCS ver 5 bugs to RHEL 5 so we can remove RHCS v5 which never existed.


Note You need to log in before you can comment on or make changes to this bug.