Bug 193851 - corrupt dlm message during heavy nfs usage
corrupt dlm message during heavy nfs usage
Status: CLOSED INSUFFICIENT_DATA
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel (Show other bugs)
5.0
All Linux
medium Severity medium
: ---
: ---
Assigned To: David Teigland
Red Hat Kernel QE team
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2006-06-02 02:20 EDT by Andrej Filipcic
Modified: 2009-09-03 12:50 EDT (History)
2 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2006-12-12 16:49:04 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Andrej Filipcic 2006-06-02 02:20:35 EDT
Description of problem:

nfs-exported gfs partiton has severe problems with dlm. After heavy nfs load
from several nfs clients, dlm_recvd starts to consume all available cpu and
eventually locks gfs access from cluster nodes. nfsd consumes the rest of (SMP)
 cpu in a system. The only workaround is to reboot the server.
There is the same problem with cluster-1.02.00, but CVS version does not corrupt
cluster state so the rest of the cluster remains operational after reboot of the
problematic server. The server also joins after reboot without any problem.


Version-Release number of selected component (if applicable):
cluster CVS stable, 2006-05-30

How reproducible:
after heavy nfs load when trying to access lots of small files from several nfs
clients

Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:
dmesg log
----------------
dlm: midcomms: bad header version 18ce0378
dlm: midcomms: cmd=0, flags=0, length=256, lkid=4290794496, lockspace=4294934785
dlm: midcomms: base=ffff810048b83000, offset=256, len=2640, ret=256,
limit=00001000 newbuf=0
78 03 ce 18 00 00 00 01-00 54 c0 ff 01 81 ff ff
fe ff fe ff 02 81 ff ff-00 02 64 98 00 81 ff ff
15 d4 16 80 ff ff ff ff-01 00 00 00 00 00 00 00
00 54 c0 ff 01 81 ff ff-d0 00 00 00 00 00 00 00
01 00 01 00 05 00 48 00-9c 01 09 19 08 00 00 01
00 54 c0 ff 01 81 ff ff-fe ff fe ff 02 81 ff ff
00 02 64 98 00 81 ff ff-15 d4 16 80 ff ff ff ff
01 00 00 00 00 00 00 00-00 54 c0 ff 01 81 ff ff
d0 00 00 00 00 00 00 00-01 00 01 00 05 00 48 00
22 00 ee 18 08 00 00 01-00 54 c0 ff 01 81 ff ff
fe ff fe ff 02 81 ff ff-00 02 64 98 00 81 ff ff
15 d4 16 80 ff ff ff ff-01 00 00 00 00 00 00 00
00 54 c0 ff 01 81 ff ff-d0 00 00 00 00 00 00 00
01 00 01 00 05 00 48 00-63 02 c9 18 08 00 00 01
00 54 c0 ff 01 81 ff ff-fe ff fe ff 02 81 ff ff
00 02 64 98
00 81 ff ff
15 d4 16 80
ff
ff
ff
ff
dlm: lowcomms: addr=ffff810048b83000, base=0, len=2896, iov_len=4096,
iov_base[0]=ffff810048b83b50, read=2896
dlm: midcomms: bad header version 18ce0378
dlm: midcomms: cmd=0, flags=0, length=256, lkid=4290794496, lockspace=4294934785
dlm: midcomms: base=ffff810048b83000, offset=256, len=3840, ret=256,
limit=00001000 newbuf=0
78 03 ce 18 00 00 00 01-00 54 c0 ff 01 81 ff ff
fe ff fe ff 02 81 ff ff-00 02 64 98 00 81 ff ff
15 d4 16 80 ff ff ff ff-01 00 00 00 00 00 00 00
00 54 c0 ff 01 81 ff ff-d0 00 00 00 00 00 00 00
01 00 01 00 05 00 48 00-9c 01 09 19 08 00 00 01
00 54 c0 ff 01 81 ff ff-fe ff fe ff 02 81 ff ff
00 02 64 98 00 81 ff ff-15 d4 16 80 ff ff ff ff
01 00 00 00 00 00 00 00-00 54 c0 ff 01 81 ff ff
d0 00 00 00 00 00 00 00-01 00 01 00 05 00 48 00
22 00 ee 18 08 00 00 01-00 54 c0 ff 01 81 ff ff
fe ff fe ff 02 81 ff ff-00 02 64 98 00 81 ff ff
15 d4 16 80 ff ff ff ff-01 00 00 00 00 00 00 00
00 54 c0 ff 01 81 ff ff-d0 00 00 00 00 00 00 00
01 00 01 00 05 00 48 00-63 02 c9 18 08 00 00 01
00 54 c0 ff 01 81 ff ff-fe ff fe ff 02 81 ff ff
00 02 64 98
00 81 ff ff
15 d4 16 80
ff
ff
ff
ff
dlm: lowcomms: addr=ffff810048b83000, base=0, len=4096, iov_len=1200,
iov_base[0]=ffff810048b84000, read=1200

System info:
Gentoo Base System version 1.12.0_pre19
Portage 2.1_rc2-r3 (default-linux/amd64/2005.1, gcc-4.1.1, glibc-2.4-r3,
2.6.16-gentoo-r4 x86_64)
=================================================================
System uname: 2.6.16-gentoo-r4 x86_64 AMD Opteron(tm) Processor 246
distcc 2.18.3 x86_64-pc-linux-gnu (protocols 1 and 2) (default port 3632) [enabled]
dev-lang/python:     2.3.5-r2, 2.4.3-r1
dev-python/pycrypto: 2.0.1-r5
dev-util/ccache:     [Not Present]
dev-util/confcache:  [Not Present]
sys-apps/sandbox:    1.2.18
sys-devel/autoconf:  2.13, 2.59-r7
sys-devel/automake:  1.4_p6, 1.5, 1.6.3, 1.7.9-r1, 1.8.5-r3, 1.9.6-r2
sys-devel/binutils:  2.16.1-r2
sys-devel/libtool:   1.5.22
virtual/os-headers:  2.6.11-r3
Comment 1 David Teigland 2006-10-17 13:07:39 EDT
Is the cman/dlm traffic running on a separate network from the
nfs traffic?  If not, is it possible to try that?
Comment 2 Nate Straz 2007-12-13 12:40:50 EST
Moving all RHCS ver 5 bugs to RHEL 5 so we can remove RHCS v5 which never existed.

Note You need to log in before you can comment on or make changes to this bug.